1% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. 2% Copying of this file is authorized only if (1) you are D. E. Knuth, or if 3% (2) you make absolutely no changes to your copy. (The WEB system provides 4% for alterations via an auxiliary file; the master file should stay intact.) 5% See Appendix H of the WEB manual for hints on how to install this program. 6% And see Appendix A of the TRIP manual for details about how to validate it. 7 8% TeX is a trademark of the American Mathematical Society. 9% METAFONT is a trademark of Addison-Wesley Publishing Company. 10 11% Version 0 was released in September 1982 after it passed a variety of tests. 12% Version 1 was released in November 1983 after thorough testing. 13% Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). 14% Version 1.2 allowed `0' in response to an error, et alia (October 1984). 15% Version 1.3 made memory allocation more flexible and local (November 1984). 16% Version 1.4 fixed accents right after line breaks, et alia (April 1985). 17% Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). 18% Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). 19% Version 2.1 corrected anomalies in discretionary breaks (January 1987). 20% Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). 21% Version 2.3 avoided incomplete page in premature termination (August 1987). 22% Version 2.4 fixed \noaligned rules in indented displays (August 1987). 23% Version 2.5 saved cur_order when expanding tokens (September 1987). 24% Version 2.6 added 10sp slop when shipping leaders (November 1987). 25% Version 2.7 improved rounding of negative-width characters (November 1987). 26% Version 2.8 fixed weird bug if no \patterns are used (December 1987). 27% Version 2.9 made \csname\endcsname's "relax" local (December 1987). 28% Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). 29% Version 2.92 fixed \patterns, also file names with complex macros (May 1988). 30% Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). 31% Version 2.94 kept open_log_file from calling fatal_error (November 1988). 32% Version 2.95 solved that problem a better way (December 1988). 33% Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). 34% Version 2.97 corrected blunder in creating 2.95 (February 1989). 35% Version 2.98 omitted save_for_after at outer level (March 1989). 36% Version 2.99 caught $$\begingroup\halign..$$ (June 1989). 37% Version 2.991 caught .5\ifdim.6... (June 1989). 38% Version 2.992 introduced major changes for 8-bit extensions (September 1989). 39% Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). 40% Version 3.0 fixed unusual displays; was more \output robust (March 1990). 41% Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). 42% Version 3.14 fixed unprintable font names and corrected typos (March 1991). 43% Version 3.141 more of same; reconstituted ligatures better (March 1992). 44% Version 3.1415 preserved nonexplicit kerns, tidied up (February 1993). 45% Version 3.14159 allowed fontmemsize to change; bulletproofing (March 1995). 46% Version 3.141592 fixed \xleaders, glueset, weird alignments (December 2002). 47% Version 3.1415926 was a general cleanup with minor fixes (February 2008). 48% Version 3.14159265 was similar (January 2014). 49 50% A reward of $327.68 will be paid to the first finder of any remaining bug. 51 52% Although considerable effort has been expended to make the TeX program 53% correct and reliable, no warranty is implied; the author disclaims any 54% obligation or liability for damages, including but not limited to 55% special, indirect, or consequential damages arising out of or in 56% connection with the use or performance of this software. This work has 57% been a ``labor of love'' and the author hopes that users enjoy it. 58 59% Here is TeX material that gets inserted after \input webmac 60\def\hang{\hangindent 3em\noindent\ignorespaces} 61\def\hangg#1 {\hang\hbox{#1 }} 62\def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} 63\font\ninerm=cmr9 64\let\mc=\ninerm % medium caps for names like SAIL 65\def\PASCAL{Pascal} 66\def\ph{\hbox{Pascal-H}} 67\def\pct!{{\char`\%}} % percent sign in ordinary text 68\font\logo=logo10 % font used for the METAFONT logo 69\def\MF{{\logo META}\-{\logo FONT}} 70\def\<#1>{$\langle#1\rangle$} 71\def\section{\mathhexbox278} 72 73\def\(#1){} % this is used to make section names sort themselves better 74\def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> 75 76\outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section 77 \def\rhead{PART #2:\uppercase{#3}} % define running headline 78 \message{*\modno} % progress report 79 \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next 80 \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} 81\let\?=\relax % we want to be able to \write a \? 82 83\def\title{\TeX82} 84\def\topofcontents{\hsize 5.5in 85 \vglue 0pt plus 1fil minus 1.5in 86 \def\?##1]{\hbox to 1in{\hfil##1.\ }} 87 } 88\def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} 89\pageno=3 90\def\glob{13} % this should be the section number of "<Global...>" 91\def\gglob{20, 26} % this should be the next two sections of "<Global...>" 92 93@* \[1] Introduction. 94This is \TeX, a document compiler intended to produce typesetting of high 95quality. 96The \PASCAL\ program that follows is the definition of \TeX82, a standard 97@:PASCAL}{\PASCAL@> 98@!@:TeX82}{\TeX82@> 99version of \TeX\ that is designed to be highly portable so that identical output 100will be obtainable on a great variety of computers. 101 102The main purpose of the following program is to explain the algorithms of \TeX\ 103as clearly as possible. As a result, the program will not necessarily be very 104efficient when a particular \PASCAL\ compiler has translated it into a 105particular machine language. However, the program has been written so that it 106can be tuned to run efficiently in a wide variety of operating environments 107by making comparatively few changes. Such flexibility is possible because 108the documentation that follows is written in the \.{WEB} language, which is 109at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} 110to \PASCAL\ is able to introduce most of the necessary refinements. 111Semi-automatic translation to other languages is also feasible, because the 112program below does not make extensive use of features that are peculiar to 113\PASCAL. 114 115A large piece of software like \TeX\ has inherent complexity that cannot 116be reduced below a certain level of difficulty, although each individual 117part is fairly simple by itself. The \.{WEB} language is intended to make 118the algorithms as readable as possible, by reflecting the way the 119individual program pieces fit together and by providing the 120cross-references that connect different parts. Detailed comments about 121what is going on, and about why things were done in certain ways, have 122been liberally sprinkled throughout the program. These comments explain 123features of the implementation, but they rarely attempt to explain the 124\TeX\ language itself, since the reader is supposed to be familiar with 125{\sl The \TeX book}. 126@.WEB@> 127@:TeXbook}{\sl The \TeX book@> 128 129@ The present implementation has a long ancestry, beginning in the summer 130of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded 131a prototype 132@^Plass, Michael Frederick@> 133@^Liang, Franklin Mark@> 134@^Knuth, Donald Ervin@> 135based on some specifications that the author had made in May of that year. 136This original proto\TeX\ included macro definitions and elementary 137manipulations on boxes and glue, but it did not have line-breaking, 138page-breaking, mathematical formulas, alignment routines, error recovery, 139or the present semantic nest; furthermore, 140it used character lists instead of token lists, so that a control sequence 141like \.{\\halign} was represented by a list of seven characters. A 142complete version of \TeX\ was designed and coded by the author in late 1431977 and early 1978; that program, like its prototype, was written in the 144{\mc SAIL} language, for which an excellent debugging system was 145available. Preliminary plans to convert the {\mc SAIL} code into a form 146somewhat like the present ``web'' were developed by Luis Trabb~Pardo and 147@^Trabb Pardo, Luis Isidoro@> 148the author at the beginning of 1979, and a complete implementation was 149created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which 150@^Zabala Salelles, Ignacio Andr\'es@> 151was written by the author during the latter part of 1981 and the early 152part of 1982, also incorporates ideas from the 1979 implementation of 153@^Guibas, Leonidas Ioannis@> 154@^Sedgewick, Robert@> 155@^Wyatt, Douglas Kirk@> 156\TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, 157and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred 158refinements were introduced into \TeX82 based on the experiences gained with 159the original implementations, so that essentially every part of the system 160has been substantially improved. After the appearance of ``Version 0'' in 161September 1982, this program benefited greatly from the comments of 162many other people, notably David~R. Fuchs and Howard~W. Trickey. 163A final revision in September 1989 extended the input character set to 164eight-bit codes and introduced the ability to hyphenate words from 165different languages, based on some ideas of Michael~J. Ferguson. 166@^Fuchs, David Raymond@> 167@^Trickey, Howard Wellington@> 168@^Ferguson, Michael John@> 169 170No doubt there still is plenty of room for improvement, but the author 171is firmly committed to keeping \TeX82 ``frozen'' from now on; stability 172and reliability are to be its main virtues. 173 174On the other hand, the \.{WEB} description can be extended without changing 175the core of \TeX82 itself, and the program has been designed so that such 176extensions are not extremely difficult to make. 177The |banner| string defined here should be changed whenever \TeX\ 178undergoes any modifications, so that it will be clear which version of 179\TeX\ might be the guilty party when a problem arises. 180@^extensions to \TeX@> 181@^system dependencies@> 182 183If this program is changed, the resulting system should not be called 184`\TeX'; the official name `\TeX' by itself is reserved 185for software systems that are fully compatible with each other. 186A special test suite called the ``\.{TRIP} test'' is available for 187helping to determine whether a particular implementation deserves to be 188known as `\TeX' [cf.~Stanford Computer Science report CS1027, 189November 1984]. 190 191@d banner=='This is TeX, Version 3.14159265' {printed when \TeX\ starts} 192 193@ Different \PASCAL s have slightly different conventions, and the present 194@!@:PASCAL H}{\ph@> 195program expresses \TeX\ in terms of the \PASCAL\ that was 196available to the author in 1982. Constructions that apply to 197this particular compiler, which we shall call \ph, should help the 198reader see how to make an appropriate interface for other systems 199if necessary. (\ph\ is Charles Hedrick's modification of a compiler 200@^Hedrick, Charles Locke@> 201for the DECsystem-10 that was originally developed at the University of 202Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 20329--42. The \TeX\ program below is intended to be adaptable, without 204extensive changes, to most other versions of \PASCAL, so it does not fully 205use the admirable features of \ph. Indeed, a conscious effort has been 206made here to avoid using several idiosyncratic features of standard 207\PASCAL\ itself, so that most of the code can be translated mechanically 208into other high-level languages. For example, the `\&{with}' and `\\{new}' 209features are not used, nor are pointer types, set types, or enumerated 210scalar types; there are no `\&{var}' parameters, except in the case of files; 211there are no tag fields on variant records; there are no assignments 212|real:=integer|; no procedures are declared local to other procedures.) 213 214The portions of this program that involve system-dependent code, where 215changes might be necessary because of differences between \PASCAL\ compilers 216and/or differences between 217operating systems, can be identified by looking at the sections whose 218numbers are listed under `system dependencies' in the index. Furthermore, 219the index entries for `dirty \PASCAL' list all places where the restrictions 220of \PASCAL\ have not been followed perfectly, for one reason or another. 221@!@^system dependencies@> 222@!@^dirty \PASCAL@> 223 224Incidentally, \PASCAL's standard |round| function can be problematical, 225because it disagrees with the IEEE floating-point standard. 226Many implementors have 227therefore chosen to substitute their own home-grown rounding procedure. 228 229@ The program begins with a normal \PASCAL\ program heading, whose 230components will be filled in later, using the conventions of \.{WEB}. 231@.WEB@> 232For example, the portion of the program called `\X\glob:Global 233variables\X' below will be replaced by a sequence of variable declarations 234that starts in $\section\glob$ of this documentation. In this way, we are able 235to define each individual global variable when we are prepared to 236understand what it means; we do not have to define all of the globals at 237once. Cross references in $\section\glob$, where it says ``See also 238sections \gglob, \dots,'' also make it possible to look at the set of 239all global variables, if desired. Similar remarks apply to the other 240portions of the program heading. 241 242Actually the heading shown here is not quite normal: The |program| line 243does not mention any |output| file, because \ph\ would ask the \TeX\ user 244to specify a file name if |output| were specified here. 245@:PASCAL H}{\ph@> 246@^system dependencies@> 247 248@d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} 249@f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} 250@f type==true {but `|type|' will not be treated as a reserved word} 251 252@p @t\4@>@<Compiler directives@>@/ 253program TEX; {all file names are defined dynamically} 254label @<Labels in the outer block@>@/ 255const @<Constants in the outer block@>@/ 256mtype @<Types in the outer block@>@/ 257var @<Global variables@>@/ 258@# 259procedure initialize; {this procedure gets things started properly} 260 var @<Local variables for initialization@>@/ 261 begin @<Initialize whatever \TeX\ might access@>@; 262 end;@# 263@t\4@>@<Basic printing procedures@>@/ 264@t\4@>@<Error handling procedures@>@/ 265 266@ The overall \TeX\ program begins with the heading just shown, after which 267comes a bunch of procedure declarations and function declarations. 268Finally we will get to the main program, which begins with the 269comment `|start_here|'. If you want to skip down to the 270main program now, you can look up `|start_here|' in the index. 271But the author suggests that the best way to understand this program 272is to follow pretty much the order of \TeX's components as they appear in the 273\.{WEB} description you are now reading, since the present ordering is 274intended to combine the advantages of the ``bottom up'' and ``top down'' 275approaches to the problem of understanding a somewhat complicated system. 276 277@ Three labels must be declared in the main program, so we give them 278symbolic names. 279 280@d start_of_TEX=1 {go here when \TeX's variables are initialized} 281@d end_of_TEX=9998 {go here to close files and terminate gracefully} 282@d final_end=9999 {this label marks the ending of the program} 283 284@<Labels in the out...@>= 285start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end; 286 {key control points} 287 288@ Some of the code below is intended to be used only when diagnosing the 289strange behavior that sometimes occurs when \TeX\ is being installed or 290when system wizards are fooling around with \TeX\ without quite knowing 291what they are doing. Such code will not normally be compiled; it is 292delimited by the codewords `$|debug|\ldots|gubed|$', with apologies 293to people who wish to preserve the purity of English. 294 295Similarly, there is some conditional code delimited by 296`$|stat|\ldots|tats|$' that is intended for use when statistics are to be 297kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also 298implements diagnostic information for \.{\\tracingparagraphs} and 299\.{\\tracingpages}. 300@^debugging@> 301 302@d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} 303@d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} 304@f debug==begin 305@f gubed==end 306@# 307@d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering 308 usage statistics} 309@d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering 310 usage statistics} 311@f stat==begin 312@f tats==end 313 314@ This program has two important variations: (1) There is a long and slow 315version called \.{INITEX}, which does the extra calculations needed to 316@.INITEX@> 317initialize \TeX's internal tables; and (2)~there is a shorter and faster 318production version, which cuts the initialization to a bare minimum. 319Parts of the program that are needed in (1) but not in (2) are delimited by 320the codewords `$|init|\ldots|tini|$'. 321 322@d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} 323@d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} 324@f init==begin 325@f tini==end 326 327@<Initialize whatever...@>= 328@<Set initial values of key variables@>@/ 329@!init @<Initialize table entries (done by \.{INITEX} only)@>@;@+tini 330 331@ If the first character of a \PASCAL\ comment is a dollar sign, 332\ph\ treats the comment as a list of ``compiler directives'' that will 333affect the translation of this program into machine language. The 334directives shown below specify full checking and inclusion of the \PASCAL\ 335debugger when \TeX\ is being debugged, but they cause range checking and other 336redundant code to be eliminated when the production system is being generated. 337Arithmetic overflow will be detected in all cases. 338@:PASCAL H}{\ph@> 339@^system dependencies@> 340@^overflow in arithmetic@> 341 342@<Compiler directives@>= 343@{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} 344@!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} 345 346@ This \TeX\ implementation conforms to the rules of the {\sl Pascal User 347@:PASCAL}{\PASCAL@> 348@^system dependencies@> 349Manual} published by Jensen and Wirth in 1975, except where system-dependent 350@^Wirth, Niklaus@> 351@^Jensen, Kathleen@> 352code is necessary to make a useful system program, and except in another 353respect where such conformity would unnecessarily obscure the meaning 354and clutter up the code: We assume that |case| statements may include a 355default case that applies if no matching label is found. Thus, we shall use 356constructions like 357$$\vbox{\halign{\ignorespaces#\hfil\cr 358|case x of|\cr 3591: $\langle\,$code for $x=1\,\rangle$;\cr 3603: $\langle\,$code for $x=3\,\rangle$;\cr 361|othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr 362|endcases|\cr}}$$ 363since most \PASCAL\ compilers have plugged this hole in the language by 364incorporating some sort of default mechanism. For example, the \ph\ 365compiler allows `|others|:' as a default label, and other \PASCAL s allow 366syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The 367definitions of |othercases| and |endcases| should be changed to agree with 368local conventions. Note that no semicolon appears before |endcases| in 369this program, so the definition of |endcases| should include a semicolon 370if the compiler wants one. (Of course, if no default mechanism is 371available, the |case| statements of \TeX\ will have to be laboriously 372extended by listing all remaining cases. People who are stuck with such 373\PASCAL s have, in fact, done this, successfully but not happily!) 374@:PASCAL H}{\ph@> 375 376@d othercases == others: {default for cases not listed explicitly} 377@d endcases == @+end {follows the default case in an extended |case| statement} 378@f othercases == else 379@f endcases == end 380 381@ The following parameters can be changed at compile time to extend or 382reduce \TeX's capacity. They may have different values in \.{INITEX} and 383in production versions of \TeX. 384@.INITEX@> 385@^system dependencies@> 386 387@<Constants...@>= 388@!mem_max=30000; {greatest index in \TeX's internal |mem| array; 389 must be strictly less than |max_halfword|; 390 must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|} 391@!mem_min=0; {smallest index in \TeX's internal |mem| array; 392 must be |min_halfword| or more; 393 must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|} 394@!buf_size=500; {maximum number of characters simultaneously present in 395 current lines of open files and in control sequences between 396 \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|} 397@!error_line=72; {width of context lines on terminal error messages} 398@!half_error_line=42; {width of first lines of contexts in terminal 399 error messages; should be between 30 and |error_line-15|} 400@!max_print_line=79; {width of longest text lines output; should be at least 60} 401@!stack_size=200; {maximum number of simultaneous input sources} 402@!max_in_open=6; {maximum number of input files and error insertions that 403 can be going on simultaneously} 404@!font_max=75; {maximum internal font number; must not exceed |max_quarterword| 405 and must be at most |font_base+256|} 406@!font_mem_size=20000; {number of words of |font_info| for all fonts} 407@!param_size=60; {maximum number of simultaneous macro parameters} 408@!nest_size=40; {maximum number of semantic levels simultaneously active} 409@!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|} 410@!string_vacancies=8000; {the minimum number of characters that should be 411 available for the user's control sequences and font names, 412 after \TeX's own error messages are stored} 413@!pool_size=32000; {maximum number of characters in strings, including all 414 error messages and help texts, and the names of all fonts and 415 control sequences; must exceed |string_vacancies| by the total 416 length of \TeX's own strings, which is currently about 23000} 417@!save_size=600; {space for saving values outside of current group; must be 418 at most |max_halfword|} 419@!trie_size=8000; {space for hyphenation patterns; should be larger for 420 \.{INITEX} than it is in production versions of \TeX} 421@!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns} 422@!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8} 423@!file_name_size=40; {file names shouldn't be longer than this} 424@!pool_name='TeXformats:TEX.POOL '; 425 {string of length |file_name_size|; tells where the string pool appears} 426@.TeXformats@> 427 428@ Like the preceding parameters, the following quantities can be changed 429at compile time to extend or reduce \TeX's capacity. But if they are changed, 430it is necessary to rerun the initialization program \.{INITEX} 431@.INITEX@> 432to generate new tables for the production \TeX\ program. 433One can't simply make helter-skelter changes to the following constants, 434since certain rather complex initialization 435numbers are computed from them. They are defined here using 436\.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to 437emphasize this distinction. 438 439@d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX}; 440 must not be less than |mem_min|} 441@d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX}; 442 must be substantially larger than |mem_bot| 443 and not greater than |mem_max|} 444@d font_base=0 {smallest internal font number; must not be less 445 than |min_quarterword|} 446@d hash_size=2100 {maximum number of control sequences; it should be at most 447 about |(mem_max-mem_min)/10|} 448@d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} 449@d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions} 450@^system dependencies@> 451 452@ In case somebody has inadvertently made bad settings of the ``constants,'' 453\TeX\ checks them using a global variable called |bad|. 454 455This is the first of many sections of \TeX\ where global variables are 456defined. 457 458@<Glob...@>= 459@!bad:integer; {is some ``constant'' wrong?} 460 461@ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|', 462or something similar. (We can't do that until |max_halfword| has been defined.) 463 464@<Check the ``constant'' values for consistency@>= 465bad:=0; 466if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; 467if max_print_line<60 then bad:=2; 468if dvi_buf_size mod 8<>0 then bad:=3; 469if mem_bot+1100>mem_top then bad:=4; 470if hash_prime>hash_size then bad:=5; 471if max_in_open>=128 then bad:=6; 472if mem_top<256+11 then bad:=7; {we will want |null_list>255|} 473 474@ Labels are given symbolic names by the following definitions, so that 475occasional |goto| statements will be meaningful. We insert the label 476`|exit|' just before the `\ignorespaces|end|\unskip' of a procedure in 477which we have used the `|return|' statement defined below; the label 478`|restart|' is occasionally used at the very beginning of a procedure; and 479the label `|reswitch|' is occasionally used just prior to a |case| 480statement in which some cases change the conditions and we wish to branch 481to the newly applicable case. Loops that are set up with the |loop| 482construction defined below are commonly exited by going to `|done|' or to 483`|found|' or to `|not_found|', and they are sometimes repeated by going to 484`|continue|'. If two or more parts of a subroutine start differently but 485end up the same, the shared code may be gathered together at 486`|common_ending|'. 487 488Incidentally, this program never declares a label that isn't actually used, 489because some fussy \PASCAL\ compilers will complain about redundant labels. 490 491@d exit=10 {go here to leave a procedure} 492@d restart=20 {go here to start a procedure again} 493@d reswitch=21 {go here to start a case statement again} 494@d continue=22 {go here to resume a loop} 495@d done=30 {go here to exit a loop} 496@d done1=31 {like |done|, when there is more than one loop} 497@d done2=32 {for exiting the second loop in a long block} 498@d done3=33 {for exiting the third loop in a very long block} 499@d done4=34 {for exiting the fourth loop in an extremely long block} 500@d done5=35 {for exiting the fifth loop in an immense block} 501@d done6=36 {for exiting the sixth loop in a block} 502@d found=40 {go here when you've found it} 503@d found1=41 {like |found|, when there's more than one per routine} 504@d found2=42 {like |found|, when there's more than two per routine} 505@d not_found=45 {go here when you've found nothing} 506@d common_ending=50 {go here when you want to merge with another branch} 507 508@ Here are some macros for common programming idioms. 509 510@d incr(#) == #:=#+1 {increase a variable by unity} 511@d decr(#) == #:=#-1 {decrease a variable by unity} 512@d negate(#) == #:=-# {change the sign of a variable} 513@d loop == @+ while true do@+ {repeat over and over until a |goto| happens} 514@f loop == xclause 515 {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} 516@d do_nothing == {empty statement} 517@d return == goto exit {terminate a procedure call} 518@f return == nil 519@d empty=0 {symbolic name for a null constant} 520 521@* \[2] The character set. 522In order to make \TeX\ readily portable to a wide variety of 523computers, all of its input text is converted to an internal eight-bit 524code that includes standard ASCII, the ``American Standard Code for 525Information Interchange.'' This conversion is done immediately when each 526character is read in. Conversely, characters are converted from ASCII to 527the user's external representation just before they are output to a 528text file. 529 530Such an internal code is relevant to users of \TeX\ primarily because it 531governs the positions of characters in the fonts. For example, the 532character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets 533this letter it specifies character number 65 in the current font. 534If that font actually has `\.A' in a different position, \TeX\ doesn't 535know what the real position is; the program that does the actual printing from 536\TeX's device-independent files is responsible for converting from ASCII to 537a particular font encoding. 538@^ASCII code@> 539 540\TeX's internal code also defines the value of constants 541that begin with a reverse apostrophe; and it provides an index to the 542\.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} 543tables. 544 545@ Characters of text that have been converted to \TeX's internal form 546are said to be of type |ASCII_code|, which is a subrange of the integers. 547 548@<Types...@>= 549@!ASCII_code=0..255; {eight-bit numbers} 550 551@ The original \PASCAL\ compiler was designed in the late 60s, when six-bit 552character sets were common, so it did not make provision for lowercase 553letters. Nowadays, of course, we need to deal with both capital and small 554letters in a convenient way, especially in a program for typesetting; 555so the present specification of \TeX\ has been written under the assumption 556that the \PASCAL\ compiler and run-time system permit the use of text files 557with more than 64 distinguishable characters. More precisely, we assume that 558the character set contains at least the letters and symbols associated 559with ASCII codes @'40 through @'176; all of these characters are now 560available on most computer terminals. 561 562Since we are dealing with more characters than were present in the first 563\PASCAL\ compilers, we have to decide what to call the associated data 564type. Some \PASCAL s use the original name |char| for the 565characters in text files, even though there now are more than 64 such 566characters, while other \PASCAL s consider |char| to be a 64-element 567subrange of a larger data type that has some other name. 568 569In order to accommodate this difference, we shall use the name |text_char| 570to stand for the data type of the characters that are converted to and 571from |ASCII_code| when they are input and output. We shall also assume 572that |text_char| consists of the elements |chr(first_text_char)| through 573|chr(last_text_char)|, inclusive. The following definitions should be 574adjusted if necessary. 575@^system dependencies@> 576 577@d text_char == char {the data type of characters in text files} 578@d first_text_char=0 {ordinal number of the smallest element of |text_char|} 579@d last_text_char=255 {ordinal number of the largest element of |text_char|} 580 581@<Local variables for init...@>= 582@!i:integer; 583 584@ The \TeX\ processor converts between ASCII code and 585the user's external character set by means of arrays |xord| and |xchr| 586that are analogous to \PASCAL's |ord| and |chr| functions. 587 588@<Glob...@>= 589@!xord: array [text_char] of ASCII_code; 590 {specifies conversion of input characters} 591@!xchr: array [ASCII_code] of text_char; 592 {specifies conversion of output characters} 593 594@ Since we are assuming that our \PASCAL\ system is able to read and 595write the visible characters of standard ASCII (although not 596necessarily using the ASCII codes to represent them), the following 597assignment statements initialize the standard part of the |xchr| array 598properly, without needing any system-dependent changes. On the other 599hand, it is possible to implement \TeX\ with less complete character 600sets, and in such cases it will be necessary to change something here. 601@^system dependencies@> 602 603@<Set init...@>= 604xchr[@'40]:=' '; 605xchr[@'41]:='!'; 606xchr[@'42]:='"'; 607xchr[@'43]:='#'; 608xchr[@'44]:='$'; 609xchr[@'45]:='%'; 610xchr[@'46]:='&'; 611xchr[@'47]:='''';@/ 612xchr[@'50]:='('; 613xchr[@'51]:=')'; 614xchr[@'52]:='*'; 615xchr[@'53]:='+'; 616xchr[@'54]:=','; 617xchr[@'55]:='-'; 618xchr[@'56]:='.'; 619xchr[@'57]:='/';@/ 620xchr[@'60]:='0'; 621xchr[@'61]:='1'; 622xchr[@'62]:='2'; 623xchr[@'63]:='3'; 624xchr[@'64]:='4'; 625xchr[@'65]:='5'; 626xchr[@'66]:='6'; 627xchr[@'67]:='7';@/ 628xchr[@'70]:='8'; 629xchr[@'71]:='9'; 630xchr[@'72]:=':'; 631xchr[@'73]:=';'; 632xchr[@'74]:='<'; 633xchr[@'75]:='='; 634xchr[@'76]:='>'; 635xchr[@'77]:='?';@/ 636xchr[@'100]:='@@'; 637xchr[@'101]:='A'; 638xchr[@'102]:='B'; 639xchr[@'103]:='C'; 640xchr[@'104]:='D'; 641xchr[@'105]:='E'; 642xchr[@'106]:='F'; 643xchr[@'107]:='G';@/ 644xchr[@'110]:='H'; 645xchr[@'111]:='I'; 646xchr[@'112]:='J'; 647xchr[@'113]:='K'; 648xchr[@'114]:='L'; 649xchr[@'115]:='M'; 650xchr[@'116]:='N'; 651xchr[@'117]:='O';@/ 652xchr[@'120]:='P'; 653xchr[@'121]:='Q'; 654xchr[@'122]:='R'; 655xchr[@'123]:='S'; 656xchr[@'124]:='T'; 657xchr[@'125]:='U'; 658xchr[@'126]:='V'; 659xchr[@'127]:='W';@/ 660xchr[@'130]:='X'; 661xchr[@'131]:='Y'; 662xchr[@'132]:='Z'; 663xchr[@'133]:='['; 664xchr[@'134]:='\'; 665xchr[@'135]:=']'; 666xchr[@'136]:='^'; 667xchr[@'137]:='_';@/ 668xchr[@'140]:='`'; 669xchr[@'141]:='a'; 670xchr[@'142]:='b'; 671xchr[@'143]:='c'; 672xchr[@'144]:='d'; 673xchr[@'145]:='e'; 674xchr[@'146]:='f'; 675xchr[@'147]:='g';@/ 676xchr[@'150]:='h'; 677xchr[@'151]:='i'; 678xchr[@'152]:='j'; 679xchr[@'153]:='k'; 680xchr[@'154]:='l'; 681xchr[@'155]:='m'; 682xchr[@'156]:='n'; 683xchr[@'157]:='o';@/ 684xchr[@'160]:='p'; 685xchr[@'161]:='q'; 686xchr[@'162]:='r'; 687xchr[@'163]:='s'; 688xchr[@'164]:='t'; 689xchr[@'165]:='u'; 690xchr[@'166]:='v'; 691xchr[@'167]:='w';@/ 692xchr[@'170]:='x'; 693xchr[@'171]:='y'; 694xchr[@'172]:='z'; 695xchr[@'173]:='{'; 696xchr[@'174]:='|'; 697xchr[@'175]:='}'; 698xchr[@'176]:='~';@/ 699 700@ Some of the ASCII codes without visible characters have been given symbolic 701names in this program because they are used with a special meaning. 702 703@d null_code=@'0 {ASCII code that might disappear} 704@d carriage_return=@'15 {ASCII code used at end of line} 705@d invalid_code=@'177 {ASCII code that many systems prohibit in text files} 706 707@ The ASCII code is ``standard'' only to a certain extent, since many 708computer installations have found it advantageous to have ready access 709to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} 710gives a complete specification of the intended correspondence between 711characters and \TeX's internal representation. 712@:TeXbook}{\sl The \TeX book@> 713 714If \TeX\ is being used 715on a garden-variety \PASCAL\ for which only standard ASCII 716codes will appear in the input and output files, it doesn't really matter 717what codes are specified in |xchr[0..@'37]|, but the safest policy is to 718blank everything out by using the code shown below. 719 720However, other settings of |xchr| will make \TeX\ more friendly on 721computers that have an extended character set, so that users can type things 722like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can 723assign codes arbitrarily, giving an |xchr| equivalent to whatever 724characters the users of \TeX\ are allowed to have in their input files. 725It is best to make the codes correspond to the intended interpretations as 726shown in Appendix~C whenever possible; but this is not necessary. For 727example, in countries with an alphabet of more than 26 letters, it is 728usually best to map the additional letters into codes less than~@'40. 729To get the most ``permissive'' character set, change |' '| on the 730right of these assignment statements to |chr(i)|. 731@^character set dependencies@> 732@^system dependencies@> 733 734@<Set init...@>= 735for i:=0 to @'37 do xchr[i]:=' '; 736for i:=@'177 to @'377 do xchr[i]:=' '; 737 738@ The following system-independent code makes the |xord| array contain a 739suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| 740where |i<j<@'177|, the value of |xord[xchr[i]]| will turn out to be 741|j| or more; hence, standard ASCII code numbers will be used instead of 742codes below @'40 in case there is a coincidence. 743 744@<Set init...@>= 745for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; 746for i:=@'200 to @'377 do xord[xchr[i]]:=i; 747for i:=0 to @'176 do xord[xchr[i]]:=i; 748 749@* \[3] Input and output. 750The bane of portability is the fact that different operating systems treat 751input and output quite differently, perhaps because computer scientists 752have not given sufficient attention to this problem. People have felt somehow 753that input and output are not part of ``real'' programming. Well, it is true 754that some kinds of programming are more fun than others. With existing 755input/output conventions being so diverse and so messy, the only sources of 756joy in such parts of the code are the rare occasions when one can find a 757way to make the program a little less bad than it might have been. We have 758two choices, either to attack I/O now and get it over with, or to postpone 759I/O until near the end. Neither prospect is very attractive, so let's 760get it over with. 761 762The basic operations we need to do are (1)~inputting and outputting of 763text, to or from a file or the user's terminal; (2)~inputting and 764outputting of eight-bit bytes, to or from a file; (3)~instructing the 765operating system to initiate (``open'') or to terminate (``close'') input or 766output from a specified file; (4)~testing whether the end of an input 767file has been reached. 768 769\TeX\ needs to deal with two kinds of files. 770We shall use the term |alpha_file| for a file that contains textual data, 771and the term |byte_file| for a file that contains eight-bit binary information. 772These two types turn out to be the same on many computers, but 773sometimes there is a significant distinction, so we shall be careful to 774distinguish between them. Standard protocols for transferring 775such files from computer to computer, via high-speed networks, are 776now becoming available to more and more communities of users. 777 778The program actually makes use also of a third kind of file, called a 779|word_file|, when dumping and reloading base information for its own 780initialization. We shall define a word file later; but it will be possible 781for us to specify simple operations on word files before they are defined. 782 783@<Types...@>= 784@!eight_bits=0..255; {unsigned one-byte quantity} 785@!alpha_file=packed file of text_char; {files that contain textual data} 786@!byte_file=packed file of eight_bits; {files that contain binary data} 787 788@ Most of what we need to do with respect to input and output can be handled 789by the I/O facilities that are standard in \PASCAL, i.e., the routines 790called |get|, |put|, |eof|, and so on. But 791standard \PASCAL\ does not allow file variables to be associated with file 792names that are determined at run time, so it cannot be used to implement 793\TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| 794is crucial for our purposes. We shall assume that |name_of_file| is a variable 795of an appropriate type such that the \PASCAL\ run-time system being used to 796implement \TeX\ can open a file whose external name is specified by 797|name_of_file|. 798@^system dependencies@> 799 800@<Glob...@>= 801@!name_of_file:packed array[1..file_name_size] of char;@;@/ 802 {on some systems this may be a \&{record} variable} 803@!name_length:0..file_name_size;@/{this many characters are actually 804 relevant in |name_of_file| (the rest are blank)} 805 806@ The \ph\ compiler with which the present version of \TeX\ was prepared has 807extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, 808we can write 809$$\vbox{\halign{#\hfil\qquad&#\hfil\cr 810|reset(f,@t\\{name}@>,'/O')|&for input;\cr 811|rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ 812The `\\{name}' parameter, which is of type `{\bf packed array 813$[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of 814the external file that is being opened for input or output. 815Blank spaces that might appear in \\{name} are ignored. 816 817The `\.{/O}' parameter tells the operating system not to issue its own 818error messages if something goes wrong. If a file of the specified name 819cannot be found, or if such a file cannot be opened for some other reason 820(e.g., someone may already be trying to write the same file), we will have 821|@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows 822\TeX\ to undertake appropriate corrective action. 823@:PASCAL H}{\ph@> 824@^system dependencies@> 825 826\TeX's file-opening procedures return |false| if no file identified by 827|name_of_file| could be opened. 828 829@d reset_OK(#)==erstat(#)=0 830@d rewrite_OK(#)==erstat(#)=0 831 832@p function a_open_in(var f:alpha_file):boolean; 833 {open a text file for input} 834begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); 835end; 836@# 837function a_open_out(var f:alpha_file):boolean; 838 {open a text file for output} 839begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); 840end; 841@# 842function b_open_in(var f:byte_file):boolean; 843 {open a binary file for input} 844begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f); 845end; 846@# 847function b_open_out(var f:byte_file):boolean; 848 {open a binary file for output} 849begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f); 850end; 851@# 852function w_open_in(var f:word_file):boolean; 853 {open a word file for input} 854begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f); 855end; 856@# 857function w_open_out(var f:word_file):boolean; 858 {open a word file for output} 859begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f); 860end; 861 862@ Files can be closed with the \ph\ routine `|close(f)|', which 863@:PASCAL H}{\ph@> 864@^system dependencies@> 865should be used when all input or output with respect to |f| has been completed. 866This makes |f| available to be opened again, if desired; and if |f| was used for 867output, the |close| operation makes the corresponding external file appear 868on the user's area, ready to be read. 869 870These procedures should not generate error messages if a file is 871being closed before it has been successfully opened. 872 873@p procedure a_close(var f:alpha_file); {close a text file} 874begin close(f); 875end; 876@# 877procedure b_close(var f:byte_file); {close a binary file} 878begin close(f); 879end; 880@# 881procedure w_close(var f:word_file); {close a word file} 882begin close(f); 883end; 884 885@ Binary input and output are done with \PASCAL's ordinary |get| and |put| 886procedures, so we don't have to make any other special arrangements for 887binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. 888The treatment of text input is more difficult, however, because 889of the necessary translation to |ASCII_code| values. 890\TeX's conventions should be efficient, and they should 891blend nicely with the user's operating environment. 892 893@ Input from text files is read one line at a time, using a routine called 894|input_ln|. This function is defined in terms of global variables called 895|buffer|, |first|, and |last| that will be described in detail later; for 896now, it suffices for us to know that |buffer| is an array of |ASCII_code| 897values, and that |first| and |last| are indices into this array 898representing the beginning and ending of a line of text. 899 900@<Glob...@>= 901@!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read} 902@!first:0..buf_size; {the first unused position in |buffer|} 903@!last:0..buf_size; {end of the line just input to |buffer|} 904@!max_buf_stack:0..buf_size; {largest index used in |buffer|} 905 906@ The |input_ln| function brings the next line of input from the specified 907file into available positions of the buffer array and returns the value 908|true|, unless the file has already been entirely read, in which case it 909returns |false| and sets |last:=first|. In general, the |ASCII_code| 910numbers that represent the next line of the file are input into 911|buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the 912global variable |last| is set equal to |first| plus the length of the 913line. Trailing blanks are removed from the line; thus, either |last=first| 914(in which case the line was entirely blank) or |buffer[last-1]<>" "|. 915 916An overflow error is given, however, if the normal actions of |input_ln| 917would make |last>=buf_size|; this is done so that other parts of \TeX\ 918can safely look at the contents of |buffer[last+1]| without overstepping 919the bounds of the |buffer| array. Upon entry to |input_ln|, the condition 920|first<buf_size| will always hold, so that there is always room for an 921``empty'' line. 922 923The variable |max_buf_stack|, which is used to keep track of how large 924the |buf_size| parameter must be to accommodate the present job, is 925also kept up to date by |input_ln|. 926 927If the |bypass_eoln| parameter is |true|, |input_ln| will do a |get| 928before looking at the first character of the line; this skips over 929an |eoln| that was in |f^|. The procedure does not do a |get| when it 930reaches the end of the line; therefore it can be used to acquire input 931from the user's terminal as well as from ordinary text files. 932 933Standard \PASCAL\ says that a file should have |eoln| immediately 934before |eof|, but \TeX\ needs only a weaker restriction: If |eof| 935occurs in the middle of a line, the system function |eoln| should return 936a |true| result (even though |f^| will be undefined). 937 938Since the inner loop of |input_ln| is part of \TeX's ``inner loop''---each 939character of input comes in at this place---it is wise to reduce system 940overhead by making use of special routines that read in an entire array 941of characters at once, if such routines are available. The following 942code uses standard \PASCAL\ to illustrate what needs to be done, but 943finer tuning is often possible at well-developed \PASCAL\ sites. 944@^inner loop@> 945 946@p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean; 947 {inputs the next line or returns |false|} 948var last_nonblank:0..buf_size; {|last| with trailing blanks removed} 949begin if bypass_eoln then if not eof(f) then get(f); 950 {input the first character of the line into |f^|} 951last:=first; {cf.\ Matthew 19\thinspace:\thinspace30} 952if eof(f) then input_ln:=false 953else begin last_nonblank:=first; 954 while not eoln(f) do 955 begin if last>=max_buf_stack then 956 begin max_buf_stack:=last+1; 957 if max_buf_stack=buf_size then 958 @<Report overflow of the input buffer, and abort@>; 959 end; 960 buffer[last]:=xord[f^]; get(f); incr(last); 961 if buffer[last-1]<>" " then last_nonblank:=last; 962 end; 963 last:=last_nonblank; input_ln:=true; 964 end; 965end; 966 967@ The user's terminal acts essentially like other files of text, except 968that it is used both for input and for output. When the terminal is 969considered an input file, the file variable is called |term_in|, and when it 970is considered an output file the file variable is |term_out|. 971@^system dependencies@> 972 973@<Glob...@>= 974@!term_in:alpha_file; {the terminal as an input file} 975@!term_out:alpha_file; {the terminal as an output file} 976 977@ Here is how to open the terminal files 978in \ph. The `\.{/I}' switch suppresses the first |get|. 979@:PASCAL H}{\ph@> 980@^system dependencies@> 981 982@d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} 983@d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} 984 985@ Sometimes it is necessary to synchronize the input/output mixture that 986happens on the user's terminal, and three system-dependent 987procedures are used for this 988purpose. The first of these, |update_terminal|, is called when we want 989to make sure that everything we have output to the terminal so far has 990actually left the computer's internal buffers and been sent. 991The second, |clear_terminal|, is called when we wish to cancel any 992input that the user may have typed ahead (since we are about to 993issue an unexpected error message). The third, |wake_up_terminal|, 994is supposed to revive the terminal if the user has disabled it by 995some instruction to the operating system. The following macros show how 996these operations can be specified in \ph: 997@:PASCAL H}{\ph@> 998@^system dependencies@> 999 1000@d update_terminal == break(term_out) {empty the terminal output buffer} 1001@d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} 1002@d wake_up_terminal == do_nothing {cancel the user's cancellation of output} 1003 1004@ We need a special routine to read the first line of \TeX\ input from 1005the user's terminal. This line is different because it is read before we 1006have opened the transcript file; there is sort of a ``chicken and 1007egg'' problem here. If the user types `\.{\\input paper}' on the first 1008line, or if some macro invoked by that line does such an \.{\\input}, 1009the transcript file will be named `\.{paper.log}'; but if no \.{\\input} 1010commands are performed during the first line of terminal input, the transcript 1011file will acquire its default name `\.{texput.log}'. (The transcript file 1012will not contain error messages generated by the first line before the 1013first \.{\\input} command.) 1014@.texput@> 1015 1016The first line is even more special if we are lucky enough to have an operating 1017system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object 1018program. It's nice to let the user start running a \TeX\ job by typing 1019a command line like `\.{tex paper}'; in such a case, \TeX\ will operate 1020as if the first line of input were `\.{paper}', i.e., the first line will 1021consist of the remainder of the command line, after the part that invoked 1022\TeX. 1023 1024The first line is special also because it may be read before \TeX\ has 1025input a format file. In such cases, normal error messages cannot yet 1026be given. The following code uses concepts that will be explained later. 1027(If the \PASCAL\ compiler does not support non-local |@!goto|\unskip, the 1028@^system dependencies@> 1029statement `|goto final_end|' should be replaced by something that 1030quietly terminates the program.) 1031 1032@<Report overflow of the input buffer, and abort@>= 1033if format_ident=0 then 1034 begin write_ln(term_out,'Buffer size exceeded!'); goto final_end; 1035@.Buffer size exceeded@> 1036 end 1037else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1; 1038 overflow("buffer size",buf_size); 1039@:TeX capacity exceeded buffer size}{\quad buffer size@> 1040 end 1041 1042@ Different systems have different ways to get started. But regardless of 1043what conventions are adopted, the routine that initializes the terminal 1044should satisfy the following specifications: 1045 1046\yskip\textindent{1)}It should open file |term_in| for input from the 1047 terminal. (The file |term_out| will already be open for output to the 1048 terminal.) 1049 1050\textindent{2)}If the user has given a command line, this line should be 1051 considered the first line of terminal input. Otherwise the 1052 user should be prompted with `\.{**}', and the first line of input 1053 should be whatever is typed in response. 1054 1055\textindent{3)}The first line of input, which might or might not be a 1056 command line, should appear in locations |first| to |last-1| of the 1057 |buffer| array. 1058 1059\textindent{4)}The global variable |loc| should be set so that the 1060 character to be read next by \TeX\ is in |buffer[loc]|. This 1061 character should not be blank, and we should have |loc<last|. 1062 1063\yskip\noindent(It may be necessary to prompt the user several times 1064before a non-blank line comes in. The prompt is `\.{**}' instead of the 1065later `\.*' because the meaning is slightly different: `\.{\\input}' need 1066not be typed immediately after~`\.{**}'.) 1067 1068@d loc==cur_input.loc_field {location of first unread character in |buffer|} 1069 1070@ The following program does the required initialization 1071without retrieving a possible command line. 1072It should be clear how to modify this routine to deal with command lines, 1073if the system permits them. 1074@^system dependencies@> 1075 1076@p function init_terminal:boolean; {gets the terminal input started} 1077label exit; 1078begin t_open_in; 1079loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; 1080@.**@> 1081 if not input_ln(term_in,true) then {this shouldn't happen} 1082 begin write_ln(term_out); 1083 write(term_out,'! End of file on the terminal... why?'); 1084@.End of file on the terminal@> 1085 init_terminal:=false; return; 1086 end; 1087 loc:=first; 1088 while (loc<last)and(buffer[loc]=" ") do incr(loc); 1089 if loc<last then 1090 begin init_terminal:=true; 1091 return; {return unless the line was all blank} 1092 end; 1093 write_ln(term_out,'Please type the name of your input file.'); 1094 end; 1095exit:end; 1096 1097@* \[4] String handling. 1098Control sequence names and diagnostic messages are variable-length strings 1099of eight-bit characters. Since \PASCAL\ does not have a well-developed string 1100mechanism, \TeX\ does all of its string processing by homegrown methods. 1101 1102Elaborate facilities for dynamic strings are not needed, so all of the 1103necessary operations can be handled with a simple data structure. 1104The array |str_pool| contains all of the (eight-bit) ASCII codes in all 1105of the strings, and the array |str_start| contains indices of the starting 1106points of each string. Strings are referred to by integer numbers, so that 1107string number |s| comprises the characters |str_pool[j]| for 1108|str_start[s]<=j<str_start[s+1]|. Additional integer variables 1109|pool_ptr| and |str_ptr| indicate the number of entries used so far 1110in |str_pool| and |str_start|, respectively; locations 1111|str_pool[pool_ptr]| and |str_start[str_ptr]| are 1112ready for the next string to be allocated. 1113 1114String numbers 0 to 255 are reserved for strings that correspond to single 1115ASCII characters. This is in accordance with the conventions of \.{WEB}, 1116@.WEB@> 1117which converts single-character strings into the ASCII code number of the 1118single character involved, while it converts other strings into integers 1119and builds a string pool file. Thus, when the string constant \.{"."} appears 1120in the program below, \.{WEB} converts it into the integer 46, which is the 1121ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} 1122into some integer greater than~255. String number 46 will presumably be the 1123single character `\..'; but some ASCII codes have no standard visible 1124representation, and \TeX\ sometimes needs to be able to print an arbitrary 1125ASCII character, so the first 256 strings are used to specify exactly what 1126should be printed for each of the 256 possibilities. 1127 1128Elements of the |str_pool| array must be ASCII codes that can actually 1129be printed; i.e., they must have an |xchr| equivalent in the local 1130character set. (This restriction applies only to preloaded strings, 1131not to those generated dynamically by the user.) 1132 1133Some \PASCAL\ compilers won't pack integers into a single byte unless the 1134integers lie in the range |-128..127|. To accommodate such systems 1135we access the string pool only via macros that can easily be redefined. 1136@^system dependencies@> 1137 1138@d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|} 1139@d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|} 1140 1141@<Types...@>= 1142@!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} 1143@!str_number = 0..max_strings; {for variables that point into |str_start|} 1144@!packed_ASCII_code = 0..255; {elements of |str_pool| array} 1145 1146@ @<Glob...@>= 1147@!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters} 1148@!str_start : array[str_number] of pool_pointer; {the starting pointers} 1149@!pool_ptr : pool_pointer; {first unused position in |str_pool|} 1150@!str_ptr : str_number; {number of the current string being created} 1151@!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} 1152@!init_str_ptr : str_number; {the starting value of |str_ptr|} 1153 1154@ Several of the elementary string operations are performed using \.{WEB} 1155macros instead of \PASCAL\ procedures, because many of the 1156operations are done quite frequently and we want to avoid the 1157overhead of procedure calls. For example, here is 1158a simple macro that computes the length of a string. 1159@.WEB@> 1160 1161@d length(#)==(str_start[#+1]-str_start[#]) {the number of characters 1162 in string number \#} 1163 1164@ The length of the current string is called |cur_length|: 1165 1166@d cur_length == (pool_ptr - str_start[str_ptr]) 1167 1168@ Strings are created by appending character codes to |str_pool|. 1169The |append_char| macro, defined here, does not check to see if the 1170value of |pool_ptr| has gotten too high; this test is supposed to be 1171made before |append_char| is used. There is also a |flush_char| 1172macro, which erases the last character appended. 1173 1174To test if there is room to append |l| more characters to |str_pool|, 1175we shall write |str_room(l)|, which aborts \TeX\ and gives an 1176apologetic error message if there isn't enough room. 1177 1178@d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} 1179begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); 1180end 1181@d flush_char == decr(pool_ptr) {forget the last character in the pool} 1182@d str_room(#) == {make sure that the pool hasn't overflowed} 1183 begin if pool_ptr+# > pool_size then 1184 overflow("pool size",pool_size-init_pool_ptr); 1185@:TeX capacity exceeded pool size}{\quad pool size@> 1186 end 1187 1188@ Once a sequence of characters has been appended to |str_pool|, it 1189officially becomes a string when the function |make_string| is called. 1190This function returns the identification number of the new string as its 1191value. 1192 1193@p function make_string : str_number; {current string enters the pool} 1194begin if str_ptr=max_strings then 1195 overflow("number of strings",max_strings-init_str_ptr); 1196@:TeX capacity exceeded number of strings}{\quad number of strings@> 1197incr(str_ptr); str_start[str_ptr]:=pool_ptr; 1198make_string:=str_ptr-1; 1199end; 1200 1201@ To destroy the most recently made string, we say |flush_string|. 1202 1203@d flush_string==begin decr(str_ptr); pool_ptr:=str_start[str_ptr]; 1204 end 1205 1206@ The following subroutine compares string |s| with another string of the 1207same length that appears in |buffer| starting at position |k|; 1208the result is |true| if and only if the strings are equal. 1209Empirical tests indicate that |str_eq_buf| is used in such a way that 1210it tends to return |true| about 80 percent of the time. 1211 1212@p function str_eq_buf(@!s:str_number;@!k:integer):boolean; 1213 {test equality of strings} 1214label not_found; {loop exit} 1215var j: pool_pointer; {running index} 1216@!result: boolean; {result of comparison} 1217begin j:=str_start[s]; 1218while j<str_start[s+1] do 1219 begin if so(str_pool[j])<>buffer[k] then 1220 begin result:=false; goto not_found; 1221 end; 1222 incr(j); incr(k); 1223 end; 1224result:=true; 1225not_found: str_eq_buf:=result; 1226end; 1227 1228@ Here is a similar routine, but it compares two strings in the string pool, 1229and it does not assume that they have the same length. 1230 1231@p function str_eq_str(@!s,@!t:str_number):boolean; 1232 {test equality of strings} 1233label not_found; {loop exit} 1234var j,@!k: pool_pointer; {running indices} 1235@!result: boolean; {result of comparison} 1236begin result:=false; 1237if length(s)<>length(t) then goto not_found; 1238j:=str_start[s]; k:=str_start[t]; 1239while j<str_start[s+1] do 1240 begin if str_pool[j]<>str_pool[k] then goto not_found; 1241 incr(j); incr(k); 1242 end; 1243result:=true; 1244not_found: str_eq_str:=result; 1245end; 1246 1247@ The initial values of |str_pool|, |str_start|, |pool_ptr|, 1248and |str_ptr| are computed by the \.{INITEX} program, based in part 1249on the information that \.{WEB} has output while processing \TeX. 1250@.INITEX@> 1251@^string pool@> 1252 1253@p @!init function get_strings_started:boolean; {initializes the string pool, 1254 but returns |false| if something goes wrong} 1255label done,exit; 1256var k,@!l:0..255; {small indices or counters} 1257@!m,@!n:text_char; {characters input from |pool_file|} 1258@!g:str_number; {garbage} 1259@!a:integer; {accumulator for check sum} 1260@!c:boolean; {check sum has been checked} 1261begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0; 1262@<Make the first 256 strings@>; 1263@<Read the other strings from the \.{TEX.POOL} file and return |true|, 1264 or give an error message and return |false|@>; 1265exit:end; 1266tini 1267 1268@ @d app_lc_hex(#)==l:=#; 1269 if l<10 then append_char(l+"0")@+else append_char(l-10+"a") 1270 1271@<Make the first 256...@>= 1272for k:=0 to 255 do 1273 begin if (@<Character |k| cannot be printed@>) then 1274 begin append_char("^"); append_char("^"); 1275 if k<@'100 then append_char(k+@'100) 1276 else if k<@'200 then append_char(k-@'100) 1277 else begin app_lc_hex(k div 16); app_lc_hex(k mod 16); 1278 end; 1279 end 1280 else append_char(k); 1281 g:=make_string; 1282 end 1283 1284@ The first 128 strings will contain 95 standard ASCII characters, and the 1285other 33 characters will be printed in three-symbol form like `\.{\^\^A}' 1286unless a system-dependent change is made here. Installations that have 1287an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, 1288would like string @'32 to be the single character @'32 instead of the 1289three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, 1290even people with an extended character set will want to represent string 1291@'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to 1292produce visible strings instead of tabs or line-feeds or carriage-returns 1293or bell-rings or characters that are treated anomalously in text files. 1294 1295Unprintable characters of codes 128--255 are, similarly, rendered 1296\.{\^\^80}--\.{\^\^ff}. 1297 1298The boolean expression defined here should be |true| unless \TeX\ 1299internal code number~|k| corresponds to a non-troublesome visible 1300symbol in the local character set. An appropriate formula for the 1301extended character set recommended in {\sl The \TeX book\/} would, for 1302example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'. 1303If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or 1304|k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46, 1305@'60..@'71, @'136, @'141..@'146, @'160..@'171]| must be printable. 1306Thus, at least 81 printable characters are needed. 1307@:TeXbook}{\sl The \TeX book@> 1308@^character set dependencies@> 1309@^system dependencies@> 1310 1311@<Character |k| cannot be printed@>= 1312 (k<" ")or(k>"~") 1313 1314@ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB} 1315description that you are now reading, it outputs the \PASCAL\ program 1316\.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX} 1317@.WEB@>@.INITEX@> 1318program reads the latter file, where each string appears as a two-digit decimal 1319length followed by the string itself, and the information is recorded in 1320\TeX's string memory. 1321 1322@<Glob...@>= 1323@!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}} 1324tini 1325 1326@ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#); 1327 a_close(pool_file); get_strings_started:=false; return; 1328 end 1329@<Read the other strings...@>= 1330name_of_file:=pool_name; {we needn't set |name_length|} 1331if a_open_in(pool_file) then 1332 begin c:=false; 1333 repeat @<Read one string, but return |false| if the 1334 string memory space is getting too tight for comfort@>; 1335 until c; 1336 a_close(pool_file); get_strings_started:=true; 1337 end 1338else bad_pool('! I can''t read TEX.POOL.') 1339@.I can't read TEX.POOL@> 1340 1341@ @<Read one string...@>= 1342begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.'); 1343@.TEX.POOL has no check sum@> 1344read(pool_file,m,n); {read two digits of string length} 1345if m='*' then @<Check the pool check sum@> 1346else begin if (xord[m]<"0")or(xord[m]>"9")or@| 1347 (xord[n]<"0")or(xord[n]>"9") then 1348 bad_pool('! TEX.POOL line doesn''t begin with two digits.'); 1349@.TEX.POOL line doesn't...@> 1350 l:=xord[m]*10+xord[n]-"0"*11; {compute the length} 1351 if pool_ptr+l+string_vacancies>pool_size then 1352 bad_pool('! You have to increase POOLSIZE.'); 1353@.You have to increase POOLSIZE@> 1354 for k:=1 to l do 1355 begin if eoln(pool_file) then m:=' '@+else read(pool_file,m); 1356 append_char(xord[m]); 1357 end; 1358 read_ln(pool_file); g:=make_string; 1359 end; 1360end 1361 1362@ The \.{WEB} operation \.{@@\$} denotes the value that should be at the 1363end of this \.{TEX.POOL} file; any other value means that the wrong pool 1364file has been loaded. 1365@^check sum@> 1366 1367@<Check the pool check sum@>= 1368begin a:=0; k:=1; 1369loop@+ begin if (xord[n]<"0")or(xord[n]>"9") then 1370 bad_pool('! TEX.POOL check sum doesn''t have nine digits.'); 1371@.TEX.POOL check sum...@> 1372 a:=10*a+xord[n]-"0"; 1373 if k=9 then goto done; 1374 incr(k); read(pool_file,n); 1375 end; 1376done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.'); 1377@.TEX.POOL doesn't match@> 1378c:=true; 1379end 1380 1381@* \[5] On-line and off-line printing. 1382Messages that are sent to a user's terminal and to the transcript-log file 1383are produced by several `|print|' procedures. These procedures will 1384direct their output to a variety of places, based on the setting of 1385the global variable |selector|, which has the following possible 1386values: 1387 1388\yskip 1389\hang |term_and_log|, the normal setting, prints on the terminal and on the 1390 transcript file. 1391 1392\hang |log_only|, prints only on the transcript file. 1393 1394\hang |term_only|, prints only on the terminal. 1395 1396\hang |no_print|, doesn't print at all. This is used only in rare cases 1397 before the transcript file is open. 1398 1399\hang |pseudo|, puts output into a cyclic buffer that is used 1400 by the |show_context| routine; when we get to that routine we shall discuss 1401 the reasoning behind this curious mode. 1402 1403\hang |new_string|, appends the output to the current string in the 1404 string pool. 1405 1406\hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. 1407 1408\yskip 1409\noindent The symbolic names `|term_and_log|', etc., have been assigned 1410numeric codes that satisfy the convenient relations |no_print+1=term_only|, 1411|no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. 1412 1413Three additional global variables, |tally| and |term_offset| and 1414|file_offset|, record the number of characters that have been printed 1415since they were most recently cleared to zero. We use |tally| to record 1416the length of (possibly very long) stretches of printing; |term_offset| 1417and |file_offset|, on the other hand, keep track of how many characters 1418have appeared so far on the current line that has been output to the 1419terminal or to the transcript file, respectively. 1420 1421@d no_print=16 {|selector| setting that makes data disappear} 1422@d term_only=17 {printing is destined for the terminal only} 1423@d log_only=18 {printing is destined for the transcript file only} 1424@d term_and_log=19 {normal |selector| setting} 1425@d pseudo=20 {special |selector| setting for |show_context|} 1426@d new_string=21 {printing is deflected to the string pool} 1427@d max_selector=21 {highest selector setting} 1428 1429@<Glob...@>= 1430@!log_file : alpha_file; {transcript of \TeX\ session} 1431@!selector : 0..max_selector; {where to print a message} 1432@!dig : array[0..22] of 0..15; {digits in a number being output} 1433@!tally : integer; {the number of characters recently printed} 1434@!term_offset : 0..max_print_line; 1435 {the number of characters on the current terminal line} 1436@!file_offset : 0..max_print_line; 1437 {the number of characters on the current file line} 1438@!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for 1439 pseudoprinting} 1440@!trick_count: integer; {threshold for pseudoprinting, explained later} 1441@!first_count: integer; {another variable for pseudoprinting} 1442 1443@ @<Initialize the output routines@>= 1444selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; 1445 1446@ Macro abbreviations for output to the terminal and to the log file are 1447defined here for convenience. Some systems need special conventions 1448for terminal output, and it is possible to adhere to those conventions 1449by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. 1450@^system dependencies@> 1451 1452@d wterm(#)==write(term_out,#) 1453@d wterm_ln(#)==write_ln(term_out,#) 1454@d wterm_cr==write_ln(term_out) 1455@d wlog(#)==write(log_file,#) 1456@d wlog_ln(#)==write_ln(log_file,#) 1457@d wlog_cr==write_ln(log_file) 1458 1459@ To end a line of text output, we call |print_ln|. 1460 1461@<Basic print...@>= 1462procedure print_ln; {prints an end-of-line} 1463begin case selector of 1464term_and_log: begin wterm_cr; wlog_cr; 1465 term_offset:=0; file_offset:=0; 1466 end; 1467log_only: begin wlog_cr; file_offset:=0; 1468 end; 1469term_only: begin wterm_cr; term_offset:=0; 1470 end; 1471no_print,pseudo,new_string: do_nothing; 1472othercases write_ln(write_file[selector]) 1473endcases;@/ 1474end; {|tally| is not affected} 1475 1476@ The |print_char| procedure sends one character to the desired destination, 1477using the |xchr| array to map it into an external character compatible with 1478|input_ln|. All printing comes through |print_ln| or |print_char|. 1479 1480@<Basic printing...@>= 1481procedure print_char(@!s:ASCII_code); {prints a single character} 1482label exit; 1483begin if @<Character |s| is the current new-line character@> then 1484 if selector<pseudo then 1485 begin print_ln; return; 1486 end; 1487case selector of 1488term_and_log: begin wterm(xchr[s]); wlog(xchr[s]); 1489 incr(term_offset); incr(file_offset); 1490 if term_offset=max_print_line then 1491 begin wterm_cr; term_offset:=0; 1492 end; 1493 if file_offset=max_print_line then 1494 begin wlog_cr; file_offset:=0; 1495 end; 1496 end; 1497log_only: begin wlog(xchr[s]); incr(file_offset); 1498 if file_offset=max_print_line then print_ln; 1499 end; 1500term_only: begin wterm(xchr[s]); incr(term_offset); 1501 if term_offset=max_print_line then print_ln; 1502 end; 1503no_print: do_nothing; 1504pseudo: if tally<trick_count then trick_buf[tally mod error_line]:=s; 1505new_string: begin if pool_ptr<pool_size then append_char(s); 1506 end; {we drop characters if the string space is full} 1507othercases write(write_file[selector],xchr[s]) 1508endcases;@/ 1509incr(tally); 1510exit:end; 1511 1512@ An entire string is output by calling |print|. Note that if we are outputting 1513the single standard ASCII character \.c, we could call |print("c")|, since 1514|"c"=99| is the number of a single-character string, as explained above. But 1515|print_char("c")| is quicker, so \TeX\ goes directly to the |print_char| 1516routine when it knows that this is safe. (The present implementation 1517assumes that it is always safe to print a visible ASCII character.) 1518@^system dependencies@> 1519 1520@<Basic print...@>= 1521procedure print(@!s:integer); {prints string |s|} 1522label exit; 1523var j:pool_pointer; {current character code position} 1524@!nl:integer; {new-line character to restore} 1525begin if s>=str_ptr then s:="???" {this can't happen} 1526@.???@> 1527else if s<256 then 1528 if s<0 then s:="???" {can't happen} 1529 else begin if selector>pseudo then 1530 begin print_char(s); return; {internal strings are not expanded} 1531 end; 1532 if (@<Character |s| is the current new-line character@>) then 1533 if selector<pseudo then 1534 begin print_ln; return; 1535 end; 1536 nl:=new_line_char; new_line_char:=-1; 1537 {temporarily disable new-line character} 1538 j:=str_start[s]; 1539 while j<str_start[s+1] do 1540 begin print_char(so(str_pool[j])); incr(j); 1541 end; 1542 new_line_char:=nl; return; 1543 end; 1544j:=str_start[s]; 1545while j<str_start[s+1] do 1546 begin print_char(so(str_pool[j])); incr(j); 1547 end; 1548exit:end; 1549 1550@ Control sequence names, file names, and strings constructed with 1551\.{\\string} might contain |ASCII_code| values that can't 1552be printed using |print_char|. Therefore we use |slow_print| for them: 1553 1554@<Basic print...@>= 1555procedure slow_print(@!s:integer); {prints string |s|} 1556var j:pool_pointer; {current character code position} 1557begin if (s>=str_ptr) or (s<256) then print(s) 1558else begin j:=str_start[s]; 1559 while j<str_start[s+1] do 1560 begin print(so(str_pool[j])); incr(j); 1561 end; 1562 end; 1563end; 1564 1565@ Here is the very first thing that \TeX\ prints: a headline that identifies 1566the version number and format package. The |term_offset| variable is temporarily 1567incorrect, but the discrepancy is not serious since we assume that the banner 1568and format identifier together will occupy at most |max_print_line| 1569character positions. 1570 1571@<Initialize the output...@>= 1572wterm(banner); 1573if format_ident=0 then wterm_ln(' (no format preloaded)') 1574else begin slow_print(format_ident); print_ln; 1575 end; 1576update_terminal; 1577 1578@ The procedure |print_nl| is like |print|, but it makes sure that the 1579string appears at the beginning of a new line. 1580 1581@<Basic print...@>= 1582procedure print_nl(@!s:str_number); {prints string |s| at beginning of line} 1583begin if ((term_offset>0)and(odd(selector)))or@| 1584 ((file_offset>0)and(selector>=log_only)) then print_ln; 1585print(s); 1586end; 1587 1588@ The procedure |print_esc| prints a string that is preceded by 1589the user's escape character (which is usually a backslash). 1590 1591@<Basic print...@>= 1592procedure print_esc(@!s:str_number); {prints escape character, then |s|} 1593var c:integer; {the escape character code} 1594begin @<Set variable |c| to the current escape character@>; 1595if c>=0 then if c<256 then print(c); 1596slow_print(s); 1597end; 1598 1599@ An array of digits in the range |0..15| is printed by |print_the_digs|. 1600 1601@<Basic print...@>= 1602procedure print_the_digs(@!k:eight_bits); 1603 {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} 1604begin while k>0 do 1605 begin decr(k); 1606 if dig[k]<10 then print_char("0"+dig[k]) 1607 else print_char("A"-10+dig[k]); 1608 end; 1609end; 1610 1611@ The following procedure, which prints out the decimal representation of a 1612given integer |n|, has been written carefully so that it works properly 1613if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| 1614to negative arguments, since such operations are not implemented consistently 1615by all \PASCAL\ compilers. 1616 1617@<Basic print...@>= 1618procedure print_int(@!n:integer); {prints an integer in decimal form} 1619var k:0..23; {index to current digit; we assume that $|n|<10^{23}$} 1620@!m:integer; {used to negate |n| in possibly dangerous cases} 1621begin k:=0; 1622if n<0 then 1623 begin print_char("-"); 1624 if n>-100000000 then negate(n) 1625 else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; 1626 if m<10 then dig[0]:=m 1627 else begin dig[0]:=0; incr(n); 1628 end; 1629 end; 1630 end; 1631repeat dig[k]:=n mod 10; n:=n div 10; incr(k); 1632until n=0; 1633print_the_digs(k); 1634end; 1635 1636@ Here is a trivial procedure to print two digits; it is usually called with 1637a parameter in the range |0<=n<=99|. 1638 1639@p procedure print_two(@!n:integer); {prints two least significant digits} 1640begin n:=abs(n) mod 100; print_char("0"+(n div 10)); 1641print_char("0"+(n mod 10)); 1642end; 1643 1644@ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. 1645 1646@p procedure print_hex(@!n:integer); 1647 {prints a positive integer in hexadecimal form} 1648var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$} 1649begin k:=0; print_char(""""); 1650repeat dig[k]:=n mod 16; n:=n div 16; incr(k); 1651until n=0; 1652print_the_digs(k); 1653end; 1654 1655@ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function 1656is now subsumed by |print|. We retain the old name here as a possible aid to 1657future software arch\ae ologists. 1658 1659@d print_ASCII == print 1660 1661@ Roman numerals are produced by the |print_roman_int| routine. Readers 1662who like puzzles might enjoy trying to figure out how this tricky code 1663works; therefore no explanation will be given. Notice that 1990 yields 1664\.{mcmxc}, not \.{mxm}. 1665 1666@p procedure print_roman_int(@!n:integer); 1667label exit; 1668var j,@!k: pool_pointer; {mysterious indices into |str_pool|} 1669@!u,@!v: nonnegative_integer; {mysterious numbers} 1670begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000; 1671loop@+ begin while n>=v do 1672 begin print_char(so(str_pool[j])); n:=n-v; 1673 end; 1674 if n<=0 then return; {nonpositive input produces no output} 1675 k:=j+2; u:=v div (so(str_pool[k-1])-"0"); 1676 if str_pool[k-1]=si("2") then 1677 begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); 1678 end; 1679 if n+u>=v then 1680 begin print_char(so(str_pool[k])); n:=n+u; 1681 end 1682 else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); 1683 end; 1684 end; 1685exit:end; 1686 1687@ The |print| subroutine will not print a string that is still being 1688created. The following procedure will. 1689 1690@p procedure print_current_string; {prints a yet-unmade string} 1691var j:pool_pointer; {points to current character code} 1692begin j:=str_start[str_ptr]; 1693while j<pool_ptr do 1694 begin print_char(so(str_pool[j])); incr(j); 1695 end; 1696end; 1697 1698@ Here is a procedure that asks the user to type a line of input, 1699assuming that the |selector| setting is either |term_only| or |term_and_log|. 1700The input is placed into locations |first| through |last-1| of the 1701|buffer| array, and echoed on the transcript file if appropriate. 1702 1703This procedure is never called when |interaction<scroll_mode|. 1704 1705@d prompt_input(#)==begin wake_up_terminal; print(#); term_input; 1706 end {prints a string and gets a line of input} 1707 1708@p procedure term_input; {gets a line from the terminal} 1709var k:0..buf_size; {index into |buffer|} 1710begin update_terminal; {now the user sees the prompt for sure} 1711if not input_ln(term_in,true) then fatal_error("End of file on the terminal!"); 1712@.End of file on the terminal@> 1713term_offset:=0; {the user's line ended with \<\rm return>} 1714decr(selector); {prepare to echo the input} 1715if last<>first then for k:=first to last-1 do print(buffer[k]); 1716print_ln; incr(selector); {restore previous status} 1717end; 1718 1719@* \[6] Reporting errors. 1720When something anomalous is detected, \TeX\ typically does something like this: 1721$$\vbox{\halign{#\hfil\cr 1722|print_err("Something anomalous has been detected");|\cr 1723|help3("This is the first line of my offer to help.")|\cr 1724|("This is the second line. I'm trying to")|\cr 1725|("explain the best way for you to proceed.");|\cr 1726|error;|\cr}}$$ 1727A two-line help message would be given using |help2|, etc.; these informal 1728helps should use simple vocabulary that complements the words used in the 1729official error message that was printed. (Outside the U.S.A., the help 1730messages should preferably be translated into the local vernacular. Each 1731line of help is at most 60 characters long, in the present implementation, 1732so that |max_print_line| will not be exceeded.) 1733 1734The |print_err| procedure supplies a `\.!' before the official message, 1735and makes sure that the terminal is awake if a stop is going to occur. 1736The |error| procedure supplies a `\..' after the official message, then it 1737shows the location of the error; and if |interaction=error_stop_mode|, 1738it also enters into a dialog with the user, during which time the help 1739message may be printed. 1740@^system dependencies@> 1741 1742@ The global variable |interaction| has four settings, representing increasing 1743amounts of user interaction: 1744 1745@d batch_mode=0 {omits all stops and omits terminal output} 1746@d nonstop_mode=1 {omits all stops} 1747@d scroll_mode=2 {omits error stops} 1748@d error_stop_mode=3 {stops at every opportunity to interact} 1749@d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; 1750 print_nl("! "); print(#); 1751 end 1752 1753@<Glob...@>= 1754@!interaction:batch_mode..error_stop_mode; {current level of interaction} 1755 1756@ @<Set init...@>=interaction:=error_stop_mode; 1757 1758@ \TeX\ is careful not to call |error| when the print |selector| setting 1759might be unusual. The only possible values of |selector| at the time of 1760error messages are 1761 1762\yskip\hang|no_print| (when |interaction=batch_mode| 1763 and |log_file| not yet open); 1764 1765\hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); 1766 1767\hang|log_only| (when |interaction=batch_mode| and |log_file| is open); 1768 1769\hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). 1770 1771@<Initialize the print |selector| based on |interaction|@>= 1772if interaction=batch_mode then selector:=no_print@+else selector:=term_only 1773 1774@ A global variable |deletions_allowed| is set |false| if the |get_next| 1775routine is active when |error| is called; this ensures that |get_next| 1776and related routines like |get_token| will never be called recursively. 1777A similar interlock is provided by |set_box_allowed|. 1778@^recursion@> 1779 1780The global variable |history| records the worst level of error that 1781has been detected. It has four possible values: |spotless|, |warning_issued|, 1782|error_message_issued|, and |fatal_error_stop|. 1783 1784Another global variable, |error_count|, is increased by one when an 1785|error| occurs without an interactive dialog, and it is reset to zero at 1786the end of every paragraph. If |error_count| reaches 100, \TeX\ decides 1787that there is no point in continuing further. 1788 1789@d spotless=0 {|history| value when nothing has been amiss yet} 1790@d warning_issued=1 {|history| value when |begin_diagnostic| has been called} 1791@d error_message_issued=2 {|history| value when |error| has been called} 1792@d fatal_error_stop=3 {|history| value when termination was premature} 1793 1794@<Glob...@>= 1795@!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?} 1796@!set_box_allowed:boolean; {is it safe to do a \.{\\setbox} assignment?} 1797@!history:spotless..fatal_error_stop; {has the source input been clean so far?} 1798@!error_count:-1..100; {the number of scrolled errors since the 1799 last paragraph ended} 1800 1801@ The value of |history| is initially |fatal_error_stop|, but it will 1802be changed to |spotless| if \TeX\ survives the initialization process. 1803 1804@<Set init...@>= 1805deletions_allowed:=true; set_box_allowed:=true; 1806error_count:=0; {|history| is initialized elsewhere} 1807 1808@ Since errors can be detected almost anywhere in \TeX, we want to declare the 1809error procedures near the beginning of the program. But the error procedures 1810in turn use some other procedures, which need to be declared |forward| 1811before we get to |error| itself. 1812 1813It is possible for |error| to be called recursively if some error arises 1814when |get_token| is being used to delete a token, and/or if some fatal error 1815occurs while \TeX\ is trying to fix a non-fatal one. But such recursion 1816@^recursion@> 1817is never more than two levels deep. 1818 1819@<Error handling...@>= 1820procedure@?normalize_selector; forward;@t\2@>@/ 1821procedure@?get_token; forward;@t\2@>@/ 1822procedure@?term_input; forward;@t\2@>@/ 1823procedure@?show_context; forward;@t\2@>@/ 1824procedure@?begin_file_reading; forward;@t\2@>@/ 1825procedure@?open_log_file; forward;@t\2@>@/ 1826procedure@?close_files_and_terminate; forward;@t\2@>@/ 1827procedure@?clear_for_error_prompt; forward;@t\2@>@/ 1828procedure@?give_err_help; forward;@t\2@>@/ 1829@t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; 1830 forward;@;@+gubed 1831 1832@ Individual lines of help are recorded in the array |help_line|, which 1833contains entries in positions |0..(help_ptr-1)|. They should be printed 1834in reverse order, i.e., with |help_line[0]| appearing last. 1835 1836@d hlp1(#)==help_line[0]:=#;@+end 1837@d hlp2(#)==help_line[1]:=#; hlp1 1838@d hlp3(#)==help_line[2]:=#; hlp2 1839@d hlp4(#)==help_line[3]:=#; hlp3 1840@d hlp5(#)==help_line[4]:=#; hlp4 1841@d hlp6(#)==help_line[5]:=#; hlp5 1842@d help0==help_ptr:=0 {sometimes there might be no help} 1843@d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} 1844@d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} 1845@d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} 1846@d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} 1847@d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} 1848@d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} 1849 1850@<Glob...@>= 1851@!help_line:array[0..5] of str_number; {helps for the next |error|} 1852@!help_ptr:0..6; {the number of help lines present} 1853@!use_err_help:boolean; {should the |err_help| list be shown?} 1854 1855@ @<Set init...@>= 1856help_ptr:=0; use_err_help:=false; 1857 1858@ The |jump_out| procedure just cuts across all active procedure levels and 1859goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the 1860whole program. It is used when there is no recovery from a particular error. 1861 1862Some \PASCAL\ compilers do not implement non-local |goto| statements. 1863@^system dependencies@> 1864In such cases the body of |jump_out| should simply be 1865`|close_files_and_terminate|;\thinspace' followed by a call on some system 1866procedure that quietly terminates the program. 1867 1868@<Error hand...@>= 1869procedure jump_out; 1870begin goto end_of_TEX; 1871end; 1872 1873@ Here now is the general |error| routine. 1874 1875@<Error hand...@>= 1876procedure error; {completes the job of error reporting} 1877label continue,exit; 1878var c:ASCII_code; {what the user types} 1879@!s1,@!s2,@!s3,@!s4:integer; 1880 {used to save global variables when deleting tokens} 1881begin if history<error_message_issued then history:=error_message_issued; 1882print_char("."); show_context; 1883if interaction=error_stop_mode then @<Get user's advice and |return|@>; 1884incr(error_count); 1885if error_count=100 then 1886 begin print_nl("(That makes 100 errors; please try again.)"); 1887@.That makes 100 errors...@> 1888 history:=fatal_error_stop; jump_out; 1889 end; 1890@<Put help message on the transcript file@>; 1891exit:end; 1892 1893@ @<Get user's advice...@>= 1894loop@+begin continue: clear_for_error_prompt; prompt_input("? "); 1895@.?\relax@> 1896 if last=first then return; 1897 c:=buffer[first]; 1898 if c>="a" then c:=c+"A"-"a"; {convert to uppercase} 1899 @<Interpret code |c| and |return| if done@>; 1900 end 1901 1902@ It is desirable to provide an `\.E' option here that gives the user 1903an easy way to return from \TeX\ to the system editor, with the offending 1904line ready to be edited. But such an extension requires some system 1905wizardry, so the present implementation simply types out the name of the 1906file that should be 1907edited and the relevant line number. 1908@^system dependencies@> 1909 1910There is a secret `\.D' option available when the debugging routines haven't 1911been commented~out. 1912@^debugging@> 1913 1914@<Interpret code |c| and |return| if done@>= 1915case c of 1916"0","1","2","3","4","5","6","7","8","9": if deletions_allowed then 1917 @<Delete \(c)|c-"0"| tokens and |goto continue|@>; 1918@t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/ 1919"E": if base_ptr>0 then 1920 begin print_nl("You want to edit file "); 1921@.You want to edit file x@> 1922 slow_print(input_stack[base_ptr].name_field); 1923 print(" at line "); print_int(line); 1924 interaction:=scroll_mode; jump_out; 1925 end; 1926"H": @<Print the help information and |goto continue|@>; 1927"I":@<Introduce new material from the terminal and |return|@>; 1928"Q","R","S":@<Change the interaction level and |return|@>; 1929"X":begin interaction:=scroll_mode; jump_out; 1930 end; 1931othercases do_nothing 1932endcases;@/ 1933@<Print the menu of available options@> 1934 1935@ @<Print the menu...@>= 1936begin print("Type <return> to proceed, S to scroll future error messages,");@/ 1937@.Type <return> to proceed...@> 1938print_nl("R to run without stopping, Q to run quietly,");@/ 1939print_nl("I to insert something, "); 1940if base_ptr>0 then print("E to edit your file,"); 1941if deletions_allowed then 1942 print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); 1943print_nl("H for help, X to quit."); 1944end 1945 1946@ Here the author of \TeX\ apologizes for making use of the numerical 1947relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings 1948|batch_mode|, |nonstop_mode|, |scroll_mode|. 1949@^Knuth, Donald Ervin@> 1950 1951@<Change the interaction...@>= 1952begin error_count:=0; interaction:=batch_mode+c-"Q"; 1953print("OK, entering "); 1954case c of 1955"Q":begin print_esc("batchmode"); decr(selector); 1956 end; 1957"R":print_esc("nonstopmode"); 1958"S":print_esc("scrollmode"); 1959end; {there are no other cases} 1960print("..."); print_ln; update_terminal; return; 1961end 1962 1963@ When the following code is executed, |buffer[(first+1)..(last-1)]| may 1964contain the material inserted by the user; otherwise another prompt will 1965be given. In order to understand this part of the program fully, you need 1966to be familiar with \TeX's input stacks. 1967 1968@<Introduce new material...@>= 1969begin begin_file_reading; {enter a new syntactic level for terminal input} 1970{now |state=mid_line|, so an initial blank space will count as a blank} 1971if last>first+1 then 1972 begin loc:=first+1; buffer[first]:=" "; 1973 end 1974else begin prompt_input("insert>"); loc:=first; 1975@.insert>@> 1976 end; 1977first:=last; 1978cur_input.limit_field:=last-1; {no |end_line_char| ends this line} 1979return; 1980end 1981 1982@ We allow deletion of up to 99 tokens at a time. 1983 1984@<Delete \(c)|c-"0"| tokens...@>= 1985begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state; 1986align_state:=1000000; OK_to_interrupt:=false; 1987if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then 1988 c:=c*10+buffer[first+1]-"0"*11 1989else c:=c-"0"; 1990while c>0 do 1991 begin get_token; {one-level recursive call of |error| is possible} 1992 decr(c); 1993 end; 1994cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true; 1995help2("I have just deleted some text, as you asked.")@/ 1996("You can now delete more, or insert, or whatever."); 1997show_context; goto continue; 1998end 1999 2000@ @<Print the help info...@>= 2001begin if use_err_help then 2002 begin give_err_help; use_err_help:=false; 2003 end 2004else begin if help_ptr=0 then 2005 help2("Sorry, I don't know how to help in this situation.")@/ 2006 @t\kern1em@>("Maybe you should try asking a human?"); 2007 repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; 2008 until help_ptr=0; 2009 end; 2010help4("Sorry, I already gave what help I could...")@/ 2011 ("Maybe you should try asking a human?")@/ 2012 ("An error might have occurred before I noticed any problems.")@/ 2013 ("``If all else fails, read the instructions.''");@/ 2014goto continue; 2015end 2016 2017@ @<Put help message on the transcript file@>= 2018if interaction>batch_mode then decr(selector); {avoid terminal output} 2019if use_err_help then 2020 begin print_ln; give_err_help; 2021 end 2022else while help_ptr>0 do 2023 begin decr(help_ptr); print_nl(help_line[help_ptr]); 2024 end; 2025print_ln; 2026if interaction>batch_mode then incr(selector); {re-enable terminal output} 2027print_ln 2028 2029@ A dozen or so error messages end with a parenthesized integer, so we 2030save a teeny bit of program space by declaring the following procedure: 2031 2032@p procedure int_error(@!n:integer); 2033begin print(" ("); print_int(n); print_char(")"); error; 2034end; 2035 2036@ In anomalous cases, the print selector might be in an unknown state; 2037the following subroutine is called to fix things just enough to keep 2038running a bit longer. 2039 2040@p procedure normalize_selector; 2041begin if log_opened then selector:=term_and_log 2042else selector:=term_only; 2043if job_name=0 then open_log_file; 2044if interaction=batch_mode then decr(selector); 2045end; 2046 2047@ The following procedure prints \TeX's last words before dying. 2048 2049@d succumb==begin if interaction=error_stop_mode then 2050 interaction:=scroll_mode; {no more interaction} 2051 if log_opened then error; 2052 @!debug if interaction>batch_mode then debug_help;@+gubed@;@/ 2053 history:=fatal_error_stop; jump_out; {irrecoverable error} 2054 end 2055 2056@<Error hand...@>= 2057procedure fatal_error(@!s:str_number); {prints |s|, and that's it} 2058begin normalize_selector;@/ 2059print_err("Emergency stop"); help1(s); succumb; 2060@.Emergency stop@> 2061end; 2062 2063@ Here is the most dreaded error message. 2064 2065@<Error hand...@>= 2066procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} 2067begin normalize_selector; 2068print_err("TeX capacity exceeded, sorry ["); 2069@.TeX capacity exceeded ...@> 2070print(s); print_char("="); print_int(n); print_char("]"); 2071help2("If you really absolutely need more capacity,")@/ 2072 ("you can ask a wizard to enlarge me."); 2073succumb; 2074end; 2075 2076@ The program might sometime run completely amok, at which point there is 2077no choice but to stop. If no previous error has been detected, that's bad 2078news; a message is printed that is really intended for the \TeX\ 2079maintenance person instead of the user (unless the user has been 2080particularly diabolical). The index entries for `this can't happen' may 2081help to pinpoint the problem. 2082@^dry rot@> 2083 2084@<Error hand...@>= 2085procedure confusion(@!s:str_number); 2086 {consistency check violated; |s| tells where} 2087begin normalize_selector; 2088if history<error_message_issued then 2089 begin print_err("This can't happen ("); print(s); print_char(")"); 2090@.This can't happen@> 2091 help1("I'm broken. Please show this to someone who can fix can fix"); 2092 end 2093else begin print_err("I can't go on meeting you like this"); 2094@.I can't go on...@> 2095 help2("One of your faux pas seems to have wounded me deeply...")@/ 2096 ("in fact, I'm barely conscious. Please fix it and try again."); 2097 end; 2098succumb; 2099end; 2100 2101@ Users occasionally want to interrupt \TeX\ while it's running. 2102If the \PASCAL\ runtime system allows this, one can implement 2103a routine that sets the global variable |interrupt| to some nonzero value 2104when such an interrupt is signalled. Otherwise there is probably at least 2105a way to make |interrupt| nonzero using the \PASCAL\ debugger. 2106@^system dependencies@> 2107@^debugging@> 2108 2109@d check_interrupt==begin if interrupt<>0 then pause_for_instructions; 2110 end 2111 2112@<Global...@>= 2113@!interrupt:integer; {should \TeX\ pause for instructions?} 2114@!OK_to_interrupt:boolean; {should interrupts be observed?} 2115 2116@ @<Set init...@>= 2117interrupt:=0; OK_to_interrupt:=true; 2118 2119@ When an interrupt has been detected, the program goes into its 2120highest interaction level and lets the user have nearly the full flexibility of 2121the |error| routine. \TeX\ checks for interrupts only at times when it is 2122safe to do this. 2123 2124@p procedure pause_for_instructions; 2125begin if OK_to_interrupt then 2126 begin interaction:=error_stop_mode; 2127 if (selector=log_only)or(selector=no_print) then 2128 incr(selector); 2129 print_err("Interruption"); 2130@.Interruption@> 2131 help3("You rang?")@/ 2132 ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/ 2133 ("unless you just want to quit by typing `X'."); 2134 deletions_allowed:=false; error; deletions_allowed:=true; 2135 interrupt:=0; 2136 end; 2137end; 2138 2139@* \[7] Arithmetic with scaled dimensions. 2140The principal computations performed by \TeX\ are done entirely in terms of 2141integers less than $2^{31}$ in magnitude; and divisions are done only when both 2142dividend and divisor are nonnegative. Thus, the arithmetic specified in this 2143program can be carried out in exactly the same way on a wide variety of 2144computers, including some small ones. Why? Because the arithmetic 2145calculations need to be spelled out precisely in order to guarantee that 2146\TeX\ will produce identical output on different machines. If some 2147quantities were rounded differently in different implementations, we would 2148find that line breaks and even page breaks might occur in different places. 2149Hence the arithmetic of \TeX\ has been designed with care, and systems that 2150claim to be implementations of \TeX82 should follow precisely the 2151@:TeX82}{\TeX82@> 2152calculations as they appear in the present program. 2153 2154(Actually there are three places where \TeX\ uses |div| with a possibly negative 2155numerator. These are harmless; see |div| in the index. Also if the user 2156sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic 2157information will involve negative-numerator division. The same remarks 2158apply for |mod| as well as for |div|.) 2159 2160@ Here is a routine that calculates half of an integer, using an 2161unambiguous convention with respect to signed odd numbers. 2162 2163@p function half(@!x:integer):integer; 2164begin if odd(x) then half:=(x+1) div 2 2165else half:=x @!div 2; 2166end; 2167 2168@ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples 2169of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit 2170positions from the right end of a binary computer word. 2171 2172@d unity == @'200000 {$2^{16}$, represents 1.00000} 2173@d two == @'400000 {$2^{17}$, represents 2.00000} 2174 2175@<Types...@>= 2176@!scaled = integer; {this type is used for scaled integers} 2177@!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$} 2178@!small_number=0..63; {this type is self-explanatory} 2179 2180@ The following function is used to create a scaled integer from a given decimal 2181fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is 2182given in |dig[i]|, and the calculation produces a correctly rounded result. 2183 2184@p function round_decimals(@!k:small_number) : scaled; 2185 {converts a decimal fraction} 2186var a:integer; {the accumulator} 2187begin a:=0; 2188while k>0 do 2189 begin decr(k); a:=(a+dig[k]*two) div 10; 2190 end; 2191round_decimals:=(a+1) div 2; 2192end; 2193 2194@ Conversely, here is a procedure analogous to |print_int|. If the output 2195of this procedure is subsequently read by \TeX\ and converted by the 2196|round_decimals| routine above, it turns out that the original value will 2197be reproduced exactly; the ``simplest'' such decimal number is output, 2198but there is always at least one digit following the decimal point. 2199 2200The invariant relation in the \&{repeat} loop is that a sequence of 2201decimal digits yet to be printed will yield the original number if and only if 2202they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}f<s$. 2203We can stop if and only if $f=0$ satisfies this condition; the loop will 2204terminate before $s$ can possibly become zero. 2205 2206@p procedure print_scaled(@!s:scaled); {prints scaled real, rounded to five 2207 digits} 2208var delta:scaled; {amount of allowable inaccuracy} 2209begin if s<0 then 2210 begin print_char("-"); negate(s); {print the sign, if negative} 2211 end; 2212print_int(s div unity); {print the integer part} 2213print_char("."); 2214s:=10*(s mod unity)+5; delta:=10; 2215repeat if delta>unity then s:=s+@'100000-50000; {round the last digit} 2216print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; 2217until s<=delta; 2218end; 2219 2220@ Physical sizes that a \TeX\ user specifies for portions of documents are 2221represented internally as scaled points. Thus, if we define an `sp' (scaled 2222@^sp@> 2223point) as a unit equal to $2^{-16}$ printer's points, every dimension 2224inside of \TeX\ is an integer number of sp. There are exactly 22254,736,286.72 sp per inch. Users are not allowed to specify dimensions 2226larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 2227meters); two such quantities can be added without overflow on a 32-bit 2228computer. 2229 2230The present implementation of \TeX\ does not check for overflow when 2231@^overflow in arithmetic@> 2232dimensions are added or subtracted. This could be done by inserting a 2233few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then 2234@t\\{report\_overflow}@>|', but the chance of overflow is so remote that 2235such tests do not seem worthwhile. 2236 2237\TeX\ needs to do only a few arithmetic operations on scaled quantities, 2238other than addition and subtraction, and the following subroutines do most of 2239the work. A single computation might use several subroutine calls, and it is 2240desirable to avoid producing multiple error messages in case of arithmetic 2241overflow; so the routines set the global variable |arith_error| to |true| 2242instead of reporting errors directly to the user. Another global variable, 2243|remainder|, holds the remainder after a division. 2244 2245@<Glob...@>= 2246@!arith_error:boolean; {has arithmetic overflow occurred recently?} 2247@!remainder:scaled; {amount subtracted to get an exact division} 2248 2249@ The first arithmetical subroutine we need computes $nx+y$, where |x| 2250and~|y| are |scaled| and |n| is an integer. We will also use it to 2251multiply integers. 2252 2253@d nx_plus_y(#)==mult_and_add(#,@'7777777777) 2254@d mult_integers(#)==mult_and_add(#,0,@'17777777777) 2255 2256@p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled; 2257begin if n<0 then 2258 begin negate(x); negate(n); 2259 end; 2260if n=0 then mult_and_add:=y 2261else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then 2262 mult_and_add:=n*x+y 2263else begin arith_error:=true; mult_and_add:=0; 2264 end; 2265end; 2266 2267@ We also need to divide scaled dimensions by integers. 2268 2269@p function x_over_n(@!x:scaled;@!n:integer):scaled; 2270var negative:boolean; {should |remainder| be negated?} 2271begin negative:=false; 2272if n=0 then 2273 begin arith_error:=true; x_over_n:=0; remainder:=x; 2274 end 2275else begin if n<0 then 2276 begin negate(x); negate(n); negative:=true; 2277 end; 2278 if x>=0 then 2279 begin x_over_n:=x div n; remainder:=x mod n; 2280 end 2281 else begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n); 2282 end; 2283 end; 2284if negative then negate(remainder); 2285end; 2286 2287@ Then comes the multiplication of a scaled number by a fraction |n/d|, 2288where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is 2289positive. It would be too dangerous to multiply by~|n| and then divide 2290by~|d|, in separate operations, since overflow might well occur; and it 2291would be too inaccurate to divide by |d| and then multiply by |n|. Hence 2292this subroutine simulates 1.5-precision arithmetic. 2293 2294@p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled; 2295var positive:boolean; {was |x>=0|?} 2296@!t,@!u,@!v:nonnegative_integer; {intermediate quantities} 2297begin if x>=0 then positive:=true 2298else begin negate(x); positive:=false; 2299 end; 2300t:=(x mod @'100000)*n; 2301u:=(x div @'100000)*n+(t div @'100000); 2302v:=(u mod d)*@'100000 + (t mod @'100000); 2303if u div d>=@'100000 then arith_error:=true 2304else u:=@'100000*(u div d) + (v div d); 2305if positive then 2306 begin xn_over_d:=u; remainder:=v mod d; 2307 end 2308else begin xn_over_d:=-u; remainder:=-(v mod d); 2309 end; 2310end; 2311 2312@ The next subroutine is used to compute the ``badness'' of glue, when a 2313total~|t| is supposed to be made from amounts that sum to~|s|. According 2314to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; 2315however, badness is simply a heuristic, so we need not squeeze out the 2316last drop of accuracy when computing it. All we really want is an 2317approximation that has similar properties. 2318@:TeXbook}{\sl The \TeX book@> 2319 2320The actual method used to compute the badness is easier to read from the 2321program than to describe in words. It produces an integer value that is a 2322reasonably close approximation to $100(t/s)^3$, and all implementations 2323of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is 2324treated as infinitely bad, and represented by 10000. 2325 2326It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s) 2327>=badness(t,s+1)|}.$$ The badness function defined here is capable of 2328computing at most 1095 distinct values, but that is plenty. 2329 2330@d inf_bad = 10000 {infinitely bad value} 2331 2332@p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|} 2333var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx 2334 100\cdot2^{18}$} 2335begin if t=0 then badness:=0 2336else if s<=0 then badness:=inf_bad 2337else begin if t<=7230584 then r:=(t*297) div s {$297^3=99.94\times2^{18}$} 2338 else if s>=1663497 then r:=t div (s div 297) 2339 else r:=t; 2340 if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$} 2341 else badness:=(r*r*r+@'400000) div @'1000000; 2342 end; {that was $r^3/2^{18}$, rounded to the nearest integer} 2343end; 2344 2345@ When \TeX\ ``packages'' a list into a box, it needs to calculate the 2346proportionality ratio by which the glue inside the box should stretch 2347or shrink. This calculation does not affect \TeX's decision making, 2348so the precise details of rounding, etc., in the glue calculation are not 2349of critical importance for the consistency of results on different computers. 2350 2351We shall use the type |glue_ratio| for such proportionality ratios. 2352A glue ratio should take the same amount of memory as an 2353|integer| (usually 32 bits) if it is to blend smoothly with \TeX's 2354other data structures. Thus |glue_ratio| should be equivalent to 2355|short_real| in some implementations of \PASCAL. Alternatively, 2356it is possible to deal with glue ratios using nothing but fixed-point 2357arithmetic; see {\sl TUGboat \bf3},1 (March 1982), 10--27. (But the 2358routines cited there must be modified to allow negative glue ratios.) 2359@^system dependencies@> 2360 2361@d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio} 2362@d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio} 2363@d float(#) == # {convert from |glue_ratio| to type |real|} 2364@d unfloat(#) == # {convert from |real| to type |glue_ratio|} 2365@d float_constant(#) == #.0 {convert |integer| constant to |real|} 2366 2367@<Types...@>= 2368@!glue_ratio=real; {one-word representation of a glue expansion factor} 2369 2370@* \[8] Packed data. 2371In order to make efficient use of storage space, \TeX\ bases its major data 2372structures on a |memory_word|, which contains either a (signed) integer, 2373possibly scaled, or a (signed) |glue_ratio|, or a small number of 2374fields that are one half or one quarter of the size used for storing 2375integers. 2376 2377If |x| is a variable of type |memory_word|, it contains up to four 2378fields that can be referred to as follows: 2379$$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr 2380|x|&.|int|&(an |integer|)\cr 2381|x|&.|sc|\qquad&(a |scaled| integer)\cr 2382|x|&.|gr|&(a |glue_ratio|)\cr 2383|x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr 2384|x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword 2385 field)\cr 2386|x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt 2387 &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ 2388This is somewhat cumbersome to write, and not very readable either, but 2389macros will be used to make the notation shorter and more transparent. 2390The \PASCAL\ code below gives a formal definition of |memory_word| and 2391its subsidiary types, using packed variant records. \TeX\ makes no 2392assumptions about the relative positions of the fields within a word. 2393 2394Since we are assuming 32-bit integers, a halfword must contain at least 239516 bits, and a quarterword must contain at least 8 bits. 2396@^system dependencies@> 2397But it doesn't hurt to have more bits; for example, with enough 36-bit 2398words you might be able to have |mem_max| as large as 262142, which is 2399eight times as much memory as anybody had during the first four years of 2400\TeX's existence. 2401 2402N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled 2403by a \PASCAL\ that packs all of the |memory_word| variants into 2404the space of a single integer. This means, for example, that |glue_ratio| 2405words should be |short_real| instead of |real| on some computers. Some 2406\PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into 2407an eight-bit field, but others insist on allocating space for an additional 2408sign bit; on such systems you can get 256 values into a quarterword only 2409if the subrange is `|-128..127|'. 2410 2411The present implementation tries to accommodate as many variations as possible, 2412so it makes few assumptions. If integers having the subrange 2413`|min_quarterword..max_quarterword|' can be packed into a quarterword, 2414and if integers having the subrange `|min_halfword..max_halfword|' 2415can be packed into a halfword, everything should work satisfactorily. 2416 2417It is usually most efficient to have |min_quarterword=min_halfword=0|, 2418so one should try to achieve this unless it causes a severe problem. 2419The values defined here are recommended for most 32-bit computers. 2420 2421@d min_quarterword=0 {smallest allowable value in a |quarterword|} 2422@d max_quarterword=255 {largest allowable value in a |quarterword|} 2423@d min_halfword==0 {smallest allowable value in a |halfword|} 2424@d max_halfword==65535 {largest allowable value in a |halfword|} 2425 2426@ Here are the inequalities that the quarterword and halfword values 2427must satisfy (or rather, the inequalities that they mustn't satisfy): 2428 2429@<Check the ``constant''...@>= 2430init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/ 2431if (mem_min>mem_bot)or(mem_max<mem_top) then bad:=10; 2432if (min_quarterword>0)or(max_quarterword<127) then bad:=11; 2433if (min_halfword>0)or(max_halfword<32767) then bad:=12; 2434if (min_quarterword<min_halfword)or@| 2435 (max_quarterword>max_halfword) then bad:=13; 2436if (mem_min<min_halfword)or(mem_max>=max_halfword)or@| 2437 (mem_bot-mem_min>max_halfword+1) then bad:=14; 2438if (font_base<min_quarterword)or(font_max>max_quarterword) then bad:=15; 2439if font_max>font_base+256 then bad:=16; 2440if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17; 2441if buf_size>max_halfword then bad:=18; 2442if max_quarterword-min_quarterword<255 then bad:=19; 2443 2444@ The operation of adding or subtracting |min_quarterword| occurs quite 2445frequently in \TeX, so it is convenient to abbreviate this operation 2446by using the macros |qi| and |qo| for input and output to and from 2447quarterword format. 2448 2449The inner loop of \TeX\ will run faster with respect to compilers 2450that don't optimize expressions like `|x+0|' and `|x-0|', if these 2451macros are simplified in the obvious way when |min_quarterword=0|. 2452@^inner loop@>@^system dependencies@> 2453 2454@d qi(#)==#+min_quarterword 2455 {to put an |eight_bits| item into a quarterword} 2456@d qo(#)==#-min_quarterword 2457 {to take an |eight_bits| item out of a quarterword} 2458@d hi(#)==#+min_halfword 2459 {to put a sixteen-bit item into a halfword} 2460@d ho(#)==#-min_halfword 2461 {to take a sixteen-bit item from a halfword} 2462 2463@ The reader should study the following definitions closely: 2464@^system dependencies@> 2465 2466@d sc==int {|scaled| data is equivalent to |integer|} 2467 2468@<Types...@>= 2469@!quarterword = min_quarterword..max_quarterword; {1/4 of a word} 2470@!halfword=min_halfword..max_halfword; {1/2 of a word} 2471@!two_choices = 1..2; {used when there are two variants in a record} 2472@!four_choices = 1..4; {used when there are four variants in a record} 2473@!two_halves = packed record@;@/ 2474 @!rh:halfword; 2475 case two_choices of 2476 1: (@!lh:halfword); 2477 2: (@!b0:quarterword; @!b1:quarterword); 2478 end; 2479@!four_quarters = packed record@;@/ 2480 @!b0:quarterword; 2481 @!b1:quarterword; 2482 @!b2:quarterword; 2483 @!b3:quarterword; 2484 end; 2485@!memory_word = record@;@/ 2486 case four_choices of 2487 1: (@!int:integer); 2488 2: (@!gr:glue_ratio); 2489 3: (@!hh:two_halves); 2490 4: (@!qqqq:four_quarters); 2491 end; 2492@!word_file = file of memory_word; 2493 2494@ When debugging, we may want to print a |memory_word| without knowing 2495what type it is; so we print it in all modes. 2496@^dirty \PASCAL@>@^debugging@> 2497 2498@p @!debug procedure print_word(@!w:memory_word); 2499 {prints |w| in all ways} 2500begin print_int(w.int); print_char(" ");@/ 2501print_scaled(w.sc); print_char(" ");@/ 2502print_scaled(round(unity*float(w.gr))); print_ln;@/ 2503@^real multiplication@> 2504print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":"); 2505print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/ 2506print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":"); 2507print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3); 2508end; 2509gubed 2510 2511@* \[9] Dynamic memory allocation. 2512The \TeX\ system does nearly all of its own memory allocation, so that it 2513can readily be transported into environments that do not have automatic 2514facilities for strings, garbage collection, etc., and so that it can be in 2515control of what error messages the user receives. The dynamic storage 2516requirements of \TeX\ are handled by providing a large array |mem| in 2517which consecutive blocks of words are used as nodes by the \TeX\ routines. 2518 2519Pointer variables are indices into this array, or into another array 2520called |eqtb| that will be explained later. A pointer variable might 2521also be a special flag that lies outside the bounds of |mem|, so we 2522allow pointers to assume any |halfword| value. The minimum halfword 2523value represents a null pointer. \TeX\ does not assume that |mem[null]| exists. 2524 2525@d pointer==halfword {a flag or a location in |mem| or |eqtb|} 2526@d null==min_halfword {the null pointer} 2527 2528@<Glob...@>= 2529@!temp_ptr:pointer; {a pointer variable for occasional emergency use} 2530 2531@ The |mem| array is divided into two regions that are allocated separately, 2532but the dividing line between these two regions is not fixed; they grow 2533together until finding their ``natural'' size in a particular job. 2534Locations less than or equal to |lo_mem_max| are used for storing 2535variable-length records consisting of two or more words each. This region 2536is maintained using an algorithm similar to the one described in exercise 25372.5--19 of {\sl The Art of Computer Programming}. However, no size field 2538appears in the allocated nodes; the program is responsible for knowing the 2539relevant size when a node is freed. Locations greater than or equal to 2540|hi_mem_min| are used for storing one-word records; a conventional 2541\.{AVAIL} stack is used for allocation in this region. 2542 2543Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part 2544of preloaded format files, by the \.{INITEX} preprocessor. 2545@.INITEX@> 2546Production versions of \TeX\ may extend the memory at both ends in order to 2547provide more space; locations between |mem_min| and |mem_bot| are always 2548used for variable-size nodes, and locations between |mem_top| and |mem_max| 2549are always used for single-word nodes. 2550 2551The key pointers that govern |mem| allocation have a prescribed order: 2552$$\advance\thickmuskip-2mu 2553\hbox{|null<=mem_min<=mem_bot<lo_mem_max< 2554 hi_mem_min<mem_top<=mem_end<=mem_max|.}$$ 2555 2556Empirical tests show that the present implementation of \TeX\ tends to 2557spend about 9\pct! of its running time allocating nodes, and about 6\pct! 2558deallocating them after their use. 2559 2560@<Glob...@>= 2561@!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area} 2562@!lo_mem_max : pointer; {the largest location of variable-size memory in use} 2563@!hi_mem_min : pointer; {the smallest location of one-word memory in use} 2564 2565@ In order to study the memory requirements of particular applications, it 2566is possible to prepare a version of \TeX\ that keeps track of current and 2567maximum memory usage. When code between the delimiters |@!stat| $\ldots$ 2568|tats| is not ``commented out,'' \TeX\ will run a bit slower but it will 2569report these statistics when |tracing_stats| is sufficiently large. 2570 2571@<Glob...@>= 2572@!var_used, @!dyn_used : integer; {how much memory is in use} 2573 2574@ Let's consider the one-word memory region first, since it's the 2575simplest. The pointer variable |mem_end| holds the highest-numbered location 2576of |mem| that has ever been used. The free locations of |mem| that 2577occur between |hi_mem_min| and |mem_end|, inclusive, are of type 2578|two_halves|, and we write |info(p)| and |link(p)| for the |lh| 2579and |rh| fields of |mem[p]| when it is of this type. The single-word 2580free locations form a linked list 2581$$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$ 2582terminated by |null|. 2583 2584@d link(#) == mem[#].hh.rh {the |link| field of a memory word} 2585@d info(#) == mem[#].hh.lh {the |info| field of a memory word} 2586 2587@<Glob...@>= 2588@!avail : pointer; {head of the list of available one-word nodes} 2589@!mem_end : pointer; {the last one-word node used in |mem|} 2590 2591@ If memory is exhausted, it might mean that the user has forgotten 2592a right brace. We will define some procedures later that try to help 2593pinpoint the trouble. 2594 2595@p @<Declare the procedure called |show_token_list|@>@/ 2596@<Declare the procedure called |runaway|@> 2597 2598@ The function |get_avail| returns a pointer to a new one-word node whose 2599|link| field is null. However, \TeX\ will halt if there is no more room left. 2600@^inner loop@> 2601 2602If the available-space list is empty, i.e., if |avail=null|, 2603we try first to increase |mem_end|. If that cannot be done, i.e., if 2604|mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be 2605done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit. 2606 2607@p function get_avail : pointer; {single-word node allocation} 2608var p:pointer; {the new node being got} 2609begin p:=avail; {get top location in the |avail| stack} 2610if p<>null then avail:=link(avail) {and pop it off} 2611else if mem_end<mem_max then {or go into virgin territory} 2612 begin incr(mem_end); p:=mem_end; 2613 end 2614else begin decr(hi_mem_min); p:=hi_mem_min; 2615 if hi_mem_min<=lo_mem_max then 2616 begin runaway; {if memory is exhausted, display possible runaway text} 2617 overflow("main memory size",mem_max+1-mem_min); 2618 {quit; all one-word nodes are busy} 2619@:TeX capacity exceeded main memory size}{\quad main memory size@> 2620 end; 2621 end; 2622link(p):=null; {provide an oft-desired initialization of the new node} 2623@!stat incr(dyn_used);@+tats@;{maintain statistics} 2624get_avail:=p; 2625end; 2626 2627@ Conversely, a one-word node is recycled by calling |free_avail|. 2628This routine is part of \TeX's ``inner loop,'' so we want it to be fast. 2629@^inner loop@> 2630 2631@d free_avail(#)== {single-word node liberation} 2632 begin link(#):=avail; avail:=#; 2633 @!stat decr(dyn_used);@+tats@/ 2634 end 2635 2636@ There's also a |fast_get_avail| routine, which saves the procedure-call 2637overhead at the expense of extra programming. This routine is used in 2638the places that would otherwise account for the most calls of |get_avail|. 2639@^inner loop@> 2640 2641@d fast_get_avail(#)==@t@>@;@/ 2642 begin #:=avail; {avoid |get_avail| if possible, to save time} 2643 if #=null then #:=get_avail 2644 else begin avail:=link(#); link(#):=null; 2645 @!stat incr(dyn_used);@+tats@/ 2646 end; 2647 end 2648 2649@ The procedure |flush_list(p)| frees an entire linked list of 2650one-word nodes that starts at position |p|. 2651@^inner loop@> 2652 2653@p procedure flush_list(@!p:pointer); {makes list of single-word nodes 2654 available} 2655var @!q,@!r:pointer; {list traversers} 2656begin if p<>null then 2657 begin r:=p; 2658 repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/ 2659 until r=null; {now |q| is the last node on the list} 2660 link(q):=avail; avail:=p; 2661 end; 2662end; 2663 2664@ The available-space list that keeps track of the variable-size portion 2665of |mem| is a nonempty, doubly-linked circular list of empty nodes, 2666pointed to by the roving pointer |rover|. 2667 2668Each empty node has size 2 or more; the first word contains the special 2669value |max_halfword| in its |link| field and the size in its |info| field; 2670the second word contains the two pointers for double linking. 2671 2672Each nonempty node also has size 2 or more. Its first word is of type 2673|two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|. 2674Otherwise there is complete flexibility with respect to the contents 2675of its other fields and its other words. 2676 2677(We require |mem_max<max_halfword| because terrible things can happen 2678when |max_halfword| appears in the |link| field of a nonempty node.) 2679 2680@d empty_flag == max_halfword {the |link| of an empty variable-size node} 2681@d is_empty(#) == (link(#)=empty_flag) {tests for empty node} 2682@d node_size == info {the size field in empty variable-size nodes} 2683@d llink(#) == info(#+1) {left link in doubly-linked list of empty nodes} 2684@d rlink(#) == link(#+1) {right link in doubly-linked list of empty nodes} 2685 2686@<Glob...@>= 2687@!rover : pointer; {points to some node in the list of empties} 2688 2689@ A call to |get_node| with argument |s| returns a pointer to a new node 2690of size~|s|, which must be 2~or more. The |link| field of the first word 2691of this new node is set to null. An overflow stop occurs if no suitable 2692space exists. 2693 2694If |get_node| is called with $s=2^{30}$, it simply merges adjacent free 2695areas and returns the value |max_halfword|. 2696 2697@p function get_node(@!s:integer):pointer; {variable-size node allocation} 2698label found,exit,restart; 2699var p:pointer; {the node currently under inspection} 2700@!q:pointer; {the node physically after node |p|} 2701@!r:integer; {the newly allocated node, or a candidate for this honor} 2702@!t:integer; {temporary register} 2703begin restart: p:=rover; {start at some free node in the ring} 2704repeat @<Try to allocate within node |p| and its physical successors, 2705 and |goto found| if allocation was possible@>; 2706@^inner loop@> 2707p:=rlink(p); {move to the next node in the ring} 2708until p=rover; {repeat until the whole list has been traversed} 2709if s=@'10000000000 then 2710 begin get_node:=max_halfword; return; 2711 end; 2712if lo_mem_max+2<hi_mem_min then if lo_mem_max+2<=mem_bot+max_halfword then 2713 @<Grow more variable-size memory and |goto restart|@>; 2714overflow("main memory size",mem_max+1-mem_min); 2715 {sorry, nothing satisfactory is left} 2716@:TeX capacity exceeded main memory size}{\quad main memory size@> 2717found: link(r):=null; {this node is now nonempty} 2718@!stat var_used:=var_used+s; {maintain usage statistics} 2719tats@;@/ 2720get_node:=r; 2721exit:end; 2722 2723@ The lower part of |mem| grows by 1000 words at a time, unless 2724we are very close to going under. When it grows, we simply link 2725a new node into the available-space list. This method of controlled 2726growth helps to keep the |mem| usage consecutive when \TeX\ is 2727implemented on ``virtual memory'' systems. 2728@^virtual memory@> 2729 2730@<Grow more variable-size memory and |goto restart|@>= 2731begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000 2732else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2; 2733 {|lo_mem_max+2<=t<hi_mem_min|} 2734p:=llink(rover); q:=lo_mem_max; rlink(p):=q; llink(rover):=q;@/ 2735if t>mem_bot+max_halfword then t:=mem_bot+max_halfword; 2736rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/ 2737lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null; 2738rover:=q; goto restart; 2739end 2740 2741@ Empirical tests show that the routine in this section performs a 2742node-merging operation about 0.75 times per allocation, on the average, 2743after which it finds that |r>p+1| about 95\pct! of the time. 2744 2745@<Try to allocate...@>= 2746q:=p+node_size(p); {find the physical successor} 2747@^inner loop@> 2748while is_empty(q) do {merge node |p| with node |q|} 2749 begin t:=rlink(q); 2750 if q=rover then rover:=t; 2751 llink(t):=llink(q); rlink(llink(q)):=t;@/ 2752 q:=q+node_size(q); 2753 end; 2754r:=q-s; 2755if r>p+1 then @<Allocate from the top of node |p| and |goto found|@>; 2756if r=p then if rlink(p)<>p then 2757 @<Allocate entire node |p| and |goto found|@>; 2758node_size(p):=q-p {reset the size in case it grew} 2759 2760@ @<Allocate from the top...@>= 2761begin node_size(p):=r-p; {store the remaining size} 2762@^inner loop@> 2763rover:=p; {start searching here next time} 2764goto found; 2765end 2766 2767@ Here we delete node |p| from the ring, and let |rover| rove around. 2768 2769@<Allocate entire...@>= 2770begin rover:=rlink(p); t:=llink(p); 2771llink(rover):=t; rlink(t):=rover; 2772goto found; 2773end 2774 2775@ Conversely, when some variable-size node |p| of size |s| is no longer needed, 2776the operation |free_node(p,s)| will make its words available, by inserting 2777|p| as a new empty node just before where |rover| now points. 2778@^inner loop@> 2779 2780@p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node 2781 liberation} 2782var q:pointer; {|llink(rover)|} 2783begin node_size(p):=s; link(p):=empty_flag; 2784q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links} 2785llink(rover):=p; rlink(q):=p; {insert |p| into the ring} 2786@!stat var_used:=var_used-s;@+tats@;{maintain statistics} 2787end; 2788 2789@ Just before \.{INITEX} writes out the memory, it sorts the doubly linked 2790available space list. The list is probably very short at such times, so a 2791simple insertion sort is used. The smallest available location will be 2792pointed to by |rover|, the next-smallest by |rlink(rover)|, etc. 2793 2794@p @!init procedure sort_avail; {sorts the available variable-size nodes 2795 by location} 2796var p,@!q,@!r: pointer; {indices into |mem|} 2797@!old_rover:pointer; {initial |rover| setting} 2798begin p:=get_node(@'10000000000); {merge adjacent free areas} 2799p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover; 2800while p<>old_rover do @<Sort \(p)|p| into the list starting at |rover| 2801 and advance |p| to |rlink(p)|@>; 2802p:=rover; 2803while rlink(p)<>max_halfword do 2804 begin llink(rlink(p)):=p; p:=rlink(p); 2805 end; 2806rlink(p):=rover; llink(rover):=p; 2807end; 2808tini 2809 2810@ The following |while| loop is guaranteed to 2811terminate, since the list that starts at 2812|rover| ends with |max_halfword| during the sorting procedure. 2813 2814@<Sort \(p)|p|...@>= 2815if p<rover then 2816 begin q:=p; p:=rlink(q); rlink(q):=rover; rover:=q; 2817 end 2818else begin q:=rover; 2819 while rlink(q)<p do q:=rlink(q); 2820 r:=rlink(p); rlink(p):=rlink(q); rlink(q):=p; p:=r; 2821 end 2822 2823@* \[10] Data structures for boxes and their friends. 2824From the computer's standpoint, \TeX's chief mission is to create 2825horizontal and vertical lists. We shall now investigate how the elements 2826of these lists are represented internally as nodes in the dynamic memory. 2827 2828A horizontal or vertical list is linked together by |link| fields in 2829the first word of each node. Individual nodes represent boxes, glue, 2830penalties, or special things like discretionary hyphens; because of this 2831variety, some nodes are longer than others, and we must distinguish different 2832kinds of nodes. We do this by putting a `|type|' field in the first word, 2833together with the link and an optional `|subtype|'. 2834 2835@d type(#) == mem[#].hh.b0 {identifies what kind of node this is} 2836@d subtype(#) == mem[#].hh.b1 {secondary identification in some cases} 2837 2838@ A |@!char_node|, which represents a single character, is the most important 2839kind of node because it accounts for the vast majority of all boxes. 2840Special precautions are therefore taken to ensure that a |char_node| does 2841not take up much memory space. Every such node is one word long, and in fact 2842it is identifiable by this property, since other kinds of nodes have at least 2843two words, and they appear in |mem| locations less than |hi_mem_min|. 2844This makes it possible to omit the |type| field in a |char_node|, leaving 2845us room for two bytes that identify a |font| and a |character| within 2846that font. 2847 2848Note that the format of a |char_node| allows for up to 256 different 2849fonts and up to 256 characters per font; but most implementations will 2850probably limit the total number of fonts to fewer than 75 per job, 2851and most fonts will stick to characters whose codes are 2852less than 128 (since higher codes 2853are more difficult to access on most keyboards). 2854 2855Extensions of \TeX\ intended for oriental languages will need even more 2856than $256\times256$ possible characters, when we consider different sizes 2857@^oriental characters@>@^Chinese characters@>@^Japanese characters@> 2858and styles of type. It is suggested that Chinese and Japanese fonts be 2859handled by representing such characters in two consecutive |char_node| 2860entries: The first of these has |font=font_base|, and its |link| points 2861to the second; 2862the second identifies the font and the character dimensions. 2863The saving feature about oriental characters is that most of them have 2864the same box dimensions. The |character| field of the first |char_node| 2865is a ``\\{charext}'' that distinguishes between graphic symbols whose 2866dimensions are identical for typesetting purposes. (See the \MF\ manual.) 2867Such an extension of \TeX\ would not be difficult; further details are 2868left to the reader. 2869 2870In order to make sure that the |character| code fits in a quarterword, 2871\TeX\ adds the quantity |min_quarterword| to the actual code. 2872 2873Character nodes appear only in horizontal lists, never in vertical lists. 2874 2875@d is_char_node(#) == (#>=hi_mem_min) 2876 {does the argument point to a |char_node|?} 2877@d font == type {the font code in a |char_node|} 2878@d character == subtype {the character code in a |char_node|} 2879 2880@ An |hlist_node| stands for a box that was made from a horizontal list. 2881Each |hlist_node| is seven words long, and contains the following fields 2882(in addition to the mandatory |type| and |link|, which we shall not 2883mention explicitly when discussing the other node types): The |height| and 2884|width| and |depth| are scaled integers denoting the dimensions of the 2885box. There is also a |shift_amount| field, a scaled integer indicating 2886how much this box should be lowered (if it appears in a horizontal list), 2887or how much it should be moved to the right (if it appears in a vertical 2888list). There is a |list_ptr| field, which points to the beginning of the 2889list from which this box was fabricated; if |list_ptr| is |null|, the box 2890is empty. Finally, there are three fields that represent the setting of 2891the glue: |glue_set(p)| is a word of type |glue_ratio| that represents 2892the proportionality constant for glue setting; |glue_sign(p)| is 2893|stretching| or |shrinking| or |normal| depending on whether or not the 2894glue should stretch or shrink or remain rigid; and |glue_order(p)| 2895specifies the order of infinity to which glue setting applies (|normal|, 2896|fil|, |fill|, or |filll|). The |subtype| field is not used. 2897 2898@d hlist_node=0 {|type| of hlist nodes} 2899@d box_node_size=7 {number of words to allocate for a box node} 2900@d width_offset=1 {position of |width| field in a box node} 2901@d depth_offset=2 {position of |depth| field in a box node} 2902@d height_offset=3 {position of |height| field in a box node} 2903@d width(#) == mem[#+width_offset].sc {width of the box, in sp} 2904@d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp} 2905@d height(#) == mem[#+height_offset].sc {height of the box, in sp} 2906@d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp} 2907@d list_offset=5 {position of |list_ptr| field in a box node} 2908@d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box} 2909@d glue_order(#) == subtype(#+list_offset) {applicable order of infinity} 2910@d glue_sign(#) == type(#+list_offset) {stretching or shrinking} 2911@d normal=0 {the most common case when several cases are named} 2912@d stretching = 1 {glue setting applies to the stretch components} 2913@d shrinking = 2 {glue setting applies to the shrink components} 2914@d glue_offset = 6 {position of |glue_set| in a box node} 2915@d glue_set(#) == mem[#+glue_offset].gr 2916 {a word of type |glue_ratio| for glue setting} 2917 2918@ The |new_null_box| function returns a pointer to an |hlist_node| in 2919which all subfields have the values corresponding to `\.{\\hbox\{\}}'. 2920The |subtype| field is set to |min_quarterword|, since that's the desired 2921|span_count| value if this |hlist_node| is changed to an |unset_node|. 2922 2923@p function new_null_box:pointer; {creates a new box node} 2924var p:pointer; {the new node} 2925begin p:=get_node(box_node_size); type(p):=hlist_node; 2926subtype(p):=min_quarterword; 2927width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null; 2928glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p)); 2929new_null_box:=p; 2930end; 2931 2932@ A |vlist_node| is like an |hlist_node| in all respects except that it 2933contains a vertical list. 2934 2935@d vlist_node=1 {|type| of vlist nodes} 2936 2937@ A |rule_node| stands for a solid black rectangle; it has |width|, 2938|depth|, and |height| fields just as in an |hlist_node|. However, if 2939any of these dimensions is $-2^{30}$, the actual value will be determined 2940by running the rule up to the boundary of the innermost enclosing box. 2941This is called a ``running dimension.'' The |width| is never running in 2942an hlist; the |height| and |depth| are never running in a~vlist. 2943 2944@d rule_node=2 {|type| of rule nodes} 2945@d rule_node_size=4 {number of words to allocate for a rule node} 2946@d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item} 2947@d is_running(#) == (#=null_flag) {tests for a running dimension} 2948 2949@ A new rule node is delivered by the |new_rule| function. It 2950makes all the dimensions ``running,'' so you have to change the 2951ones that are not allowed to run. 2952 2953@p function new_rule:pointer; 2954var p:pointer; {the new node} 2955begin p:=get_node(rule_node_size); type(p):=rule_node; 2956subtype(p):=0; {the |subtype| is not used} 2957width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag; 2958new_rule:=p; 2959end; 2960 2961@ Insertions are represented by |ins_node| records, where the |subtype| 2962indicates the corresponding box number. For example, `\.{\\insert 250}' 2963leads to an |ins_node| whose |subtype| is |250+min_quarterword|. 2964The |height| field of an |ins_node| is slightly misnamed; it actually holds 2965the natural height plus depth of the vertical list being inserted. 2966The |depth| field holds the |split_max_depth| to be used in case this 2967insertion is split, and the |split_top_ptr| points to the corresponding 2968|split_top_skip|. The |float_cost| field holds the |floating_penalty| that 2969will be used if this insertion floats to a subsequent page after a 2970split insertion of the same class. There is one more field, the 2971|ins_ptr|, which points to the beginning of the vlist for the insertion. 2972 2973@d ins_node=3 {|type| of insertion nodes} 2974@d ins_node_size=5 {number of words to allocate for an insertion} 2975@d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used} 2976@d ins_ptr(#)==info(#+4) {the vertical list to be inserted} 2977@d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used} 2978 2979@ A |mark_node| has a |mark_ptr| field that points to the reference count 2980of a token list that contains the user's \.{\\mark} text. 2981This field occupies a full word instead of a halfword, because 2982there's nothing to put in the other halfword; it is easier in \PASCAL\ to 2983use the full word than to risk leaving garbage in the unused half. 2984 2985@d mark_node=4 {|type| of a mark node} 2986@d small_node_size=2 {number of words to allocate for most node types} 2987@d mark_ptr(#)==mem[#+1].int {head of the token list for a mark} 2988 2989@ An |adjust_node|, which occurs only in horizontal lists, 2990specifies material that will be moved out into the surrounding 2991vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}' 2992operation. The |adjust_ptr| field points to the vlist containing this 2993material. 2994 2995@d adjust_node=5 {|type| of an adjust node} 2996@d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list} 2997 2998@ A |ligature_node|, which occurs only in horizontal lists, specifies 2999a character that was fabricated from the interaction of two or more 3000actual characters. The second word of the node, which is called the 3001|lig_char| word, contains |font| and |character| fields just as in a 3002|char_node|. The characters that generated the ligature have not been 3003forgotten, since they are needed for diagnostic messages and for 3004hyphenation; the |lig_ptr| field points to a linked list of character 3005nodes for all original characters that have been deleted. (This list 3006might be empty if the characters that generated the ligature were 3007retained in other nodes.) 3008 3009The |subtype| field is 0, plus 2 and/or 1 if the original source of the 3010ligature included implicit left and/or right boundaries. 3011 3012@d ligature_node=6 {|type| of a ligature node} 3013@d lig_char(#)==#+1 {the word where the ligature is to be found} 3014@d lig_ptr(#)==link(lig_char(#)) {the list of characters} 3015 3016@ The |new_ligature| function creates a ligature node having given 3017contents of the |font|, |character|, and |lig_ptr| fields. We also have 3018a |new_lig_item| function, which returns a two-word node having a given 3019|character| field. Such nodes are used for temporary processing as ligatures 3020are being created. 3021 3022@p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer; 3023var p:pointer; {the new node} 3024begin p:=get_node(small_node_size); type(p):=ligature_node; 3025font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q; 3026subtype(p):=0; new_ligature:=p; 3027end; 3028@# 3029function new_lig_item(@!c:quarterword):pointer; 3030var p:pointer; {the new node} 3031begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null; 3032new_lig_item:=p; 3033end; 3034 3035@ A |disc_node|, which occurs only in horizontal lists, specifies a 3036``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text 3037that starts at |pre_break(p)| will precede the break, the text that starts at 3038|post_break(p)| will follow the break, and text that appears in the next 3039|replace_count(p)| nodes will be ignored. For example, an ordinary 3040discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with 3041|pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|, 3042and |replace_count=0|. All three of the discretionary texts must be 3043lists that consist entirely of character, kern, box, rule, and ligature nodes. 3044 3045If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this 3046break. Otherwise the |hyphen_penalty| will be charged. The texts will 3047actually be substituted into the list by the line-breaking algorithm if it 3048decides to make the break, and the discretionary node will disappear at 3049that time; thus, the output routine sees only discretionaries that were 3050not chosen. 3051 3052@d disc_node=7 {|type| of a discretionary node} 3053@d replace_count==subtype {how many subsequent nodes to replace} 3054@d pre_break==llink {text that precedes a discretionary break} 3055@d post_break==rlink {text that follows a discretionary break} 3056 3057@p function new_disc:pointer; {creates an empty |disc_node|} 3058var p:pointer; {the new node} 3059begin p:=get_node(small_node_size); type(p):=disc_node; 3060replace_count(p):=0; pre_break(p):=null; post_break(p):=null; 3061new_disc:=p; 3062end; 3063 3064@ A |whatsit_node| is a wild card reserved for extensions to \TeX. The 3065|subtype| field in its first word says what `\\{whatsit}' it is, and 3066implicitly determines the node size (which must be 2 or more) and the 3067format of the remaining words. When a |whatsit_node| is encountered 3068in a list, special actions are invoked; knowledgeable people who are 3069careful not to mess up the rest of \TeX\ are able to make \TeX\ do new 3070things by adding code at the end of the program. For example, there 3071might be a `\TeX nicolor' extension to specify different colors of ink, 3072@^extensions to \TeX@> 3073and the whatsit node might contain the desired parameters. 3074 3075The present implementation of \TeX\ treats the features associated with 3076`\.{\\write}' and `\.{\\special}' as if they were extensions, in order to 3077illustrate how such routines might be coded. We shall defer further 3078discussion of extensions until the end of this program. 3079 3080@d whatsit_node=8 {|type| of special extension nodes} 3081 3082@ A |math_node|, which occurs only in horizontal lists, appears before and 3083after mathematical formulas. The |subtype| field is |before| before the 3084formula and |after| after it. There is a |width| field, which represents 3085the amount of surrounding space inserted by \.{\\mathsurround}. 3086 3087@d math_node=9 {|type| of a math node} 3088@d before=0 {|subtype| for math node that introduces a formula} 3089@d after=1 {|subtype| for math node that winds up a formula} 3090 3091@p function new_math(@!w:scaled;@!s:small_number):pointer; 3092var p:pointer; {the new node} 3093begin p:=get_node(small_node_size); type(p):=math_node; 3094subtype(p):=s; width(p):=w; new_math:=p; 3095end; 3096 3097@ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|, 3098|rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|, 3099|disc_node|, |whatsit_node|, and |math_node| are at the low end of the 3100type codes, by permitting a break at glue in a list if and only if the 3101|type| of the previous node is less than |math_node|. Furthermore, a 3102node is discarded after a break if its type is |math_node| or~more. 3103 3104@d precedes_break(#)==(type(#)<math_node) 3105@d non_discardable(#)==(type(#)<math_node) 3106 3107@ A |glue_node| represents glue in a list. However, it is really only 3108a pointer to a separate glue specification, since \TeX\ makes use of the 3109fact that many essentially identical nodes of glue are usually present. 3110If |p| points to a |glue_node|, |glue_ptr(p)| points to 3111another packet of words that specify the stretch and shrink components, etc. 3112 3113Glue nodes also serve to represent leaders; the |subtype| is used to 3114distinguish between ordinary glue (which is called |normal|) and the three 3115kinds of leaders (which are called |a_leaders|, |c_leaders|, and |x_leaders|). 3116The |leader_ptr| field points to a rule node or to a box node containing the 3117leaders; it is set to |null| in ordinary glue nodes. 3118 3119Many kinds of glue are computed from \TeX's ``skip'' parameters, and 3120it is helpful to know which parameter has led to a particular glue node. 3121Therefore the |subtype| is set to indicate the source of glue, whenever 3122it originated as a parameter. We will be defining symbolic names for the 3123parameter numbers later (e.g., |line_skip_code=0|, |baseline_skip_code=1|, 3124etc.); it suffices for now to say that the |subtype| of parametric glue 3125will be the same as the parameter number, plus~one. 3126 3127In math formulas there are two more possibilities for the |subtype| in a 3128glue node: |mu_glue| denotes an \.{\\mskip} (where the units are scaled \.{mu} 3129instead of scaled \.{pt}); and |cond_math_glue| denotes the `\.{\\nonscript}' 3130feature that cancels the glue node immediately following if it appears 3131in a subscript. 3132 3133@d glue_node=10 {|type| of node that points to a glue specification} 3134@d cond_math_glue=98 {special |subtype| to suppress glue in the next node} 3135@d mu_glue=99 {|subtype| for math glue} 3136@d a_leaders=100 {|subtype| for aligned leaders} 3137@d c_leaders=101 {|subtype| for centered leaders} 3138@d x_leaders=102 {|subtype| for expanded leaders} 3139@d glue_ptr==llink {pointer to a glue specification} 3140@d leader_ptr==rlink {pointer to box or rule node for leaders} 3141 3142@ A glue specification has a halfword reference count in its first word, 3143@^reference counts@> 3144representing |null| plus the number of glue nodes that point to it (less one). 3145Note that the reference count appears in the same position as 3146the |link| field in list nodes; this is the field that is initialized 3147to |null| when a node is allocated, and it is also the field that is flagged 3148by |empty_flag| in empty nodes. 3149 3150Glue specifications also contain three |scaled| fields, for the |width|, 3151|stretch|, and |shrink| dimensions. Finally, there are two one-byte 3152fields called |stretch_order| and |shrink_order|; these contain the 3153orders of infinity (|normal|, |fil|, |fill|, or |filll|) 3154corresponding to the stretch and shrink values. 3155 3156@d glue_spec_size=4 {number of words to allocate for a glue specification} 3157@d glue_ref_count(#) == link(#) {reference count of a glue specification} 3158@d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue} 3159@d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue} 3160@d stretch_order == type {order of infinity for stretching} 3161@d shrink_order == subtype {order of infinity for shrinking} 3162@d fil=1 {first-order infinity} 3163@d fill=2 {second-order infinity} 3164@d filll=3 {third-order infinity} 3165 3166@<Types...@>= 3167@!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power} 3168 3169@ Here is a function that returns a pointer to a copy of a glue spec. 3170The reference count in the copy is |null|, because there is assumed 3171to be exactly one reference to the new specification. 3172 3173@p function new_spec(@!p:pointer):pointer; {duplicates a glue specification} 3174var q:pointer; {the new spec} 3175begin q:=get_node(glue_spec_size);@/ 3176mem[q]:=mem[p]; glue_ref_count(q):=null;@/ 3177width(q):=width(p); stretch(q):=stretch(p); shrink(q):=shrink(p); 3178new_spec:=q; 3179end; 3180 3181@ And here's a function that creates a glue node for a given parameter 3182identified by its code number; for example, 3183|new_param_glue(line_skip_code)| returns a pointer to a glue node for the 3184current \.{\\lineskip}. 3185 3186@p function new_param_glue(@!n:small_number):pointer; 3187var p:pointer; {the new node} 3188@!q:pointer; {the glue specification} 3189begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=n+1; 3190leader_ptr(p):=null;@/ 3191q:=@<Current |mem| equivalent of glue parameter number |n|@>@t@>; 3192glue_ptr(p):=q; incr(glue_ref_count(q)); 3193new_param_glue:=p; 3194end; 3195 3196@ Glue nodes that are more or less anonymous are created by |new_glue|, 3197whose argument points to a glue specification. 3198 3199@p function new_glue(@!q:pointer):pointer; 3200var p:pointer; {the new node} 3201begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=normal; 3202leader_ptr(p):=null; glue_ptr(p):=q; incr(glue_ref_count(q)); 3203new_glue:=p; 3204end; 3205 3206@ Still another subroutine is needed: This one is sort of a combination 3207of |new_param_glue| and |new_glue|. It creates a glue node for one of 3208the current glue parameters, but it makes a fresh copy of the glue 3209specification, since that specification will probably be subject to change, 3210while the parameter will stay put. The global variable |temp_ptr| is 3211set to the address of the new spec. 3212 3213@p function new_skip_param(@!n:small_number):pointer; 3214var p:pointer; {the new node} 3215begin temp_ptr:=new_spec(@<Current |mem| equivalent of glue parameter...@>); 3216p:=new_glue(temp_ptr); glue_ref_count(temp_ptr):=null; subtype(p):=n+1; 3217new_skip_param:=p; 3218end; 3219 3220@ A |kern_node| has a |width| field to specify a (normally negative) 3221amount of spacing. This spacing correction appears in horizontal lists 3222between letters like A and V when the font designer said that it looks 3223better to move them closer together or further apart. A kern node can 3224also appear in a vertical list, when its `|width|' denotes additional 3225spacing in the vertical direction. The |subtype| is either |normal| (for 3226kerns inserted from font information or math mode calculations) or |explicit| 3227(for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern| 3228(for kerns inserted from non-math accents) or |mu_glue| (for kerns 3229inserted from \.{\\mkern} specifications in math formulas). 3230 3231@d kern_node=11 {|type| of a kern node} 3232@d explicit=1 {|subtype| of kern nodes from \.{\\kern} and \.{\\/}} 3233@d acc_kern=2 {|subtype| of kern nodes from accents} 3234 3235@ The |new_kern| function creates a kern node having a given width. 3236 3237@p function new_kern(@!w:scaled):pointer; 3238var p:pointer; {the new node} 3239begin p:=get_node(small_node_size); type(p):=kern_node; 3240subtype(p):=normal; 3241width(p):=w; 3242new_kern:=p; 3243end; 3244 3245@ A |penalty_node| specifies the penalty associated with line or page 3246breaking, in its |penalty| field. This field is a fullword integer, but 3247the full range of integer values is not used: Any penalty |>=10000| is 3248treated as infinity, and no break will be allowed for such high values. 3249Similarly, any penalty |<=-10000| is treated as negative infinity, and a 3250break will be forced. 3251 3252@d penalty_node=12 {|type| of a penalty node} 3253@d inf_penalty=inf_bad {``infinite'' penalty value} 3254@d eject_penalty=-inf_penalty {``negatively infinite'' penalty value} 3255@d penalty(#) == mem[#+1].int {the added cost of breaking a list here} 3256 3257@ Anyone who has been reading the last few sections of the program will 3258be able to guess what comes next. 3259 3260@p function new_penalty(@!m:integer):pointer; 3261var p:pointer; {the new node} 3262begin p:=get_node(small_node_size); type(p):=penalty_node; 3263subtype(p):=0; {the |subtype| is not used} 3264penalty(p):=m; new_penalty:=p; 3265end; 3266 3267@ You might think that we have introduced enough node types by now. Well, 3268almost, but there is one more: An |unset_node| has nearly the same format 3269as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign} 3270or \.{\\valign} that are not yet in their final form, since the box 3271dimensions are their ``natural'' sizes before any glue adjustment has been 3272made. The |glue_set| word is not present; instead, we have a |glue_stretch| 3273field, which contains the total stretch of order |glue_order| that is 3274present in the hlist or vlist being boxed. 3275Similarly, the |shift_amount| field is replaced by a |glue_shrink| field, 3276containing the total shrink of order |glue_sign| that is present. 3277The |subtype| field is called |span_count|; an unset box typically 3278contains the data for |qo(span_count)+1| columns. 3279Unset nodes will be changed to box nodes when alignment is completed. 3280 3281@d unset_node=13 {|type| for an unset node} 3282@d glue_stretch(#)==mem[#+glue_offset].sc {total stretch in an unset node} 3283@d glue_shrink==shift_amount {total shrink in an unset node} 3284@d span_count==subtype {indicates the number of spanned columns} 3285 3286@ In fact, there are still more types coming. When we get to math formula 3287processing we will see that a |style_node| has |type=14|; and a number 3288of larger type codes will also be defined, for use in math mode only. 3289 3290@ Warning: If any changes are made to these data structure layouts, such as 3291changing any of the node sizes or even reordering the words of nodes, 3292the |copy_node_list| procedure and the memory initialization code 3293below may have to be changed. Such potentially dangerous parts of the 3294program are listed in the index under `data structure assumptions'. 3295@!@^data structure assumptions@> 3296However, other references to the nodes are made symbolically in terms of 3297the \.{WEB} macro definitions above, so that format changes will leave 3298\TeX's other algorithms intact. 3299@^system dependencies@> 3300 3301@* \[11] Memory layout. 3302Some areas of |mem| are dedicated to fixed usage, since static allocation is 3303more efficient than dynamic allocation when we can get away with it. For 3304example, locations |mem_bot| to |mem_bot+3| are always used to store the 3305specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The 3306following macro definitions accomplish the static allocation by giving 3307symbolic names to the fixed positions. Static variable-size nodes appear 3308in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes 3309appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is 3310harmless to let |lig_trick| and |garbage| share the same location of |mem|. 3311 3312@d zero_glue==mem_bot {specification for \.{0pt plus 0pt minus 0pt}} 3313@d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}} 3314@d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}} 3315@d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}} 3316@d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}} 3317@d lo_mem_stat_max==fil_neg_glue+glue_spec_size-1 {largest statically 3318 allocated word in the variable-size |mem|} 3319@# 3320@d page_ins_head==mem_top {list of insertion data for current page} 3321@d contrib_head==mem_top-1 {vlist of items not yet on current page} 3322@d page_head==mem_top-2 {vlist for current page} 3323@d temp_head==mem_top-3 {head of a temporary list of some kind} 3324@d hold_head==mem_top-4 {head of a temporary list of another kind} 3325@d adjust_head==mem_top-5 {head of adjustment list returned by |hpack|} 3326@d active==mem_top-7 {head of active list in |line_break|, needs two words} 3327@d align_head==mem_top-8 {head of preamble list for alignments} 3328@d end_span==mem_top-9 {tail of spanned-width lists} 3329@d omit_template==mem_top-10 {a constant token list} 3330@d null_list==mem_top-11 {permanently empty list} 3331@d lig_trick==mem_top-12 {a ligature masquerading as a |char_node|} 3332@d garbage==mem_top-12 {used for scrap information} 3333@d backup_head==mem_top-13 {head of token list built by |scan_keyword|} 3334@d hi_mem_stat_min==mem_top-13 {smallest statically allocated word in 3335 the one-word |mem|} 3336@d hi_mem_stat_usage=14 {the number of one-word nodes always present} 3337 3338@ The following code gets |mem| off to a good start, when \TeX\ is 3339initializing itself the slow~way. 3340 3341@<Local variables for init...@>= 3342@!k:integer; {index into |mem|, |eqtb|, etc.} 3343 3344@ @<Initialize table entries...@>= 3345for k:=mem_bot+1 to lo_mem_stat_max do mem[k].sc:=0; 3346 {all glue dimensions are zeroed} 3347@^data structure assumptions@> 3348k:=mem_bot;@+while k<=lo_mem_stat_max do 3349 {set first words of glue specifications} 3350 begin glue_ref_count(k):=null+1; 3351 stretch_order(k):=normal; shrink_order(k):=normal; 3352 k:=k+glue_spec_size; 3353 end; 3354stretch(fil_glue):=unity; stretch_order(fil_glue):=fil;@/ 3355stretch(fill_glue):=unity; stretch_order(fill_glue):=fill;@/ 3356stretch(ss_glue):=unity; stretch_order(ss_glue):=fil;@/ 3357shrink(ss_glue):=unity; shrink_order(ss_glue):=fil;@/ 3358stretch(fil_neg_glue):=-unity; stretch_order(fil_neg_glue):=fil;@/ 3359rover:=lo_mem_stat_max+1; 3360link(rover):=empty_flag; {now initialize the dynamic memory} 3361node_size(rover):=1000; {which is a 1000-word available node} 3362llink(rover):=rover; rlink(rover):=rover;@/ 3363lo_mem_max:=rover+1000; link(lo_mem_max):=null; info(lo_mem_max):=null;@/ 3364for k:=hi_mem_stat_min to mem_top do 3365 mem[k]:=mem[lo_mem_max]; {clear list heads} 3366@<Initialize the special list heads and constant nodes@>; 3367avail:=null; mem_end:=mem_top; 3368hi_mem_min:=hi_mem_stat_min; {initialize the one-word memory} 3369var_used:=lo_mem_stat_max+1-mem_bot; dyn_used:=hi_mem_stat_usage; 3370 {initialize statistics} 3371 3372@ If \TeX\ is extended improperly, the |mem| array might get screwed up. 3373For example, some pointers might be wrong, or some ``dead'' nodes might not 3374have been freed when the last reference to them disappeared. Procedures 3375|check_mem| and |search_mem| are available to help diagnose such 3376problems. These procedures make use of two arrays called |free| and 3377|was_free| that are present only if \TeX's debugging routines have 3378been included. (You may want to decrease the size of |mem| while you 3379@^debugging@> 3380are debugging.) 3381 3382@<Glob...@>= 3383@!debug @!free: packed array [mem_min..mem_max] of boolean; {free cells} 3384@t\hskip10pt@>@!was_free: packed array [mem_min..mem_max] of boolean; 3385 {previously free cells} 3386@t\hskip10pt@>@!was_mem_end,@!was_lo_max,@!was_hi_min: pointer; 3387 {previous |mem_end|, |lo_mem_max|, and |hi_mem_min|} 3388@t\hskip10pt@>@!panicking:boolean; {do we want to check memory constantly?} 3389gubed 3390 3391@ @<Set initial...@>= 3392@!debug was_mem_end:=mem_min; {indicate that everything was previously free} 3393was_lo_max:=mem_min; was_hi_min:=mem_max; 3394panicking:=false; 3395gubed 3396 3397@ Procedure |check_mem| makes sure that the available space lists of 3398|mem| are well formed, and it optionally prints out all locations 3399that are reserved now but were free the last time this procedure was called. 3400 3401@p @!debug procedure check_mem(@!print_locs : boolean); 3402label done1,done2; {loop exits} 3403var p,@!q:pointer; {current locations of interest in |mem|} 3404@!clobbered:boolean; {is something amiss?} 3405begin for p:=mem_min to lo_mem_max do free[p]:=false; {you can probably 3406 do this faster} 3407for p:=hi_mem_min to mem_end do free[p]:=false; {ditto} 3408@<Check single-word |avail| list@>; 3409@<Check variable-size |avail| list@>; 3410@<Check flags of unavailable nodes@>; 3411if print_locs then @<Print newly busy locations@>; 3412for p:=mem_min to lo_mem_max do was_free[p]:=free[p]; 3413for p:=hi_mem_min to mem_end do was_free[p]:=free[p]; 3414 {|was_free:=free| might be faster} 3415was_mem_end:=mem_end; was_lo_max:=lo_mem_max; was_hi_min:=hi_mem_min; 3416end; 3417gubed 3418 3419@ @<Check single-word...@>= 3420p:=avail; q:=null; clobbered:=false; 3421while p<>null do 3422 begin if (p>mem_end)or(p<hi_mem_min) then clobbered:=true 3423 else if free[p] then clobbered:=true; 3424 if clobbered then 3425 begin print_nl("AVAIL list clobbered at "); 3426@.AVAIL list clobbered...@> 3427 print_int(q); goto done1; 3428 end; 3429 free[p]:=true; q:=p; p:=link(q); 3430 end; 3431done1: 3432 3433@ @<Check variable-size...@>= 3434p:=rover; q:=null; clobbered:=false; 3435repeat if (p>=lo_mem_max)or(p<mem_min) then clobbered:=true 3436 else if (rlink(p)>=lo_mem_max)or(rlink(p)<mem_min) then clobbered:=true 3437 else if not(is_empty(p))or(node_size(p)<2)or@| 3438 (p+node_size(p)>lo_mem_max)or@| (llink(rlink(p))<>p) then clobbered:=true; 3439 if clobbered then 3440 begin print_nl("Double-AVAIL list clobbered at "); 3441 print_int(q); goto done2; 3442 end; 3443for q:=p to p+node_size(p)-1 do {mark all locations free} 3444 begin if free[q] then 3445 begin print_nl("Doubly free location at "); 3446@.Doubly free location...@> 3447 print_int(q); goto done2; 3448 end; 3449 free[q]:=true; 3450 end; 3451q:=p; p:=rlink(p); 3452until p=rover; 3453done2: 3454 3455@ @<Check flags...@>= 3456p:=mem_min; 3457while p<=lo_mem_max do {node |p| should not be empty} 3458 begin if is_empty(p) then 3459 begin print_nl("Bad flag at "); print_int(p); 3460@.Bad flag...@> 3461 end; 3462 while (p<=lo_mem_max) and not free[p] do incr(p); 3463 while (p<=lo_mem_max) and free[p] do incr(p); 3464 end 3465 3466@ @<Print newly busy...@>= 3467begin print_nl("New busy locs:"); 3468for p:=mem_min to lo_mem_max do 3469 if not free[p] and ((p>was_lo_max) or was_free[p]) then 3470 begin print_char(" "); print_int(p); 3471 end; 3472for p:=hi_mem_min to mem_end do 3473 if not free[p] and 3474 ((p<was_hi_min) or (p>was_mem_end) or was_free[p]) then 3475 begin print_char(" "); print_int(p); 3476 end; 3477end 3478 3479@ The |search_mem| procedure attempts to answer the question ``Who points 3480to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem| 3481that might not be of type |two_halves|. Strictly speaking, this is 3482@^dirty \PASCAL@> 3483undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to 3484point to |p| purely by coincidence). But for debugging purposes, we want 3485to rule out the places that do {\sl not\/} point to |p|, so a few false 3486drops are tolerable. 3487 3488@p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|} 3489var q:integer; {current position being searched} 3490begin for q:=mem_min to lo_mem_max do 3491 begin if link(q)=p then 3492 begin print_nl("LINK("); print_int(q); print_char(")"); 3493 end; 3494 if info(q)=p then 3495 begin print_nl("INFO("); print_int(q); print_char(")"); 3496 end; 3497 end; 3498for q:=hi_mem_min to mem_end do 3499 begin if link(q)=p then 3500 begin print_nl("LINK("); print_int(q); print_char(")"); 3501 end; 3502 if info(q)=p then 3503 begin print_nl("INFO("); print_int(q); print_char(")"); 3504 end; 3505 end; 3506@<Search |eqtb| for equivalents equal to |p|@>; 3507@<Search |save_stack| for equivalents that point to |p|@>; 3508@<Search |hyph_list| for pointers to |p|@>; 3509end; 3510gubed 3511 3512@* \[12] Displaying boxes. 3513We can reinforce our knowledge of the data structures just introduced 3514by considering two procedures that display a list in symbolic form. 3515The first of these, called |short_display|, is used in ``overfull box'' 3516messages to give the top-level description of a list. The other one, 3517called |show_node_list|, prints a detailed description of exactly what 3518is in the data structure. 3519 3520The philosophy of |short_display| is to ignore the fine points about exactly 3521what is inside boxes, except that ligatures and discretionary breaks are 3522expanded. As a result, |short_display| is a recursive procedure, but the 3523recursion is never more than one level deep. 3524@^recursion@> 3525 3526A global variable |font_in_short_display| keeps track of the font code that 3527is assumed to be present when |short_display| begins; deviations from this 3528font will be printed. 3529 3530@<Glob...@>= 3531@!font_in_short_display:integer; {an internal font number} 3532 3533@ Boxes, rules, inserts, whatsits, marks, and things in general that are 3534sort of ``complicated'' are indicated only by printing `\.{[]}'. 3535 3536@p procedure short_display(@!p:integer); {prints highlights of list |p|} 3537var n:integer; {for replacement counts} 3538begin while p>mem_min do 3539 begin if is_char_node(p) then 3540 begin if p<=mem_end then 3541 begin if font(p)<>font_in_short_display then 3542 begin if (font(p)<font_base)or(font(p)>font_max) then 3543 print_char("*") 3544@.*\relax@> 3545 else @<Print the font identifier for |font(p)|@>; 3546 print_char(" "); font_in_short_display:=font(p); 3547 end; 3548 print_ASCII(qo(character(p))); 3549 end; 3550 end 3551 else @<Print a short indication of the contents of node |p|@>; 3552 p:=link(p); 3553 end; 3554end; 3555 3556@ @<Print a short indication of the contents of node |p|@>= 3557case type(p) of 3558hlist_node,vlist_node,ins_node,whatsit_node,mark_node,adjust_node, 3559 unset_node: print("[]"); 3560rule_node: print_char("|"); 3561glue_node: if glue_ptr(p)<>zero_glue then print_char(" "); 3562math_node: print_char("$"); 3563ligature_node: short_display(lig_ptr(p)); 3564disc_node: begin short_display(pre_break(p)); 3565 short_display(post_break(p));@/ 3566 n:=replace_count(p); 3567 while n>0 do 3568 begin if link(p)<>null then p:=link(p); 3569 decr(n); 3570 end; 3571 end; 3572othercases do_nothing 3573endcases 3574 3575@ The |show_node_list| routine requires some auxiliary subroutines: one to 3576print a font-and-character combination, one to print a token list without 3577its reference count, and one to print a rule dimension. 3578 3579@p procedure print_font_and_char(@!p:integer); {prints |char_node| data} 3580begin if p>mem_end then print_esc("CLOBBERED.") 3581else begin if (font(p)<font_base)or(font(p)>font_max) then print_char("*") 3582@.*\relax@> 3583 else @<Print the font identifier for |font(p)|@>; 3584 print_char(" "); print_ASCII(qo(character(p))); 3585 end; 3586end; 3587@# 3588procedure print_mark(@!p:integer); {prints token list data in braces} 3589begin print_char("{"); 3590if (p<hi_mem_min)or(p>mem_end) then print_esc("CLOBBERED.") 3591else show_token_list(link(p),null,max_print_line-10); 3592print_char("}"); 3593end; 3594@# 3595procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node} 3596begin if is_running(d) then print_char("*") else print_scaled(d); 3597@.*\relax@> 3598end; 3599 3600@ Then there is a subroutine that prints glue stretch and shrink, possibly 3601followed by the name of finite units: 3602 3603@p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number); 3604 {prints a glue component} 3605begin print_scaled(d); 3606if (order<normal)or(order>filll) then print("foul") 3607else if order>normal then 3608 begin print("fil"); 3609 while order>fil do 3610 begin print_char("l"); decr(order); 3611 end; 3612 end 3613else if s<>0 then print(s); 3614end; 3615 3616@ The next subroutine prints a whole glue specification. 3617 3618@p procedure print_spec(@!p:integer;@!s:str_number); 3619 {prints a glue specification} 3620begin if (p<mem_min)or(p>=lo_mem_max) then print_char("*") 3621@.*\relax@> 3622else begin print_scaled(width(p)); 3623 if s<>0 then print(s); 3624 if stretch(p)<>0 then 3625 begin print(" plus "); print_glue(stretch(p),stretch_order(p),s); 3626 end; 3627 if shrink(p)<>0 then 3628 begin print(" minus "); print_glue(shrink(p),shrink_order(p),s); 3629 end; 3630 end; 3631end; 3632 3633@ We also need to declare some procedures that appear later in this 3634documentation. 3635 3636@p @<Declare procedures needed for displaying the elements of mlists@>@; 3637@<Declare the procedure called |print_skip_param|@> 3638 3639@ Since boxes can be inside of boxes, |show_node_list| is inherently recursive, 3640@^recursion@> 3641up to a given maximum number of levels. The history of nesting is indicated 3642by the current string, which will be printed at the beginning of each line; 3643the length of this string, namely |cur_length|, is the depth of nesting. 3644 3645Recursive calls on |show_node_list| therefore use the following pattern: 3646 3647@d node_list_display(#)== 3648 begin append_char("."); show_node_list(#); flush_char; 3649 end {|str_room| need not be checked; see |show_box| below} 3650 3651@ A global variable called |depth_threshold| is used to record the maximum 3652depth of nesting for which |show_node_list| will show information. If we 3653have |depth_threshold=0|, for example, only the top level information will 3654be given and no sublists will be traversed. Another global variable, called 3655|breadth_max|, tells the maximum number of items to show at each level; 3656|breadth_max| had better be positive, or you won't see anything. 3657 3658@<Glob...@>= 3659@!depth_threshold : integer; {maximum nesting depth in box displays} 3660@!breadth_max : integer; {maximum number of items shown at the same list level} 3661 3662@ Now we are ready for |show_node_list| itself. This procedure has been 3663written to be ``extra robust'' in the sense that it should not crash or get 3664into a loop even if the data structures have been messed up by bugs in 3665the rest of the program. You can safely call its parent routine 3666|show_box(p)| for arbitrary values of |p| when you are debugging \TeX. 3667However, in the presence of bad data, the procedure may 3668@^dirty \PASCAL@>@^debugging@> 3669fetch a |memory_word| whose variant is different from the way it was stored; 3670for example, it might try to read |mem[p].hh| when |mem[p]| 3671contains a scaled integer, if |p| is a pointer that has been 3672clobbered or chosen at random. 3673 3674@p procedure show_node_list(@!p:integer); {prints a node list symbolically} 3675label exit; 3676var n:integer; {the number of items already printed at this level} 3677@!g:real; {a glue ratio, as a floating point number} 3678begin if cur_length>depth_threshold then 3679 begin if p>null then print(" []"); 3680 {indicate that there's been some truncation} 3681 return; 3682 end; 3683n:=0; 3684while p>mem_min do 3685 begin print_ln; print_current_string; {display the nesting history} 3686 if p>mem_end then {pointer out of range} 3687 begin print("Bad link, display aborted."); return; 3688@.Bad link...@> 3689 end; 3690 incr(n); if n>breadth_max then {time to stop} 3691 begin print("etc."); return; 3692@.etc@> 3693 end; 3694 @<Display node |p|@>; 3695 p:=link(p); 3696 end; 3697exit: 3698end; 3699 3700@ @<Display node |p|@>= 3701if is_char_node(p) then print_font_and_char(p) 3702else case type(p) of 3703 hlist_node,vlist_node,unset_node: @<Display box |p|@>; 3704 rule_node: @<Display rule |p|@>; 3705 ins_node: @<Display insertion |p|@>; 3706 whatsit_node: @<Display the whatsit node |p|@>; 3707 glue_node: @<Display glue |p|@>; 3708 kern_node: @<Display kern |p|@>; 3709 math_node: @<Display math node |p|@>; 3710 ligature_node: @<Display ligature |p|@>; 3711 penalty_node: @<Display penalty |p|@>; 3712 disc_node: @<Display discretionary |p|@>; 3713 mark_node: @<Display mark |p|@>; 3714 adjust_node: @<Display adjustment |p|@>; 3715 @t\4@>@<Cases of |show_node_list| that arise in mlists only@>@; 3716 othercases print("Unknown node type!") 3717 endcases 3718 3719@ @<Display box |p|@>= 3720begin if type(p)=hlist_node then print_esc("h") 3721else if type(p)=vlist_node then print_esc("v") 3722else print_esc("unset"); 3723print("box("); print_scaled(height(p)); print_char("+"); 3724print_scaled(depth(p)); print(")x"); print_scaled(width(p)); 3725if type(p)=unset_node then 3726 @<Display special fields of the unset node |p|@> 3727else begin @<Display the value of |glue_set(p)|@>; 3728 if shift_amount(p)<>0 then 3729 begin print(", shifted "); print_scaled(shift_amount(p)); 3730 end; 3731 end; 3732node_list_display(list_ptr(p)); {recursive call} 3733end 3734 3735@ @<Display special fields of the unset node |p|@>= 3736begin if span_count(p)<>min_quarterword then 3737 begin print(" ("); print_int(qo(span_count(p))+1); 3738 print(" columns)"); 3739 end; 3740if glue_stretch(p)<>0 then 3741 begin print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0); 3742 end; 3743if glue_shrink(p)<>0 then 3744 begin print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0); 3745 end; 3746end 3747 3748@ The code will have to change in this place if |glue_ratio| is 3749a structured type instead of an ordinary |real|. Note that this routine 3750should avoid arithmetic errors even if the |glue_set| field holds an 3751arbitrary random value. The following code assumes that a properly 3752formed nonzero |real| number has absolute value $2^{20}$ or more when 3753it is regarded as an integer; this precaution was adequate to prevent 3754floating point underflow on the author's computer. 3755@^system dependencies@> 3756@^dirty \PASCAL@> 3757 3758@<Display the value of |glue_set(p)|@>= 3759g:=float(glue_set(p)); 3760if (g<>float_constant(0))and(glue_sign(p)<>normal) then 3761 begin print(", glue set "); 3762 if glue_sign(p)=shrinking then print("- "); 3763 if abs(mem[p+glue_offset].int)<@'4000000 then print("?.?") 3764 else if abs(g)>float_constant(20000) then 3765 begin if g>float_constant(0) then print_char(">") 3766 else print("< -"); 3767 print_glue(20000*unity,glue_order(p),0); 3768 end 3769 else print_glue(round(unity*g),glue_order(p),0); 3770@^real multiplication@> 3771 end 3772 3773@ @<Display rule |p|@>= 3774begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+"); 3775print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p)); 3776end 3777 3778@ @<Display insertion |p|@>= 3779begin print_esc("insert"); print_int(qo(subtype(p))); 3780print(", natural size "); print_scaled(height(p)); 3781print("; split("); print_spec(split_top_ptr(p),0); 3782print_char(","); print_scaled(depth(p)); 3783print("); float cost "); print_int(float_cost(p)); 3784node_list_display(ins_ptr(p)); {recursive call} 3785end 3786 3787@ @<Display glue |p|@>= 3788if subtype(p)>=a_leaders then @<Display leaders |p|@> 3789else begin print_esc("glue"); 3790 if subtype(p)<>normal then 3791 begin print_char("("); 3792 if subtype(p)<cond_math_glue then 3793 print_skip_param(subtype(p)-1) 3794 else if subtype(p)=cond_math_glue then print_esc("nonscript") 3795 else print_esc("mskip"); 3796 print_char(")"); 3797 end; 3798 if subtype(p)<>cond_math_glue then 3799 begin print_char(" "); 3800 if subtype(p)<cond_math_glue then print_spec(glue_ptr(p),0) 3801 else print_spec(glue_ptr(p),"mu"); 3802 end; 3803 end 3804 3805@ @<Display leaders |p|@>= 3806begin print_esc(""); 3807if subtype(p)=c_leaders then print_char("c") 3808else if subtype(p)=x_leaders then print_char("x"); 3809print("leaders "); print_spec(glue_ptr(p),0); 3810node_list_display(leader_ptr(p)); {recursive call} 3811end 3812 3813@ An ``explicit'' kern value is indicated implicitly by an explicit space. 3814 3815@<Display kern |p|@>= 3816if subtype(p)<>mu_glue then 3817 begin print_esc("kern"); 3818 if subtype(p)<>normal then print_char(" "); 3819 print_scaled(width(p)); 3820 if subtype(p)=acc_kern then print(" (for accent)"); 3821@.for accent@> 3822 end 3823else begin print_esc("mkern"); print_scaled(width(p)); print("mu"); 3824 end 3825 3826@ @<Display math node |p|@>= 3827begin print_esc("math"); 3828if subtype(p)=before then print("on") 3829else print("off"); 3830if width(p)<>0 then 3831 begin print(", surrounded "); print_scaled(width(p)); 3832 end; 3833end 3834 3835@ @<Display ligature |p|@>= 3836begin print_font_and_char(lig_char(p)); print(" (ligature "); 3837if subtype(p)>1 then print_char("|"); 3838font_in_short_display:=font(lig_char(p)); short_display(lig_ptr(p)); 3839if odd(subtype(p)) then print_char("|"); 3840print_char(")"); 3841end 3842 3843@ @<Display penalty |p|@>= 3844begin print_esc("penalty "); print_int(penalty(p)); 3845end 3846 3847@ The |post_break| list of a discretionary node is indicated by a prefixed 3848`\.{\char'174}' instead of the `\..' before the |pre_break| list. 3849 3850@<Display discretionary |p|@>= 3851begin print_esc("discretionary"); 3852if replace_count(p)>0 then 3853 begin print(" replacing "); print_int(replace_count(p)); 3854 end; 3855node_list_display(pre_break(p)); {recursive call} 3856append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call} 3857end 3858 3859@ @<Display mark |p|@>= 3860begin print_esc("mark"); print_mark(mark_ptr(p)); 3861end 3862 3863@ @<Display adjustment |p|@>= 3864begin print_esc("vadjust"); node_list_display(adjust_ptr(p)); {recursive call} 3865end 3866 3867@ The recursive machinery is started by calling |show_box|. 3868@^recursion@> 3869 3870@p procedure show_box(@!p:pointer); 3871begin @<Assign the values |depth_threshold:=show_box_depth| and 3872 |breadth_max:=show_box_breadth|@>; 3873if breadth_max<=0 then breadth_max:=5; 3874if pool_ptr+depth_threshold>=pool_size then 3875 depth_threshold:=pool_size-pool_ptr-1; 3876 {now there's enough room for prefix string} 3877show_node_list(p); {the show starts at |p|} 3878print_ln; 3879end; 3880 3881@* \[13] Destroying boxes. 3882When we are done with a node list, we are obliged to return it to free 3883storage, including all of its sublists. The recursive procedure 3884|flush_node_list| does this for us. 3885 3886@ First, however, we shall consider two non-recursive procedures that do 3887simpler tasks. The first of these, |delete_token_ref|, is called when 3888a pointer to a token list's reference count is being removed. This means 3889that the token list should disappear if the reference count was |null|, 3890otherwise the count should be decreased by one. 3891@^reference counts@> 3892 3893@d token_ref_count(#) == info(#) {reference count preceding a token list} 3894 3895@p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count 3896 of a token list that is losing one reference} 3897begin if token_ref_count(p)=null then flush_list(p) 3898else decr(token_ref_count(p)); 3899end; 3900 3901@ Similarly, |delete_glue_ref| is called when a pointer to a glue 3902specification is being withdrawn. 3903@^reference counts@> 3904@d fast_delete_glue_ref(#)==@t@>@;@/ 3905 begin if glue_ref_count(#)=null then free_node(#,glue_spec_size) 3906 else decr(glue_ref_count(#)); 3907 end 3908 3909@p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification} 3910fast_delete_glue_ref(p); 3911 3912@ Now we are ready to delete any node list, recursively. 3913In practice, the nodes deleted are usually charnodes (about 2/3 of the time), 3914and they are glue nodes in about half of the remaining cases. 3915@^recursion@> 3916 3917@p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|} 3918label done; {go here when node |p| has been freed} 3919var q:pointer; {successor to node |p|} 3920begin while p<>null do 3921@^inner loop@> 3922 begin q:=link(p); 3923 if is_char_node(p) then free_avail(p) 3924 else begin case type(p) of 3925 hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p)); 3926 free_node(p,box_node_size); goto done; 3927 end; 3928 rule_node: begin free_node(p,rule_node_size); goto done; 3929 end; 3930 ins_node: begin flush_node_list(ins_ptr(p)); 3931 delete_glue_ref(split_top_ptr(p)); 3932 free_node(p,ins_node_size); goto done; 3933 end; 3934 whatsit_node: @<Wipe out the whatsit node |p| and |goto done|@>; 3935 glue_node: begin fast_delete_glue_ref(glue_ptr(p)); 3936 if leader_ptr(p)<>null then flush_node_list(leader_ptr(p)); 3937 end; 3938 kern_node,math_node,penalty_node: do_nothing; 3939 ligature_node: flush_node_list(lig_ptr(p)); 3940 mark_node: delete_token_ref(mark_ptr(p)); 3941 disc_node: begin flush_node_list(pre_break(p)); 3942 flush_node_list(post_break(p)); 3943 end; 3944 adjust_node: flush_node_list(adjust_ptr(p)); 3945 @t\4@>@<Cases of |flush_node_list| that arise in mlists only@>@; 3946 othercases confusion("flushing") 3947@:this can't happen flushing}{\quad flushing@> 3948 endcases;@/ 3949 free_node(p,small_node_size); 3950 done:end; 3951 p:=q; 3952 end; 3953end; 3954 3955@* \[14] Copying boxes. 3956Another recursive operation that acts on boxes is sometimes needed: The 3957procedure |copy_node_list| returns a pointer to another node list that has 3958the same structure and meaning as the original. Note that since glue 3959specifications and token lists have reference counts, we need not make 3960copies of them. Reference counts can never get too large to fit in a 3961halfword, since each pointer to a node is in a different memory address, 3962and the total number of memory addresses fits in a halfword. 3963@^recursion@> 3964@^reference counts@> 3965 3966(Well, there actually are also references from outside |mem|; if the 3967|save_stack| is made arbitrarily large, it would theoretically be possible 3968to break \TeX\ by overflowing a reference count. But who would want to do that?) 3969 3970@d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list} 3971@d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec} 3972 3973@ The copying procedure copies words en masse without bothering 3974to look at their individual fields. If the node format changes---for 3975example, if the size is altered, or if some link field is moved to another 3976relative position---then this code may need to be changed too. 3977@^data structure assumptions@> 3978 3979@p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the 3980 node list that starts at |p| and returns a pointer to the new list} 3981var h:pointer; {temporary head of copied list} 3982@!q:pointer; {previous position in new list} 3983@!r:pointer; {current node being fabricated for new list} 3984@!words:0..5; {number of words remaining to be copied} 3985begin h:=get_avail; q:=h; 3986while p<>null do 3987 begin @<Make a copy of node |p| in node |r|@>; 3988 link(q):=r; q:=r; p:=link(p); 3989 end; 3990link(q):=null; q:=link(h); free_avail(h); 3991copy_node_list:=q; 3992end; 3993 3994@ @<Make a copy of node |p|...@>= 3995words:=1; {this setting occurs in more branches than any other} 3996if is_char_node(p) then r:=get_avail 3997else @<Case statement to copy different types and set |words| to the number 3998 of initial words not yet copied@>; 3999while words>0 do 4000 begin decr(words); mem[r+words]:=mem[p+words]; 4001 end 4002 4003@ @<Case statement to copy...@>= 4004case type(p) of 4005hlist_node,vlist_node,unset_node: begin r:=get_node(box_node_size); 4006 mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words} 4007 list_ptr(r):=copy_node_list(list_ptr(p)); {this affects |mem[r+5]|} 4008 words:=5; 4009 end; 4010rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size; 4011 end; 4012ins_node: begin r:=get_node(ins_node_size); mem[r+4]:=mem[p+4]; 4013 add_glue_ref(split_top_ptr(p)); 4014 ins_ptr(r):=copy_node_list(ins_ptr(p)); {this affects |mem[r+4]|} 4015 words:=ins_node_size-1; 4016 end; 4017whatsit_node:@<Make a partial copy of the whatsit node |p| and make |r| 4018 point to it; set |words| to the number of initial words not yet copied@>; 4019glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p)); 4020 glue_ptr(r):=glue_ptr(p); leader_ptr(r):=copy_node_list(leader_ptr(p)); 4021 end; 4022kern_node,math_node,penalty_node: begin r:=get_node(small_node_size); 4023 words:=small_node_size; 4024 end; 4025ligature_node: begin r:=get_node(small_node_size); 4026 mem[lig_char(r)]:=mem[lig_char(p)]; {copy |font| and |character|} 4027 lig_ptr(r):=copy_node_list(lig_ptr(p)); 4028 end; 4029disc_node: begin r:=get_node(small_node_size); 4030 pre_break(r):=copy_node_list(pre_break(p)); 4031 post_break(r):=copy_node_list(post_break(p)); 4032 end; 4033mark_node: begin r:=get_node(small_node_size); add_token_ref(mark_ptr(p)); 4034 words:=small_node_size; 4035 end; 4036adjust_node: begin r:=get_node(small_node_size); 4037 adjust_ptr(r):=copy_node_list(adjust_ptr(p)); 4038 end; {|words=1=small_node_size-1|} 4039othercases confusion("copying") 4040@:this can't happen copying}{\quad copying@> 4041endcases 4042 4043@* \[15] The command codes. 4044Before we can go any further, we need to define symbolic names for the internal 4045code numbers that represent the various commands obeyed by \TeX. These codes 4046are somewhat arbitrary, but not completely so. For example, the command 4047codes for character types are fixed by the language, since a user says, 4048e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter, 4049and the command code |math_shift| is equal to~3. Some other codes have 4050been made adjacent so that |case| statements in the program need not consider 4051cases that are widely spaced, or so that |case| statements can be replaced 4052by |if| statements. 4053 4054At any rate, here is the list, for future reference. First come the 4055``catcode'' commands, several of which share their numeric codes with 4056ordinary commands when the catcode cannot emerge from \TeX's scanning routine. 4057 4058@d escape=0 {escape delimiter (called \.\\ in {\sl The \TeX book\/})} 4059@:TeXbook}{\sl The \TeX book@> 4060@d relax=0 {do nothing ( \.{\\relax} )} 4061@d left_brace=1 {beginning of a group ( \.\{ )} 4062@d right_brace=2 {ending of a group ( \.\} )} 4063@d math_shift=3 {mathematics shift character ( \.\$ )} 4064@d tab_mark=4 {alignment delimiter ( \.\&, \.{\\span} )} 4065@d car_ret=5 {end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )} 4066@d out_param=5 {output a macro parameter} 4067@d mac_param=6 {macro parameter symbol ( \.\# )} 4068@d sup_mark=7 {superscript ( \.{\char'136} )} 4069@d sub_mark=8 {subscript ( \.{\char'137} )} 4070@d ignore=9 {characters to ignore ( \.{\^\^@@} )} 4071@d endv=9 {end of \<v_j> list in alignment template} 4072@d spacer=10 {characters equivalent to blank space ( \.{\ } )} 4073@d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )} 4074@d other_char=12 {none of the special character types} 4075@d active_char=13 {characters that invoke macros ( \.{\char`\~} )} 4076@d par_end=13 {end of paragraph ( \.{\\par} )} 4077@d match=13 {match a macro parameter} 4078@d comment=14 {characters that introduce comments ( \.\% )} 4079@d end_match=14 {end of parameters to macro} 4080@d stop=14 {end of job ( \.{\\end}, \.{\\dump} )} 4081@d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )} 4082@d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )} 4083@d max_char_code=15 {largest catcode for individual characters} 4084 4085@ Next are the ordinary run-of-the-mill command codes. Codes that are 4086|min_internal| or more represent internal quantities that might be 4087expanded by `\.{\\the}'. 4088 4089@d char_num=16 {character specified numerically ( \.{\\char} )} 4090@d math_char_num=17 {explicit math code ( \.{\\mathchar} )} 4091@d mark=18 {mark definition ( \.{\\mark} )} 4092@d xray=19 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)} 4093@d make_box=20 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)} 4094@d hmove=21 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )} 4095@d vmove=22 {vertical motion ( \.{\\raise}, \.{\\lower} )} 4096@d un_hbox=23 {unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )} 4097@d un_vbox=24 {unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )} 4098@d remove_item=25 {nullify last item ( \.{\\unpenalty}, 4099 \.{\\unkern}, \.{\\unskip} )} 4100@d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)} 4101@d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)} 4102@d mskip=28 {math glue ( \.{\\mskip} )} 4103@d kern=29 {fixed space ( \.{\\kern})} 4104@d mkern=30 {math kern ( \.{\\mkern} )} 4105@d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)} 4106@d halign=32 {horizontal table alignment ( \.{\\halign} )} 4107@d valign=33 {vertical table alignment ( \.{\\valign} )} 4108@d no_align=34 {temporary escape from alignment ( \.{\\noalign} )} 4109@d vrule=35 {vertical rule ( \.{\\vrule} )} 4110@d hrule=36 {horizontal rule ( \.{\\hrule} )} 4111@d insert=37 {vlist inserted in box ( \.{\\insert} )} 4112@d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )} 4113@d ignore_spaces=39 {gobble |spacer| tokens ( \.{\\ignorespaces} )} 4114@d after_assignment=40 {save till assignment is done ( \.{\\afterassignment} )} 4115@d after_group=41 {save till group is done ( \.{\\aftergroup} )} 4116@d break_penalty=42 {additional badness ( \.{\\penalty} )} 4117@d start_par=43 {begin paragraph ( \.{\\indent}, \.{\\noindent} )} 4118@d ital_corr=44 {italic correction ( \.{\\/} )} 4119@d accent=45 {attach accent in text ( \.{\\accent} )} 4120@d math_accent=46 {attach accent in math ( \.{\\mathaccent} )} 4121@d discretionary=47 {discretionary texts ( \.{\\-}, \.{\\discretionary} )} 4122@d eq_no=48 {equation number ( \.{\\eqno}, \.{\\leqno} )} 4123@d left_right=49 {variable delimiter ( \.{\\left}, \.{\\right} )} 4124@d math_comp=50 {component of formula ( \.{\\mathbin}, etc.~)} 4125@d limit_switch=51 {diddle limit conventions ( \.{\\displaylimits}, etc.~)} 4126@d above=52 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)} 4127@d math_style=53 {style specification ( \.{\\displaystyle}, etc.~)} 4128@d math_choice=54 {choice specification ( \.{\\mathchoice} )} 4129@d non_script=55 {conditional math glue ( \.{\\nonscript} )} 4130@d vcenter=56 {vertically center a vbox ( \.{\\vcenter} )} 4131@d case_shift=57 {force specific case ( \.{\\lowercase}, \.{\\uppercase}~)} 4132@d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )} 4133@d extension=59 {extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)} 4134@d in_stream=60 {files for reading ( \.{\\openin}, \.{\\closein} )} 4135@d begin_group=61 {begin local grouping ( \.{\\begingroup} )} 4136@d end_group=62 {end local grouping ( \.{\\endgroup} )} 4137@d omit=63 {omit alignment template ( \.{\\omit} )} 4138@d ex_space=64 {explicit space ( \.{\\\ } )} 4139@d no_boundary=65 {suppress boundary ligatures ( \.{\\noboundary} )} 4140@d radical=66 {square root and similar signs ( \.{\\radical} )} 4141@d end_cs_name=67 {end control sequence ( \.{\\endcsname} )} 4142@d min_internal=68 {the smallest code that can follow \.{\\the}} 4143@d char_given=68 {character code defined by \.{\\chardef}} 4144@d math_given=69 {math code defined by \.{\\mathchardef}} 4145@d last_item=70 {most recent item ( \.{\\lastpenalty}, 4146 \.{\\lastkern}, \.{\\lastskip} )} 4147@d max_non_prefixed_command=70 {largest command code that can't be \.{\\global}} 4148 4149@ The next codes are special; they all relate to mode-independent 4150assignment of values to \TeX's internal registers or tables. 4151Codes that are |max_internal| or less represent internal quantities 4152that might be expanded by `\.{\\the}'. 4153 4154@d toks_register=71 {token list register ( \.{\\toks} )} 4155@d assign_toks=72 {special token list ( \.{\\output}, \.{\\everypar}, etc.~)} 4156@d assign_int=73 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)} 4157@d assign_dimen=74 {user-defined length ( \.{\\hsize}, etc.~)} 4158@d assign_glue=75 {user-defined glue ( \.{\\baselineskip}, etc.~)} 4159@d assign_mu_glue=76 {user-defined muglue ( \.{\\thinmuskip}, etc.~)} 4160@d assign_font_dimen=77 {user-defined font dimension ( \.{\\fontdimen} )} 4161@d assign_font_int=78 {user-defined font integer ( \.{\\hyphenchar}, 4162 \.{\\skewchar} )} 4163@d set_aux=79 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )} 4164@d set_prev_graf=80 {specify state info ( \.{\\prevgraf} )} 4165@d set_page_dimen=81 {specify state info ( \.{\\pagegoal}, etc.~)} 4166@d set_page_int=82 {specify state info ( \.{\\deadcycles}, 4167 \.{\\insertpenalties} )} 4168@d set_box_dimen=83 {change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )} 4169@d set_shape=84 {specify fancy paragraph shape ( \.{\\parshape} )} 4170@d def_code=85 {define a character code ( \.{\\catcode}, etc.~)} 4171@d def_family=86 {declare math fonts ( \.{\\textfont}, etc.~)} 4172@d set_font=87 {set current font ( font identifiers )} 4173@d def_font=88 {define a font file ( \.{\\font} )} 4174@d register=89 {internal register ( \.{\\count}, \.{\\dimen}, etc.~)} 4175@d max_internal=89 {the largest code that can follow \.{\\the}} 4176@d advance=90 {advance a register or parameter ( \.{\\advance} )} 4177@d multiply=91 {multiply a register or parameter ( \.{\\multiply} )} 4178@d divide=92 {divide a register or parameter ( \.{\\divide} )} 4179@d prefix=93 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )} 4180@d let=94 {assign a command code ( \.{\\let}, \.{\\futurelet} )} 4181@d shorthand_def=95 {code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)} 4182@d read_to_cs=96 {read into a control sequence ( \.{\\read} )} 4183@d def=97 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )} 4184@d set_box=98 {set a box ( \.{\\setbox} )} 4185@d hyph_data=99 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )} 4186@d set_interaction=100 {define level of interaction ( \.{\\batchmode}, etc.~)} 4187@d max_command=100 {the largest command code seen at |big_switch|} 4188 4189@ The remaining command codes are extra special, since they cannot get through 4190\TeX's scanner to the main control routine. They have been given values higher 4191than |max_command| so that their special nature is easily discernible. 4192The ``expandable'' commands come first. 4193 4194@d undefined_cs=max_command+1 {initial state of most |eq_type| fields} 4195@d expand_after=max_command+2 {special expansion ( \.{\\expandafter} )} 4196@d no_expand=max_command+3 {special nonexpansion ( \.{\\noexpand} )} 4197@d input=max_command+4 {input a source file ( \.{\\input}, \.{\\endinput} )} 4198@d if_test=max_command+5 {conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)} 4199@d fi_or_else=max_command+6 {delimiters for conditionals ( \.{\\else}, etc.~)} 4200@d cs_name=max_command+7 {make a control sequence from tokens ( \.{\\csname} )} 4201@d convert=max_command+8 {convert to text ( \.{\\number}, \.{\\string}, etc.~)} 4202@d the=max_command+9 {expand an internal quantity ( \.{\\the} )} 4203@d top_bot_mark=max_command+10 {inserted mark ( \.{\\topmark}, etc.~)} 4204@d call=max_command+11 {non-long, non-outer control sequence} 4205@d long_call=max_command+12 {long, non-outer control sequence} 4206@d outer_call=max_command+13 {non-long, outer control sequence} 4207@d long_outer_call=max_command+14 {long, outer control sequence} 4208@d end_template=max_command+15 {end of an alignment template} 4209@d dont_expand=max_command+16 {the following token was marked by \.{\\noexpand}} 4210@d glue_ref=max_command+17 {the equivalent points to a glue specification} 4211@d shape_ref=max_command+18 {the equivalent points to a parshape specification} 4212@d box_ref=max_command+19 {the equivalent points to a box node, or is |null|} 4213@d data=max_command+20 {the equivalent is simply a halfword number} 4214 4215@* \[16] The semantic nest. 4216\TeX\ is typically in the midst of building many lists at once. For example, 4217when a math formula is being processed, \TeX\ is in math mode and 4218working on an mlist; this formula has temporarily interrupted \TeX\ from 4219being in horizontal mode and building the hlist of a paragraph; and this 4220paragraph has temporarily interrupted \TeX\ from being in vertical mode 4221and building the vlist for the next page of a document. Similarly, when a 4222\.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily 4223interrupted from working in restricted horizontal mode, and it enters 4224internal vertical mode. The ``semantic nest'' is a stack that 4225keeps track of what lists and modes are currently suspended. 4226 4227At each level of processing we are in one of six modes: 4228 4229\yskip\hang|vmode| stands for vertical mode (the page builder); 4230 4231\hang|hmode| stands for horizontal mode (the paragraph builder); 4232 4233\hang|mmode| stands for displayed formula mode; 4234 4235\hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox}); 4236 4237\hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox}); 4238 4239\hang|-mmode| stands for math formula mode (not displayed). 4240 4241\yskip\noindent The mode is temporarily set to zero while processing \.{\\write} 4242texts in the |ship_out| routine. 4243 4244Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that 4245\TeX's ``big semantic switch'' can select the appropriate thing to 4246do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current 4247mode and |cur_cmd| is the current command code. 4248 4249@d vmode=1 {vertical mode} 4250@d hmode=vmode+max_command+1 {horizontal mode} 4251@d mmode=hmode+max_command+1 {math mode} 4252 4253@p procedure print_mode(@!m:integer); {prints the mode represented by |m|} 4254begin if m>0 then 4255 case m div (max_command+1) of 4256 0:print("vertical"); 4257 1:print("horizontal"); 4258 2:print("display math"); 4259 end 4260else if m=0 then print("no") 4261else case (-m) div (max_command+1) of 4262 0:print("internal vertical"); 4263 1:print("restricted horizontal"); 4264 2:print("math"); 4265 end; 4266print(" mode"); 4267end; 4268 4269@ The state of affairs at any semantic level can be represented by 4270five values: 4271 4272\yskip\hang|mode| is the number representing the semantic mode, as 4273just explained. 4274 4275\yskip\hang|head| is a |pointer| to a list head for the list being built; 4276|link(head)| therefore points to the first element of the list, or 4277to |null| if the list is empty. 4278 4279\yskip\hang|tail| is a |pointer| to the final node of the list being 4280built; thus, |tail=head| if and only if the list is empty. 4281 4282\yskip\hang|prev_graf| is the number of lines of the current paragraph that 4283have already been put into the present vertical list. 4284 4285\yskip\hang|aux| is an auxiliary |memory_word| that gives further information 4286that is needed to characterize the situation. 4287 4288\yskip\noindent 4289In vertical mode, |aux| is also known as |prev_depth|; it is the scaled 4290value representing the depth of the previous box, for use in baseline 4291calculations, or it is |<=-1000|pt if the next box on the vertical list is to 4292be exempt from baseline calculations. In horizontal mode, |aux| is also 4293known as |space_factor| and |clang|; it holds the current space factor used in 4294spacing calculations, and the current language used for hyphenation. 4295(The value of |clang| is undefined in restricted horizontal mode.) 4296In math mode, |aux| is also known as |incompleat_noad|; if 4297not |null|, it points to a record that represents the numerator of a 4298generalized fraction for which the denominator is currently being formed 4299in the current list. 4300 4301There is also a sixth quantity, |mode_line|, which correlates 4302the semantic nest with the user's input; |mode_line| contains the source 4303line number at which the current level of nesting was entered. The negative 4304of this line number is the |mode_line| at the level of the 4305user's output routine. 4306 4307In horizontal mode, the |prev_graf| field is used for initial language data. 4308 4309The semantic nest is an array called |nest| that holds the |mode|, |head|, 4310|tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels 4311below the currently active one. Information about the currently active 4312level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|, 4313|aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to 4314be pushed onto |nest| if necessary. 4315 4316@d ignore_depth==-65536000 {|prev_depth| value that is ignored} 4317 4318@<Types...@>= 4319@!list_state_record=record@!mode_field:-mmode..mmode;@+ 4320 @!head_field,@!tail_field: pointer; 4321 @!pg_field,@!ml_field: integer;@+ 4322 @!aux_field: memory_word; 4323 end; 4324 4325@ @d mode==cur_list.mode_field {current mode} 4326@d head==cur_list.head_field {header node of current list} 4327@d tail==cur_list.tail_field {final node on current list} 4328@d prev_graf==cur_list.pg_field {number of paragraph lines accumulated} 4329@d aux==cur_list.aux_field {auxiliary data about the current list} 4330@d prev_depth==aux.sc {the name of |aux| in vertical mode} 4331@d space_factor==aux.hh.lh {part of |aux| in horizontal mode} 4332@d clang==aux.hh.rh {the other part of |aux| in horizontal mode} 4333@d incompleat_noad==aux.int {the name of |aux| in math mode} 4334@d mode_line==cur_list.ml_field {source file line number at beginning of list} 4335 4336@<Glob...@>= 4337@!nest:array[0..nest_size] of list_state_record; 4338@!nest_ptr:0..nest_size; {first unused location of |nest|} 4339@!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing} 4340@!cur_list:list_state_record; {the ``top'' semantic state} 4341@!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}} 4342 4343@ Here is a common way to make the current list grow: 4344 4345@d tail_append(#)==begin link(tail):=#; tail:=link(tail); 4346 end 4347 4348@ We will see later that the vertical list at the bottom semantic level is split 4349into two parts; the ``current page'' runs from |page_head| to |page_tail|, 4350and the ``contribution list'' runs from |contrib_head| to |tail| of 4351semantic level zero. The idea is that contributions are first formed in 4352vertical mode, then ``contributed'' to the current page (during which time 4353the page-breaking decisions are made). For now, we don't need to know 4354any more details about the page-building process. 4355 4356@<Set init...@>= 4357nest_ptr:=0; max_nest_stack:=0; 4358mode:=vmode; head:=contrib_head; tail:=contrib_head; 4359prev_depth:=ignore_depth; mode_line:=0; 4360prev_graf:=0; shown_mode:=0; 4361@<Start a new current page@>; 4362 4363@ When \TeX's work on one level is interrupted, the state is saved by 4364calling |push_nest|. This routine changes |head| and |tail| so that 4365a new (empty) list is begun; it does not change |mode| or |aux|. 4366 4367@p procedure push_nest; {enter a new semantic level, save the old} 4368begin if nest_ptr>max_nest_stack then 4369 begin max_nest_stack:=nest_ptr; 4370 if nest_ptr=nest_size then overflow("semantic nest size",nest_size); 4371@:TeX capacity exceeded semantic nest size}{\quad semantic nest size@> 4372 end; 4373nest[nest_ptr]:=cur_list; {stack the record} 4374incr(nest_ptr); head:=get_avail; tail:=head; prev_graf:=0; mode_line:=line; 4375end; 4376 4377@ Conversely, when \TeX\ is finished on the current level, the former 4378state is restored by calling |pop_nest|. This routine will never be 4379called at the lowest semantic level, nor will it be called unless |head| 4380is a node that should be returned to free memory. 4381 4382@p procedure pop_nest; {leave a semantic level, re-enter the old} 4383begin free_avail(head); decr(nest_ptr); cur_list:=nest[nest_ptr]; 4384end; 4385 4386@ Here is a procedure that displays what \TeX\ is working on, at all levels. 4387 4388@p procedure@?print_totals; forward;@t\2@> 4389procedure show_activities; 4390var p:0..nest_size; {index into |nest|} 4391@!m:-mmode..mmode; {mode} 4392@!a:memory_word; {auxiliary} 4393@!q,@!r:pointer; {for showing the current page} 4394@!t:integer; {ditto} 4395begin nest[nest_ptr]:=cur_list; {put the top level into the array} 4396print_nl(""); print_ln; 4397for p:=nest_ptr downto 0 do 4398 begin m:=nest[p].mode_field; a:=nest[p].aux_field; 4399 print_nl("### "); print_mode(m); 4400 print(" entered at line "); print_int(abs(nest[p].ml_field)); 4401 if m=hmode then if nest[p].pg_field <> @'40600000 then 4402 begin print(" (language"); print_int(nest[p].pg_field mod @'200000); 4403 print(":hyphenmin"); print_int(nest[p].pg_field div @'20000000); 4404 print_char(","); print_int((nest[p].pg_field div @'200000) mod @'100); 4405 print_char(")"); 4406 end; 4407 if nest[p].ml_field<0 then print(" (\output routine)"); 4408 if p=0 then 4409 begin @<Show the status of the current page@>; 4410 if link(contrib_head)<>null then 4411 print_nl("### recent contributions:"); 4412 end; 4413 show_box(link(nest[p].head_field)); 4414 @<Show the auxiliary field, |a|@>; 4415 end; 4416end; 4417 4418@ @<Show the auxiliary...@>= 4419case abs(m) div (max_command+1) of 44200: begin print_nl("prevdepth "); 4421 if a.sc<=ignore_depth then print("ignored") 4422 else print_scaled(a.sc); 4423 if nest[p].pg_field<>0 then 4424 begin print(", prevgraf "); 4425 print_int(nest[p].pg_field); print(" line"); 4426 if nest[p].pg_field<>1 then print_char("s"); 4427 end; 4428 end; 44291: begin print_nl("spacefactor "); print_int(a.hh.lh); 4430 if m>0 then@+ if a.hh.rh>0 then 4431 begin print(", current language "); print_int(a.hh.rh);@+ 4432 end; 4433 end; 44342: if a.int<>null then 4435 begin print("this will be denominator of:"); show_box(a.int);@+ 4436 end; 4437end {there are no other cases} 4438 4439@* \[17] The table of equivalents. 4440Now that we have studied the data structures for \TeX's semantic routines, 4441we ought to consider the data structures used by its syntactic routines. In 4442other words, our next concern will be 4443the tables that \TeX\ looks at when it is scanning 4444what the user has written. 4445 4446The biggest and most important such table is called |eqtb|. It holds the 4447current ``equivalents'' of things; i.e., it explains what things mean 4448or what their current values are, for all quantities that are subject to 4449the nesting structure provided by \TeX's grouping mechanism. There are six 4450parts to |eqtb|: 4451 4452\yskip\hangg 1) |eqtb[active_base..(hash_base-1)]| holds the current 4453equivalents of single-character control sequences. 4454 4455\yskip\hangg 2) |eqtb[hash_base..(glue_base-1)]| holds the current 4456equivalents of multiletter control sequences. 4457 4458\yskip\hangg 3) |eqtb[glue_base..(local_base-1)]| holds the current 4459equivalents of glue parameters like the current baselineskip. 4460 4461\yskip\hangg 4) |eqtb[local_base..(int_base-1)]| holds the current 4462equivalents of local halfword quantities like the current box registers, 4463the current ``catcodes,'' the current font, and a pointer to the current 4464paragraph shape. 4465 4466\yskip\hangg 5) |eqtb[int_base..(dimen_base-1)]| holds the current 4467equivalents of fullword integer parameters like the current hyphenation 4468penalty. 4469 4470\yskip\hangg 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents 4471of fullword dimension parameters like the current hsize or amount of 4472hanging indentation. 4473 4474\yskip\noindent Note that, for example, the current amount of 4475baselineskip glue is determined by the setting of a particular location 4476in region~3 of |eqtb|, while the current meaning of the control sequence 4477`\.{\\baselineskip}' (which might have been changed by \.{\\def} or 4478\.{\\let}) appears in region~2. 4479 4480@ Each entry in |eqtb| is a |memory_word|. Most of these words are of type 4481|two_halves|, and subdivided into three fields: 4482 4483\yskip\hangg 1) The |eq_level| (a quarterword) is the level of grouping at 4484which this equivalent was defined. If the level is |level_zero|, the 4485equivalent has never been defined; |level_one| refers to the outer level 4486(outside of all groups), and this level is also used for global 4487definitions that never go away. Higher levels are for equivalents that 4488will disappear at the end of their group. @^global definitions@> 4489 4490\yskip\hangg 2) The |eq_type| (another quarterword) specifies what kind of 4491entry this is. There are many types, since each \TeX\ primitive like 4492\.{\\hbox}, \.{\\def}, etc., has its own special code. The list of 4493command codes above includes all possible settings of the |eq_type| field. 4494 4495\yskip\hangg 3) The |equiv| (a halfword) is the current equivalent value. 4496This may be a font number, a pointer into |mem|, or a variety of other 4497things. 4498 4499@d eq_level_field(#)==#.hh.b1 4500@d eq_type_field(#)==#.hh.b0 4501@d equiv_field(#)==#.hh.rh 4502@d eq_level(#)==eq_level_field(eqtb[#]) {level of definition} 4503@d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent} 4504@d equiv(#)==equiv_field(eqtb[#]) {equivalent value} 4505@d level_zero=min_quarterword {level for undefined quantities} 4506@d level_one=level_zero+1 {outermost level for defined quantities} 4507 4508@ Many locations in |eqtb| have symbolic names. The purpose of the next 4509paragraphs is to define these names, and to set up the initial values of the 4510equivalents. 4511 4512In the first region we have 256 equivalents for ``active characters'' that 4513act as control sequences, followed by 256 equivalents for single-character 4514control sequences. 4515 4516Then comes region~2, which corresponds to the hash table that we will 4517define later. The maximum address in this region is used for a dummy 4518control sequence that is perpetually undefined. There also are several 4519locations for control sequences that are perpetually defined 4520(since they are used in error recovery). 4521 4522@d active_base=1 {beginning of region 1, for active character equivalents} 4523@d single_base=active_base+256 {equivalents of one-character control sequences} 4524@d null_cs=single_base+256 {equivalent of \.{\\csname\\endcsname}} 4525@d hash_base=null_cs+1 {beginning of region 2, for the hash table} 4526@d frozen_control_sequence=hash_base+hash_size {for error recovery} 4527@d frozen_protection=frozen_control_sequence {inaccessible but definable} 4528@d frozen_cr=frozen_control_sequence+1 {permanent `\.{\\cr}'} 4529@d frozen_end_group=frozen_control_sequence+2 {permanent `\.{\\endgroup}'} 4530@d frozen_right=frozen_control_sequence+3 {permanent `\.{\\right}'} 4531@d frozen_fi=frozen_control_sequence+4 {permanent `\.{\\fi}'} 4532@d frozen_end_template=frozen_control_sequence+5 {permanent `\.{\\endtemplate}'} 4533@d frozen_endv=frozen_control_sequence+6 {second permanent `\.{\\endtemplate}'} 4534@d frozen_relax=frozen_control_sequence+7 {permanent `\.{\\relax}'} 4535@d end_write=frozen_control_sequence+8 {permanent `\.{\\endwrite}'} 4536@d frozen_dont_expand=frozen_control_sequence+9 4537 {permanent `\.{\\notexpanded:}'} 4538@d frozen_null_font=frozen_control_sequence+10 4539 {permanent `\.{\\nullfont}'} 4540@d font_id_base=frozen_null_font-font_base 4541 {begins table of 257 permanent font identifiers} 4542@d undefined_control_sequence=frozen_null_font+257 {dummy location} 4543@d glue_base=undefined_control_sequence+1 {beginning of region 3} 4544 4545@<Initialize table entries...@>= 4546eq_type(undefined_control_sequence):=undefined_cs; 4547equiv(undefined_control_sequence):=null; 4548eq_level(undefined_control_sequence):=level_zero; 4549for k:=active_base to undefined_control_sequence-1 do 4550 eqtb[k]:=eqtb[undefined_control_sequence]; 4551 4552@ Here is a routine that displays the current meaning of an |eqtb| entry 4553in region 1 or~2. (Similar routines for the other regions will appear 4554below.) 4555 4556@<Show equivalent |n|, in region 1 or 2@>= 4557begin sprint_cs(n); print_char("="); print_cmd_chr(eq_type(n),equiv(n)); 4558if eq_type(n)>=call then 4559 begin print_char(":"); show_token_list(link(equiv(n)),null,32); 4560 end; 4561end 4562 4563@ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the 4564glue parameters defined here. It is important that the ``muskip'' 4565parameters have larger numbers than the others. 4566 4567@d line_skip_code=0 {interline glue if |baseline_skip| is infeasible} 4568@d baseline_skip_code=1 {desired glue between baselines} 4569@d par_skip_code=2 {extra glue just above a paragraph} 4570@d above_display_skip_code=3 {extra glue just above displayed math} 4571@d below_display_skip_code=4 {extra glue just below displayed math} 4572@d above_display_short_skip_code=5 4573 {glue above displayed math following short lines} 4574@d below_display_short_skip_code=6 4575 {glue below displayed math following short lines} 4576@d left_skip_code=7 {glue at left of justified lines} 4577@d right_skip_code=8 {glue at right of justified lines} 4578@d top_skip_code=9 {glue at top of main pages} 4579@d split_top_skip_code=10 {glue at top of split pages} 4580@d tab_skip_code=11 {glue between aligned entries} 4581@d space_skip_code=12 {glue between words (if not |zero_glue|)} 4582@d xspace_skip_code=13 {glue after sentences (if not |zero_glue|)} 4583@d par_fill_skip_code=14 {glue on last line of paragraph} 4584@d thin_mu_skip_code=15 {thin space in math formula} 4585@d med_mu_skip_code=16 {medium space in math formula} 4586@d thick_mu_skip_code=17 {thick space in math formula} 4587@d glue_pars=18 {total number of glue parameters} 4588@d skip_base=glue_base+glue_pars {table of 256 ``skip'' registers} 4589@d mu_skip_base=skip_base+256 {table of 256 ``muskip'' registers} 4590@d local_base=mu_skip_base+256 {beginning of region 4} 4591@# 4592@d skip(#)==equiv(skip_base+#) {|mem| location of glue specification} 4593@d mu_skip(#)==equiv(mu_skip_base+#) {|mem| location of math glue spec} 4594@d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification} 4595@d line_skip==glue_par(line_skip_code) 4596@d baseline_skip==glue_par(baseline_skip_code) 4597@d par_skip==glue_par(par_skip_code) 4598@d above_display_skip==glue_par(above_display_skip_code) 4599@d below_display_skip==glue_par(below_display_skip_code) 4600@d above_display_short_skip==glue_par(above_display_short_skip_code) 4601@d below_display_short_skip==glue_par(below_display_short_skip_code) 4602@d left_skip==glue_par(left_skip_code) 4603@d right_skip==glue_par(right_skip_code) 4604@d top_skip==glue_par(top_skip_code) 4605@d split_top_skip==glue_par(split_top_skip_code) 4606@d tab_skip==glue_par(tab_skip_code) 4607@d space_skip==glue_par(space_skip_code) 4608@d xspace_skip==glue_par(xspace_skip_code) 4609@d par_fill_skip==glue_par(par_fill_skip_code) 4610@d thin_mu_skip==glue_par(thin_mu_skip_code) 4611@d med_mu_skip==glue_par(med_mu_skip_code) 4612@d thick_mu_skip==glue_par(thick_mu_skip_code) 4613 4614@<Current |mem| equivalent of glue parameter number |n|@>=glue_par(n) 4615 4616@ Sometimes we need to convert \TeX's internal code numbers into symbolic 4617form. The |print_skip_param| routine gives the symbolic name of a glue 4618parameter. 4619 4620@<Declare the procedure called |print_skip_param|@>= 4621procedure print_skip_param(@!n:integer); 4622begin case n of 4623line_skip_code: print_esc("lineskip"); 4624baseline_skip_code: print_esc("baselineskip"); 4625par_skip_code: print_esc("parskip"); 4626above_display_skip_code: print_esc("abovedisplayskip"); 4627below_display_skip_code: print_esc("belowdisplayskip"); 4628above_display_short_skip_code: print_esc("abovedisplayshortskip"); 4629below_display_short_skip_code: print_esc("belowdisplayshortskip"); 4630left_skip_code: print_esc("leftskip"); 4631right_skip_code: print_esc("rightskip"); 4632top_skip_code: print_esc("topskip"); 4633split_top_skip_code: print_esc("splittopskip"); 4634tab_skip_code: print_esc("tabskip"); 4635space_skip_code: print_esc("spaceskip"); 4636xspace_skip_code: print_esc("xspaceskip"); 4637par_fill_skip_code: print_esc("parfillskip"); 4638thin_mu_skip_code: print_esc("thinmuskip"); 4639med_mu_skip_code: print_esc("medmuskip"); 4640thick_mu_skip_code: print_esc("thickmuskip"); 4641othercases print("[unknown glue parameter!]") 4642endcases; 4643end; 4644 4645@ The symbolic names for glue parameters are put into \TeX's hash table 4646by using the routine called |primitive|, defined below. Let us enter them 4647now, so that we don't have to list all those parameter names anywhere else. 4648 4649@<Put each of \TeX's primitives into the hash table@>= 4650primitive("lineskip",assign_glue,glue_base+line_skip_code);@/ 4651@!@:line_skip_}{\.{\\lineskip} primitive@> 4652primitive("baselineskip",assign_glue,glue_base+baseline_skip_code);@/ 4653@!@:baseline_skip_}{\.{\\baselineskip} primitive@> 4654primitive("parskip",assign_glue,glue_base+par_skip_code);@/ 4655@!@:par_skip_}{\.{\\parskip} primitive@> 4656primitive("abovedisplayskip",assign_glue,glue_base+above_display_skip_code);@/ 4657@!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@> 4658primitive("belowdisplayskip",assign_glue,glue_base+below_display_skip_code);@/ 4659@!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@> 4660primitive("abovedisplayshortskip", 4661 assign_glue,glue_base+above_display_short_skip_code);@/ 4662@!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@> 4663primitive("belowdisplayshortskip", 4664 assign_glue,glue_base+below_display_short_skip_code);@/ 4665@!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@> 4666primitive("leftskip",assign_glue,glue_base+left_skip_code);@/ 4667@!@:left_skip_}{\.{\\leftskip} primitive@> 4668primitive("rightskip",assign_glue,glue_base+right_skip_code);@/ 4669@!@:right_skip_}{\.{\\rightskip} primitive@> 4670primitive("topskip",assign_glue,glue_base+top_skip_code);@/ 4671@!@:top_skip_}{\.{\\topskip} primitive@> 4672primitive("splittopskip",assign_glue,glue_base+split_top_skip_code);@/ 4673@!@:split_top_skip_}{\.{\\splittopskip} primitive@> 4674primitive("tabskip",assign_glue,glue_base+tab_skip_code);@/ 4675@!@:tab_skip_}{\.{\\tabskip} primitive@> 4676primitive("spaceskip",assign_glue,glue_base+space_skip_code);@/ 4677@!@:space_skip_}{\.{\\spaceskip} primitive@> 4678primitive("xspaceskip",assign_glue,glue_base+xspace_skip_code);@/ 4679@!@:xspace_skip_}{\.{\\xspaceskip} primitive@> 4680primitive("parfillskip",assign_glue,glue_base+par_fill_skip_code);@/ 4681@!@:par_fill_skip_}{\.{\\parfillskip} primitive@> 4682primitive("thinmuskip",assign_mu_glue,glue_base+thin_mu_skip_code);@/ 4683@!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@> 4684primitive("medmuskip",assign_mu_glue,glue_base+med_mu_skip_code);@/ 4685@!@:med_mu_skip_}{\.{\\medmuskip} primitive@> 4686primitive("thickmuskip",assign_mu_glue,glue_base+thick_mu_skip_code);@/ 4687@!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@> 4688 4689@ @<Cases of |print_cmd_chr| for symbolic printing of primitives@>= 4690assign_glue,assign_mu_glue: if chr_code<skip_base then 4691 print_skip_param(chr_code-glue_base) 4692 else if chr_code<mu_skip_base then 4693 begin print_esc("skip"); print_int(chr_code-skip_base); 4694 end 4695 else begin print_esc("muskip"); print_int(chr_code-mu_skip_base); 4696 end; 4697 4698@ All glue parameters and registers are initially `\.{0pt plus0pt minus0pt}'. 4699 4700@<Initialize table entries...@>= 4701equiv(glue_base):=zero_glue; eq_level(glue_base):=level_one; 4702eq_type(glue_base):=glue_ref; 4703for k:=glue_base+1 to local_base-1 do eqtb[k]:=eqtb[glue_base]; 4704glue_ref_count(zero_glue):=glue_ref_count(zero_glue)+local_base-glue_base; 4705 4706@ @<Show equivalent |n|, in region 3@>= 4707if n<skip_base then 4708 begin print_skip_param(n-glue_base); print_char("="); 4709 if n<glue_base+thin_mu_skip_code then print_spec(equiv(n),"pt") 4710 else print_spec(equiv(n),"mu"); 4711 end 4712else if n<mu_skip_base then 4713 begin print_esc("skip"); print_int(n-skip_base); print_char("="); 4714 print_spec(equiv(n),"pt"); 4715 end 4716else begin print_esc("muskip"); print_int(n-mu_skip_base); print_char("="); 4717 print_spec(equiv(n),"mu"); 4718 end 4719 4720@ Region 4 of |eqtb| contains the local quantities defined here. The 4721bulk of this region is taken up by five tables that are indexed by eight-bit 4722characters; these tables are important to both the syntactic and semantic 4723portions of \TeX. There are also a bunch of special things like font and 4724token parameters, as well as the tables of \.{\\toks} and \.{\\box} 4725registers. 4726 4727@d par_shape_loc=local_base {specifies paragraph shape} 4728@d output_routine_loc=local_base+1 {points to token list for \.{\\output}} 4729@d every_par_loc=local_base+2 {points to token list for \.{\\everypar}} 4730@d every_math_loc=local_base+3 {points to token list for \.{\\everymath}} 4731@d every_display_loc=local_base+4 {points to token list for \.{\\everydisplay}} 4732@d every_hbox_loc=local_base+5 {points to token list for \.{\\everyhbox}} 4733@d every_vbox_loc=local_base+6 {points to token list for \.{\\everyvbox}} 4734@d every_job_loc=local_base+7 {points to token list for \.{\\everyjob}} 4735@d every_cr_loc=local_base+8 {points to token list for \.{\\everycr}} 4736@d err_help_loc=local_base+9 {points to token list for \.{\\errhelp}} 4737@d toks_base=local_base+10 {table of 256 token list registers} 4738@d box_base=toks_base+256 {table of 256 box registers} 4739@d cur_font_loc=box_base+256 {internal font number outside math mode} 4740@d math_font_base=cur_font_loc+1 {table of 48 math font numbers} 4741@d cat_code_base=math_font_base+48 4742 {table of 256 command codes (the ``catcodes'')} 4743@d lc_code_base=cat_code_base+256 {table of 256 lowercase mappings} 4744@d uc_code_base=lc_code_base+256 {table of 256 uppercase mappings} 4745@d sf_code_base=uc_code_base+256 {table of 256 spacefactor mappings} 4746@d math_code_base=sf_code_base+256 {table of 256 math mode mappings} 4747@d int_base=math_code_base+256 {beginning of region 5} 4748@# 4749@d par_shape_ptr==equiv(par_shape_loc) 4750@d output_routine==equiv(output_routine_loc) 4751@d every_par==equiv(every_par_loc) 4752@d every_math==equiv(every_math_loc) 4753@d every_display==equiv(every_display_loc) 4754@d every_hbox==equiv(every_hbox_loc) 4755@d every_vbox==equiv(every_vbox_loc) 4756@d every_job==equiv(every_job_loc) 4757@d every_cr==equiv(every_cr_loc) 4758@d err_help==equiv(err_help_loc) 4759@d toks(#)==equiv(toks_base+#) 4760@d box(#)==equiv(box_base+#) 4761@d cur_font==equiv(cur_font_loc) 4762@d fam_fnt(#)==equiv(math_font_base+#) 4763@d cat_code(#)==equiv(cat_code_base+#) 4764@d lc_code(#)==equiv(lc_code_base+#) 4765@d uc_code(#)==equiv(uc_code_base+#) 4766@d sf_code(#)==equiv(sf_code_base+#) 4767@d math_code(#)==equiv(math_code_base+#) 4768 {Note: |math_code(c)| is the true math code plus |min_halfword|} 4769 4770@<Put each...@>= 4771primitive("output",assign_toks,output_routine_loc); 4772@!@:output_}{\.{\\output} primitive@> 4773primitive("everypar",assign_toks,every_par_loc); 4774@!@:every_par_}{\.{\\everypar} primitive@> 4775primitive("everymath",assign_toks,every_math_loc); 4776@!@:every_math_}{\.{\\everymath} primitive@> 4777primitive("everydisplay",assign_toks,every_display_loc); 4778@!@:every_display_}{\.{\\everydisplay} primitive@> 4779primitive("everyhbox",assign_toks,every_hbox_loc); 4780@!@:every_hbox_}{\.{\\everyhbox} primitive@> 4781primitive("everyvbox",assign_toks,every_vbox_loc); 4782@!@:every_vbox_}{\.{\\everyvbox} primitive@> 4783primitive("everyjob",assign_toks,every_job_loc); 4784@!@:every_job_}{\.{\\everyjob} primitive@> 4785primitive("everycr",assign_toks,every_cr_loc); 4786@!@:every_cr_}{\.{\\everycr} primitive@> 4787primitive("errhelp",assign_toks,err_help_loc); 4788@!@:err_help_}{\.{\\errhelp} primitive@> 4789 4790@ @<Cases of |print_cmd_chr|...@>= 4791assign_toks: if chr_code>=toks_base then 4792 begin print_esc("toks"); print_int(chr_code-toks_base); 4793 end 4794else case chr_code of 4795 output_routine_loc: print_esc("output"); 4796 every_par_loc: print_esc("everypar"); 4797 every_math_loc: print_esc("everymath"); 4798 every_display_loc: print_esc("everydisplay"); 4799 every_hbox_loc: print_esc("everyhbox"); 4800 every_vbox_loc: print_esc("everyvbox"); 4801 every_job_loc: print_esc("everyjob"); 4802 every_cr_loc: print_esc("everycr"); 4803 othercases print_esc("errhelp") 4804 endcases; 4805 4806@ We initialize most things to null or undefined values. An undefined font 4807is represented by the internal code |font_base|. 4808 4809However, the character code tables are given initial values based on the 4810conventional interpretation of ASCII code. These initial values should 4811not be changed when \TeX\ is adapted for use with non-English languages; 4812all changes to the initialization conventions should be made in format 4813packages, not in \TeX\ itself, so that global interchange of formats is 4814possible. 4815 4816@d null_font==font_base 4817@d var_code==@'70000 {math code meaning ``use the current family''} 4818 4819@<Initialize table entries...@>= 4820par_shape_ptr:=null; eq_type(par_shape_loc):=shape_ref; 4821eq_level(par_shape_loc):=level_one;@/ 4822for k:=output_routine_loc to toks_base+255 do 4823 eqtb[k]:=eqtb[undefined_control_sequence]; 4824box(0):=null; eq_type(box_base):=box_ref; eq_level(box_base):=level_one; 4825for k:=box_base+1 to box_base+255 do eqtb[k]:=eqtb[box_base]; 4826cur_font:=null_font; eq_type(cur_font_loc):=data; 4827eq_level(cur_font_loc):=level_one;@/ 4828for k:=math_font_base to math_font_base+47 do eqtb[k]:=eqtb[cur_font_loc]; 4829equiv(cat_code_base):=0; eq_type(cat_code_base):=data; 4830eq_level(cat_code_base):=level_one;@/ 4831for k:=cat_code_base+1 to int_base-1 do eqtb[k]:=eqtb[cat_code_base]; 4832for k:=0 to 255 do 4833 begin cat_code(k):=other_char; math_code(k):=hi(k); sf_code(k):=1000; 4834 end; 4835cat_code(carriage_return):=car_ret; cat_code(" "):=spacer; 4836cat_code("\"):=escape; cat_code("%"):=comment; 4837cat_code(invalid_code):=invalid_char; cat_code(null_code):=ignore; 4838for k:="0" to "9" do math_code(k):=hi(k+var_code); 4839for k:="A" to "Z" do 4840 begin cat_code(k):=letter; cat_code(k+"a"-"A"):=letter;@/ 4841 math_code(k):=hi(k+var_code+@"100); 4842 math_code(k+"a"-"A"):=hi(k+"a"-"A"+var_code+@"100);@/ 4843 lc_code(k):=k+"a"-"A"; lc_code(k+"a"-"A"):=k+"a"-"A";@/ 4844 uc_code(k):=k; uc_code(k+"a"-"A"):=k;@/ 4845 sf_code(k):=999; 4846 end; 4847 4848@ @<Show equivalent |n|, in region 4@>= 4849if n=par_shape_loc then 4850 begin print_esc("parshape"); print_char("="); 4851 if par_shape_ptr=null then print_char("0") 4852 else print_int(info(par_shape_ptr)); 4853 end 4854else if n<toks_base then 4855 begin print_cmd_chr(assign_toks,n); print_char("="); 4856 if equiv(n)<>null then show_token_list(link(equiv(n)),null,32); 4857 end 4858else if n<box_base then 4859 begin print_esc("toks"); print_int(n-toks_base); print_char("="); 4860 if equiv(n)<>null then show_token_list(link(equiv(n)),null,32); 4861 end 4862else if n<cur_font_loc then 4863 begin print_esc("box"); print_int(n-box_base); print_char("="); 4864 if equiv(n)=null then print("void") 4865 else begin depth_threshold:=0; breadth_max:=1; show_node_list(equiv(n)); 4866 end; 4867 end 4868else if n<cat_code_base then @<Show the font identifier in |eqtb[n]|@> 4869else @<Show the halfword code in |eqtb[n]|@> 4870 4871@ @<Show the font identifier in |eqtb[n]|@>= 4872begin if n=cur_font_loc then print("current font") 4873else if n<math_font_base+16 then 4874 begin print_esc("textfont"); print_int(n-math_font_base); 4875 end 4876else if n<math_font_base+32 then 4877 begin print_esc("scriptfont"); print_int(n-math_font_base-16); 4878 end 4879else begin print_esc("scriptscriptfont"); print_int(n-math_font_base-32); 4880 end; 4881print_char("=");@/ 4882print_esc(hash[font_id_base+equiv(n)].rh); 4883 {that's |font_id_text(equiv(n))|} 4884end 4885 4886@ @<Show the halfword code in |eqtb[n]|@>= 4887if n<math_code_base then 4888 begin if n<lc_code_base then 4889 begin print_esc("catcode"); print_int(n-cat_code_base); 4890 end 4891 else if n<uc_code_base then 4892 begin print_esc("lccode"); print_int(n-lc_code_base); 4893 end 4894 else if n<sf_code_base then 4895 begin print_esc("uccode"); print_int(n-uc_code_base); 4896 end 4897 else begin print_esc("sfcode"); print_int(n-sf_code_base); 4898 end; 4899 print_char("="); print_int(equiv(n)); 4900 end 4901else begin print_esc("mathcode"); print_int(n-math_code_base); 4902 print_char("="); print_int(ho(equiv(n))); 4903 end 4904 4905@ Region 5 of |eqtb| contains the integer parameters and registers defined 4906here, as well as the |del_code| table. The latter table differs from the 4907|cat_code..math_code| tables that precede it, since delimiter codes are 4908fullword integers while the other kinds of codes occupy at most a 4909halfword. This is what makes region~5 different from region~4. We will 4910store the |eq_level| information in an auxiliary array of quarterwords 4911that will be defined later. 4912 4913@d pretolerance_code=0 {badness tolerance before hyphenation} 4914@d tolerance_code=1 {badness tolerance after hyphenation} 4915@d line_penalty_code=2 {added to the badness of every line} 4916@d hyphen_penalty_code=3 {penalty for break after discretionary hyphen} 4917@d ex_hyphen_penalty_code=4 {penalty for break after explicit hyphen} 4918@d club_penalty_code=5 {penalty for creating a club line} 4919@d widow_penalty_code=6 {penalty for creating a widow line} 4920@d display_widow_penalty_code=7 {ditto, just before a display} 4921@d broken_penalty_code=8 {penalty for breaking a page at a broken line} 4922@d bin_op_penalty_code=9 {penalty for breaking after a binary operation} 4923@d rel_penalty_code=10 {penalty for breaking after a relation} 4924@d pre_display_penalty_code=11 4925 {penalty for breaking just before a displayed formula} 4926@d post_display_penalty_code=12 4927 {penalty for breaking just after a displayed formula} 4928@d inter_line_penalty_code=13 {additional penalty between lines} 4929@d double_hyphen_demerits_code=14 {demerits for double hyphen break} 4930@d final_hyphen_demerits_code=15 {demerits for final hyphen break} 4931@d adj_demerits_code=16 {demerits for adjacent incompatible lines} 4932@d mag_code=17 {magnification ratio} 4933@d delimiter_factor_code=18 {ratio for variable-size delimiters} 4934@d looseness_code=19 {change in number of lines for a paragraph} 4935@d time_code=20 {current time of day} 4936@d day_code=21 {current day of the month} 4937@d month_code=22 {current month of the year} 4938@d year_code=23 {current year of our Lord} 4939@d show_box_breadth_code=24 {nodes per level in |show_box|} 4940@d show_box_depth_code=25 {maximum level in |show_box|} 4941@d hbadness_code=26 {hboxes exceeding this badness will be shown by |hpack|} 4942@d vbadness_code=27 {vboxes exceeding this badness will be shown by |vpack|} 4943@d pausing_code=28 {pause after each line is read from a file} 4944@d tracing_online_code=29 {show diagnostic output on terminal} 4945@d tracing_macros_code=30 {show macros as they are being expanded} 4946@d tracing_stats_code=31 {show memory usage if \TeX\ knows it} 4947@d tracing_paragraphs_code=32 {show line-break calculations} 4948@d tracing_pages_code=33 {show page-break calculations} 4949@d tracing_output_code=34 {show boxes when they are shipped out} 4950@d tracing_lost_chars_code=35 {show characters that aren't in the font} 4951@d tracing_commands_code=36 {show command codes at |big_switch|} 4952@d tracing_restores_code=37 {show equivalents when they are restored} 4953@d uc_hyph_code=38 {hyphenate words beginning with a capital letter} 4954@d output_penalty_code=39 {penalty found at current page break} 4955@d max_dead_cycles_code=40 {bound on consecutive dead cycles of output} 4956@d hang_after_code=41 {hanging indentation changes after this many lines} 4957@d floating_penalty_code=42 {penalty for insertions heldover after a split} 4958@d global_defs_code=43 {override \.{\\global} specifications} 4959@d cur_fam_code=44 {current family} 4960@d escape_char_code=45 {escape character for token output} 4961@d default_hyphen_char_code=46 {value of \.{\\hyphenchar} when a font is loaded} 4962@d default_skew_char_code=47 {value of \.{\\skewchar} when a font is loaded} 4963@d end_line_char_code=48 {character placed at the right end of the buffer} 4964@d new_line_char_code=49 {character that prints as |print_ln|} 4965@d language_code=50 {current hyphenation table} 4966@d left_hyphen_min_code=51 {minimum left hyphenation fragment size} 4967@d right_hyphen_min_code=52 {minimum right hyphenation fragment size} 4968@d holding_inserts_code=53 {do not remove insertion nodes from \.{\\box255}} 4969@d error_context_lines_code=54 {maximum intermediate line pairs shown} 4970@d int_pars=55 {total number of integer parameters} 4971@d count_base=int_base+int_pars {256 user \.{\\count} registers} 4972@d del_code_base=count_base+256 {256 delimiter code mappings} 4973@d dimen_base=del_code_base+256 {beginning of region 6} 4974@# 4975@d del_code(#)==eqtb[del_code_base+#].int 4976@d count(#)==eqtb[count_base+#].int 4977@d int_par(#)==eqtb[int_base+#].int {an integer parameter} 4978@d pretolerance==int_par(pretolerance_code) 4979@d tolerance==int_par(tolerance_code) 4980@d line_penalty==int_par(line_penalty_code) 4981@d hyphen_penalty==int_par(hyphen_penalty_code) 4982@d ex_hyphen_penalty==int_par(ex_hyphen_penalty_code) 4983@d club_penalty==int_par(club_penalty_code) 4984@d widow_penalty==int_par(widow_penalty_code) 4985@d display_widow_penalty==int_par(display_widow_penalty_code) 4986@d broken_penalty==int_par(broken_penalty_code) 4987@d bin_op_penalty==int_par(bin_op_penalty_code) 4988@d rel_penalty==int_par(rel_penalty_code) 4989@d pre_display_penalty==int_par(pre_display_penalty_code) 4990@d post_display_penalty==int_par(post_display_penalty_code) 4991@d inter_line_penalty==int_par(inter_line_penalty_code) 4992@d double_hyphen_demerits==int_par(double_hyphen_demerits_code) 4993@d final_hyphen_demerits==int_par(final_hyphen_demerits_code) 4994@d adj_demerits==int_par(adj_demerits_code) 4995@d mag==int_par(mag_code) 4996@d delimiter_factor==int_par(delimiter_factor_code) 4997@d looseness==int_par(looseness_code) 4998@d time==int_par(time_code) 4999@d day==int_par(day_code) 5000@d month==int_par(month_code) 5001@d year==int_par(year_code) 5002@d show_box_breadth==int_par(show_box_breadth_code) 5003@d show_box_depth==int_par(show_box_depth_code) 5004@d hbadness==int_par(hbadness_code) 5005@d vbadness==int_par(vbadness_code) 5006@d pausing==int_par(pausing_code) 5007@d tracing_online==int_par(tracing_online_code) 5008@d tracing_macros==int_par(tracing_macros_code) 5009@d tracing_stats==int_par(tracing_stats_code) 5010@d tracing_paragraphs==int_par(tracing_paragraphs_code) 5011@d tracing_pages==int_par(tracing_pages_code) 5012@d tracing_output==int_par(tracing_output_code) 5013@d tracing_lost_chars==int_par(tracing_lost_chars_code) 5014@d tracing_commands==int_par(tracing_commands_code) 5015@d tracing_restores==int_par(tracing_restores_code) 5016@d uc_hyph==int_par(uc_hyph_code) 5017@d output_penalty==int_par(output_penalty_code) 5018@d max_dead_cycles==int_par(max_dead_cycles_code) 5019@d hang_after==int_par(hang_after_code) 5020@d floating_penalty==int_par(floating_penalty_code) 5021@d global_defs==int_par(global_defs_code) 5022@d cur_fam==int_par(cur_fam_code) 5023@d escape_char==int_par(escape_char_code) 5024@d default_hyphen_char==int_par(default_hyphen_char_code) 5025@d default_skew_char==int_par(default_skew_char_code) 5026@d end_line_char==int_par(end_line_char_code) 5027@d new_line_char==int_par(new_line_char_code) 5028@d language==int_par(language_code) 5029@d left_hyphen_min==int_par(left_hyphen_min_code) 5030@d right_hyphen_min==int_par(right_hyphen_min_code) 5031@d holding_inserts==int_par(holding_inserts_code) 5032@d error_context_lines==int_par(error_context_lines_code) 5033 5034@<Assign the values |depth_threshold:=show_box_depth|...@>= 5035depth_threshold:=show_box_depth; 5036breadth_max:=show_box_breadth 5037 5038@ We can print the symbolic name of an integer parameter as follows. 5039 5040@p procedure print_param(@!n:integer); 5041begin case n of 5042pretolerance_code:print_esc("pretolerance"); 5043tolerance_code:print_esc("tolerance"); 5044line_penalty_code:print_esc("linepenalty"); 5045hyphen_penalty_code:print_esc("hyphenpenalty"); 5046ex_hyphen_penalty_code:print_esc("exhyphenpenalty"); 5047club_penalty_code:print_esc("clubpenalty"); 5048widow_penalty_code:print_esc("widowpenalty"); 5049display_widow_penalty_code:print_esc("displaywidowpenalty"); 5050broken_penalty_code:print_esc("brokenpenalty"); 5051bin_op_penalty_code:print_esc("binoppenalty"); 5052rel_penalty_code:print_esc("relpenalty"); 5053pre_display_penalty_code:print_esc("predisplaypenalty"); 5054post_display_penalty_code:print_esc("postdisplaypenalty"); 5055inter_line_penalty_code:print_esc("interlinepenalty"); 5056double_hyphen_demerits_code:print_esc("doublehyphendemerits"); 5057final_hyphen_demerits_code:print_esc("finalhyphendemerits"); 5058adj_demerits_code:print_esc("adjdemerits"); 5059mag_code:print_esc("mag"); 5060delimiter_factor_code:print_esc("delimiterfactor"); 5061looseness_code:print_esc("looseness"); 5062time_code:print_esc("time"); 5063day_code:print_esc("day"); 5064month_code:print_esc("month"); 5065year_code:print_esc("year"); 5066show_box_breadth_code:print_esc("showboxbreadth"); 5067show_box_depth_code:print_esc("showboxdepth"); 5068hbadness_code:print_esc("hbadness"); 5069vbadness_code:print_esc("vbadness"); 5070pausing_code:print_esc("pausing"); 5071tracing_online_code:print_esc("tracingonline"); 5072tracing_macros_code:print_esc("tracingmacros"); 5073tracing_stats_code:print_esc("tracingstats"); 5074tracing_paragraphs_code:print_esc("tracingparagraphs"); 5075tracing_pages_code:print_esc("tracingpages"); 5076tracing_output_code:print_esc("tracingoutput"); 5077tracing_lost_chars_code:print_esc("tracinglostchars"); 5078tracing_commands_code:print_esc("tracingcommands"); 5079tracing_restores_code:print_esc("tracingrestores"); 5080uc_hyph_code:print_esc("uchyph"); 5081output_penalty_code:print_esc("outputpenalty"); 5082max_dead_cycles_code:print_esc("maxdeadcycles"); 5083hang_after_code:print_esc("hangafter"); 5084floating_penalty_code:print_esc("floatingpenalty"); 5085global_defs_code:print_esc("globaldefs"); 5086cur_fam_code:print_esc("fam"); 5087escape_char_code:print_esc("escapechar"); 5088default_hyphen_char_code:print_esc("defaulthyphenchar"); 5089default_skew_char_code:print_esc("defaultskewchar"); 5090end_line_char_code:print_esc("endlinechar"); 5091new_line_char_code:print_esc("newlinechar"); 5092language_code:print_esc("language"); 5093left_hyphen_min_code:print_esc("lefthyphenmin"); 5094right_hyphen_min_code:print_esc("righthyphenmin"); 5095holding_inserts_code:print_esc("holdinginserts"); 5096error_context_lines_code:print_esc("errorcontextlines"); 5097othercases print("[unknown integer parameter!]") 5098endcases; 5099end; 5100 5101@ The integer parameter names must be entered into the hash table. 5102 5103@<Put each...@>= 5104primitive("pretolerance",assign_int,int_base+pretolerance_code);@/ 5105@!@:pretolerance_}{\.{\\pretolerance} primitive@> 5106primitive("tolerance",assign_int,int_base+tolerance_code);@/ 5107@!@:tolerance_}{\.{\\tolerance} primitive@> 5108primitive("linepenalty",assign_int,int_base+line_penalty_code);@/ 5109@!@:line_penalty_}{\.{\\linepenalty} primitive@> 5110primitive("hyphenpenalty",assign_int,int_base+hyphen_penalty_code);@/ 5111@!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@> 5112primitive("exhyphenpenalty",assign_int,int_base+ex_hyphen_penalty_code);@/ 5113@!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@> 5114primitive("clubpenalty",assign_int,int_base+club_penalty_code);@/ 5115@!@:club_penalty_}{\.{\\clubpenalty} primitive@> 5116primitive("widowpenalty",assign_int,int_base+widow_penalty_code);@/ 5117@!@:widow_penalty_}{\.{\\widowpenalty} primitive@> 5118primitive("displaywidowpenalty", 5119 assign_int,int_base+display_widow_penalty_code);@/ 5120@!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@> 5121primitive("brokenpenalty",assign_int,int_base+broken_penalty_code);@/ 5122@!@:broken_penalty_}{\.{\\brokenpenalty} primitive@> 5123primitive("binoppenalty",assign_int,int_base+bin_op_penalty_code);@/ 5124@!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@> 5125primitive("relpenalty",assign_int,int_base+rel_penalty_code);@/ 5126@!@:rel_penalty_}{\.{\\relpenalty} primitive@> 5127primitive("predisplaypenalty",assign_int,int_base+pre_display_penalty_code);@/ 5128@!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@> 5129primitive("postdisplaypenalty",assign_int,int_base+post_display_penalty_code);@/ 5130@!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@> 5131primitive("interlinepenalty",assign_int,int_base+inter_line_penalty_code);@/ 5132@!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@> 5133primitive("doublehyphendemerits", 5134 assign_int,int_base+double_hyphen_demerits_code);@/ 5135@!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@> 5136primitive("finalhyphendemerits", 5137 assign_int,int_base+final_hyphen_demerits_code);@/ 5138@!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@> 5139primitive("adjdemerits",assign_int,int_base+adj_demerits_code);@/ 5140@!@:adj_demerits_}{\.{\\adjdemerits} primitive@> 5141primitive("mag",assign_int,int_base+mag_code);@/ 5142@!@:mag_}{\.{\\mag} primitive@> 5143primitive("delimiterfactor",assign_int,int_base+delimiter_factor_code);@/ 5144@!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@> 5145primitive("looseness",assign_int,int_base+looseness_code);@/ 5146@!@:looseness_}{\.{\\looseness} primitive@> 5147primitive("time",assign_int,int_base+time_code);@/ 5148@!@:time_}{\.{\\time} primitive@> 5149primitive("day",assign_int,int_base+day_code);@/ 5150@!@:day_}{\.{\\day} primitive@> 5151primitive("month",assign_int,int_base+month_code);@/ 5152@!@:month_}{\.{\\month} primitive@> 5153primitive("year",assign_int,int_base+year_code);@/ 5154@!@:year_}{\.{\\year} primitive@> 5155primitive("showboxbreadth",assign_int,int_base+show_box_breadth_code);@/ 5156@!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@> 5157primitive("showboxdepth",assign_int,int_base+show_box_depth_code);@/ 5158@!@:show_box_depth_}{\.{\\showboxdepth} primitive@> 5159primitive("hbadness",assign_int,int_base+hbadness_code);@/ 5160@!@:hbadness_}{\.{\\hbadness} primitive@> 5161primitive("vbadness",assign_int,int_base+vbadness_code);@/ 5162@!@:vbadness_}{\.{\\vbadness} primitive@> 5163primitive("pausing",assign_int,int_base+pausing_code);@/ 5164@!@:pausing_}{\.{\\pausing} primitive@> 5165primitive("tracingonline",assign_int,int_base+tracing_online_code);@/ 5166@!@:tracing_online_}{\.{\\tracingonline} primitive@> 5167primitive("tracingmacros",assign_int,int_base+tracing_macros_code);@/ 5168@!@:tracing_macros_}{\.{\\tracingmacros} primitive@> 5169primitive("tracingstats",assign_int,int_base+tracing_stats_code);@/ 5170@!@:tracing_stats_}{\.{\\tracingstats} primitive@> 5171primitive("tracingparagraphs",assign_int,int_base+tracing_paragraphs_code);@/ 5172@!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@> 5173primitive("tracingpages",assign_int,int_base+tracing_pages_code);@/ 5174@!@:tracing_pages_}{\.{\\tracingpages} primitive@> 5175primitive("tracingoutput",assign_int,int_base+tracing_output_code);@/ 5176@!@:tracing_output_}{\.{\\tracingoutput} primitive@> 5177primitive("tracinglostchars",assign_int,int_base+tracing_lost_chars_code);@/ 5178@!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@> 5179primitive("tracingcommands",assign_int,int_base+tracing_commands_code);@/ 5180@!@:tracing_commands_}{\.{\\tracingcommands} primitive@> 5181primitive("tracingrestores",assign_int,int_base+tracing_restores_code);@/ 5182@!@:tracing_restores_}{\.{\\tracingrestores} primitive@> 5183primitive("uchyph",assign_int,int_base+uc_hyph_code);@/ 5184@!@:uc_hyph_}{\.{\\uchyph} primitive@> 5185primitive("outputpenalty",assign_int,int_base+output_penalty_code);@/ 5186@!@:output_penalty_}{\.{\\outputpenalty} primitive@> 5187primitive("maxdeadcycles",assign_int,int_base+max_dead_cycles_code);@/ 5188@!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@> 5189primitive("hangafter",assign_int,int_base+hang_after_code);@/ 5190@!@:hang_after_}{\.{\\hangafter} primitive@> 5191primitive("floatingpenalty",assign_int,int_base+floating_penalty_code);@/ 5192@!@:floating_penalty_}{\.{\\floatingpenalty} primitive@> 5193primitive("globaldefs",assign_int,int_base+global_defs_code);@/ 5194@!@:global_defs_}{\.{\\globaldefs} primitive@> 5195primitive("fam",assign_int,int_base+cur_fam_code);@/ 5196@!@:fam_}{\.{\\fam} primitive@> 5197primitive("escapechar",assign_int,int_base+escape_char_code);@/ 5198@!@:escape_char_}{\.{\\escapechar} primitive@> 5199primitive("defaulthyphenchar",assign_int,int_base+default_hyphen_char_code);@/ 5200@!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@> 5201primitive("defaultskewchar",assign_int,int_base+default_skew_char_code);@/ 5202@!@:default_skew_char_}{\.{\\defaultskewchar} primitive@> 5203primitive("endlinechar",assign_int,int_base+end_line_char_code);@/ 5204@!@:end_line_char_}{\.{\\endlinechar} primitive@> 5205primitive("newlinechar",assign_int,int_base+new_line_char_code);@/ 5206@!@:new_line_char_}{\.{\\newlinechar} primitive@> 5207primitive("language",assign_int,int_base+language_code);@/ 5208@!@:language_}{\.{\\language} primitive@> 5209primitive("lefthyphenmin",assign_int,int_base+left_hyphen_min_code);@/ 5210@!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@> 5211primitive("righthyphenmin",assign_int,int_base+right_hyphen_min_code);@/ 5212@!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@> 5213primitive("holdinginserts",assign_int,int_base+holding_inserts_code);@/ 5214@!@:holding_inserts_}{\.{\\holdinginserts} primitive@> 5215primitive("errorcontextlines",assign_int,int_base+error_context_lines_code);@/ 5216@!@:error_context_lines_}{\.{\\errorcontextlines} primitive@> 5217 5218@ @<Cases of |print_cmd_chr|...@>= 5219assign_int: if chr_code<count_base then print_param(chr_code-int_base) 5220 else begin print_esc("count"); print_int(chr_code-count_base); 5221 end; 5222 5223@ The integer parameters should really be initialized by a macro package; 5224the following initialization does the minimum to keep \TeX\ from 5225complete failure. 5226@^null delimiter@> 5227 5228@<Initialize table entries...@>= 5229for k:=int_base to del_code_base-1 do eqtb[k].int:=0; 5230mag:=1000; tolerance:=10000; hang_after:=1; max_dead_cycles:=25; 5231escape_char:="\"; end_line_char:=carriage_return; 5232for k:=0 to 255 do del_code(k):=-1; 5233del_code("."):=0; {this null delimiter is used in error recovery} 5234 5235@ The following procedure, which is called just before \TeX\ initializes its 5236input and output, establishes the initial values of the date and time. 5237@^system dependencies@> 5238Since standard \PASCAL\ cannot provide such information, something special 5239is needed. The program here simply specifies July 4, 1776, at noon; but 5240users probably want a better approximation to the truth. 5241 5242@p procedure fix_date_and_time; 5243begin time:=12*60; {minutes since midnight} 5244day:=4; {fourth day of the month} 5245month:=7; {seventh month of the year} 5246year:=1776; {Anno Domini} 5247end; 5248 5249@ @<Show equivalent |n|, in region 5@>= 5250begin if n<count_base then print_param(n-int_base) 5251else if n<del_code_base then 5252 begin print_esc("count"); print_int(n-count_base); 5253 end 5254else begin print_esc("delcode"); print_int(n-del_code_base); 5255 end; 5256print_char("="); print_int(eqtb[n].int); 5257end 5258 5259@ @<Set variable |c| to the current escape character@>=c:=escape_char 5260 5261@ @<Character |s| is the current new-line character@>=s=new_line_char 5262 5263@ \TeX\ is occasionally supposed to print diagnostic information that 5264goes only into the transcript file, unless |tracing_online| is positive. 5265Here are two routines that adjust the destination of print commands: 5266 5267@p procedure begin_diagnostic; {prepare to do some tracing} 5268begin old_setting:=selector; 5269if (tracing_online<=0)and(selector=term_and_log) then 5270 begin decr(selector); 5271 if history=spotless then history:=warning_issued; 5272 end; 5273end; 5274@# 5275procedure end_diagnostic(@!blank_line:boolean); 5276 {restore proper conditions after tracing} 5277begin print_nl(""); 5278if blank_line then print_ln; 5279selector:=old_setting; 5280end; 5281 5282@ Of course we had better declare another global variable, if the previous 5283routines are going to work. 5284 5285@<Glob...@>= 5286@!old_setting:0..max_selector; 5287 5288@ The final region of |eqtb| contains the dimension parameters defined 5289here, and the 256 \.{\\dimen} registers. 5290 5291@d par_indent_code=0 {indentation of paragraphs} 5292@d math_surround_code=1 {space around math in text} 5293@d line_skip_limit_code=2 {threshold for |line_skip| instead of |baseline_skip|} 5294@d hsize_code=3 {line width in horizontal mode} 5295@d vsize_code=4 {page height in vertical mode} 5296@d max_depth_code=5 {maximum depth of boxes on main pages} 5297@d split_max_depth_code=6 {maximum depth of boxes on split pages} 5298@d box_max_depth_code=7 {maximum depth of explicit vboxes} 5299@d hfuzz_code=8 {tolerance for overfull hbox messages} 5300@d vfuzz_code=9 {tolerance for overfull vbox messages} 5301@d delimiter_shortfall_code=10 {maximum amount uncovered by variable delimiters} 5302@d null_delimiter_space_code=11 {blank space in null delimiters} 5303@d script_space_code=12 {extra space after subscript or superscript} 5304@d pre_display_size_code=13 {length of text preceding a display} 5305@d display_width_code=14 {length of line for displayed equation} 5306@d display_indent_code=15 {indentation of line for displayed equation} 5307@d overfull_rule_code=16 {width of rule that identifies overfull hboxes} 5308@d hang_indent_code=17 {amount of hanging indentation} 5309@d h_offset_code=18 {amount of horizontal offset when shipping pages out} 5310@d v_offset_code=19 {amount of vertical offset when shipping pages out} 5311@d emergency_stretch_code=20 {reduces badnesses on final pass of line-breaking} 5312@d dimen_pars=21 {total number of dimension parameters} 5313@d scaled_base=dimen_base+dimen_pars 5314 {table of 256 user-defined \.{\\dimen} registers} 5315@d eqtb_size=scaled_base+255 {largest subscript of |eqtb|} 5316@# 5317@d dimen(#)==eqtb[scaled_base+#].sc 5318@d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity} 5319@d par_indent==dimen_par(par_indent_code) 5320@d math_surround==dimen_par(math_surround_code) 5321@d line_skip_limit==dimen_par(line_skip_limit_code) 5322@d hsize==dimen_par(hsize_code) 5323@d vsize==dimen_par(vsize_code) 5324@d max_depth==dimen_par(max_depth_code) 5325@d split_max_depth==dimen_par(split_max_depth_code) 5326@d box_max_depth==dimen_par(box_max_depth_code) 5327@d hfuzz==dimen_par(hfuzz_code) 5328@d vfuzz==dimen_par(vfuzz_code) 5329@d delimiter_shortfall==dimen_par(delimiter_shortfall_code) 5330@d null_delimiter_space==dimen_par(null_delimiter_space_code) 5331@d script_space==dimen_par(script_space_code) 5332@d pre_display_size==dimen_par(pre_display_size_code) 5333@d display_width==dimen_par(display_width_code) 5334@d display_indent==dimen_par(display_indent_code) 5335@d overfull_rule==dimen_par(overfull_rule_code) 5336@d hang_indent==dimen_par(hang_indent_code) 5337@d h_offset==dimen_par(h_offset_code) 5338@d v_offset==dimen_par(v_offset_code) 5339@d emergency_stretch==dimen_par(emergency_stretch_code) 5340 5341@p procedure print_length_param(@!n:integer); 5342begin case n of 5343par_indent_code:print_esc("parindent"); 5344math_surround_code:print_esc("mathsurround"); 5345line_skip_limit_code:print_esc("lineskiplimit"); 5346hsize_code:print_esc("hsize"); 5347vsize_code:print_esc("vsize"); 5348max_depth_code:print_esc("maxdepth"); 5349split_max_depth_code:print_esc("splitmaxdepth"); 5350box_max_depth_code:print_esc("boxmaxdepth"); 5351hfuzz_code:print_esc("hfuzz"); 5352vfuzz_code:print_esc("vfuzz"); 5353delimiter_shortfall_code:print_esc("delimitershortfall"); 5354null_delimiter_space_code:print_esc("nulldelimiterspace"); 5355script_space_code:print_esc("scriptspace"); 5356pre_display_size_code:print_esc("predisplaysize"); 5357display_width_code:print_esc("displaywidth"); 5358display_indent_code:print_esc("displayindent"); 5359overfull_rule_code:print_esc("overfullrule"); 5360hang_indent_code:print_esc("hangindent"); 5361h_offset_code:print_esc("hoffset"); 5362v_offset_code:print_esc("voffset"); 5363emergency_stretch_code:print_esc("emergencystretch"); 5364othercases print("[unknown dimen parameter!]") 5365endcases; 5366end; 5367 5368@ @<Put each...@>= 5369primitive("parindent",assign_dimen,dimen_base+par_indent_code);@/ 5370@!@:par_indent_}{\.{\\parindent} primitive@> 5371primitive("mathsurround",assign_dimen,dimen_base+math_surround_code);@/ 5372@!@:math_surround_}{\.{\\mathsurround} primitive@> 5373primitive("lineskiplimit",assign_dimen,dimen_base+line_skip_limit_code);@/ 5374@!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@> 5375primitive("hsize",assign_dimen,dimen_base+hsize_code);@/ 5376@!@:hsize_}{\.{\\hsize} primitive@> 5377primitive("vsize",assign_dimen,dimen_base+vsize_code);@/ 5378@!@:vsize_}{\.{\\vsize} primitive@> 5379primitive("maxdepth",assign_dimen,dimen_base+max_depth_code);@/ 5380@!@:max_depth_}{\.{\\maxdepth} primitive@> 5381primitive("splitmaxdepth",assign_dimen,dimen_base+split_max_depth_code);@/ 5382@!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@> 5383primitive("boxmaxdepth",assign_dimen,dimen_base+box_max_depth_code);@/ 5384@!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@> 5385primitive("hfuzz",assign_dimen,dimen_base+hfuzz_code);@/ 5386@!@:hfuzz_}{\.{\\hfuzz} primitive@> 5387primitive("vfuzz",assign_dimen,dimen_base+vfuzz_code);@/ 5388@!@:vfuzz_}{\.{\\vfuzz} primitive@> 5389primitive("delimitershortfall", 5390 assign_dimen,dimen_base+delimiter_shortfall_code);@/ 5391@!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@> 5392primitive("nulldelimiterspace", 5393 assign_dimen,dimen_base+null_delimiter_space_code);@/ 5394@!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@> 5395primitive("scriptspace",assign_dimen,dimen_base+script_space_code);@/ 5396@!@:script_space_}{\.{\\scriptspace} primitive@> 5397primitive("predisplaysize",assign_dimen,dimen_base+pre_display_size_code);@/ 5398@!@:pre_display_size_}{\.{\\predisplaysize} primitive@> 5399primitive("displaywidth",assign_dimen,dimen_base+display_width_code);@/ 5400@!@:display_width_}{\.{\\displaywidth} primitive@> 5401primitive("displayindent",assign_dimen,dimen_base+display_indent_code);@/ 5402@!@:display_indent_}{\.{\\displayindent} primitive@> 5403primitive("overfullrule",assign_dimen,dimen_base+overfull_rule_code);@/ 5404@!@:overfull_rule_}{\.{\\overfullrule} primitive@> 5405primitive("hangindent",assign_dimen,dimen_base+hang_indent_code);@/ 5406@!@:hang_indent_}{\.{\\hangindent} primitive@> 5407primitive("hoffset",assign_dimen,dimen_base+h_offset_code);@/ 5408@!@:h_offset_}{\.{\\hoffset} primitive@> 5409primitive("voffset",assign_dimen,dimen_base+v_offset_code);@/ 5410@!@:v_offset_}{\.{\\voffset} primitive@> 5411primitive("emergencystretch",assign_dimen,dimen_base+emergency_stretch_code);@/ 5412@!@:emergency_stretch_}{\.{\\emergencystretch} primitive@> 5413 5414@ @<Cases of |print_cmd_chr|...@>= 5415assign_dimen: if chr_code<scaled_base then 5416 print_length_param(chr_code-dimen_base) 5417 else begin print_esc("dimen"); print_int(chr_code-scaled_base); 5418 end; 5419 5420@ @<Initialize table entries...@>= 5421for k:=dimen_base to eqtb_size do eqtb[k].sc:=0; 5422 5423@ @<Show equivalent |n|, in region 6@>= 5424begin if n<scaled_base then print_length_param(n-dimen_base) 5425else begin print_esc("dimen"); print_int(n-scaled_base); 5426 end; 5427print_char("="); print_scaled(eqtb[n].sc); print("pt"); 5428end 5429 5430@ Here is a procedure that displays the contents of |eqtb[n]| 5431symbolically. 5432 5433@p@t\4@>@<Declare the procedure called |print_cmd_chr|@>@;@/ 5434@!stat procedure show_eqtb(@!n:pointer); 5435begin if n<active_base then print_char("?") {this can't happen} 5436else if n<glue_base then @<Show equivalent |n|, in region 1 or 2@> 5437else if n<local_base then @<Show equivalent |n|, in region 3@> 5438else if n<int_base then @<Show equivalent |n|, in region 4@> 5439else if n<dimen_base then @<Show equivalent |n|, in region 5@> 5440else if n<=eqtb_size then @<Show equivalent |n|, in region 6@> 5441else print_char("?"); {this can't happen either} 5442end; 5443tats 5444 5445@ The last two regions of |eqtb| have fullword values instead of the 5446three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary, 5447but \TeX\ needs to store the |eq_level| information in another array 5448called |xeq_level|. 5449 5450@<Glob...@>= 5451@!eqtb:array[active_base..eqtb_size] of memory_word; 5452@!xeq_level:array[int_base..eqtb_size] of quarterword; 5453 5454@ @<Set init...@>= 5455for k:=int_base to eqtb_size do xeq_level[k]:=level_one; 5456 5457@ When the debugging routine |search_mem| is looking for pointers having a 5458given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the 5459first part of region~4. 5460 5461@<Search |eqtb| for equivalents equal to |p|@>= 5462for q:=active_base to box_base+255 do 5463 begin if equiv(q)=p then 5464 begin print_nl("EQUIV("); print_int(q); print_char(")"); 5465 end; 5466 end 5467 5468@* \[18] The hash table. 5469Control sequences are stored and retrieved by means of a fairly standard hash 5470table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C 5471in {\sl The Art of Computer Programming\/}). Once a control sequence enters the 5472table, it is never removed, because there are complicated situations 5473involving \.{\\gdef} where the removal of a control sequence at the end of 5474a group would be a mistake preventable only by the introduction of a 5475complicated reference-count mechanism. 5476 5477The actual sequence of letters forming a control sequence identifier is 5478stored in the |str_pool| array together with all the other strings. An 5479auxiliary array |hash| consists of items with two halfword fields per 5480word. The first of these, called |next(p)|, points to the next identifier 5481belonging to the same coalesced list as the identifier corresponding to~|p|; 5482and the other, called |text(p)|, points to the |str_start| entry for 5483|p|'s identifier. If position~|p| of the hash table is empty, we have 5484|text(p)=0|; if position |p| is either empty or the end of a coalesced 5485hash list, we have |next(p)=0|. An auxiliary pointer variable called 5486|hash_used| is maintained in such a way that all locations |p>=hash_used| 5487are nonempty. The global variable |cs_count| tells how many multiletter 5488control sequences have been defined, if statistics are being kept. 5489 5490A global boolean variable called |no_new_control_sequence| is set to 5491|true| during the time that new hash table entries are forbidden. 5492 5493@d next(#) == hash[#].lh {link for coalesced lists} 5494@d text(#) == hash[#].rh {string number for control sequence name} 5495@d hash_is_full == (hash_used=hash_base) {test if all positions are occupied} 5496@d font_id_text(#) == text(font_id_base+#) {a frozen font identifier's name} 5497 5498@<Glob...@>= 5499@!hash: array[hash_base..undefined_control_sequence-1] of two_halves; 5500 {the hash table} 5501@!hash_used:pointer; {allocation pointer for |hash|} 5502@!no_new_control_sequence:boolean; {are new identifiers legal?} 5503@!cs_count:integer; {total number of known identifiers} 5504 5505@ @<Set init...@>= 5506no_new_control_sequence:=true; {new identifiers are usually forbidden} 5507next(hash_base):=0; text(hash_base):=0; 5508for k:=hash_base+1 to undefined_control_sequence-1 do hash[k]:=hash[hash_base]; 5509 5510@ @<Initialize table entries...@>= 5511hash_used:=frozen_control_sequence; {nothing is used} 5512cs_count:=0; 5513eq_type(frozen_dont_expand):=dont_expand; 5514text(frozen_dont_expand):="notexpanded:"; 5515@.notexpanded:@> 5516 5517@ Here is the subroutine that searches the hash table for an identifier 5518that matches a given string of length |l>1| appearing in |buffer[j.. 5519(j+l-1)]|. If the identifier is found, the corresponding hash table address 5520is returned. Otherwise, if the global variable |no_new_control_sequence| 5521is |true|, the dummy address |undefined_control_sequence| is returned. 5522Otherwise the identifier is inserted into the hash table and its location 5523is returned. 5524 5525@p function id_lookup(@!j,@!l:integer):pointer; {search the hash table} 5526label found; {go here if you found it} 5527var h:integer; {hash code} 5528@!d:integer; {number of characters in incomplete current string} 5529@!p:pointer; {index in |hash| array} 5530@!k:pointer; {index in |buffer| array} 5531begin @<Compute the hash code |h|@>; 5532p:=h+hash_base; {we start searching here; note that |0<=h<hash_prime|} 5533loop@+begin if text(p)>0 then if length(text(p))=l then 5534 if str_eq_buf(text(p),j) then goto found; 5535 if next(p)=0 then 5536 begin if no_new_control_sequence then 5537 p:=undefined_control_sequence 5538 else @<Insert a new control sequence after |p|, then make 5539 |p| point to it@>; 5540 goto found; 5541 end; 5542 p:=next(p); 5543 end; 5544found: id_lookup:=p; 5545end; 5546 5547@ @<Insert a new control...@>= 5548begin if text(p)>0 then 5549 begin repeat if hash_is_full then overflow("hash size",hash_size); 5550@:TeX capacity exceeded hash size}{\quad hash size@> 5551 decr(hash_used); 5552 until text(hash_used)=0; {search for an empty location in |hash|} 5553 next(p):=hash_used; p:=hash_used; 5554 end; 5555str_room(l); d:=cur_length; 5556while pool_ptr>str_start[str_ptr] do 5557 begin decr(pool_ptr); str_pool[pool_ptr+l]:=str_pool[pool_ptr]; 5558 end; {move current string up to make room for another} 5559for k:=j to j+l-1 do append_char(buffer[k]); 5560text(p):=make_string; pool_ptr:=pool_ptr+d; 5561@!stat incr(cs_count);@+tats@;@/ 5562end 5563 5564@ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it 5565should be a prime number. The theory of hashing tells us to expect fewer 5566than two table probes, on the average, when the search is successful. 5567[See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.] 5568@^Vitter, Jeffrey Scott@> 5569 5570@<Compute the hash code |h|@>= 5571h:=buffer[j]; 5572for k:=j+1 to j+l-1 do 5573 begin h:=h+h+buffer[k]; 5574 while h>=hash_prime do h:=h-hash_prime; 5575 end 5576 5577@ Single-character control sequences do not need to be looked up in a hash 5578table, since we can use the character code itself as a direct address. 5579The procedure |print_cs| prints the name of a control sequence, given 5580a pointer to its address in |eqtb|. A space is printed after the name 5581unless it is a single nonletter or an active character. This procedure 5582might be invoked with invalid data, so it is ``extra robust.'' The 5583individual characters must be printed one at a time using |print|, since 5584they may be unprintable. 5585 5586@<Basic printing...@>= 5587procedure print_cs(@!p:integer); {prints a purported control sequence} 5588begin if p<hash_base then {single character} 5589 if p>=single_base then 5590 if p=null_cs then 5591 begin print_esc("csname"); print_esc("endcsname"); print_char(" "); 5592 end 5593 else begin print_esc(p-single_base); 5594 if cat_code(p-single_base)=letter then print_char(" "); 5595 end 5596 else if p<active_base then print_esc("IMPOSSIBLE.") 5597@.IMPOSSIBLE@> 5598 else print(p-active_base) 5599else if p>=undefined_control_sequence then print_esc("IMPOSSIBLE.") 5600else if (text(p)<0)or(text(p)>=str_ptr) then print_esc("NONEXISTENT.") 5601@.NONEXISTENT@> 5602else begin print_esc(text(p)); 5603 print_char(" "); 5604 end; 5605end; 5606 5607@ Here is a similar procedure; it avoids the error checks, and it never 5608prints a space after the control sequence. 5609 5610@<Basic printing procedures@>= 5611procedure sprint_cs(@!p:pointer); {prints a control sequence} 5612begin if p<hash_base then 5613 if p<single_base then print(p-active_base) 5614 else if p<null_cs then print_esc(p-single_base) 5615 else begin print_esc("csname"); print_esc("endcsname"); 5616 end 5617else print_esc(text(p)); 5618end; 5619 5620@ We need to put \TeX's ``primitive'' control sequences into the hash 5621table, together with their command code (which will be the |eq_type|) 5622and an operand (which will be the |equiv|). The |primitive| procedure 5623does this, in a way that no \TeX\ user can. The global value |cur_val| 5624contains the new |eqtb| pointer after |primitive| has acted. 5625 5626@p @!init procedure primitive(@!s:str_number;@!c:quarterword;@!o:halfword); 5627var k:pool_pointer; {index into |str_pool|} 5628@!j:small_number; {index into |buffer|} 5629@!l:small_number; {length of the string} 5630begin if s<256 then cur_val:=s+single_base 5631else begin k:=str_start[s]; l:=str_start[s+1]-k; 5632 {we will move |s| into the (empty) |buffer|} 5633 for j:=0 to l-1 do buffer[j]:=so(str_pool[k+j]); 5634 cur_val:=id_lookup(0,l); {|no_new_control_sequence| is |false|} 5635 flush_string; text(cur_val):=s; {we don't want to have the string twice} 5636 end; 5637eq_level(cur_val):=level_one; eq_type(cur_val):=c; equiv(cur_val):=o; 5638end; 5639tini 5640 5641@ Many of \TeX's primitives need no |equiv|, since they are identifiable 5642by their |eq_type| alone. These primitives are loaded into the hash table 5643as follows: 5644 5645@<Put each of \TeX's primitives into the hash table@>= 5646primitive(" ",ex_space,0);@/ 5647@!@:Single-character primitives /}{\quad\.{\\\ }@> 5648primitive("/",ital_corr,0);@/ 5649@!@:Single-character primitives /}{\quad\.{\\/}@> 5650primitive("accent",accent,0);@/ 5651@!@:accent_}{\.{\\accent} primitive@> 5652primitive("advance",advance,0);@/ 5653@!@:advance_}{\.{\\advance} primitive@> 5654primitive("afterassignment",after_assignment,0);@/ 5655@!@:after_assignment_}{\.{\\afterassignment} primitive@> 5656primitive("aftergroup",after_group,0);@/ 5657@!@:after_group_}{\.{\\aftergroup} primitive@> 5658primitive("begingroup",begin_group,0);@/ 5659@!@:begin_group_}{\.{\\begingroup} primitive@> 5660primitive("char",char_num,0);@/ 5661@!@:char_}{\.{\\char} primitive@> 5662primitive("csname",cs_name,0);@/ 5663@!@:cs_name_}{\.{\\csname} primitive@> 5664primitive("delimiter",delim_num,0);@/ 5665@!@:delimiter_}{\.{\\delimiter} primitive@> 5666primitive("divide",divide,0);@/ 5667@!@:divide_}{\.{\\divide} primitive@> 5668primitive("endcsname",end_cs_name,0);@/ 5669@!@:end_cs_name_}{\.{\\endcsname} primitive@> 5670primitive("endgroup",end_group,0); 5671@!@:end_group_}{\.{\\endgroup} primitive@> 5672text(frozen_end_group):="endgroup"; eqtb[frozen_end_group]:=eqtb[cur_val];@/ 5673primitive("expandafter",expand_after,0);@/ 5674@!@:expand_after_}{\.{\\expandafter} primitive@> 5675primitive("font",def_font,0);@/ 5676@!@:font_}{\.{\\font} primitive@> 5677primitive("fontdimen",assign_font_dimen,0);@/ 5678@!@:font_dimen_}{\.{\\fontdimen} primitive@> 5679primitive("halign",halign,0);@/ 5680@!@:halign_}{\.{\\halign} primitive@> 5681primitive("hrule",hrule,0);@/ 5682@!@:hrule_}{\.{\\hrule} primitive@> 5683primitive("ignorespaces",ignore_spaces,0);@/ 5684@!@:ignore_spaces_}{\.{\\ignorespaces} primitive@> 5685primitive("insert",insert,0);@/ 5686@!@:insert_}{\.{\\insert} primitive@> 5687primitive("mark",mark,0);@/ 5688@!@:mark_}{\.{\\mark} primitive@> 5689primitive("mathaccent",math_accent,0);@/ 5690@!@:math_accent_}{\.{\\mathaccent} primitive@> 5691primitive("mathchar",math_char_num,0);@/ 5692@!@:math_char_}{\.{\\mathchar} primitive@> 5693primitive("mathchoice",math_choice,0);@/ 5694@!@:math_choice_}{\.{\\mathchoice} primitive@> 5695primitive("multiply",multiply,0);@/ 5696@!@:multiply_}{\.{\\multiply} primitive@> 5697primitive("noalign",no_align,0);@/ 5698@!@:no_align_}{\.{\\noalign} primitive@> 5699primitive("noboundary",no_boundary,0);@/ 5700@!@:no_boundary_}{\.{\\noboundary} primitive@> 5701primitive("noexpand",no_expand,0);@/ 5702@!@:no_expand_}{\.{\\noexpand} primitive@> 5703primitive("nonscript",non_script,0);@/ 5704@!@:non_script_}{\.{\\nonscript} primitive@> 5705primitive("omit",omit,0);@/ 5706@!@:omit_}{\.{\\omit} primitive@> 5707primitive("parshape",set_shape,0);@/ 5708@!@:par_shape_}{\.{\\parshape} primitive@> 5709primitive("penalty",break_penalty,0);@/ 5710@!@:penalty_}{\.{\\penalty} primitive@> 5711primitive("prevgraf",set_prev_graf,0);@/ 5712@!@:prev_graf_}{\.{\\prevgraf} primitive@> 5713primitive("radical",radical,0);@/ 5714@!@:radical_}{\.{\\radical} primitive@> 5715primitive("read",read_to_cs,0);@/ 5716@!@:read_}{\.{\\read} primitive@> 5717primitive("relax",relax,256); {cf.\ |scan_file_name|} 5718@!@:relax_}{\.{\\relax} primitive@> 5719text(frozen_relax):="relax"; eqtb[frozen_relax]:=eqtb[cur_val];@/ 5720primitive("setbox",set_box,0);@/ 5721@!@:set_box_}{\.{\\setbox} primitive@> 5722primitive("the",the,0);@/ 5723@!@:the_}{\.{\\the} primitive@> 5724primitive("toks",toks_register,0);@/ 5725@!@:toks_}{\.{\\toks} primitive@> 5726primitive("vadjust",vadjust,0);@/ 5727@!@:vadjust_}{\.{\\vadjust} primitive@> 5728primitive("valign",valign,0);@/ 5729@!@:valign_}{\.{\\valign} primitive@> 5730primitive("vcenter",vcenter,0);@/ 5731@!@:vcenter_}{\.{\\vcenter} primitive@> 5732primitive("vrule",vrule,0);@/ 5733@!@:vrule_}{\.{\\vrule} primitive@> 5734 5735@ Each primitive has a corresponding inverse, so that it is possible to 5736display the cryptic numeric contents of |eqtb| in symbolic form. 5737Every call of |primitive| in this program is therefore accompanied by some 5738straightforward code that forms part of the |print_cmd_chr| routine 5739below. 5740 5741@<Cases of |print_cmd_chr|...@>= 5742accent: print_esc("accent"); 5743advance: print_esc("advance"); 5744after_assignment: print_esc("afterassignment"); 5745after_group: print_esc("aftergroup"); 5746assign_font_dimen: print_esc("fontdimen"); 5747begin_group: print_esc("begingroup"); 5748break_penalty: print_esc("penalty"); 5749char_num: print_esc("char"); 5750cs_name: print_esc("csname"); 5751def_font: print_esc("font"); 5752delim_num: print_esc("delimiter"); 5753divide: print_esc("divide"); 5754end_cs_name: print_esc("endcsname"); 5755end_group: print_esc("endgroup"); 5756ex_space: print_esc(" "); 5757expand_after: print_esc("expandafter"); 5758halign: print_esc("halign"); 5759hrule: print_esc("hrule"); 5760ignore_spaces: print_esc("ignorespaces"); 5761insert: print_esc("insert"); 5762ital_corr: print_esc("/"); 5763mark: print_esc("mark"); 5764math_accent: print_esc("mathaccent"); 5765math_char_num: print_esc("mathchar"); 5766math_choice: print_esc("mathchoice"); 5767multiply: print_esc("multiply"); 5768no_align: print_esc("noalign"); 5769no_boundary:print_esc("noboundary"); 5770no_expand: print_esc("noexpand"); 5771non_script: print_esc("nonscript"); 5772omit: print_esc("omit"); 5773radical: print_esc("radical"); 5774read_to_cs: print_esc("read"); 5775relax: print_esc("relax"); 5776set_box: print_esc("setbox"); 5777set_prev_graf: print_esc("prevgraf"); 5778set_shape: print_esc("parshape"); 5779the: print_esc("the"); 5780toks_register: print_esc("toks"); 5781vadjust: print_esc("vadjust"); 5782valign: print_esc("valign"); 5783vcenter: print_esc("vcenter"); 5784vrule: print_esc("vrule"); 5785 5786@ We will deal with the other primitives later, at some point in the program 5787where their |eq_type| and |equiv| values are more meaningful. For example, 5788the primitives for math mode will be loaded when we consider the routines 5789that deal with formulas. It is easy to find where each particular 5790primitive was treated by looking in the index at the end; for example, the 5791section where |"radical"| entered |eqtb| is listed under `\.{\\radical} 5792primitive'. (Primitives consisting of a single nonalphabetic character, 5793@!like `\.{\\/}', are listed under `Single-character primitives'.) 5794@!@^Single-character primitives@> 5795 5796Meanwhile, this is a convenient place to catch up on something we were unable 5797to do before the hash table was defined: 5798 5799@<Print the font identifier for |font(p)|@>= 5800print_esc(font_id_text(font(p))) 5801 5802@* \[19] Saving and restoring equivalents. 5803The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups 5804in \TeX\ means that |eqtb| entries valid in outer groups should be saved 5805and restored later if they are overridden inside the braces. When a new |eqtb| 5806value is being assigned, the program therefore checks to see if the previous 5807entry belongs to an outer level. In such a case, the old value is placed 5808on the |save_stack| just before the new value enters |eqtb|. At the 5809end of a grouping level, i.e., when the right brace is sensed, the 5810|save_stack| is used to restore the outer values, and the inner ones are 5811destroyed. 5812 5813Entries on the |save_stack| are of type |memory_word|. The top item on 5814this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three 5815fields called |save_type|, |save_level|, and |save_index|, and it is 5816interpreted in one of four ways: 5817 5818\yskip\hangg 1) If |save_type(p)=restore_old_value|, then 5819|save_index(p)| is a location in |eqtb| whose current value should 5820be destroyed at the end of the current group and replaced by |save_stack[p-1]|. 5821Furthermore if |save_index(p)>=int_base|, then |save_level(p)| 5822should replace the corresponding entry in |xeq_level|. 5823 5824\yskip\hangg 2) If |save_type(p)=restore_zero|, then |save_index(p)| 5825is a location in |eqtb| whose current value should be destroyed at the end 5826of the current group, when it should be 5827replaced by the current value of |eqtb[undefined_control_sequence]|. 5828 5829\yskip\hangg 3) If |save_type(p)=insert_token|, then |save_index(p)| 5830is a token that should be inserted into \TeX's input when the current 5831group ends. 5832 5833\yskip\hangg 4) If |save_type(p)=level_boundary|, then |save_level(p)| 5834is a code explaining what kind of group we were previously in, and 5835|save_index(p)| points to the level boundary word at the bottom of 5836the entries for that group. 5837 5838@d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry} 5839@d save_level(#)==save_stack[#].hh.b1 5840 {saved level for regions 5 and 6, or group code} 5841@d save_index(#)==save_stack[#].hh.rh 5842 {|eqtb| location or token or |save_stack| location} 5843@d restore_old_value=0 {|save_type| when a value should be restored later} 5844@d restore_zero=1 {|save_type| when an undefined entry should be restored} 5845@d insert_token=2 {|save_type| when a token is being saved for later use} 5846@d level_boundary=3 {|save_type| corresponding to beginning of group} 5847 5848@ Here are the group codes that are used to discriminate between different 5849kinds of groups. They allow \TeX\ to decide what special actions, if any, 5850should be performed when a group ends. 5851\def\grp{\.{\char'173...\char'175}} 5852 5853Some groups are not supposed to be ended by right braces. For example, 5854the `\.\$' that begins a math formula causes a |math_shift_group| to 5855be started, and this should be terminated by a matching `\.\$'. Similarly, 5856a group that starts with \.{\\left} should end with \.{\\right}, and 5857one that starts with \.{\\begingroup} should end with \.{\\endgroup}. 5858 5859@d bottom_level=0 {group code for the outside world} 5860@d simple_group=1 {group code for local structure only} 5861@d hbox_group=2 {code for `\.{\\hbox}\grp'} 5862@d adjusted_hbox_group=3 {code for `\.{\\hbox}\grp' in vertical mode} 5863@d vbox_group=4 {code for `\.{\\vbox}\grp'} 5864@d vtop_group=5 {code for `\.{\\vtop}\grp'} 5865@d align_group=6 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'} 5866@d no_align_group=7 {code for `\.{\\noalign}\grp'} 5867@d output_group=8 {code for output routine} 5868@d math_group=9 {code for, e.g., `\.{\char'136}\grp'} 5869@d disc_group=10 {code for `\.{\\discretionary}\grp\grp\grp'} 5870@d insert_group=11 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'} 5871@d vcenter_group=12 {code for `\.{\\vcenter}\grp'} 5872@d math_choice_group=13 {code for `\.{\\mathchoice}\grp\grp\grp\grp'} 5873@d semi_simple_group=14 {code for `\.{\\begingroup...\\endgroup}'} 5874@d math_shift_group=15 {code for `\.{\$...\$}'} 5875@d math_left_group=16 {code for `\.{\\left...\\right}'} 5876@d max_group_code=16 5877 5878@<Types...@>= 5879@!group_code=0..max_group_code; {|save_level| for a level boundary} 5880 5881@ The global variable |cur_group| keeps track of what sort of group we are 5882currently in. Another global variable, |cur_boundary|, points to the 5883topmost |level_boundary| word. And |cur_level| is the current depth of 5884nesting. The routines are designed to preserve the condition that no entry 5885in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|. 5886 5887@ @<Glob...@>= 5888@!save_stack : array[0..save_size] of memory_word; 5889@!save_ptr : 0..save_size; {first unused entry on |save_stack|} 5890@!max_save_stack:0..save_size; {maximum usage of save stack} 5891@!cur_level: quarterword; {current nesting level for groups} 5892@!cur_group: group_code; {current group type} 5893@!cur_boundary: 0..save_size; {where the current level begins} 5894 5895@ At this time it might be a good idea for the reader to review the introduction 5896to |eqtb| that was given above just before the long lists of parameter names. 5897Recall that the ``outer level'' of the program is |level_one|, since 5898undefined control sequences are assumed to be ``defined'' at |level_zero|. 5899 5900@<Set init...@>= 5901save_ptr:=0; cur_level:=level_one; cur_group:=bottom_level; cur_boundary:=0; 5902max_save_stack:=0; 5903 5904@ The following macro is used to test if there is room for up to six more 5905entries on |save_stack|. By making a conservative test like this, we can 5906get by with testing for overflow in only a few places. 5907 5908@d check_full_save_stack==if save_ptr>max_save_stack then 5909 begin max_save_stack:=save_ptr; 5910 if max_save_stack>save_size-6 then overflow("save size",save_size); 5911@:TeX capacity exceeded save size}{\quad save size@> 5912 end 5913 5914@ Procedure |new_save_level| is called when a group begins. The 5915argument is a group identification code like `|hbox_group|'. After 5916calling this routine, it is safe to put five more entries on |save_stack|. 5917 5918In some cases integer-valued items are placed onto the 5919|save_stack| just below a |level_boundary| word, because this is a 5920convenient place to keep information that is supposed to ``pop up'' just 5921when the group has finished. 5922For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt 5923dimension is stored on |save_stack| just before |new_save_level| is 5924called. 5925 5926We use the notation |saved(k)| to stand for an integer item that 5927appears in location |save_ptr+k| of the save stack. 5928 5929@d saved(#)==save_stack[save_ptr+#].int 5930 5931@p procedure new_save_level(@!c:group_code); {begin a new level of grouping} 5932begin check_full_save_stack; 5933save_type(save_ptr):=level_boundary; save_level(save_ptr):=cur_group; 5934save_index(save_ptr):=cur_boundary; 5935if cur_level=max_quarterword then overflow("grouping levels", 5936@:TeX capacity exceeded grouping levels}{\quad grouping levels@> 5937 max_quarterword-min_quarterword); 5938 {quit if |(cur_level+1)| is too big to be stored in |eqtb|} 5939cur_boundary:=save_ptr; incr(cur_level); incr(save_ptr); cur_group:=c; 5940end; 5941 5942@ Just before an entry of |eqtb| is changed, the following procedure should 5943be called to update the other data structures properly. It is important 5944to keep in mind that reference counts in |mem| include references from 5945within |save_stack|, so these counts must be handled carefully. 5946@^reference counts@> 5947 5948@p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|} 5949var q:pointer; {|equiv| field of |w|} 5950begin case eq_type_field(w) of 5951call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w)); 5952glue_ref: delete_glue_ref(equiv_field(w)); 5953shape_ref: begin q:=equiv_field(w); {we need to free a \.{\\parshape} block} 5954 if q<>null then free_node(q,info(q)+info(q)+1); 5955 end; {such a block is |2n+1| words long, where |n=info(q)|} 5956box_ref: flush_node_list(equiv_field(w)); 5957othercases do_nothing 5958endcases; 5959end; 5960 5961@ To save a value of |eqtb[p]| that was established at level |l|, we 5962can use the following subroutine. 5963 5964@p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|} 5965begin check_full_save_stack; 5966if l=level_zero then save_type(save_ptr):=restore_zero 5967else begin save_stack[save_ptr]:=eqtb[p]; incr(save_ptr); 5968 save_type(save_ptr):=restore_old_value; 5969 end; 5970save_level(save_ptr):=l; save_index(save_ptr):=p; incr(save_ptr); 5971end; 5972 5973@ The procedure |eq_define| defines an |eqtb| entry having specified 5974|eq_type| and |equiv| fields, and saves the former value if appropriate. 5975This procedure is used only for entries in the first four regions of |eqtb|, 5976i.e., only for entries that have |eq_type| and |equiv| fields. 5977After calling this routine, it is safe to put four more entries on 5978|save_stack|, provided that there was room for four more entries before 5979the call, since |eq_save| makes the necessary test. 5980 5981@p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword); 5982 {new data for |eqtb|} 5983begin if eq_level(p)=cur_level then eq_destroy(eqtb[p]) 5984else if cur_level>level_one then eq_save(p,eq_level(p)); 5985eq_level(p):=cur_level; eq_type(p):=t; equiv(p):=e; 5986end; 5987 5988@ The counterpart of |eq_define| for the remaining (fullword) positions in 5989|eqtb| is called |eq_word_define|. Since |xeq_level[p]>=level_one| for all 5990|p|, a `|restore_zero|' will never be used in this case. 5991 5992@p procedure eq_word_define(@!p:pointer;@!w:integer); 5993begin if xeq_level[p]<>cur_level then 5994 begin eq_save(p,xeq_level[p]); xeq_level[p]:=cur_level; 5995 end; 5996eqtb[p].int:=w; 5997end; 5998 5999@ The |eq_define| and |eq_word_define| routines take care of local definitions. 6000@^global definitions@> 6001Global definitions are done in almost the same way, but there is no need 6002to save old values, and the new value is associated with |level_one|. 6003 6004@p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword); 6005 {global |eq_define|} 6006begin eq_destroy(eqtb[p]); 6007eq_level(p):=level_one; eq_type(p):=t; equiv(p):=e; 6008end; 6009@# 6010procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|} 6011begin eqtb[p].int:=w; xeq_level[p]:=level_one; 6012end; 6013 6014@ Subroutine |save_for_after| puts a token on the stack for save-keeping. 6015 6016@p procedure save_for_after(@!t:halfword); 6017begin if cur_level>level_one then 6018 begin check_full_save_stack; 6019 save_type(save_ptr):=insert_token; save_level(save_ptr):=level_zero; 6020 save_index(save_ptr):=t; incr(save_ptr); 6021 end; 6022end; 6023 6024@ The |unsave| routine goes the other way, taking items off of |save_stack|. 6025This routine takes care of restoration when a level ends; everything 6026belonging to the topmost group is cleared off of the save stack. 6027 6028@p@t\4@>@<Declare the procedure called |restore_trace|@>@;@/ 6029procedure@?back_input; forward; @t\2@> 6030procedure unsave; {pops the top level off the save stack} 6031label done; 6032var p:pointer; {position to be restored} 6033@!l:quarterword; {saved level, if in fullword regions of |eqtb|} 6034@!t:halfword; {saved value of |cur_tok|} 6035begin if cur_level>level_one then 6036 begin decr(cur_level); 6037 @<Clear off top level from |save_stack|@>; 6038 end 6039else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|} 6040@:this can't happen curlevel}{\quad curlevel@> 6041end; 6042 6043@ @<Clear off...@>= 6044loop@+begin decr(save_ptr); 6045 if save_type(save_ptr)=level_boundary then goto done; 6046 p:=save_index(save_ptr); 6047 if save_type(save_ptr)=insert_token then 6048 @<Insert token |p| into \TeX's input@> 6049 else begin if save_type(save_ptr)=restore_old_value then 6050 begin l:=save_level(save_ptr); decr(save_ptr); 6051 end 6052 else save_stack[save_ptr]:=eqtb[undefined_control_sequence]; 6053 @<Store \(s)|save_stack[save_ptr]| in |eqtb[p]|, unless 6054 |eqtb[p]| holds a global value@>; 6055 end; 6056 end; 6057done: cur_group:=save_level(save_ptr); cur_boundary:=save_index(save_ptr) 6058 6059@ A global definition, which sets the level to |level_one|, 6060@^global definitions@> 6061will not be undone by |unsave|. If at least one global definition of 6062|eqtb[p]| has been carried out within the group that just ended, the 6063last such definition will therefore survive. 6064 6065@<Store \(s)|save...@>= 6066if p<int_base then 6067 if eq_level(p)=level_one then 6068 begin eq_destroy(save_stack[save_ptr]); {destroy the saved value} 6069 @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/ 6070 end 6071 else begin eq_destroy(eqtb[p]); {destroy the current value} 6072 eqtb[p]:=save_stack[save_ptr]; {restore the saved value} 6073 @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ 6074 end 6075else if xeq_level[p]<>level_one then 6076 begin eqtb[p]:=save_stack[save_ptr]; xeq_level[p]:=l; 6077 @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ 6078 end 6079else begin 6080 @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/ 6081 end 6082 6083@ @<Declare the procedure called |restore_trace|@>= 6084@!stat procedure restore_trace(@!p:pointer;@!s:str_number); 6085 {|eqtb[p]| has just been restored or retained} 6086begin begin_diagnostic; print_char("{"); print(s); print_char(" "); 6087show_eqtb(p); print_char("}"); 6088end_diagnostic(false); 6089end; 6090tats 6091 6092@ When looking for possible pointers to a memory location, it is helpful 6093to look for references from |eqtb| that might be waiting on the 6094save stack. Of course, we might find spurious pointers too; but this 6095routine is merely an aid when debugging, and at such times we are 6096grateful for any scraps of information, even if they prove to be irrelevant. 6097@^dirty \PASCAL@> 6098 6099@<Search |save_stack| for equivalents that point to |p|@>= 6100if save_ptr>0 then for q:=0 to save_ptr-1 do 6101 begin if equiv_field(save_stack[q])=p then 6102 begin print_nl("SAVE("); print_int(q); print_char(")"); 6103 end; 6104 end 6105 6106@ Most of the parameters kept in |eqtb| can be changed freely, but there's 6107an exception: The magnification should not be used with two different 6108values during any \TeX\ job, since a single magnification is applied to an 6109entire run. The global variable |mag_set| is set to the current magnification 6110whenever it becomes necessary to ``freeze'' it at a particular value. 6111 6112@<Glob...@>= 6113@!mag_set:integer; {if nonzero, this magnification should be used henceforth} 6114 6115@ @<Set init...@>= 6116mag_set:=0; 6117 6118@ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag| 6119for magnification. 6120 6121@p procedure prepare_mag; 6122begin if (mag_set>0)and(mag<>mag_set) then 6123 begin print_err("Incompatible magnification ("); print_int(mag); 6124@.Incompatible magnification@> 6125 print(");"); print_nl(" the previous value will be retained"); 6126 help2("I can handle only one magnification ratio per job. So I've")@/ 6127 ("reverted to the magnification you used earlier on this run.");@/ 6128 int_error(mag_set); 6129 geq_word_define(int_base+mag_code,mag_set); {|mag:=mag_set|} 6130 end; 6131if (mag<=0)or(mag>32768) then 6132 begin print_err("Illegal magnification has been changed to 1000");@/ 6133@.Illegal magnification...@> 6134 help1("The magnification ratio must be between 1 and 32768."); 6135 int_error(mag); geq_word_define(int_base+mag_code,1000); 6136 end; 6137mag_set:=mag; 6138end; 6139 6140@* \[20] Token lists. 6141A \TeX\ token is either a character or a control sequence, and it is 6142@^token@> 6143represented internally in one of two ways: (1)~A character whose ASCII 6144code number is |c| and whose command code is |m| is represented as the 6145number $2^8m+c$; the command code is in the range |1<=m<=14|. (2)~A control 6146sequence whose |eqtb| address is |p| is represented as the number 6147|cs_token_flag+p|. Here |cs_token_flag=@t$2^{12}-1$@>| is larger than 6148$2^8m+c$, yet it is small enough that |cs_token_flag+p< max_halfword|; 6149thus, a token fits comfortably in a halfword. 6150 6151A token |t| represents a |left_brace| command if and only if 6152|t<left_brace_limit|; it represents a |right_brace| command if and only if 6153we have |left_brace_limit<=t<right_brace_limit|; and it represents a |match| or 6154|end_match| command if and only if |match_token<=t<=end_match_token|. 6155The following definitions take care of these token-oriented constants 6156and a few others. 6157 6158@d cs_token_flag==@'7777 {amount added to the |eqtb| location in a 6159 token that stands for a control sequence; is a multiple of~256, less~1} 6160@d left_brace_token=@'0400 {$2^8\cdot|left_brace|$} 6161@d left_brace_limit=@'1000 {$2^8\cdot(|left_brace|+1)$} 6162@d right_brace_token=@'1000 {$2^8\cdot|right_brace|$} 6163@d right_brace_limit=@'1400 {$2^8\cdot(|right_brace|+1)$} 6164@d math_shift_token=@'1400 {$2^8\cdot|math_shift|$} 6165@d tab_token=@'2000 {$2^8\cdot|tab_mark|$} 6166@d out_param_token=@'2400 {$2^8\cdot|out_param|$} 6167@d space_token=@'5040 {$2^8\cdot|spacer|+|" "|$} 6168@d letter_token=@'5400 {$2^8\cdot|letter|$} 6169@d other_token=@'6000 {$2^8\cdot|other_char|$} 6170@d match_token=@'6400 {$2^8\cdot|match|$} 6171@d end_match_token=@'7000 {$2^8\cdot|end_match|$} 6172 6173@ @<Check the ``constant''...@>= 6174if cs_token_flag+undefined_control_sequence>max_halfword then bad:=21; 6175 6176@ A token list is a singly linked list of one-word nodes in |mem|, where 6177each word contains a token and a link. Macro definitions, output-routine 6178definitions, marks, \.{\\write} texts, and a few other things 6179are remembered by \TeX\ in the form 6180of token lists, usually preceded by a node with a reference count in its 6181|token_ref_count| field. The token stored in location |p| is called 6182|info(p)|. 6183 6184Three special commands appear in the token lists of macro definitions. 6185When |m=match|, it means that \TeX\ should scan a parameter 6186for the current macro; when |m=end_match|, it means that parameter 6187matching should end and \TeX\ should start reading the macro text; and 6188when |m=out_param|, it means that \TeX\ should insert parameter 6189number |c| into the text at this point. 6190 6191The enclosing \.{\char'173} and \.{\char'175} characters of a macro 6192definition are omitted, but the final right brace of an output routine 6193is included at the end of its token list. 6194 6195Here is an example macro definition that illustrates these conventions. 6196After \TeX\ processes the text 6197$$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$ 6198the definition of \.{\\mac} is represented as a token list containing 6199$$\def\,{\hskip2pt} 6200\vbox{\halign{\hfil#\hfil\cr 6201(reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ , 6202\.{\\b}, |end_match|,\cr 6203|out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#, 6204|other_char|\,\.1,\cr 6205|out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$ 6206The procedure |scan_toks| builds such token lists, and |macro_call| 6207does the parameter matching. 6208@^reference counts@> 6209 6210Examples such as 6211$$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$ 6212explain why reference counts would be needed even if \TeX\ had no \.{\\let} 6213operation: When the token list for \.{\\m} is being read, the redefinition of 6214\.{\\m} changes the |eqtb| entry before the token list has been fully 6215consumed, so we dare not simply destroy a token list when its 6216control sequence is being redefined. 6217 6218If the parameter-matching part of a definition ends with `\.{\#\{}', 6219the corresponding token list will have `\.\{' just before the `|end_match|' 6220and also at the very end. The first `\.\{' is used to delimit the parameter; the 6221second one keeps the first from disappearing. 6222 6223@ The procedure |show_token_list|, which prints a symbolic form of 6224the token list that starts at a given node |p|, illustrates these 6225conventions. The token list being displayed should not begin with a reference 6226count. However, the procedure is intended to be robust, so that if the 6227memory links are awry or if |p| is not really a pointer to a token list, 6228nothing catastrophic will happen. 6229 6230An additional parameter |q| is also given; this parameter is either null 6231or it points to a node in the token list where a certain magic computation 6232takes place that will be explained later. (Basically, |q| is non-null when 6233we are printing the two-line context information at the time of an error 6234message; |q| marks the place corresponding to where the second line 6235should begin.) 6236 6237For example, if |p| points to the node containing the first \.a in the 6238token list above, then |show_token_list| will print the string 6239$$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$ 6240and if |q| points to the node containing the second \.a, 6241the magic computation will be performed just before the second \.a is printed. 6242 6243The generation will stop, and `\.{\\ETC.}' will be printed, if the length 6244of printing exceeds a given limit~|l|. Anomalous entries are printed in the 6245form of control sequences that are not followed by a blank space, e.g., 6246`\.{\\BAD.}'; this cannot be confused with actual control sequences because 6247a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'. 6248 6249@<Declare the procedure called |show_token_list|@>= 6250procedure show_token_list(@!p,@!q:integer;@!l:integer); 6251label exit; 6252var m,@!c:integer; {pieces of a token} 6253@!match_chr:ASCII_code; {character used in a `|match|'} 6254@!n:ASCII_code; {the highest parameter number, as an ASCII digit} 6255begin match_chr:="#"; n:="0"; tally:=0; 6256while (p<>null) and (tally<l) do 6257 begin if p=q then @<Do magic computation@>; 6258 @<Display token |p|, and |return| if there are problems@>; 6259 p:=link(p); 6260 end; 6261if p<>null then print_esc("ETC."); 6262@.ETC@> 6263exit: 6264end; 6265 6266@ @<Display token |p|...@>= 6267if (p<hi_mem_min) or (p>mem_end) then 6268 begin print_esc("CLOBBERED."); return; 6269@.CLOBBERED@> 6270 end; 6271if info(p)>=cs_token_flag then print_cs(info(p)-cs_token_flag) 6272else begin m:=info(p) div @'400; c:=info(p) mod @'400; 6273 if info(p)<0 then print_esc("BAD.") 6274@.BAD@> 6275 else @<Display the token $(|m|,|c|)$@>; 6276 end 6277 6278@ The procedure usually ``learns'' the character code used for macro 6279parameters by seeing one in a |match| command before it runs into any 6280|out_param| commands. 6281 6282@<Display the token ...@>= 6283case m of 6284left_brace,right_brace,math_shift,tab_mark,sup_mark,sub_mark,spacer, 6285 letter,other_char: print(c); 6286mac_param: begin print(c); print(c); 6287 end; 6288out_param: begin print(match_chr); 6289 if c<=9 then print_char(c+"0") 6290 else begin print_char("!"); return; 6291 end; 6292 end; 6293match: begin match_chr:=c; print(c); incr(n); print_char(n); 6294 if n>"9" then return; 6295 end; 6296end_match: print("->"); 6297@.->@> 6298othercases print_esc("BAD.") 6299@.BAD@> 6300endcases 6301 6302@ Here's the way we sometimes want to display a token list, given a pointer 6303to its reference count; the pointer may be null. 6304 6305@p procedure token_show(@!p:pointer); 6306begin if p<>null then show_token_list(link(p),null,10000000); 6307end; 6308 6309@ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in 6310symbolic form, including the expansion of a macro or mark. 6311 6312@p procedure print_meaning; 6313begin print_cmd_chr(cur_cmd,cur_chr); 6314if cur_cmd>=call then 6315 begin print_char(":"); print_ln; token_show(cur_chr); 6316 end 6317else if cur_cmd=top_bot_mark then 6318 begin print_char(":"); print_ln; 6319 token_show(cur_mark[cur_chr]); 6320 end; 6321end; 6322 6323@* \[21] Introduction to the syntactic routines. 6324Let's pause a moment now and try to look at the Big Picture. 6325The \TeX\ program consists of three main parts: syntactic routines, 6326semantic routines, and output routines. The chief purpose of the 6327syntactic routines is to deliver the user's input to the semantic routines, 6328one token at a time. The semantic routines act as an interpreter 6329responding to these tokens, which may be regarded as commands. And the 6330output routines are periodically called on to convert box-and-glue 6331lists into a compact set of instructions that will be sent 6332to a typesetter. We have discussed the basic data structures and utility 6333routines of \TeX, so we are good and ready to plunge into the real activity by 6334considering the syntactic routines. 6335 6336Our current goal is to come to grips with the |get_next| procedure, 6337which is the keystone of \TeX's input mechanism. Each call of |get_next| 6338sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|, 6339representing the next input token. 6340$$\vbox{\halign{#\hfil\cr 6341 \hbox{|cur_cmd| denotes a command code from the long list of codes 6342 given above;}\cr 6343 \hbox{|cur_chr| denotes a character code or other modifier of the command 6344 code;}\cr 6345 \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr 6346 \hbox{\qquad if the current token was a control sequence, 6347 otherwise it's zero.}\cr}}$$ 6348Underlying this external behavior of |get_next| is all the machinery 6349necessary to convert from character files to tokens. At a given time we 6350may be only partially finished with the reading of several files (for 6351which \.{\\input} was specified), and partially finished with the expansion 6352of some user-defined macros and/or some macro parameters, and partially 6353finished with the generation of some text in a template for \.{\\halign}, 6354and so on. When reading a character file, special characters must be 6355classified as math delimiters, etc.; comments and extra blank spaces must 6356be removed, paragraphs must be recognized, and control sequences must be 6357found in the hash table. Furthermore there are occasions in which the 6358scanning routines have looked ahead for a word like `\.{plus}' but only 6359part of that word was found, hence a few characters must be put back 6360into the input and scanned again. 6361 6362To handle these situations, which might all be present simultaneously, 6363\TeX\ uses various stacks that hold information about the incomplete 6364activities, and there is a finite state control for each level of the 6365input mechanism. These stacks record the current state of an implicitly 6366recursive process, but the |get_next| procedure is not recursive. 6367Therefore it will not be difficult to translate these algorithms into 6368low-level languages that do not support recursion. 6369 6370@<Glob...@>= 6371@!cur_cmd: eight_bits; {current command set by |get_next|} 6372@!cur_chr: halfword; {operand of current command} 6373@!cur_cs: pointer; {control sequence found here, zero if none found} 6374@!cur_tok: halfword; {packed representative of |cur_cmd| and |cur_chr|} 6375 6376@ The |print_cmd_chr| routine prints a symbolic interpretation of a 6377command code and its modifier. This is used in certain `\.{You can\'t}' 6378error messages, and in the implementation of diagnostic routines like 6379\.{\\show}. 6380 6381The body of |print_cmd_chr| is a rather tedious listing of print 6382commands, and most of it is essentially an inverse to the |primitive| 6383routine that enters a \TeX\ primitive into |eqtb|. Therefore much of 6384this procedure appears elsewhere in the program, 6385together with the corresponding |primitive| calls. 6386 6387@d chr_cmd(#)==begin print(#); print_ASCII(chr_code); 6388 end 6389 6390@<Declare the procedure called |print_cmd_chr|@>= 6391procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword); 6392begin case cmd of 6393left_brace: chr_cmd("begin-group character "); 6394right_brace: chr_cmd("end-group character "); 6395math_shift: chr_cmd("math shift character "); 6396mac_param: chr_cmd("macro parameter character "); 6397sup_mark: chr_cmd("superscript character "); 6398sub_mark: chr_cmd("subscript character "); 6399endv: print("end of alignment template"); 6400spacer: chr_cmd("blank space "); 6401letter: chr_cmd("the letter "); 6402other_char: chr_cmd("the character "); 6403@t\4@>@<Cases of |print_cmd_chr| for symbolic printing of primitives@>@/ 6404othercases print("[unknown command code!]") 6405endcases; 6406end; 6407 6408@ Here is a procedure that displays the current command. 6409 6410@p procedure show_cur_cmd_chr; 6411begin begin_diagnostic; print_nl("{"); 6412if mode<>shown_mode then 6413 begin print_mode(mode); print(": "); shown_mode:=mode; 6414 end; 6415print_cmd_chr(cur_cmd,cur_chr); print_char("}"); 6416end_diagnostic(false); 6417end; 6418 6419@* \[22] Input stacks and states. 6420This implementation of 6421\TeX\ uses two different conventions for representing sequential stacks. 6422@^stack conventions@>@^conventions for representing stacks@> 6423 6424\yskip\hangg 1) If there is frequent access to the top entry, and if the 6425stack is essentially never empty, then the top entry is kept in a global 6426variable (even better would be a machine register), and the other entries 6427appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the 6428semantic stack described above is handled this way, and so is the input 6429stack that we are about to study. 6430 6431\yskip\hangg 2) If there is infrequent top access, the entire stack contents 6432are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack| 6433is treated this way, as we have seen. 6434 6435\yskip\noindent 6436The state of \TeX's input mechanism appears in the input stack, whose 6437entries are records with six fields, called |state|, |index|, |start|, |loc|, 6438|limit|, and |name|. This stack is maintained with 6439convention~(1), so it is declared in the following way: 6440 6441@<Types...@>= 6442@!in_state_record = record 6443 @!state_field, @!index_field: quarterword; 6444 @!start_field,@!loc_field, @!limit_field, @!name_field: halfword; 6445 end; 6446 6447@ @<Glob...@>= 6448@!input_stack : array[0..stack_size] of in_state_record; 6449@!input_ptr : 0..stack_size; {first unused location of |input_stack|} 6450@!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing} 6451@!cur_input : in_state_record; 6452 {the ``top'' input state, according to convention (1)} 6453 6454@ We've already defined the special variable |loc==cur_input.loc_field| 6455in our discussion of basic input-output routines. The other components of 6456|cur_input| are defined in the same way: 6457 6458@d state==cur_input.state_field {current scanner state} 6459@d index==cur_input.index_field {reference for buffer information} 6460@d start==cur_input.start_field {starting position in |buffer|} 6461@d limit==cur_input.limit_field {end of current line in |buffer|} 6462@d name==cur_input.name_field {name of the current file} 6463 6464@ Let's look more closely now at the control variables 6465(|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|), 6466assuming that \TeX\ is reading a line of characters that have been input 6467from some file or from the user's terminal. There is an array called 6468|buffer| that acts as a stack of all lines of characters that are 6469currently being read from files, including all lines on subsidiary 6470levels of the input stack that are not yet completed. \TeX\ will return to 6471the other lines when it is finished with the present input file. 6472 6473(Incidentally, on a machine with byte-oriented addressing, it might be 6474appropriate to combine |buffer| with the |str_pool| array, 6475letting the buffer entries grow downward from the top of the string pool 6476and checking that these two tables don't bump into each other.) 6477 6478The line we are currently working on begins in position |start| of the 6479buffer; the next character we are about to read is |buffer[loc]|; and 6480|limit| is the location of the last character present. If |loc>limit|, 6481the line has been completely read. Usually |buffer[limit]| is the 6482|end_line_char|, denoting the end of a line, but this is not 6483true if the current line is an insertion that was entered on the user's 6484terminal in response to an error message. 6485 6486The |name| variable is a string number that designates the name of 6487the current file, if we are reading a text file. It is zero if we 6488are reading from the terminal; it is |n+1| if we are reading from 6489input stream |n|, where |0<=n<=16|. (Input stream 16 stands for 6490an invalid stream number; in such cases the input is actually from 6491the terminal, under control of the procedure |read_toks|.) 6492 6493The |state| variable has one of three values, when we are scanning such 6494files: 6495$$\baselineskip 15pt\vbox{\halign{#\hfil\cr 64961) |state=mid_line| is the normal state.\cr 64972) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr 64983) |state=new_line| is the state at the beginning of a line.\cr}}$$ 6499These state values are assigned numeric codes so that if we add the state 6500code to the next character's command code, we get distinct values. For 6501example, `|mid_line+spacer|' stands for the case that a blank 6502space character occurs in the middle of a line when it is not being 6503ignored; after this case is processed, the next value of |state| will 6504be |skip_blanks|. 6505 6506@d mid_line=1 {|state| code when scanning a line of characters} 6507@d skip_blanks=2+max_char_code {|state| code when ignoring blanks} 6508@d new_line=3+max_char_code+max_char_code {|state| code at start of line} 6509 6510@ Additional information about the current line is available via the 6511|index| variable, which counts how many lines of characters are present 6512in the buffer below the current level. We have |index=0| when reading 6513from the terminal and prompting the user for each line; then if the user types, 6514e.g., `\.{\\input paper}', we will have |index=1| while reading 6515the file \.{paper.tex}. However, it does not follow that |index| is the 6516same as the input stack pointer, since many of the levels on the input 6517stack may come from token lists. For example, the instruction `\.{\\input 6518paper}' might occur in a token list. 6519 6520The global variable |in_open| is equal to the |index| 6521value of the highest non-token-list level. Thus, the number of partially read 6522lines in the buffer is |in_open+1|, and we have |in_open=index| 6523when we are not reading a token list. 6524 6525If we are not currently reading from the terminal, or from an input 6526stream, we are reading from the file variable |input_file[index]|. We use 6527the notation |terminal_input| as a convenient abbreviation for |name=0|, 6528and |cur_file| as an abbreviation for |input_file[index]|. 6529 6530The global variable |line| contains the line number in the topmost 6531open file, for use in error messages. If we are not reading from 6532the terminal, |line_stack[index]| holds the line number for the 6533enclosing level, so that |line| can be restored when the current 6534file has been read. Line numbers should never be negative, since the 6535negative of the current line number is used to identify the user's output 6536routine in the |mode_line| field of the semantic nest entries. 6537 6538If more information about the input state is needed, it can be 6539included in small arrays like those shown here. For example, 6540the current page or segment number in the input file might be 6541put into a variable |@!page|, maintained for enclosing levels in 6542`\ignorespaces|@!page_stack:array[1..max_in_open] of integer|\unskip' 6543by analogy with |line_stack|. 6544@^system dependencies@> 6545 6546@d terminal_input==(name=0) {are we reading from the terminal?} 6547@d cur_file==input_file[index] {the current |alpha_file| variable} 6548 6549@<Glob...@>= 6550@!in_open : 0..max_in_open; {the number of lines in the buffer, less one} 6551@!open_parens : 0..max_in_open; {the number of open text files} 6552@!input_file : array[1..max_in_open] of alpha_file; 6553@!line : integer; {current line number in the current source file} 6554@!line_stack : array[1..max_in_open] of integer; 6555 6556@ Users of \TeX\ sometimes forget to balance left and right braces properly, 6557and one of the ways \TeX\ tries to spot such errors is by considering an 6558input file as broken into subfiles by control sequences that 6559are declared to be \.{\\outer}. 6560 6561A variable called |scanner_status| tells \TeX\ whether or not to complain 6562when a subfile ends. This variable has six possible values: 6563 6564\yskip\hang|normal|, means that a subfile can safely end here without incident. 6565 6566\yskip\hang|skipping|, means that a subfile can safely end here, but not a file, 6567because we're reading past some conditional text that was not selected. 6568 6569\yskip\hang|defining|, means that a subfile shouldn't end now because a 6570macro is being defined. 6571 6572\yskip\hang|matching|, means that a subfile shouldn't end now because a 6573macro is being used and we are searching for the end of its arguments. 6574 6575\yskip\hang|aligning|, means that a subfile shouldn't end now because we are 6576not finished with the preamble of an \.{\\halign} or \.{\\valign}. 6577 6578\yskip\hang|absorbing|, means that a subfile shouldn't end now because we are 6579reading a balanced token list for \.{\\message}, \.{\\write}, etc. 6580 6581\yskip\noindent 6582If the |scanner_status| is not |normal|, the variable |warning_index| points 6583to the |eqtb| location for the relevant control sequence name to print 6584in an error message. 6585 6586@d skipping=1 {|scanner_status| when passing conditional text} 6587@d defining=2 {|scanner_status| when reading a macro definition} 6588@d matching=3 {|scanner_status| when reading macro arguments} 6589@d aligning=4 {|scanner_status| when reading an alignment preamble} 6590@d absorbing=5 {|scanner_status| when reading a balanced text} 6591 6592@<Glob...@>= 6593@!scanner_status : normal..absorbing; {can a subfile end now?} 6594@!warning_index : pointer; {identifier relevant to non-|normal| scanner status} 6595@!def_ref : pointer; {reference count of token list being defined} 6596 6597@ Here is a procedure that uses |scanner_status| to print a warning message 6598when a subfile has ended, and at certain other crucial times: 6599 6600@<Declare the procedure called |runaway|@>= 6601procedure runaway; 6602var p:pointer; {head of runaway list} 6603begin if scanner_status>skipping then 6604 begin print_nl("Runaway "); 6605@.Runaway...@> 6606 case scanner_status of 6607 defining: begin print("definition"); p:=def_ref; 6608 end; 6609 matching: begin print("argument"); p:=temp_head; 6610 end; 6611 aligning: begin print("preamble"); p:=hold_head; 6612 end; 6613 absorbing: begin print("text"); p:=def_ref; 6614 end; 6615 end; {there are no other cases} 6616 print_char("?");print_ln; show_token_list(link(p),null,error_line-10); 6617 end; 6618end; 6619 6620@ However, all this discussion about input state really applies only to the 6621case that we are inputting from a file. There is another important case, 6622namely when we are currently getting input from a token list. In this case 6623|state=token_list|, and the conventions about the other state variables 6624are different: 6625 6626\yskip\hang|loc| is a pointer to the current node in the token list, i.e., 6627the node that will be read next. If |loc=null|, the token list has been 6628fully read. 6629 6630\yskip\hang|start| points to the first node of the token list; this node 6631may or may not contain a reference count, depending on the type of token 6632list involved. 6633 6634\yskip\hang|token_type|, which takes the place of |index| in the 6635discussion above, is a code number that explains what kind of token list 6636is being scanned. 6637 6638\yskip\hang|name| points to the |eqtb| address of the control sequence 6639being expanded, if the current token list is a macro. 6640 6641\yskip\hang|param_start|, which takes the place of |limit|, tells where 6642the parameters of the current macro begin in the |param_stack|, if the 6643current token list is a macro. 6644 6645\yskip\noindent The |token_type| can take several values, depending on 6646where the current token list came from: 6647 6648\yskip\hang|parameter|, if a parameter is being scanned; 6649 6650\hang|u_template|, if the \<u_j> part of an alignment 6651template is being scanned; 6652 6653\hang|v_template|, if the \<v_j> part of an alignment 6654template is being scanned; 6655 6656\hang|backed_up|, if the token list being scanned has been inserted as 6657`to be read again'. 6658 6659\hang|inserted|, if the token list being scanned has been inserted as 6660the text expansion of a \.{\\count} or similar variable; 6661 6662\hang|macro|, if a user-defined control sequence is being scanned; 6663 6664\hang|output_text|, if an \.{\\output} routine is being scanned; 6665 6666\hang|every_par_text|, if the text of \.{\\everypar} is being scanned; 6667 6668\hang|every_math_text|, if the text of \.{\\everymath} is being scanned; 6669 6670\hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned; 6671 6672\hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned; 6673 6674\hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned; 6675 6676\hang|every_job_text|, if the text of \.{\\everyjob} is being scanned; 6677 6678\hang|every_cr_text|, if the text of \.{\\everycr} is being scanned; 6679 6680\hang|mark_text|, if the text of a \.{\\mark} is being scanned; 6681 6682\hang|write_text|, if the text of a \.{\\write} is being scanned. 6683 6684\yskip\noindent 6685The codes for |output_text|, |every_par_text|, etc., are equal to a constant 6686plus the corresponding codes for token list parameters |output_routine_loc|, 6687|every_par_loc|, etc. The token list begins with a reference count if and 6688only if |token_type>=macro|. 6689@^reference counts@> 6690 6691@d token_list=0 {|state| code when scanning a token list} 6692@d token_type==index {type of current token list} 6693@d param_start==limit {base of macro parameters in |param_stack|} 6694@d parameter=0 {|token_type| code for parameter} 6695@d u_template=1 {|token_type| code for \<u_j> template} 6696@d v_template=2 {|token_type| code for \<v_j> template} 6697@d backed_up=3 {|token_type| code for text to be reread} 6698@d inserted=4 {|token_type| code for inserted texts} 6699@d macro=5 {|token_type| code for defined control sequences} 6700@d output_text=6 {|token_type| code for output routines} 6701@d every_par_text=7 {|token_type| code for \.{\\everypar}} 6702@d every_math_text=8 {|token_type| code for \.{\\everymath}} 6703@d every_display_text=9 {|token_type| code for \.{\\everydisplay}} 6704@d every_hbox_text=10 {|token_type| code for \.{\\everyhbox}} 6705@d every_vbox_text=11 {|token_type| code for \.{\\everyvbox}} 6706@d every_job_text=12 {|token_type| code for \.{\\everyjob}} 6707@d every_cr_text=13 {|token_type| code for \.{\\everycr}} 6708@d mark_text=14 {|token_type| code for \.{\\topmark}, etc.} 6709@d write_text=15 {|token_type| code for \.{\\write}} 6710 6711@ The |param_stack| is an auxiliary array used to hold pointers to the token 6712lists for parameters at the current level and subsidiary levels of input. 6713This stack is maintained with convention (2), and it grows at a different 6714rate from the others. 6715 6716@<Glob...@>= 6717@!param_stack:array [0..param_size] of pointer; 6718 {token list pointers for parameters} 6719@!param_ptr:0..param_size; {first unused entry in |param_stack|} 6720@!max_param_stack:integer; 6721 {largest value of |param_ptr|, will be |<=param_size+9|} 6722 6723@ The input routines must also interact with the processing of 6724\.{\\halign} and \.{\\valign}, since the appearance of tab marks and 6725\.{\\cr} in certain places is supposed to trigger the beginning of special 6726\<v_j> template text in the scanner. This magic is accomplished by an 6727|align_state| variable that is increased by~1 when a `\.{\char'173}' is 6728scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state| 6729is nonzero during the \<u_j> template, after which it is set to zero; the 6730\<v_j> template begins when a tab mark or \.{\\cr} occurs at a time that 6731|align_state=0|. 6732 6733@<Glob...@>= 6734@!align_state:integer; {group level with respect to current alignment} 6735 6736@ Thus, the ``current input state'' can be very complicated indeed; there 6737can be many levels and each level can arise in a variety of ways. The 6738|show_context| procedure, which is used by \TeX's error-reporting routine to 6739print out the current input state on all levels down to the most recent 6740line of characters from an input file, illustrates most of these conventions. 6741The global variable |base_ptr| contains the lowest level that was 6742displayed by this procedure. 6743 6744@<Glob...@>= 6745@!base_ptr:0..stack_size; {shallowest level shown by |show_context|} 6746 6747@ The status at each level is indicated by printing two lines, where the first 6748line indicates what was read so far and the second line shows what remains 6749to be read. The context is cropped, if necessary, so that the first line 6750contains at most |half_error_line| characters, and the second contains 6751at most |error_line|. Non-current input levels whose |token_type| is 6752`|backed_up|' are shown only if they have not been fully read. 6753 6754@p procedure show_context; {prints where the scanner is} 6755label done; 6756var old_setting:0..max_selector; {saved |selector| setting} 6757@!nn:integer; {number of contexts shown so far, less one} 6758@!bottom_line:boolean; {have we reached the final context to be shown?} 6759@<Local variables for formatting calculations@>@/ 6760begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input; 6761 {store current state} 6762nn:=-1; bottom_line:=false; 6763loop@+begin cur_input:=input_stack[base_ptr]; {enter into the context} 6764 if (state<>token_list) then 6765 if (name>17) or (base_ptr=0) then bottom_line:=true; 6766 if (base_ptr=input_ptr)or bottom_line or(nn<error_context_lines) then 6767 @<Display the current context@> 6768 else if nn=error_context_lines then 6769 begin print_nl("..."); incr(nn); {omitted if |error_context_lines<0|} 6770 end; 6771 if bottom_line then goto done; 6772 decr(base_ptr); 6773 end; 6774done: cur_input:=input_stack[input_ptr]; {restore original state} 6775end; 6776 6777@ @<Display the current context@>= 6778begin if (base_ptr=input_ptr) or (state<>token_list) or 6779 (token_type<>backed_up) or (loc<>null) then 6780 {we omit backed-up token lists that have already been read} 6781 begin tally:=0; {get ready to count characters} 6782 old_setting:=selector; 6783 if state<>token_list then 6784 begin @<Print location of current line@>; 6785 @<Pseudoprint the line@>; 6786 end 6787 else begin @<Print type of token list@>; 6788 @<Pseudoprint the token list@>; 6789 end; 6790 selector:=old_setting; {stop pseudoprinting} 6791 @<Print two lines using the tricky pseudoprinted information@>; 6792 incr(nn); 6793 end; 6794end 6795 6796@ This routine should be changed, if necessary, to give the best possible 6797indication of where the current line resides in the input file. 6798For example, on some systems it is best to print both a page and line number. 6799@^system dependencies@> 6800 6801@<Print location of current line@>= 6802if name<=17 then 6803 if terminal_input then 6804 if base_ptr=0 then print_nl("<*>") else print_nl("<insert> ") 6805 else begin print_nl("<read "); 6806 if name=17 then print_char("*")@+else print_int(name-1); 6807@.*\relax@> 6808 print_char(">"); 6809 end 6810else begin print_nl("l."); print_int(line); 6811 end; 6812print_char(" ") 6813 6814@ @<Print type of token list@>= 6815case token_type of 6816parameter: print_nl("<argument> "); 6817u_template,v_template: print_nl("<template> "); 6818backed_up: if loc=null then print_nl("<recently read> ") 6819 else print_nl("<to be read again> "); 6820inserted: print_nl("<inserted text> "); 6821macro: begin print_ln; print_cs(name); 6822 end; 6823output_text: print_nl("<output> "); 6824every_par_text: print_nl("<everypar> "); 6825every_math_text: print_nl("<everymath> "); 6826every_display_text: print_nl("<everydisplay> "); 6827every_hbox_text: print_nl("<everyhbox> "); 6828every_vbox_text: print_nl("<everyvbox> "); 6829every_job_text: print_nl("<everyjob> "); 6830every_cr_text: print_nl("<everycr> "); 6831mark_text: print_nl("<mark> "); 6832write_text: print_nl("<write> "); 6833othercases print_nl("?") {this should never happen} 6834endcases 6835 6836@ Here it is necessary to explain a little trick. We don't want to store a long 6837string that corresponds to a token list, because that string might take up 6838lots of memory; and we are printing during a time when an error message is 6839being given, so we dare not do anything that might overflow one of \TeX's 6840tables. So `pseudoprinting' is the answer: We enter a mode of printing 6841that stores characters into a buffer of length |error_line|, where character 6842$k+1$ is placed into \hbox{|trick_buf[k mod error_line]|} if 6843|k<trick_count|, otherwise character |k| is dropped. Initially we set 6844|tally:=0| and |trick_count:=1000000|; then when we reach the 6845point where transition from line 1 to line 2 should occur, we 6846set |first_count:=tally| and |trick_count:=@tmax@>(error_line, 6847tally+1+error_line-half_error_line)|. At the end of the 6848pseudoprinting, the values of |first_count|, |tally|, and 6849|trick_count| give us all the information we need to print the two lines, 6850and all of the necessary text is in |trick_buf|. 6851 6852Namely, let |l| be the length of the descriptive information that appears 6853on the first line. The length of the context information gathered for that 6854line is |k=first_count|, and the length of the context information 6855gathered for line~2 is $m=\min(|tally|, |trick_count|)-k$. If |l+k<=h|, 6856where |h=half_error_line|, we print |trick_buf[0..k-1]| after the 6857descriptive information on line~1, and set |n:=l+k|; here |n| is the 6858length of line~1. If $l+k>h$, some cropping is necessary, so we set |n:=h| 6859and print `\.{...}' followed by 6860$$\hbox{|trick_buf[(l+k-h+3)..k-1]|,}$$ 6861where subscripts of |trick_buf| are circular modulo |error_line|. The 6862second line consists of |n|~spaces followed by |trick_buf[k..(k+m-1)]|, 6863unless |n+m>error_line|; in the latter case, further cropping is done. 6864This is easier to program than to explain. 6865 6866@<Local variables for formatting...@>= 6867@!i:0..buf_size; {index into |buffer|} 6868@!j:0..buf_size; {end of current line in |buffer|} 6869@!l:0..half_error_line; {length of descriptive information on line 1} 6870@!m:integer; {context information gathered for line 2} 6871@!n:0..error_line; {length of line 1} 6872@!p: integer; {starting or ending place in |trick_buf|} 6873@!q: integer; {temporary index} 6874 6875@ The following code sets up the print routines so that they will gather 6876the desired information. 6877 6878@d begin_pseudoprint== 6879 begin l:=tally; tally:=0; selector:=pseudo; 6880 trick_count:=1000000; 6881 end 6882@d set_trick_count== 6883 begin first_count:=tally; 6884 trick_count:=tally+1+error_line-half_error_line; 6885 if trick_count<error_line then trick_count:=error_line; 6886 end 6887 6888@ And the following code uses the information after it has been gathered. 6889 6890@<Print two lines using the tricky pseudoprinted information@>= 6891if trick_count=1000000 then set_trick_count; 6892 {|set_trick_count| must be performed} 6893if tally<trick_count then m:=tally-first_count 6894else m:=trick_count-first_count; {context on line 2} 6895if l+first_count<=half_error_line then 6896 begin p:=0; n:=l+first_count; 6897 end 6898else begin print("..."); p:=l+first_count-half_error_line+3; 6899 n:=half_error_line; 6900 end; 6901for q:=p to first_count-1 do print_char(trick_buf[q mod error_line]); 6902print_ln; 6903for q:=1 to n do print_char(" "); {print |n| spaces to begin line~2} 6904if m+n<=error_line then p:=first_count+m else p:=first_count+(error_line-n-3); 6905for q:=first_count to p-1 do print_char(trick_buf[q mod error_line]); 6906if m+n>error_line then print("...") 6907 6908@ But the trick is distracting us from our current goal, which is to 6909understand the input state. So let's concentrate on the data structures that 6910are being pseudoprinted as we finish up the |show_context| procedure. 6911 6912@<Pseudoprint the line@>= 6913begin_pseudoprint; 6914if buffer[limit]=end_line_char then j:=limit 6915else j:=limit+1; {determine the effective end of the line} 6916if j>0 then for i:=start to j-1 do 6917 begin if i=loc then set_trick_count; 6918 print(buffer[i]); 6919 end 6920 6921@ @<Pseudoprint the token list@>= 6922begin_pseudoprint; 6923if token_type<macro then show_token_list(start,loc,100000) 6924else show_token_list(link(start),loc,100000) {avoid reference count} 6925 6926@ Here is the missing piece of |show_token_list| that is activated when the 6927token beginning line~2 is about to be shown: 6928 6929@<Do magic computation@>=set_trick_count 6930 6931@* \[23] Maintaining the input stacks. 6932The following subroutines change the input status in commonly needed ways. 6933 6934First comes |push_input|, which stores the current state and creates a 6935new level (having, initially, the same properties as the old). 6936 6937@d push_input==@t@> {enter a new input level, save the old} 6938 begin if input_ptr>max_in_stack then 6939 begin max_in_stack:=input_ptr; 6940 if input_ptr=stack_size then overflow("input stack size",stack_size); 6941@:TeX capacity exceeded input stack size}{\quad input stack size@> 6942 end; 6943 input_stack[input_ptr]:=cur_input; {stack the record} 6944 incr(input_ptr); 6945 end 6946 6947@ And of course what goes up must come down. 6948 6949@d pop_input==@t@> {leave an input level, re-enter the old} 6950 begin decr(input_ptr); cur_input:=input_stack[input_ptr]; 6951 end 6952 6953@ Here is a procedure that starts a new level of token-list input, given 6954a token list |p| and its type |t|. If |t=macro|, the calling routine should 6955set |name| and |loc|. 6956 6957@d back_list(#)==begin_token_list(#,backed_up) {backs up a simple token list} 6958@d ins_list(#)==begin_token_list(#,inserted) {inserts a simple token list} 6959 6960@p procedure begin_token_list(@!p:pointer;@!t:quarterword); 6961begin push_input; state:=token_list; start:=p; token_type:=t; 6962if t>=macro then {the token list starts with a reference count} 6963 begin add_token_ref(p); 6964 if t=macro then param_start:=param_ptr 6965 else begin loc:=link(p); 6966 if tracing_macros>1 then 6967 begin begin_diagnostic; print_nl(""); 6968 case t of 6969 mark_text:print_esc("mark"); 6970 write_text:print_esc("write"); 6971 othercases print_cmd_chr(assign_toks,t-output_text+output_routine_loc) 6972 endcases;@/ 6973 print("->"); token_show(p); end_diagnostic(false); 6974 end; 6975 end; 6976 end 6977else loc:=p; 6978end; 6979 6980@ When a token list has been fully scanned, the following computations 6981should be done as we leave that level of input. The |token_type| tends 6982to be equal to either |backed_up| or |inserted| about 2/3 of the time. 6983@^inner loop@> 6984 6985@p procedure end_token_list; {leave a token-list input level} 6986begin if token_type>=backed_up then {token list to be deleted} 6987 begin if token_type<=inserted then flush_list(start) 6988 else begin delete_token_ref(start); {update reference count} 6989 if token_type=macro then {parameters must be flushed} 6990 while param_ptr>param_start do 6991 begin decr(param_ptr); 6992 flush_list(param_stack[param_ptr]); 6993 end; 6994 end; 6995 end 6996else if token_type=u_template then 6997 if align_state>500000 then align_state:=0 6998 else fatal_error("(interwoven alignment preambles are not allowed)"); 6999@.interwoven alignment preambles...@> 7000pop_input; 7001check_interrupt; 7002end; 7003 7004@ Sometimes \TeX\ has read too far and wants to ``unscan'' what it has 7005seen. The |back_input| procedure takes care of this by putting the token 7006just scanned back into the input stream, ready to be read again. This 7007procedure can be used only if |cur_tok| represents the token to be 7008replaced. Some applications of \TeX\ use this procedure a lot, 7009so it has been slightly optimized for speed. 7010@^inner loop@> 7011 7012@p procedure back_input; {undoes one token of input} 7013var p:pointer; {a token list of length one} 7014begin while (state=token_list)and(loc=null)and(token_type<>v_template) do 7015 end_token_list; {conserve stack space} 7016p:=get_avail; info(p):=cur_tok; 7017if cur_tok<right_brace_limit then 7018 if cur_tok<left_brace_limit then decr(align_state) 7019 else incr(align_state); 7020push_input; state:=token_list; start:=p; token_type:=backed_up; 7021loc:=p; {that was |back_list(p)|, without procedure overhead} 7022end; 7023 7024@ @<Insert token |p| into \TeX's input@>= 7025begin t:=cur_tok; cur_tok:=p; back_input; cur_tok:=t; 7026end 7027 7028@ The |back_error| routine is used when we want to replace an offending token 7029just before issuing an error message. This routine, like |back_input|, 7030requires that |cur_tok| has been set. We disable interrupts during the 7031call of |back_input| so that the help message won't be lost. 7032 7033@p procedure back_error; {back up one token and call |error|} 7034begin OK_to_interrupt:=false; back_input; OK_to_interrupt:=true; error; 7035end; 7036@# 7037procedure ins_error; {back up one inserted token and call |error|} 7038begin OK_to_interrupt:=false; back_input; token_type:=inserted; 7039OK_to_interrupt:=true; error; 7040end; 7041 7042@ The |begin_file_reading| procedure starts a new level of input for lines 7043of characters to be read from a file, or as an insertion from the 7044terminal. It does not take care of opening the file, nor does it set |loc| 7045or |limit| or |line|. 7046@^system dependencies@> 7047 7048@p procedure begin_file_reading; 7049begin if in_open=max_in_open then overflow("text input levels",max_in_open); 7050@:TeX capacity exceeded text input levels}{\quad text input levels@> 7051if first=buf_size then overflow("buffer size",buf_size); 7052@:TeX capacity exceeded buffer size}{\quad buffer size@> 7053incr(in_open); push_input; index:=in_open; 7054line_stack[index]:=line; start:=first; state:=mid_line; 7055name:=0; {|terminal_input| is now |true|} 7056end; 7057 7058@ Conversely, the variables must be downdated when such a level of input 7059is finished: 7060 7061@p procedure end_file_reading; 7062begin first:=start; line:=line_stack[index]; 7063if name>17 then a_close(cur_file); {forget it} 7064pop_input; decr(in_open); 7065end; 7066 7067@ In order to keep the stack from overflowing during a long sequence of 7068inserted `\.{\\show}' commands, the following routine removes completed 7069error-inserted lines from memory. 7070 7071@p procedure clear_for_error_prompt; 7072begin while (state<>token_list)and terminal_input and@| 7073 (input_ptr>0)and(loc>limit) do end_file_reading; 7074print_ln; clear_terminal; 7075end; 7076 7077@ To get \TeX's whole input mechanism going, we perform the following 7078actions. 7079 7080@<Initialize the input routines@>= 7081begin input_ptr:=0; max_in_stack:=0; 7082in_open:=0; open_parens:=0; max_buf_stack:=0; 7083param_ptr:=0; max_param_stack:=0; 7084first:=buf_size; repeat buffer[first]:=0; decr(first); until first=0; 7085scanner_status:=normal; warning_index:=null; first:=1; 7086state:=new_line; start:=1; index:=0; line:=0; name:=0; 7087force_eof:=false; 7088align_state:=1000000;@/ 7089if not init_terminal then goto final_end; 7090limit:=last; first:=last+1; {|init_terminal| has set |loc| and |last|} 7091end 7092 7093@* \[24] Getting the next token. 7094The heart of \TeX's input mechanism is the |get_next| procedure, which 7095we shall develop in the next few sections of the program. Perhaps we 7096shouldn't actually call it the ``heart,'' however, because it really acts 7097as \TeX's eyes and mouth, reading the source files and gobbling them up. 7098And it also helps \TeX\ to regurgitate stored token lists that are to be 7099processed again. 7100@^eyes and mouth@> 7101 7102The main duty of |get_next| is to input one token and to set |cur_cmd| 7103and |cur_chr| to that token's command code and modifier. Furthermore, if 7104the input token is a control sequence, the |eqtb| location of that control 7105sequence is stored in |cur_cs|; otherwise |cur_cs| is set to zero. 7106 7107Underlying this simple description is a certain amount of complexity 7108because of all the cases that need to be handled. 7109However, the inner loop of |get_next| is reasonably short and fast. 7110 7111When |get_next| is asked to get the next token of a \.{\\read} line, 7112it sets |cur_cmd=cur_chr=cur_cs=0| in the case that no more tokens 7113appear on that line. (There might not be any tokens at all, if the 7114|end_line_char| has |ignore| as its catcode.) 7115 7116@ The value of |par_loc| is the |eqtb| address of `\.{\\par}'. This quantity 7117is needed because a blank line of input is supposed to be exactly equivalent 7118to the appearance of \.{\\par}; we must set |cur_cs:=par_loc| 7119when detecting a blank line. 7120 7121@<Glob...@>= 7122@!par_loc:pointer; {location of `\.{\\par}' in |eqtb|} 7123@!par_token:halfword; {token representing `\.{\\par}'} 7124 7125@ @<Put each...@>= 7126primitive("par",par_end,256); {cf.\ |scan_file_name|} 7127@!@:par_}{\.{\\par} primitive@> 7128par_loc:=cur_val; par_token:=cs_token_flag+par_loc; 7129 7130@ @<Cases of |print_cmd_chr|...@>= 7131par_end:print_esc("par"); 7132 7133@ Before getting into |get_next|, let's consider the subroutine that 7134is called when an `\.{\\outer}' control sequence has been scanned or 7135when the end of a file has been reached. These two cases are distinguished 7136by |cur_cs|, which is zero at the end of a file. 7137 7138@p procedure check_outer_validity; 7139var p:pointer; {points to inserted token list} 7140@!q:pointer; {auxiliary pointer} 7141begin if scanner_status<>normal then 7142 begin deletions_allowed:=false; 7143 @<Back up an outer control sequence so that it can be reread@>; 7144 if scanner_status>skipping then 7145 @<Tell the user what has run away and try to recover@> 7146 else begin print_err("Incomplete "); print_cmd_chr(if_test,cur_if); 7147@.Incomplete \\if...@> 7148 print("; all text was ignored after line "); print_int(skip_line); 7149 help3("A forbidden control sequence occurred in skipped text.")@/ 7150 ("This kind of error happens when you say `\if...' and forget")@/ 7151 ("the matching `\fi'. I've inserted a `\fi'; this might work."); 7152 if cur_cs<>0 then cur_cs:=0 7153 else help_line[2]:=@| 7154 "The file ended while I was skipping conditional text."; 7155 cur_tok:=cs_token_flag+frozen_fi; ins_error; 7156 end; 7157 deletions_allowed:=true; 7158 end; 7159end; 7160 7161@ An outer control sequence that occurs in a \.{\\read} will not be reread, 7162since the error recovery for \.{\\read} is not very powerful. 7163 7164@<Back up an outer control sequence so that it can be reread@>= 7165if cur_cs<>0 then 7166 begin if (state=token_list)or(name<1)or(name>17) then 7167 begin p:=get_avail; info(p):=cs_token_flag+cur_cs; 7168 back_list(p); {prepare to read the control sequence again} 7169 end; 7170 cur_cmd:=spacer; cur_chr:=" "; {replace it by a space} 7171 end 7172 7173@ @<Tell the user what has run away...@>= 7174begin runaway; {print a definition, argument, or preamble} 7175if cur_cs=0 then print_err("File ended") 7176@.File ended while scanning...@> 7177else begin cur_cs:=0; print_err("Forbidden control sequence found"); 7178@.Forbidden control sequence...@> 7179 end; 7180print(" while scanning "); 7181@<Print either `\.{definition}' or `\.{use}' or `\.{preamble}' or `\.{text}', 7182 and insert tokens that should lead to recovery@>; 7183print(" of "); sprint_cs(warning_index); 7184help4("I suspect you have forgotten a `}', causing me")@/ 7185("to read past where you wanted me to stop.")@/ 7186("I'll try to recover; but if the error is serious,")@/ 7187("you'd better type `E' or `X' now and fix your file.");@/ 7188error; 7189end 7190 7191@ The recovery procedure can't be fully understood without knowing more 7192about the \TeX\ routines that should be aborted, but we can sketch the 7193ideas here: For a runaway definition we will insert a right brace; for a 7194runaway preamble, we will insert a special \.{\\cr} token and a right 7195brace; and for a runaway argument, we will set |long_state| to 7196|outer_call| and insert \.{\\par}. 7197 7198@<Print either `\.{definition}' or ...@>= 7199p:=get_avail; 7200case scanner_status of 7201defining:begin print("definition"); info(p):=right_brace_token+"}"; 7202 end; 7203matching:begin print("use"); info(p):=par_token; long_state:=outer_call; 7204 end; 7205aligning:begin print("preamble"); info(p):=right_brace_token+"}"; q:=p; 7206 p:=get_avail; link(p):=q; info(p):=cs_token_flag+frozen_cr; 7207 align_state:=-1000000; 7208 end; 7209absorbing:begin print("text"); info(p):=right_brace_token+"}"; 7210 end; 7211end; {there are no other cases} 7212ins_list(p) 7213 7214@ We need to mention a procedure here that may be called by |get_next|. 7215 7216@p procedure@?firm_up_the_line; forward; 7217 7218@ Now we're ready to take the plunge into |get_next| itself. Parts of 7219this routine are executed more often than any other instructions of \TeX. 7220@^mastication@>@^inner loop@> 7221 7222@d switch=25 {a label in |get_next|} 7223@d start_cs=26 {another} 7224 7225@p procedure get_next; {sets |cur_cmd|, |cur_chr|, |cur_cs| to next token} 7226label restart, {go here to get the next input token} 7227 switch, {go here to eat the next character from a file} 7228 reswitch, {go here to digest it again} 7229 start_cs, {go here to start looking for a control sequence} 7230 found, {go here when a control sequence has been found} 7231 exit; {go here when the next input token has been got} 7232var k:0..buf_size; {an index into |buffer|} 7233@!t:halfword; {a token} 7234@!cat:0..max_char_code; {|cat_code(cur_chr)|, usually} 7235@!c,@!cc:ASCII_code; {constituents of a possible expanded code} 7236@!d:2..3; {number of excess characters in an expanded code} 7237begin restart: cur_cs:=0; 7238if state<>token_list then 7239@<Input from external file, |goto restart| if no input found@> 7240else @<Input from token list, |goto restart| if end of list or 7241 if a parameter needs to be expanded@>; 7242@<If an alignment entry has just ended, take appropriate action@>; 7243exit:end; 7244 7245@ An alignment entry ends when a tab or \.{\\cr} occurs, provided that the 7246current level of braces is the same as the level that was present at the 7247beginning of that alignment entry; i.e., provided that |align_state| has 7248returned to the value it had after the \<u_j> template for that entry. 7249@^inner loop@> 7250 7251@<If an alignment entry has just ended, take appropriate action@>= 7252if cur_cmd<=car_ret then if cur_cmd>=tab_mark then if align_state=0 then 7253 @<Insert the \(v)\<v_j> template and |goto restart|@> 7254 7255@ @<Input from external file, |goto restart| if no input found@>= 7256@^inner loop@> 7257begin switch: if loc<=limit then {current line not yet finished} 7258 begin cur_chr:=buffer[loc]; incr(loc); 7259 reswitch: cur_cmd:=cat_code(cur_chr); 7260 @<Change state if necessary, and |goto switch| if the 7261 current character should be ignored, 7262 or |goto reswitch| if the current character 7263 changes to another@>; 7264 end 7265else begin state:=new_line;@/ 7266 @<Move to next line of file, 7267 or |goto restart| if there is no next line, 7268 or |return| if a \.{\\read} line has finished@>; 7269 check_interrupt; 7270 goto switch; 7271 end; 7272end 7273 7274@ The following 48-way switch accomplishes the scanning quickly, assuming 7275that a decent \PASCAL\ compiler has translated the code. Note that the numeric 7276values for |mid_line|, |skip_blanks|, and |new_line| are spaced 7277apart from each other by |max_char_code+1|, so we can add a character's 7278command code to the state to get a single number that characterizes both. 7279 7280@d any_state_plus(#) == mid_line+#,skip_blanks+#,new_line+# 7281 7282@<Change state if necessary...@>= 7283case state+cur_cmd of 7284@<Cases where character is ignored@>: goto switch; 7285any_state_plus(escape): @<Scan a control sequence 7286 and set |state:=skip_blanks| or |mid_line|@>; 7287any_state_plus(active_char): @<Process an active-character control sequence 7288 and set |state:=mid_line|@>; 7289any_state_plus(sup_mark): @<If this |sup_mark| starts an expanded character 7290 like~\.{\^\^A} or~\.{\^\^df}, then |goto reswitch|, 7291 otherwise set |state:=mid_line|@>; 7292any_state_plus(invalid_char): @<Decry the invalid character and 7293 |goto restart|@>; 7294@t\4@>@<Handle situations involving spaces, braces, changes of state@>@; 7295othercases do_nothing 7296endcases 7297 7298@ @<Cases where character is ignored@>= 7299any_state_plus(ignore),skip_blanks+spacer,new_line+spacer 7300 7301@ We go to |restart| instead of to |switch|, because |state| might equal 7302|token_list| after the error has been dealt with 7303(cf.\ |clear_for_error_prompt|). 7304 7305@<Decry the invalid...@>= 7306begin print_err("Text line contains an invalid character"); 7307@.Text line contains...@> 7308help2("A funny symbol that I can't read has just been input.")@/ 7309("Continue, and I'll forget that it ever happened.");@/ 7310deletions_allowed:=false; error; deletions_allowed:=true; 7311goto restart; 7312end 7313 7314@ @d add_delims_to(#)==#+math_shift,#+tab_mark,#+mac_param, 7315 #+sub_mark,#+letter,#+other_char 7316 7317@<Handle situations involving spaces, braces, changes of state@>= 7318mid_line+spacer:@<Enter |skip_blanks| state, emit a space@>; 7319mid_line+car_ret:@<Finish line, emit a space@>; 7320skip_blanks+car_ret,any_state_plus(comment): 7321 @<Finish line, |goto switch|@>; 7322new_line+car_ret:@<Finish line, emit a \.{\\par}@>; 7323mid_line+left_brace: incr(align_state); 7324skip_blanks+left_brace,new_line+left_brace: begin 7325 state:=mid_line; incr(align_state); 7326 end; 7327mid_line+right_brace: decr(align_state); 7328skip_blanks+right_brace,new_line+right_brace: begin 7329 state:=mid_line; decr(align_state); 7330 end; 7331add_delims_to(skip_blanks),add_delims_to(new_line): state:=mid_line; 7332 7333@ When a character of type |spacer| gets through, its character code is 7334changed to $\.{"\ "}=@'40$. This means that the ASCII codes for tab and space, 7335and for the space inserted at the end of a line, will 7336be treated alike when macro parameters are being matched. We do this 7337since such characters are indistinguishable on most computer terminal displays. 7338 7339@<Finish line, emit a space@>= 7340begin loc:=limit+1; cur_cmd:=spacer; cur_chr:=" "; 7341end 7342 7343@ The following code is performed only when |cur_cmd=spacer|. 7344 7345@<Enter |skip_blanks| state, emit a space@>= 7346begin state:=skip_blanks; cur_chr:=" "; 7347end 7348 7349@ @<Finish line, |goto switch|@>= 7350begin loc:=limit+1; goto switch; 7351end 7352 7353@ @<Finish line, emit a \.{\\par}@>= 7354begin loc:=limit+1; cur_cs:=par_loc; cur_cmd:=eq_type(cur_cs); 7355cur_chr:=equiv(cur_cs); 7356if cur_cmd>=outer_call then check_outer_validity; 7357end 7358 7359@ Notice that a code like \.{\^\^8} becomes \.x if not followed by a hex digit. 7360 7361@d is_hex(#)==(((#>="0")and(#<="9"))or((#>="a")and(#<="f"))) 7362@d hex_to_cur_chr== 7363 if c<="9" then cur_chr:=c-"0" @+else cur_chr:=c-"a"+10; 7364 if cc<="9" then cur_chr:=16*cur_chr+cc-"0" 7365 else cur_chr:=16*cur_chr+cc-"a"+10 7366 7367@<If this |sup_mark| starts an expanded character...@>= 7368begin if cur_chr=buffer[loc] then if loc<limit then 7369 begin c:=buffer[loc+1]; @+if c<@'200 then {yes we have an expanded char} 7370 begin loc:=loc+2; 7371 if is_hex(c) then if loc<=limit then 7372 begin cc:=buffer[loc]; @+if is_hex(cc) then 7373 begin incr(loc); hex_to_cur_chr; goto reswitch; 7374 end; 7375 end; 7376 if c<@'100 then cur_chr:=c+@'100 @+else cur_chr:=c-@'100; 7377 goto reswitch; 7378 end; 7379 end; 7380state:=mid_line; 7381end 7382 7383@ @<Process an active-character...@>= 7384begin cur_cs:=cur_chr+active_base; 7385cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); state:=mid_line; 7386if cur_cmd>=outer_call then check_outer_validity; 7387end 7388 7389@ Control sequence names are scanned only when they appear in some line of 7390a file; once they have been scanned the first time, their |eqtb| location 7391serves as a unique identification, so \TeX\ doesn't need to refer to the 7392original name any more except when it prints the equivalent in symbolic form. 7393 7394The program that scans a control sequence has been written carefully 7395in order to avoid the blowups that might otherwise occur if a malicious 7396user tried something like `\.{\\catcode\'15=0}'. The algorithm might 7397look at |buffer[limit+1]|, but it never looks at |buffer[limit+2]|. 7398 7399If expanded characters like `\.{\^\^A}' or `\.{\^\^df}' 7400appear in or just following 7401a control sequence name, they are converted to single characters in the 7402buffer and the process is repeated, slowly but surely. 7403 7404@<Scan a control...@>= 7405begin if loc>limit then cur_cs:=null_cs {|state| is irrelevant in this case} 7406else begin start_cs: k:=loc; cur_chr:=buffer[k]; cat:=cat_code(cur_chr); 7407 incr(k); 7408 if cat=letter then state:=skip_blanks 7409 else if cat=spacer then state:=skip_blanks 7410 else state:=mid_line; 7411 if (cat=letter)and(k<=limit) then 7412 @<Scan ahead in the buffer until finding a nonletter; 7413 if an expanded code is encountered, reduce it 7414 and |goto start_cs|; otherwise if a multiletter control 7415 sequence is found, adjust |cur_cs| and |loc|, and 7416 |goto found|@> 7417 else @<If an expanded code is present, reduce it and |goto start_cs|@>; 7418 cur_cs:=single_base+buffer[loc]; incr(loc); 7419 end; 7420found: cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7421if cur_cmd>=outer_call then check_outer_validity; 7422end 7423 7424@ Whenever we reach the following piece of code, we will have 7425|cur_chr=buffer[k-1]| and |k<=limit+1| and |cat=cat_code(cur_chr)|. If an 7426expanded code like \.{\^\^A} or \.{\^\^df} appears in |buffer[(k-1)..(k+1)]| 7427or |buffer[(k-1)..(k+2)]|, we 7428will store the corresponding code in |buffer[k-1]| and shift the rest of 7429the buffer left two or three places. 7430 7431@<If an expanded...@>= 7432begin if buffer[k]=cur_chr then @+if cat=sup_mark then @+if k<limit then 7433 begin c:=buffer[k+1]; @+if c<@'200 then {yes, one is indeed present} 7434 begin d:=2; 7435 if is_hex(c) then @+if k+2<=limit then 7436 begin cc:=buffer[k+2]; @+if is_hex(cc) then incr(d); 7437 end; 7438 if d>2 then 7439 begin hex_to_cur_chr; buffer[k-1]:=cur_chr; 7440 end 7441 else if c<@'100 then buffer[k-1]:=c+@'100 7442 else buffer[k-1]:=c-@'100; 7443 limit:=limit-d; first:=first-d; 7444 while k<=limit do 7445 begin buffer[k]:=buffer[k+d]; incr(k); 7446 end; 7447 goto start_cs; 7448 end; 7449 end; 7450end 7451 7452@ @<Scan ahead in the buffer...@>= 7453begin repeat cur_chr:=buffer[k]; cat:=cat_code(cur_chr); incr(k); 7454until (cat<>letter)or(k>limit); 7455@<If an expanded...@>; 7456if cat<>letter then decr(k); 7457 {now |k| points to first nonletter} 7458if k>loc+1 then {multiletter control sequence has been scanned} 7459 begin cur_cs:=id_lookup(loc,k-loc); loc:=k; goto found; 7460 end; 7461end 7462 7463@ Let's consider now what happens when |get_next| is looking at a token list. 7464 7465@<Input from token list, |goto restart| if end of list or 7466 if a parameter needs to be expanded@>= 7467if loc<>null then {list not exhausted} 7468@^inner loop@> 7469 begin t:=info(loc); loc:=link(loc); {move to next} 7470 if t>=cs_token_flag then {a control sequence token} 7471 begin cur_cs:=t-cs_token_flag; 7472 cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7473 if cur_cmd>=outer_call then 7474 if cur_cmd=dont_expand then 7475 @<Get the next token, suppressing expansion@> 7476 else check_outer_validity; 7477 end 7478 else begin cur_cmd:=t div @'400; cur_chr:=t mod @'400; 7479 case cur_cmd of 7480 left_brace: incr(align_state); 7481 right_brace: decr(align_state); 7482 out_param: @<Insert macro parameter and |goto restart|@>; 7483 othercases do_nothing 7484 endcases; 7485 end; 7486 end 7487else begin {we are done with this token list} 7488 end_token_list; goto restart; {resume previous level} 7489 end 7490 7491@ The present point in the program is reached only when the |expand| 7492routine has inserted a special marker into the input. In this special 7493case, |info(loc)| is known to be a control sequence token, and |link(loc)=null|. 7494 7495@d no_expand_flag=257 {this characterizes a special variant of |relax|} 7496 7497@<Get the next token, suppressing expansion@>= 7498begin cur_cs:=info(loc)-cs_token_flag; loc:=null;@/ 7499cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 7500if cur_cmd>max_command then 7501 begin cur_cmd:=relax; cur_chr:=no_expand_flag; 7502 end; 7503end 7504 7505@ @<Insert macro parameter...@>= 7506begin begin_token_list(param_stack[param_start+cur_chr-1],parameter); 7507goto restart; 7508end 7509 7510@ All of the easy branches of |get_next| have now been taken care of. 7511There is one more branch. 7512 7513@d end_line_char_inactive == (end_line_char<0)or(end_line_char>255) 7514 7515@<Move to next line of file, or |goto restart|...@>= 7516if name>17 then @<Read next line of file into |buffer|, or 7517 |goto restart| if the file has ended@> 7518else begin if not terminal_input then {\.{\\read} line has ended} 7519 begin cur_cmd:=0; cur_chr:=0; return; 7520 end; 7521 if input_ptr>0 then {text was inserted during error recovery} 7522 begin end_file_reading; goto restart; {resume previous level} 7523 end; 7524 if selector<log_only then open_log_file; 7525 if interaction>nonstop_mode then 7526 begin if end_line_char_inactive then incr(limit); 7527 if limit=start then {previous line was empty} 7528 print_nl("(Please type a command or say `\end')"); 7529@.Please type...@> 7530 print_ln; first:=start; 7531 prompt_input("*"); {input on-line into |buffer|} 7532@.*\relax@> 7533 limit:=last; 7534 if end_line_char_inactive then decr(limit) 7535 else buffer[limit]:=end_line_char; 7536 first:=limit+1; 7537 loc:=start; 7538 end 7539 else fatal_error("*** (job aborted, no legal \end found)"); 7540@.job aborted@> 7541 {nonstop mode, which is intended for overnight batch processing, 7542 never waits for on-line input} 7543 end 7544 7545@ The global variable |force_eof| is normally |false|; it is set |true| 7546by an \.{\\endinput} command. 7547 7548@<Glob...@>= 7549@!force_eof:boolean; {should the next \.{\\input} be aborted early?} 7550 7551@ @<Read next line of file into |buffer|, or 7552 |goto restart| if the file has ended@>= 7553begin incr(line); first:=start; 7554if not force_eof then 7555 begin if input_ln(cur_file,true) then {not end of file} 7556 firm_up_the_line {this sets |limit|} 7557 else force_eof:=true; 7558 end; 7559if force_eof then 7560 begin print_char(")"); decr(open_parens); 7561 update_terminal; {show user that file has been read} 7562 force_eof:=false; 7563 end_file_reading; {resume previous level} 7564 check_outer_validity; goto restart; 7565 end; 7566if end_line_char_inactive then decr(limit) 7567else buffer[limit]:=end_line_char; 7568first:=limit+1; loc:=start; {ready to read} 7569end 7570 7571@ If the user has set the |pausing| parameter to some positive value, 7572and if nonstop mode has not been selected, each line of input is displayed 7573on the terminal and the transcript file, followed by `\.{=>}'. 7574\TeX\ waits for a response. If the response is simply |carriage_return|, the 7575line is accepted as it stands, otherwise the line typed is 7576used instead of the line in the file. 7577 7578@p procedure firm_up_the_line; 7579var k:0..buf_size; {an index into |buffer|} 7580begin limit:=last; 7581if pausing>0 then if interaction>nonstop_mode then 7582 begin wake_up_terminal; print_ln; 7583 if start<limit then for k:=start to limit-1 do print(buffer[k]); 7584 first:=limit; prompt_input("=>"); {wait for user response} 7585@.=>@> 7586 if last>first then 7587 begin for k:=first to last-1 do {move line down in buffer} 7588 buffer[k+start-first]:=buffer[k]; 7589 limit:=start+last-first; 7590 end; 7591 end; 7592end; 7593 7594@ Since |get_next| is used so frequently in \TeX, it is convenient 7595to define three related procedures that do a little more: 7596 7597\yskip\hang|get_token| not only sets |cur_cmd| and |cur_chr|, it 7598also sets |cur_tok|, a packed halfword version of the current token. 7599 7600\yskip\hang|get_x_token|, meaning ``get an expanded token,'' is like 7601|get_token|, but if the current token turns out to be a user-defined 7602control sequence (i.e., a macro call), or a conditional, 7603or something like \.{\\topmark} or \.{\\expandafter} or \.{\\csname}, 7604it is eliminated from the input by beginning the expansion of the macro 7605or the evaluation of the conditional. 7606 7607\yskip\hang|x_token| is like |get_x_token| except that it assumes that 7608|get_next| has already been called. 7609 7610\yskip\noindent 7611In fact, these three procedures account for almost every use of |get_next|. 7612 7613@ No new control sequences will be defined except during a call of 7614|get_token|, or when \.{\\csname} compresses a token list, because 7615|no_new_control_sequence| is always |true| at other times. 7616 7617@p procedure get_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|} 7618begin no_new_control_sequence:=false; get_next; no_new_control_sequence:=true; 7619@^inner loop@> 7620if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7621else cur_tok:=cs_token_flag+cur_cs; 7622end; 7623 7624@* \[25] Expanding the next token. 7625Only a dozen or so command codes |>max_command| can possibly be returned by 7626|get_next|; in increasing order, they are |undefined_cs|, |expand_after|, 7627|no_expand|, |input|, |if_test|, |fi_or_else|, |cs_name|, |convert|, |the|, 7628|top_bot_mark|, |call|, |long_call|, |outer_call|, |long_outer_call|, and 7629|end_template|.{\emergencystretch=40pt\par} 7630 7631The |expand| subroutine is used when |cur_cmd>max_command|. It removes a 7632``call'' or a conditional or one of the other special operations just 7633listed. It follows that |expand| might invoke itself recursively. In all 7634cases, |expand| destroys the current token, but it sets things up so that 7635the next |get_next| will deliver the appropriate next token. The value of 7636|cur_tok| need not be known when |expand| is called. 7637 7638Since several of the basic scanning routines communicate via global variables, 7639their values are saved as local variables of |expand| so that 7640recursive calls don't invalidate them. 7641@^recursion@> 7642 7643@p@t\4@>@<Declare the procedure called |macro_call|@>@;@/ 7644@t\4@>@<Declare the procedure called |insert_relax|@>@;@/ 7645procedure@?pass_text; forward;@t\2@> 7646procedure@?start_input; forward;@t\2@> 7647procedure@?conditional; forward;@t\2@> 7648procedure@?get_x_token; forward;@t\2@> 7649procedure@?conv_toks; forward;@t\2@> 7650procedure@?ins_the_toks; forward;@t\2@> 7651procedure expand; 7652var t:halfword; {token that is being ``expanded after''} 7653@!p,@!q,@!r:pointer; {for list manipulation} 7654@!j:0..buf_size; {index into |buffer|} 7655@!cv_backup:integer; {to save the global quantity |cur_val|} 7656@!cvl_backup,@!radix_backup,@!co_backup:small_number; 7657 {to save |cur_val_level|, etc.} 7658@!backup_backup:pointer; {to save |link(backup_head)|} 7659@!save_scanner_status:small_number; {temporary storage of |scanner_status|} 7660begin cv_backup:=cur_val; cvl_backup:=cur_val_level; radix_backup:=radix; 7661co_backup:=cur_order; backup_backup:=link(backup_head); 7662if cur_cmd<call then @<Expand a nonmacro@> 7663else if cur_cmd<end_template then macro_call 7664else @<Insert a token containing |frozen_endv|@>; 7665cur_val:=cv_backup; cur_val_level:=cvl_backup; radix:=radix_backup; 7666cur_order:=co_backup; link(backup_head):=backup_backup; 7667end; 7668 7669@ @<Expand a nonmacro@>= 7670begin if tracing_commands>1 then show_cur_cmd_chr; 7671case cur_cmd of 7672top_bot_mark:@<Insert the \(a)appropriate mark text into the scanner@>; 7673expand_after:@<Expand the token after the next token@>; 7674no_expand:@<Suppress expansion of the next token@>; 7675cs_name:@<Manufacture a control sequence name@>; 7676convert:conv_toks; {this procedure is discussed in Part 27 below} 7677the:ins_the_toks; {this procedure is discussed in Part 27 below} 7678if_test:conditional; {this procedure is discussed in Part 28 below} 7679fi_or_else:@<Terminate the current conditional and skip to \.{\\fi}@>; 7680input:@<Initiate or terminate input from a file@>; 7681othercases @<Complain about an undefined macro@> 7682endcases; 7683end 7684 7685@ It takes only a little shuffling to do what \TeX\ calls \.{\\expandafter}. 7686 7687@<Expand the token after...@>= 7688begin get_token; t:=cur_tok; get_token; 7689if cur_cmd>max_command then expand@+else back_input; 7690cur_tok:=t; back_input; 7691end 7692 7693@ The implementation of \.{\\noexpand} is a bit trickier, because it is 7694necessary to insert a special `|dont_expand|' marker into \TeX's reading 7695mechanism. This special marker is processed by |get_next|, but it does 7696not slow down the inner loop. 7697 7698Since \.{\\outer} macros might arise here, we must also 7699clear the |scanner_status| temporarily. 7700 7701@<Suppress expansion...@>= 7702begin save_scanner_status:=scanner_status; scanner_status:=normal; 7703get_token; scanner_status:=save_scanner_status; t:=cur_tok; 7704back_input; {now |start| and |loc| point to the backed-up token |t|} 7705if t>=cs_token_flag then 7706 begin p:=get_avail; info(p):=cs_token_flag+frozen_dont_expand; 7707 link(p):=loc; start:=p; loc:=p; 7708 end; 7709end 7710 7711@ @<Complain about an undefined macro@>= 7712begin print_err("Undefined control sequence"); 7713@.Undefined control sequence@> 7714help5("The control sequence at the end of the top line")@/ 7715("of your error message was never \def'ed. If you have")@/ 7716("misspelled it (e.g., `\hobx'), type `I' and the correct")@/ 7717("spelling (e.g., `I\hbox'). Otherwise just continue,")@/ 7718("and I'll forget about whatever was undefined."); 7719error; 7720end 7721 7722@ The |expand| procedure and some other routines that construct token 7723lists find it convenient to use the following macros, which are valid only if 7724the variables |p| and |q| are reserved for token-list building. 7725 7726@d store_new_token(#)==begin q:=get_avail; link(p):=q; info(q):=#; 7727 p:=q; {|link(p)| is |null|} 7728 end 7729@d fast_store_new_token(#)==begin fast_get_avail(q); link(p):=q; info(q):=#; 7730 p:=q; {|link(p)| is |null|} 7731 end 7732 7733@ @<Manufacture a control...@>= 7734begin r:=get_avail; p:=r; {head of the list of characters} 7735repeat get_x_token; 7736if cur_cs=0 then store_new_token(cur_tok); 7737until cur_cs<>0; 7738if cur_cmd<>end_cs_name then @<Complain about missing \.{\\endcsname}@>; 7739@<Look up the characters of list |r| in the hash table, and set |cur_cs|@>; 7740flush_list(r); 7741if eq_type(cur_cs)=undefined_cs then 7742 begin eq_define(cur_cs,relax,256); {N.B.: The |save_stack| might change} 7743 end; {the control sequence will now match `\.{\\relax}'} 7744cur_tok:=cur_cs+cs_token_flag; back_input; 7745end 7746 7747@ @<Complain about missing \.{\\endcsname}@>= 7748begin print_err("Missing "); print_esc("endcsname"); print(" inserted"); 7749@.Missing \\endcsname...@> 7750help2("The control sequence marked <to be read again> should")@/ 7751 ("not appear between \csname and \endcsname."); 7752back_error; 7753end 7754 7755@ @<Look up the characters of list |r| in the hash table...@>= 7756j:=first; p:=link(r); 7757while p<>null do 7758 begin if j>=max_buf_stack then 7759 begin max_buf_stack:=j+1; 7760 if max_buf_stack=buf_size then 7761 overflow("buffer size",buf_size); 7762@:TeX capacity exceeded buffer size}{\quad buffer size@> 7763 end; 7764 buffer[j]:=info(p) mod @'400; incr(j); p:=link(p); 7765 end; 7766if j>first+1 then 7767 begin no_new_control_sequence:=false; cur_cs:=id_lookup(first,j-first); 7768 no_new_control_sequence:=true; 7769 end 7770else if j=first then cur_cs:=null_cs {the list is empty} 7771else cur_cs:=single_base+buffer[first] {the list has length one} 7772 7773@ An |end_template| command is effectively changed to an |endv| command 7774by the following code. (The reason for this is discussed below; the 7775|frozen_end_template| at the end of the template has passed the 7776|check_outer_validity| test, so its mission of error detection has been 7777accomplished.) 7778 7779@<Insert a token containing |frozen_endv|@>= 7780begin cur_tok:=cs_token_flag+frozen_endv; back_input; 7781end 7782 7783@ The processing of \.{\\input} involves the |start_input| subroutine, 7784which will be declared later; the processing of \.{\\endinput} is trivial. 7785 7786@<Put each...@>= 7787primitive("input",input,0);@/ 7788@!@:input_}{\.{\\input} primitive@> 7789primitive("endinput",input,1);@/ 7790@!@:end_input_}{\.{\\endinput} primitive@> 7791 7792@ @<Cases of |print_cmd_chr|...@>= 7793input: if chr_code=0 then print_esc("input")@+else print_esc("endinput"); 7794 7795@ @<Initiate or terminate input...@>= 7796if cur_chr>0 then force_eof:=true 7797else if name_in_progress then insert_relax 7798else start_input 7799 7800@ Sometimes the expansion looks too far ahead, so we want to insert 7801a harmless \.{\\relax} into the user's input. 7802 7803@<Declare the procedure called |insert_relax|@>= 7804procedure insert_relax; 7805begin cur_tok:=cs_token_flag+cur_cs; back_input; 7806cur_tok:=cs_token_flag+frozen_relax; back_input; token_type:=inserted; 7807end; 7808 7809@ Here is a recursive procedure that is \TeX's usual way to get the 7810next token of input. It has been slightly optimized to take account of 7811common cases. 7812 7813@p procedure get_x_token; {sets |cur_cmd|, |cur_chr|, |cur_tok|, 7814 and expands macros} 7815label restart,done; 7816begin restart: get_next; 7817@^inner loop@> 7818if cur_cmd<=max_command then goto done; 7819if cur_cmd>=call then 7820 if cur_cmd<end_template then macro_call 7821 else begin cur_cs:=frozen_endv; cur_cmd:=endv; 7822 goto done; {|cur_chr=null_list|} 7823 end 7824else expand; 7825goto restart; 7826done: if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7827else cur_tok:=cs_token_flag+cur_cs; 7828end; 7829 7830@ The |get_x_token| procedure is equivalent to two consecutive 7831procedure calls: |get_next; x_token|. 7832 7833@p procedure x_token; {|get_x_token| without the initial |get_next|} 7834begin while cur_cmd>max_command do 7835 begin expand; 7836 get_next; 7837 end; 7838if cur_cs=0 then cur_tok:=(cur_cmd*@'400)+cur_chr 7839else cur_tok:=cs_token_flag+cur_cs; 7840end; 7841 7842@ A control sequence that has been \.{\\def}'ed by the user is expanded by 7843\TeX's |macro_call| procedure. 7844 7845Before we get into the details of |macro_call|, however, let's consider the 7846treatment of primitives like \.{\\topmark}, since they are essentially 7847macros without parameters. The token lists for such marks are kept in a 7848global array of five pointers; we refer to the individual entries of this 7849array by symbolic names |top_mark|, etc. The value of |top_mark| is either 7850|null| or a pointer to the reference count of a token list. 7851 7852@d top_mark_code=0 {the mark in effect at the previous page break} 7853@d first_mark_code=1 {the first mark between |top_mark| and |bot_mark|} 7854@d bot_mark_code=2 {the mark in effect at the current page break} 7855@d split_first_mark_code=3 {the first mark found by \.{\\vsplit}} 7856@d split_bot_mark_code=4 {the last mark found by \.{\\vsplit}} 7857@d top_mark==cur_mark[top_mark_code] 7858@d first_mark==cur_mark[first_mark_code] 7859@d bot_mark==cur_mark[bot_mark_code] 7860@d split_first_mark==cur_mark[split_first_mark_code] 7861@d split_bot_mark==cur_mark[split_bot_mark_code] 7862 7863@<Glob...@>= 7864@!cur_mark:array[top_mark_code..split_bot_mark_code] of pointer; 7865 {token lists for marks} 7866 7867@ @<Set init...@>= 7868top_mark:=null; first_mark:=null; bot_mark:=null; 7869split_first_mark:=null; split_bot_mark:=null; 7870 7871@ @<Put each...@>= 7872primitive("topmark",top_bot_mark,top_mark_code); 7873@!@:top_mark_}{\.{\\topmark} primitive@> 7874primitive("firstmark",top_bot_mark,first_mark_code); 7875@!@:first_mark_}{\.{\\firstmark} primitive@> 7876primitive("botmark",top_bot_mark,bot_mark_code); 7877@!@:bot_mark_}{\.{\\botmark} primitive@> 7878primitive("splitfirstmark",top_bot_mark,split_first_mark_code); 7879@!@:split_first_mark_}{\.{\\splitfirstmark} primitive@> 7880primitive("splitbotmark",top_bot_mark,split_bot_mark_code); 7881@!@:split_bot_mark_}{\.{\\splitbotmark} primitive@> 7882 7883@ @<Cases of |print_cmd_chr|...@>= 7884top_bot_mark: case chr_code of 7885 first_mark_code: print_esc("firstmark"); 7886 bot_mark_code: print_esc("botmark"); 7887 split_first_mark_code: print_esc("splitfirstmark"); 7888 split_bot_mark_code: print_esc("splitbotmark"); 7889 othercases print_esc("topmark") 7890 endcases; 7891 7892@ The following code is activated when |cur_cmd=top_bot_mark| and 7893when |cur_chr| is a code like |top_mark_code|. 7894 7895@<Insert the \(a)appropriate mark text into the scanner@>= 7896begin if cur_mark[cur_chr]<>null then 7897 begin_token_list(cur_mark[cur_chr],mark_text); 7898end 7899 7900@ Now let's consider |macro_call| itself, which is invoked when \TeX\ is 7901scanning a control sequence whose |cur_cmd| is either |call|, |long_call|, 7902|outer_call|, or |long_outer_call|. The control sequence definition 7903appears in the token list whose reference count is in location |cur_chr| 7904of |mem|. 7905 7906The global variable |long_state| will be set to |call| or to |long_call|, 7907depending on whether or not the control sequence disallows \.{\\par} 7908in its parameters. The |get_next| routine will set |long_state| to 7909|outer_call| and emit \.{\\par}, if a file ends or if an \.{\\outer} 7910control sequence occurs in the midst of an argument. 7911 7912@<Glob...@>= 7913@!long_state:call..long_outer_call; {governs the acceptance of \.{\\par}} 7914 7915@ The parameters, if any, must be scanned before the macro is expanded. 7916Parameters are token lists without reference counts. They are placed on 7917an auxiliary stack called |pstack| while they are being scanned, since 7918the |param_stack| may be losing entries during the matching process. 7919(Note that |param_stack| can't be gaining entries, since |macro_call| is 7920the only routine that puts anything onto |param_stack|, and it 7921is not recursive.) 7922 7923@<Glob...@>= 7924@!pstack:array[0..8] of pointer; {arguments supplied to a macro} 7925 7926@ After parameter scanning is complete, the parameters are moved to the 7927|param_stack|. Then the macro body is fed to the scanner; in other words, 7928|macro_call| places the defined text of the control sequence at the 7929top of\/ \TeX's input stack, so that |get_next| will proceed to read it 7930next. 7931 7932The global variable |cur_cs| contains the |eqtb| address of the control sequence 7933being expanded, when |macro_call| begins. If this control sequence has not been 7934declared \.{\\long}, i.e., if its command code in the |eq_type| field is 7935not |long_call| or |long_outer_call|, its parameters are not allowed to contain 7936the control sequence \.{\\par}. If an illegal \.{\\par} appears, the macro 7937call is aborted, and the \.{\\par} will be rescanned. 7938 7939@<Declare the procedure called |macro_call|@>= 7940procedure macro_call; {invokes a user-defined control sequence} 7941label exit, continue, done, done1, found; 7942var r:pointer; {current node in the macro's token list} 7943@!p:pointer; {current node in parameter token list being built} 7944@!q:pointer; {new node being put into the token list} 7945@!s:pointer; {backup pointer for parameter matching} 7946@!t:pointer; {cycle pointer for backup recovery} 7947@!u,@!v:pointer; {auxiliary pointers for backup recovery} 7948@!rbrace_ptr:pointer; {one step before the last |right_brace| token} 7949@!n:small_number; {the number of parameters scanned} 7950@!unbalance:halfword; {unmatched left braces in current parameter} 7951@!m:halfword; {the number of tokens or groups (usually)} 7952@!ref_count:pointer; {start of the token list} 7953@!save_scanner_status:small_number; {|scanner_status| upon entry} 7954@!save_warning_index:pointer; {|warning_index| upon entry} 7955@!match_chr:ASCII_code; {character used in parameter} 7956begin save_scanner_status:=scanner_status; save_warning_index:=warning_index; 7957warning_index:=cur_cs; ref_count:=cur_chr; r:=link(ref_count); n:=0; 7958if tracing_macros>0 then @<Show the text of the macro being expanded@>; 7959if info(r)<>end_match_token then 7960 @<Scan the parameters and make |link(r)| point to the macro body; but 7961 |return| if an illegal \.{\\par} is detected@>; 7962@<Feed the macro body and its parameters to the scanner@>; 7963exit:scanner_status:=save_scanner_status; warning_index:=save_warning_index; 7964end; 7965 7966@ Before we put a new token list on the input stack, it is wise to clean off 7967all token lists that have recently been depleted. Then a user macro that ends 7968with a call to itself will not require unbounded stack space. 7969 7970@<Feed the macro body and its parameters to the scanner@>= 7971while (state=token_list)and(loc=null)and(token_type<>v_template) do 7972 end_token_list; {conserve stack space} 7973begin_token_list(ref_count,macro); name:=warning_index; loc:=link(r); 7974if n>0 then 7975 begin if param_ptr+n>max_param_stack then 7976 begin max_param_stack:=param_ptr+n; 7977 if max_param_stack>param_size then 7978 overflow("parameter stack size",param_size); 7979@:TeX capacity exceeded parameter stack size}{\quad parameter stack size@> 7980 end; 7981 for m:=0 to n-1 do param_stack[param_ptr+m]:=pstack[m]; 7982 param_ptr:=param_ptr+n; 7983 end 7984 7985@ At this point, the reader will find it advisable to review the explanation 7986of token list format that was presented earlier, since many aspects of that 7987format are of importance chiefly in the |macro_call| routine. 7988 7989The token list might begin with a string of compulsory tokens before the 7990first |match| or |end_match|. In that case the macro name is supposed to be 7991followed by those tokens; the following program will set |s=null| to 7992represent this restriction. Otherwise |s| will be set to the first token of 7993a string that will delimit the next parameter. 7994 7995@<Scan the parameters and make |link(r)| point to the macro body...@>= 7996begin scanner_status:=matching; unbalance:=0; 7997long_state:=eq_type(cur_cs); 7998if long_state>=outer_call then long_state:=long_state-2; 7999repeat link(temp_head):=null; 8000if (info(r)>match_token+255)or(info(r)<match_token) then s:=null 8001else begin match_chr:=info(r)-match_token; s:=link(r); r:=s; 8002 p:=temp_head; m:=0; 8003 end; 8004@<Scan a parameter until its delimiter string has been found; or, if |s=null|, 8005 simply scan the delimiter string@>;@/ 8006{now |info(r)| is a token whose command code is either |match| or |end_match|} 8007until info(r)=end_match_token; 8008end 8009 8010@ If |info(r)| is a |match| or |end_match| command, it cannot be equal to 8011any token found by |get_token|. Therefore an undelimited parameter---i.e., 8012a |match| that is immediately followed by |match| or |end_match|---will 8013always fail the test `|cur_tok=info(r)|' in the following algorithm. 8014 8015@<Scan a parameter until its delimiter string has been found; or, ...@>= 8016continue: get_token; {set |cur_tok| to the next token of input} 8017if cur_tok=info(r) then 8018 @<Advance \(r)|r|; |goto found| if the parameter delimiter has been 8019 fully matched, otherwise |goto continue|@>; 8020@<Contribute the recently matched tokens to the current parameter, and 8021 |goto continue| if a partial match is still in effect; 8022 but abort if |s=null|@>; 8023if cur_tok=par_token then if long_state<>long_call then 8024 @<Report a runaway argument and abort@>; 8025if cur_tok<right_brace_limit then 8026 if cur_tok<left_brace_limit then 8027 @<Contribute an entire group to the current parameter@> 8028 else @<Report an extra right brace and |goto continue|@> 8029else @<Store the current token, but |goto continue| if it is 8030 a blank space that would become an undelimited parameter@>; 8031incr(m); 8032if info(r)>end_match_token then goto continue; 8033if info(r)<match_token then goto continue; 8034found: if s<>null then @<Tidy up the parameter just scanned, and tuck it away@> 8035 8036@ @<Store the current token, but |goto continue| if it is...@>= 8037begin if cur_tok=space_token then 8038 if info(r)<=end_match_token then 8039 if info(r)>=match_token then goto continue; 8040store_new_token(cur_tok); 8041end 8042 8043@ A slightly subtle point arises here: When the parameter delimiter ends 8044with `\.{\#\{}', the token list will have a left brace both before and 8045after the |end_match|\kern-.4pt. Only one of these should affect the 8046|align_state|, but both will be scanned, so we must make a correction. 8047 8048@<Advance \(r)|r|; |goto found| if the parameter delimiter has been fully...@>= 8049begin r:=link(r); 8050if (info(r)>=match_token)and(info(r)<=end_match_token) then 8051 begin if cur_tok<left_brace_limit then decr(align_state); 8052 goto found; 8053 end 8054else goto continue; 8055end 8056 8057@ @<Report an extra right brace and |goto continue|@>= 8058begin back_input; print_err("Argument of "); sprint_cs(warning_index); 8059@.Argument of \\x has...@> 8060print(" has an extra }"); 8061help6("I've run across a `}' that doesn't seem to match anything.")@/ 8062 ("For example, `\def\a#1{...}' and `\a}' would produce")@/ 8063 ("this error. If you simply proceed now, the `\par' that")@/ 8064 ("I've just inserted will cause me to report a runaway")@/ 8065 ("argument that might be the root of the problem. But if")@/ 8066 ("your `}' was spurious, just type `2' and it will go away."); 8067incr(align_state); long_state:=call; cur_tok:=par_token; ins_error; 8068goto continue; 8069end {a white lie; the \.{\\par} won't always trigger a runaway} 8070 8071@ If |long_state=outer_call|, a runaway argument has already been reported. 8072 8073@<Report a runaway argument and abort@>= 8074begin if long_state=call then 8075 begin runaway; print_err("Paragraph ended before "); 8076@.Paragraph ended before...@> 8077 sprint_cs(warning_index); print(" was complete"); 8078 help3("I suspect you've forgotten a `}', causing me to apply this")@/ 8079 ("control sequence to too much text. How can we recover?")@/ 8080 ("My plan is to forget the whole thing and hope for the best."); 8081 back_error; 8082 end; 8083pstack[n]:=link(temp_head); align_state:=align_state-unbalance; 8084for m:=0 to n do flush_list(pstack[m]); 8085return; 8086end 8087 8088@ When the following code becomes active, we have matched tokens from |s| to 8089the predecessor of |r|, and we have found that |cur_tok<>info(r)|. An 8090interesting situation now presents itself: If the parameter is to be 8091delimited by a string such as `\.{ab}', and if we have scanned `\.{aa}', 8092we want to contribute one `\.a' to the current parameter and resume 8093looking for a `\.b'. The program must account for such partial matches and 8094for others that can be quite complex. But most of the time we have |s=r| 8095and nothing needs to be done. 8096 8097Incidentally, it is possible for \.{\\par} tokens to sneak in to certain 8098parameters of non-\.{\\long} macros. For example, consider a case like 8099`\.{\\def\\a\#1\\par!\{...\}}' where the first \.{\\par} is not followed 8100by an exclamation point. In such situations it does not seem appropriate 8101to prohibit the \.{\\par}, so \TeX\ keeps quiet about this bending of 8102the rules. 8103 8104@<Contribute the recently matched tokens to the current parameter...@>= 8105if s<>r then 8106 if s=null then @<Report an improper use of the macro and abort@> 8107 else begin t:=s; 8108 repeat store_new_token(info(t)); incr(m); u:=link(t); v:=s; 8109 loop@+ begin if u=r then 8110 if cur_tok<>info(v) then goto done 8111 else begin r:=link(v); goto continue; 8112 end; 8113 if info(u)<>info(v) then goto done; 8114 u:=link(u); v:=link(v); 8115 end; 8116 done: t:=link(t); 8117 until t=r; 8118 r:=s; {at this point, no tokens are recently matched} 8119 end 8120 8121@ @<Report an improper use...@>= 8122begin print_err("Use of "); sprint_cs(warning_index); 8123@.Use of x doesn't match...@> 8124print(" doesn't match its definition"); 8125help4("If you say, e.g., `\def\a1{...}', then you must always")@/ 8126 ("put `1' after `\a', since control sequence names are")@/ 8127 ("made up of letters only. The macro here has not been")@/ 8128 ("followed by the required stuff, so I'm ignoring it."); 8129error; return; 8130end 8131 8132@ @<Contribute an entire group to the current parameter@>= 8133begin unbalance:=1; 8134@^inner loop@> 8135loop@+ begin fast_store_new_token(cur_tok); get_token; 8136 if cur_tok=par_token then if long_state<>long_call then 8137 @<Report a runaway argument and abort@>; 8138 if cur_tok<right_brace_limit then 8139 if cur_tok<left_brace_limit then incr(unbalance) 8140 else begin decr(unbalance); 8141 if unbalance=0 then goto done1; 8142 end; 8143 end; 8144done1: rbrace_ptr:=p; store_new_token(cur_tok); 8145end 8146 8147@ If the parameter consists of a single group enclosed in braces, we must 8148strip off the enclosing braces. That's why |rbrace_ptr| was introduced. 8149 8150@<Tidy up the parameter just scanned, and tuck it away@>= 8151begin if (m=1)and(info(p)<right_brace_limit)and(p<>temp_head) then 8152 begin link(rbrace_ptr):=null; free_avail(p); 8153 p:=link(temp_head); pstack[n]:=link(p); free_avail(p); 8154 end 8155else pstack[n]:=link(temp_head); 8156incr(n); 8157if tracing_macros>0 then 8158 begin begin_diagnostic; print_nl(match_chr); print_int(n); 8159 print("<-"); show_token_list(pstack[n-1],null,1000); 8160 end_diagnostic(false); 8161 end; 8162end 8163 8164@ @<Show the text of the macro being expanded@>= 8165begin begin_diagnostic; print_ln; print_cs(warning_index); 8166token_show(ref_count); end_diagnostic(false); 8167end 8168 8169@* \[26] Basic scanning subroutines. 8170Let's turn now to some procedures that \TeX\ calls upon frequently to digest 8171certain kinds of patterns in the input. Most of these are quite simple; 8172some are quite elaborate. Almost all of the routines call |get_x_token|, 8173which can cause them to be invoked recursively. 8174@^stomach@> 8175@^recursion@> 8176 8177@ The |scan_left_brace| routine is called when a left brace is supposed to be 8178the next non-blank token. (The term ``left brace'' means, more precisely, 8179a character whose catcode is |left_brace|.) \TeX\ allows \.{\\relax} to 8180appear before the |left_brace|. 8181 8182@p procedure scan_left_brace; {reads a mandatory |left_brace|} 8183begin @<Get the next non-blank non-relax non-call token@>; 8184if cur_cmd<>left_brace then 8185 begin print_err("Missing { inserted"); 8186@.Missing \{ inserted@> 8187 help4("A left brace was mandatory here, so I've put one in.")@/ 8188 ("You might want to delete and/or insert some corrections")@/ 8189 ("so that I will find a matching right brace soon.")@/ 8190 ("(If you're confused by all this, try typing `I}' now.)"); 8191 back_error; cur_tok:=left_brace_token+"{"; cur_cmd:=left_brace; 8192 cur_chr:="{"; incr(align_state); 8193 end; 8194end; 8195 8196@ @<Get the next non-blank non-relax non-call token@>= 8197repeat get_x_token; 8198until (cur_cmd<>spacer)and(cur_cmd<>relax) 8199 8200@ The |scan_optional_equals| routine looks for an optional `\.=' sign preceded 8201by optional spaces; `\.{\\relax}' is not ignored here. 8202 8203@p procedure scan_optional_equals; 8204begin @<Get the next non-blank non-call token@>; 8205if cur_tok<>other_token+"=" then back_input; 8206end; 8207 8208@ @<Get the next non-blank non-call token@>= 8209repeat get_x_token; 8210until cur_cmd<>spacer 8211 8212@ In case you are getting bored, here is a slightly less trivial routine: 8213Given a string of lowercase letters, like `\.{pt}' or `\.{plus}' or 8214`\.{width}', the |scan_keyword| routine checks to see whether the next 8215tokens of input match this string. The match must be exact, except that 8216uppercase letters will match their lowercase counterparts; uppercase 8217equivalents are determined by subtracting |"a"-"A"|, rather than using the 8218|uc_code| table, since \TeX\ uses this routine only for its own limited 8219set of keywords. 8220 8221If a match is found, the characters are effectively removed from the input 8222and |true| is returned. Otherwise |false| is returned, and the input 8223is left essentially unchanged (except for the fact that some macros 8224may have been expanded, etc.). 8225@^inner loop@> 8226 8227@p function scan_keyword(@!s:str_number):boolean; {look for a given string} 8228label exit; 8229var p:pointer; {tail of the backup list} 8230@!q:pointer; {new node being added to the token list via |store_new_token|} 8231@!k:pool_pointer; {index into |str_pool|} 8232begin p:=backup_head; link(p):=null; k:=str_start[s]; 8233while k<str_start[s+1] do 8234 begin get_x_token; {recursion is possible here} 8235@^recursion@> 8236 if (cur_cs=0)and@| 8237 ((cur_chr=so(str_pool[k]))or(cur_chr=so(str_pool[k])-"a"+"A")) then 8238 begin store_new_token(cur_tok); incr(k); 8239 end 8240 else if (cur_cmd<>spacer)or(p<>backup_head) then 8241 begin back_input; 8242 if p<>backup_head then back_list(link(backup_head)); 8243 scan_keyword:=false; return; 8244 end; 8245 end; 8246flush_list(link(backup_head)); scan_keyword:=true; 8247exit:end; 8248 8249@ Here is a procedure that sounds an alarm when mu and non-mu units 8250are being switched. 8251 8252@p procedure mu_error; 8253begin print_err("Incompatible glue units"); 8254@.Incompatible glue units@> 8255help1("I'm going to assume that 1mu=1pt when they're mixed."); 8256error; 8257end; 8258 8259@ The next routine `|scan_something_internal|' is used to fetch internal 8260numeric quantities like `\.{\\hsize}', and also to handle the `\.{\\the}' 8261when expanding constructions like `\.{\\the\\toks0}' and 8262`\.{\\the\\baselineskip}'. Soon we will be considering the |scan_int| 8263procedure, which calls |scan_something_internal|; on the other hand, 8264|scan_something_internal| also calls |scan_int|, for constructions like 8265`\.{\\catcode\`\\\$}' or `\.{\\fontdimen} \.3 \.{\\ff}'. So we 8266have to declare |scan_int| as a |forward| procedure. A few other 8267procedures are also declared at this point. 8268 8269@p procedure@?scan_int; forward; {scans an integer value} 8270@t\4\4@>@<Declare procedures that scan restricted classes of integers@>@; 8271@t\4\4@>@<Declare procedures that scan font-related stuff@> 8272 8273@ \TeX\ doesn't know exactly what to expect when |scan_something_internal| 8274begins. For example, an integer or dimension or glue value could occur 8275immediately after `\.{\\hskip}'; and one can even say \.{\\the} with 8276respect to token lists in constructions like 8277`\.{\\xdef\\o\{\\the\\output\}}'. On the other hand, only integers are 8278allowed after a construction like `\.{\\count}'. To handle the various 8279possibilities, |scan_something_internal| has a |level| parameter, which 8280tells the ``highest'' kind of quantity that |scan_something_internal| is 8281allowed to produce. Six levels are distinguished, namely |int_val|, 8282|dimen_val|, |glue_val|, |mu_val|, |ident_val|, and |tok_val|. 8283 8284The output of |scan_something_internal| (and of the other routines 8285|scan_int|, |scan_dimen|, and |scan_glue| below) is put into the global 8286variable |cur_val|, and its level is put into |cur_val_level|. The highest 8287values of |cur_val_level| are special: |mu_val| is used only when 8288|cur_val| points to something in a ``muskip'' register, or to one of the 8289three parameters \.{\\thinmuskip}, \.{\\medmuskip}, \.{\\thickmuskip}; 8290|ident_val| is used only when |cur_val| points to a font identifier; 8291|tok_val| is used only when |cur_val| points to |null| or to the reference 8292count of a token list. The last two cases are allowed only when 8293|scan_something_internal| is called with |level=tok_val|. 8294 8295If the output is glue, |cur_val| will point to a glue specification, and 8296the reference count of that glue will have been updated to reflect this 8297reference; if the output is a nonempty token list, |cur_val| will point to 8298its reference count, but in this case the count will not have been updated. 8299Otherwise |cur_val| will contain the integer or scaled value in question. 8300 8301@d int_val=0 {integer values} 8302@d dimen_val=1 {dimension values} 8303@d glue_val=2 {glue specifications} 8304@d mu_val=3 {math glue specifications} 8305@d ident_val=4 {font identifier} 8306@d tok_val=5 {token lists} 8307 8308@<Glob...@>= 8309@!cur_val:integer; {value returned by numeric scanners} 8310@!cur_val_level:int_val..tok_val; {the ``level'' of this value} 8311 8312@ The hash table is initialized with `\.{\\count}', `\.{\\dimen}', `\.{\\skip}', 8313and `\.{\\muskip}' all having |register| as their command code; they are 8314distinguished by the |chr_code|, which is either |int_val|, |dimen_val|, 8315|glue_val|, or |mu_val|. 8316 8317@<Put each...@>= 8318primitive("count",register,int_val); 8319@!@:count_}{\.{\\count} primitive@> 8320primitive("dimen",register,dimen_val); 8321@!@:dimen_}{\.{\\dimen} primitive@> 8322primitive("skip",register,glue_val); 8323@!@:skip_}{\.{\\skip} primitive@> 8324primitive("muskip",register,mu_val); 8325@!@:mu_skip_}{\.{\\muskip} primitive@> 8326 8327@ @<Cases of |print_cmd_chr|...@>= 8328register: if chr_code=int_val then print_esc("count") 8329 else if chr_code=dimen_val then print_esc("dimen") 8330 else if chr_code=glue_val then print_esc("skip") 8331 else print_esc("muskip"); 8332 8333@ OK, we're ready for |scan_something_internal| itself. A second parameter, 8334|negative|, is set |true| if the value that is found should be negated. 8335It is assumed that |cur_cmd| and |cur_chr| represent the first token of 8336the internal quantity to be scanned; an error will be signalled if 8337|cur_cmd<min_internal| or |cur_cmd>max_internal|. 8338 8339@d scanned_result_end(#)==cur_val_level:=#;@+end 8340@d scanned_result(#)==@+begin cur_val:=#;scanned_result_end 8341 8342@p procedure scan_something_internal(@!level:small_number;@!negative:boolean); 8343 {fetch an internal parameter} 8344var m:halfword; {|chr_code| part of the operand token} 8345@!p:0..nest_size; {index into |nest|} 8346begin m:=cur_chr; 8347case cur_cmd of 8348def_code: @<Fetch a character code from some table@>; 8349toks_register,assign_toks,def_family,set_font,def_font: @<Fetch a token list or 8350 font identifier, provided that |level=tok_val|@>; 8351assign_int: scanned_result(eqtb[m].int)(int_val); 8352assign_dimen: scanned_result(eqtb[m].sc)(dimen_val); 8353assign_glue: scanned_result(equiv(m))(glue_val); 8354assign_mu_glue: scanned_result(equiv(m))(mu_val); 8355set_aux: @<Fetch the |space_factor| or the |prev_depth|@>; 8356set_prev_graf: @<Fetch the |prev_graf|@>; 8357set_page_int:@<Fetch the |dead_cycles| or the |insert_penalties|@>; 8358set_page_dimen: @<Fetch something on the |page_so_far|@>; 8359set_shape: @<Fetch the |par_shape| size@>; 8360set_box_dimen: @<Fetch a box dimension@>; 8361char_given,math_given: scanned_result(cur_chr)(int_val); 8362assign_font_dimen: @<Fetch a font dimension@>; 8363assign_font_int: @<Fetch a font integer@>; 8364register: @<Fetch a register@>; 8365last_item: @<Fetch an item in the current node, if appropriate@>; 8366othercases @<Complain that \.{\\the} can't do this; give zero result@> 8367endcases;@/ 8368while cur_val_level>level do @<Convert \(c)|cur_val| to a lower level@>; 8369@<Fix the reference count, if any, and negate |cur_val| if |negative|@>; 8370end; 8371 8372@ @<Fetch a character code from some table@>= 8373begin scan_char_num; 8374if m=math_code_base then scanned_result(ho(math_code(cur_val)))(int_val) 8375else if m<math_code_base then scanned_result(equiv(m+cur_val))(int_val) 8376else scanned_result(eqtb[m+cur_val].int)(int_val); 8377end 8378 8379@ @<Fetch a token list...@>= 8380if level<>tok_val then 8381 begin print_err("Missing number, treated as zero"); 8382@.Missing number...@> 8383 help3("A number should have been here; I inserted `0'.")@/ 8384 ("(If you can't figure out why I needed to see a number,")@/ 8385 ("look up `weird error' in the index to The TeXbook.)"); 8386@:TeXbook}{\sl The \TeX book@> 8387 back_error; scanned_result(0)(dimen_val); 8388 end 8389else if cur_cmd<=assign_toks then 8390 begin if cur_cmd<assign_toks then {|cur_cmd=toks_register|} 8391 begin scan_eight_bit_int; m:=toks_base+cur_val; 8392 end; 8393 scanned_result(equiv(m))(tok_val); 8394 end 8395else begin back_input; scan_font_ident; 8396 scanned_result(font_id_base+cur_val)(ident_val); 8397 end 8398 8399@ Users refer to `\.{\\the\\spacefactor}' only in horizontal 8400mode, and to `\.{\\the\\prevdepth}' only in vertical mode; so we put the 8401associated mode in the modifier part of the |set_aux| command. 8402The |set_page_int| command has modifier 0 or 1, for `\.{\\deadcycles}' and 8403`\.{\\insertpenalties}', respectively. The |set_box_dimen| command is 8404modified by either |width_offset|, |height_offset|, or |depth_offset|. 8405And the |last_item| command is modified by either |int_val|, |dimen_val|, 8406|glue_val|, |input_line_no_code|, or |badness_code|. 8407 8408@d input_line_no_code=glue_val+1 {code for \.{\\inputlineno}} 8409@d badness_code=glue_val+2 {code for \.{\\badness}} 8410 8411@<Put each...@>= 8412primitive("spacefactor",set_aux,hmode); 8413@!@:space_factor_}{\.{\\spacefactor} primitive@> 8414primitive("prevdepth",set_aux,vmode);@/ 8415@!@:prev_depth_}{\.{\\prevdepth} primitive@> 8416primitive("deadcycles",set_page_int,0); 8417@!@:dead_cycles_}{\.{\\deadcycles} primitive@> 8418primitive("insertpenalties",set_page_int,1); 8419@!@:insert_penalties_}{\.{\\insertpenalties} primitive@> 8420primitive("wd",set_box_dimen,width_offset); 8421@!@:wd_}{\.{\\wd} primitive@> 8422primitive("ht",set_box_dimen,height_offset); 8423@!@:ht_}{\.{\\ht} primitive@> 8424primitive("dp",set_box_dimen,depth_offset); 8425@!@:dp_}{\.{\\dp} primitive@> 8426primitive("lastpenalty",last_item,int_val); 8427@!@:last_penalty_}{\.{\\lastpenalty} primitive@> 8428primitive("lastkern",last_item,dimen_val); 8429@!@:last_kern_}{\.{\\lastkern} primitive@> 8430primitive("lastskip",last_item,glue_val); 8431@!@:last_skip_}{\.{\\lastskip} primitive@> 8432primitive("inputlineno",last_item,input_line_no_code); 8433@!@:input_line_no_}{\.{\\inputlineno} primitive@> 8434primitive("badness",last_item,badness_code); 8435@!@:badness_}{\.{\\badness} primitive@> 8436 8437@ @<Cases of |print_cmd_chr|...@>= 8438set_aux: if chr_code=vmode then print_esc("prevdepth") 8439@+else print_esc("spacefactor"); 8440set_page_int: if chr_code=0 then print_esc("deadcycles") 8441@+else print_esc("insertpenalties"); 8442set_box_dimen: if chr_code=width_offset then print_esc("wd") 8443else if chr_code=height_offset then print_esc("ht") 8444else print_esc("dp"); 8445last_item: case chr_code of 8446 int_val: print_esc("lastpenalty"); 8447 dimen_val: print_esc("lastkern"); 8448 glue_val: print_esc("lastskip"); 8449 input_line_no_code: print_esc("inputlineno"); 8450 othercases print_esc("badness") 8451 endcases; 8452 8453@ @<Fetch the |space_factor| or the |prev_depth|@>= 8454if abs(mode)<>m then 8455 begin print_err("Improper "); print_cmd_chr(set_aux,m); 8456@.Improper \\spacefactor@> 8457@.Improper \\prevdepth@> 8458 help4("You can refer to \spacefactor only in horizontal mode;")@/ 8459 ("you can refer to \prevdepth only in vertical mode; and")@/ 8460 ("neither of these is meaningful inside \write. So")@/ 8461 ("I'm forgetting what you said and using zero instead."); 8462 error; 8463 if level<>tok_val then scanned_result(0)(dimen_val) 8464 else scanned_result(0)(int_val); 8465 end 8466else if m=vmode then scanned_result(prev_depth)(dimen_val) 8467else scanned_result(space_factor)(int_val) 8468 8469@ @<Fetch the |dead_cycles| or the |insert_penalties|@>= 8470begin if m=0 then cur_val:=dead_cycles@+else cur_val:=insert_penalties; 8471cur_val_level:=int_val; 8472end 8473 8474@ @<Fetch a box dimension@>= 8475begin scan_eight_bit_int; 8476if box(cur_val)=null then cur_val:=0 @+else cur_val:=mem[box(cur_val)+m].sc; 8477cur_val_level:=dimen_val; 8478end 8479 8480@ Inside an \.{\\output} routine, a user may wish to look at the page totals 8481that were present at the moment when output was triggered. 8482 8483@d max_dimen==@'7777777777 {$2^{30}-1$} 8484 8485@<Fetch something on the |page_so_far|@>= 8486begin if (page_contents=empty) and (not output_active) then 8487 if m=0 then cur_val:=max_dimen@+else cur_val:=0 8488else cur_val:=page_so_far[m]; 8489cur_val_level:=dimen_val; 8490end 8491 8492@ @<Fetch the |prev_graf|@>= 8493if mode=0 then scanned_result(0)(int_val) {|prev_graf=0| within \.{\\write}} 8494else begin nest[nest_ptr]:=cur_list; p:=nest_ptr; 8495 while abs(nest[p].mode_field)<>vmode do decr(p); 8496 scanned_result(nest[p].pg_field)(int_val); 8497 end 8498 8499@ @<Fetch the |par_shape| size@>= 8500begin if par_shape_ptr=null then cur_val:=0 8501else cur_val:=info(par_shape_ptr); 8502cur_val_level:=int_val; 8503end 8504 8505@ Here is where \.{\\lastpenalty}, \.{\\lastkern}, and \.{\\lastskip} are 8506implemented. The reference count for \.{\\lastskip} will be updated later. 8507 8508We also handle \.{\\inputlineno} and \.{\\badness} here, because they are 8509legal in similar contexts. 8510 8511@<Fetch an item in the current node...@>= 8512if cur_chr>glue_val then 8513 begin if cur_chr=input_line_no_code then cur_val:=line 8514 else cur_val:=last_badness; {|cur_chr=badness_code|} 8515 cur_val_level:=int_val; 8516 end 8517else begin if cur_chr=glue_val then cur_val:=zero_glue@+else cur_val:=0; 8518 cur_val_level:=cur_chr; 8519 if not is_char_node(tail)and(mode<>0) then 8520 case cur_chr of 8521 int_val: if type(tail)=penalty_node then cur_val:=penalty(tail); 8522 dimen_val: if type(tail)=kern_node then cur_val:=width(tail); 8523 glue_val: if type(tail)=glue_node then 8524 begin cur_val:=glue_ptr(tail); 8525 if subtype(tail)=mu_glue then cur_val_level:=mu_val; 8526 end; 8527 end {there are no other cases} 8528 else if (mode=vmode)and(tail=head) then 8529 case cur_chr of 8530 int_val: cur_val:=last_penalty; 8531 dimen_val: cur_val:=last_kern; 8532 glue_val: if last_glue<>max_halfword then cur_val:=last_glue; 8533 end; {there are no other cases} 8534 end 8535 8536@ @<Fetch a font dimension@>= 8537begin find_font_dimen(false); font_info[fmem_ptr].sc:=0; 8538scanned_result(font_info[cur_val].sc)(dimen_val); 8539end 8540 8541@ @<Fetch a font integer@>= 8542begin scan_font_ident; 8543if m=0 then scanned_result(hyphen_char[cur_val])(int_val) 8544else scanned_result(skew_char[cur_val])(int_val); 8545end 8546 8547@ @<Fetch a register@>= 8548begin scan_eight_bit_int; 8549case m of 8550int_val:cur_val:=count(cur_val); 8551dimen_val:cur_val:=dimen(cur_val); 8552glue_val: cur_val:=skip(cur_val); 8553mu_val: cur_val:=mu_skip(cur_val); 8554end; {there are no other cases} 8555cur_val_level:=m; 8556end 8557 8558@ @<Complain that \.{\\the} can't do this; give zero result@>= 8559begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr); 8560@.You can't use x after ...@> 8561print("' after "); print_esc("the"); 8562help1("I'm forgetting what you said and using zero instead."); 8563error; 8564if level<>tok_val then scanned_result(0)(dimen_val) 8565else scanned_result(0)(int_val); 8566end 8567 8568@ When a |glue_val| changes to a |dimen_val|, we use the width component 8569of the glue; there is no need to decrease the reference count, since it 8570has not yet been increased. When a |dimen_val| changes to an |int_val|, 8571we use scaled points so that the value doesn't actually change. And when a 8572|mu_val| changes to a |glue_val|, the value doesn't change either. 8573 8574@<Convert \(c)|cur_val| to a lower level@>= 8575begin if cur_val_level=glue_val then cur_val:=width(cur_val) 8576else if cur_val_level=mu_val then mu_error; 8577decr(cur_val_level); 8578end 8579 8580@ If |cur_val| points to a glue specification at this point, the reference 8581count for the glue does not yet include the reference by |cur_val|. 8582If |negative| is |true|, |cur_val_level| is known to be |<=mu_val|. 8583 8584@<Fix the reference count, if any, ...@>= 8585if negative then 8586 if cur_val_level>=glue_val then 8587 begin cur_val:=new_spec(cur_val); 8588 @<Negate all three glue components of |cur_val|@>; 8589 end 8590 else negate(cur_val) 8591else if (cur_val_level>=glue_val)and(cur_val_level<=mu_val) then 8592 add_glue_ref(cur_val) 8593 8594@ @<Negate all three...@>= 8595begin negate(width(cur_val)); 8596negate(stretch(cur_val)); 8597negate(shrink(cur_val)); 8598end 8599 8600@ Our next goal is to write the |scan_int| procedure, which scans anything that 8601\TeX\ treats as an integer. But first we might as well look at some simple 8602applications of |scan_int| that have already been made inside of 8603|scan_something_internal|. 8604 8605@ @<Declare procedures that scan restricted classes of integers@>= 8606procedure scan_eight_bit_int; 8607begin scan_int; 8608if (cur_val<0)or(cur_val>255) then 8609 begin print_err("Bad register code"); 8610@.Bad register code@> 8611 help2("A register number must be between 0 and 255.")@/ 8612 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8613 end; 8614end; 8615 8616@ @<Declare procedures that scan restricted classes of integers@>= 8617procedure scan_char_num; 8618begin scan_int; 8619if (cur_val<0)or(cur_val>255) then 8620 begin print_err("Bad character code"); 8621@.Bad character code@> 8622 help2("A character number must be between 0 and 255.")@/ 8623 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8624 end; 8625end; 8626 8627@ While we're at it, we might as well deal with similar routines that 8628will be needed later. 8629 8630@<Declare procedures that scan restricted classes of integers@>= 8631procedure scan_four_bit_int; 8632begin scan_int; 8633if (cur_val<0)or(cur_val>15) then 8634 begin print_err("Bad number"); 8635@.Bad number@> 8636 help2("Since I expected to read a number between 0 and 15,")@/ 8637 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8638 end; 8639end; 8640 8641@ @<Declare procedures that scan restricted classes of integers@>= 8642procedure scan_fifteen_bit_int; 8643begin scan_int; 8644if (cur_val<0)or(cur_val>@'77777) then 8645 begin print_err("Bad mathchar"); 8646@.Bad mathchar@> 8647 help2("A mathchar number must be between 0 and 32767.")@/ 8648 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8649 end; 8650end; 8651 8652@ @<Declare procedures that scan restricted classes of integers@>= 8653procedure scan_twenty_seven_bit_int; 8654begin scan_int; 8655if (cur_val<0)or(cur_val>@'777777777) then 8656 begin print_err("Bad delimiter code"); 8657@.Bad delimiter code@> 8658 help2("A numeric delimiter code must be between 0 and 2^{27}-1.")@/ 8659 ("I changed this one to zero."); int_error(cur_val); cur_val:=0; 8660 end; 8661end; 8662 8663@ An integer number can be preceded by any number of spaces and `\.+' or 8664`\.-' signs. Then comes either a decimal constant (i.e., radix 10), an 8665octal constant (i.e., radix 8, preceded by~\.\'), a hexadecimal constant 8666(radix 16, preceded by~\."), an alphabetic constant (preceded by~\.\`), or 8667an internal variable. After scanning is complete, 8668|cur_val| will contain the answer, which must be at most 8669$2^{31}-1=2147483647$ in absolute value. The value of |radix| is set to 867010, 8, or 16 in the cases of decimal, octal, or hexadecimal constants, 8671otherwise |radix| is set to zero. An optional space follows a constant. 8672 8673@d octal_token=other_token+"'" {apostrophe, indicates an octal constant} 8674@d hex_token=other_token+"""" {double quote, indicates a hex constant} 8675@d alpha_token=other_token+"`" {reverse apostrophe, precedes alpha constants} 8676@d point_token=other_token+"." {decimal point} 8677@d continental_point_token=other_token+"," {decimal point, Eurostyle} 8678 8679@<Glob...@>= 8680@!radix:small_number; {|scan_int| sets this to 8, 10, 16, or zero} 8681 8682@ We initialize the following global variables just in case |expand| 8683comes into action before any of the basic scanning routines has assigned 8684them a value. 8685 8686@<Set init...@>= 8687cur_val:=0; cur_val_level:=int_val; radix:=0; cur_order:=normal; 8688 8689@ The |scan_int| routine is used also to scan the integer part of a 8690fraction; for example, the `\.3' in `\.{3.14159}' will be found by 8691|scan_int|. The |scan_dimen| routine assumes that |cur_tok=point_token| 8692after the integer part of such a fraction has been scanned by |scan_int|, 8693and that the decimal point has been backed up to be scanned again. 8694 8695@p procedure scan_int; {sets |cur_val| to an integer} 8696label done; 8697var negative:boolean; {should the answer be negated?} 8698@!m:integer; {|@t$2^{31}$@> div radix|, the threshold of danger} 8699@!d:small_number; {the digit just scanned} 8700@!vacuous:boolean; {have no digits appeared?} 8701@!OK_so_far:boolean; {has an error message been issued?} 8702begin radix:=0; OK_so_far:=true;@/ 8703@<Get the next non-blank non-sign token; set |negative| appropriately@>; 8704if cur_tok=alpha_token then @<Scan an alphabetic character code into |cur_val|@> 8705else if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 8706 scan_something_internal(int_val,false) 8707else @<Scan a numeric constant@>; 8708if negative then negate(cur_val); 8709end; 8710 8711@ @<Get the next non-blank non-sign token...@>= 8712negative:=false; 8713repeat @<Get the next non-blank non-call token@>; 8714if cur_tok=other_token+"-" then 8715 begin negative := not negative; cur_tok:=other_token+"+"; 8716 end; 8717until cur_tok<>other_token+"+" 8718 8719@ A space is ignored after an alphabetic character constant, so that 8720such constants behave like numeric ones. 8721 8722@<Scan an alphabetic character code into |cur_val|@>= 8723begin get_token; {suppress macro expansion} 8724if cur_tok<cs_token_flag then 8725 begin cur_val:=cur_chr; 8726 if cur_cmd<=right_brace then 8727 if cur_cmd=right_brace then incr(align_state) 8728 else decr(align_state); 8729 end 8730else if cur_tok<cs_token_flag+single_base then 8731 cur_val:=cur_tok-cs_token_flag-active_base 8732else cur_val:=cur_tok-cs_token_flag-single_base; 8733if cur_val>255 then 8734 begin print_err("Improper alphabetic constant"); 8735@.Improper alphabetic constant@> 8736 help2("A one-character control sequence belongs after a ` mark.")@/ 8737 ("So I'm essentially inserting \0 here."); 8738 cur_val:="0"; back_error; 8739 end 8740else @<Scan an optional space@>; 8741end 8742 8743@ @<Scan an optional space@>= 8744begin get_x_token; if cur_cmd<>spacer then back_input; 8745end 8746 8747@ @<Scan a numeric constant@>= 8748begin radix:=10; m:=214748364; 8749if cur_tok=octal_token then 8750 begin radix:=8; m:=@'2000000000; get_x_token; 8751 end 8752else if cur_tok=hex_token then 8753 begin radix:=16; m:=@'1000000000; get_x_token; 8754 end; 8755vacuous:=true; cur_val:=0;@/ 8756@<Accumulate the constant until |cur_tok| is not a suitable digit@>; 8757if vacuous then @<Express astonishment that no number was here@> 8758else if cur_cmd<>spacer then back_input; 8759end 8760 8761@ @d infinity==@'17777777777 {the largest positive value that \TeX\ knows} 8762@d zero_token=other_token+"0" {zero, the smallest digit} 8763@d A_token=letter_token+"A" {the smallest special hex digit} 8764@d other_A_token=other_token+"A" {special hex digit of type |other_char|} 8765 8766@<Accumulate the constant...@>= 8767loop@+ begin if (cur_tok<zero_token+radix)and(cur_tok>=zero_token)and 8768 (cur_tok<=zero_token+9) then d:=cur_tok-zero_token 8769 else if radix=16 then 8770 if (cur_tok<=A_token+5)and(cur_tok>=A_token) then d:=cur_tok-A_token+10 8771 else if (cur_tok<=other_A_token+5)and(cur_tok>=other_A_token) then 8772 d:=cur_tok-other_A_token+10 8773 else goto done 8774 else goto done; 8775 vacuous:=false; 8776 if (cur_val>=m)and((cur_val>m)or(d>7)or(radix<>10)) then 8777 begin if OK_so_far then 8778 begin print_err("Number too big"); 8779@.Number too big@> 8780 help2("I can only go up to 2147483647='17777777777=""7FFFFFFF,")@/ 8781 ("so I'm using that number instead of yours."); 8782 error; cur_val:=infinity; OK_so_far:=false; 8783 end; 8784 end 8785 else cur_val:=cur_val*radix+d; 8786 get_x_token; 8787 end; 8788done: 8789 8790@ @<Express astonishment...@>= 8791begin print_err("Missing number, treated as zero"); 8792@.Missing number...@> 8793help3("A number should have been here; I inserted `0'.")@/ 8794 ("(If you can't figure out why I needed to see a number,")@/ 8795 ("look up `weird error' in the index to The TeXbook.)"); 8796@:TeXbook}{\sl The \TeX book@> 8797back_error; 8798end 8799 8800@ The |scan_dimen| routine is similar to |scan_int|, but it sets |cur_val| to 8801a |scaled| value, i.e., an integral number of sp. One of its main tasks 8802is therefore to interpret the abbreviations for various kinds of units and 8803to convert measurements to scaled points. 8804 8805There are three parameters: |mu| is |true| if the finite units must be 8806`\.{mu}', while |mu| is |false| if `\.{mu}' units are disallowed; 8807|inf| is |true| if the infinite units `\.{fil}', `\.{fill}', `\.{filll}' 8808are permitted; and |shortcut| is |true| if |cur_val| already contains 8809an integer and only the units need to be considered. 8810 8811The order of infinity that was found in the case of infinite glue is returned 8812in the global variable |cur_order|. 8813 8814@<Glob...@>= 8815@!cur_order:glue_ord; {order of infinity found by |scan_dimen|} 8816 8817@ Constructions like `\.{-\'77 pt}' are legal dimensions, so |scan_dimen| 8818may begin with |scan_int|. This explains why it is convenient to use 8819|scan_int| also for the integer part of a decimal fraction. 8820 8821Several branches of |scan_dimen| work with |cur_val| as an integer and 8822with an auxiliary fraction |f|, so that the actual quantity of interest is 8823$|cur_val|+|f|/2^{16}$. At the end of the routine, this ``unpacked'' 8824representation is put into the single word |cur_val|, which suddenly 8825switches significance from |integer| to |scaled|. 8826 8827@d attach_fraction=88 {go here to pack |cur_val| and |f| into |cur_val|} 8828@d attach_sign=89 {go here when |cur_val| is correct except perhaps for sign} 8829@d scan_normal_dimen==scan_dimen(false,false,false) 8830 8831@p procedure scan_dimen(@!mu,@!inf,@!shortcut:boolean); 8832 {sets |cur_val| to a dimension} 8833label done, done1, done2, found, not_found, attach_fraction, attach_sign; 8834var negative:boolean; {should the answer be negated?} 8835@!f:integer; {numerator of a fraction whose denominator is $2^{16}$} 8836@<Local variables for dimension calculations@>@; 8837begin f:=0; arith_error:=false; cur_order:=normal; negative:=false; 8838if not shortcut then 8839 begin @<Get the next non-blank non-sign...@>; 8840 if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 8841 @<Fetch an internal dimension and |goto attach_sign|, 8842 or fetch an internal integer@> 8843 else begin back_input; 8844 if cur_tok=continental_point_token then cur_tok:=point_token; 8845 if cur_tok<>point_token then scan_int 8846 else begin radix:=10; cur_val:=0; 8847 end; 8848 if cur_tok=continental_point_token then cur_tok:=point_token; 8849 if (radix=10)and(cur_tok=point_token) then @<Scan decimal fraction@>; 8850 end; 8851 end; 8852if cur_val<0 then {in this case |f=0|} 8853 begin negative := not negative; negate(cur_val); 8854 end; 8855@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$, where there 8856 are |x| sp per unit; |goto attach_sign| if the units are internal@>; 8857@<Scan an optional space@>; 8858attach_sign: if arith_error or(abs(cur_val)>=@'10000000000) then 8859 @<Report that this dimension is out of range@>; 8860if negative then negate(cur_val); 8861end; 8862 8863@ @<Fetch an internal dimension and |goto attach_sign|...@>= 8864if mu then 8865 begin scan_something_internal(mu_val,false); 8866 @<Coerce glue to a dimension@>; 8867 if cur_val_level=mu_val then goto attach_sign; 8868 if cur_val_level<>int_val then mu_error; 8869 end 8870else begin scan_something_internal(dimen_val,false); 8871 if cur_val_level=dimen_val then goto attach_sign; 8872 end 8873 8874@ @<Local variables for dimension calculations@>= 8875@!num,@!denom:1..65536; {conversion ratio for the scanned units} 8876@!k,@!kk:small_number; {number of digits in a decimal fraction} 8877@!p,@!q:pointer; {top of decimal digit stack} 8878@!v:scaled; {an internal dimension} 8879@!save_cur_val:integer; {temporary storage of |cur_val|} 8880 8881@ The following code is executed when |scan_something_internal| was 8882called asking for |mu_val|, when we really wanted a ``mudimen'' instead 8883of ``muglue.'' 8884 8885@<Coerce glue to a dimension@>= 8886if cur_val_level>=glue_val then 8887 begin v:=width(cur_val); delete_glue_ref(cur_val); cur_val:=v; 8888 end 8889 8890@ When the following code is executed, we have |cur_tok=point_token|, but this 8891token has been backed up using |back_input|; we must first discard it. 8892 8893It turns out that a decimal point all by itself is equivalent to `\.{0.0}'. 8894Let's hope people don't use that fact. 8895 8896@<Scan decimal fraction@>= 8897begin k:=0; p:=null; get_token; {|point_token| is being re-scanned} 8898loop@+ begin get_x_token; 8899 if (cur_tok>zero_token+9)or(cur_tok<zero_token) then goto done1; 8900 if k<17 then {digits for |k>=17| cannot affect the result} 8901 begin q:=get_avail; link(q):=p; info(q):=cur_tok-zero_token; 8902 p:=q; incr(k); 8903 end; 8904 end; 8905done1: for kk:=k downto 1 do 8906 begin dig[kk-1]:=info(p); q:=p; p:=link(p); free_avail(q); 8907 end; 8908f:=round_decimals(k); 8909if cur_cmd<>spacer then back_input; 8910end 8911 8912@ Now comes the harder part: At this point in the program, |cur_val| is a 8913nonnegative integer and $f/2^{16}$ is a nonnegative fraction less than 1; 8914we want to multiply the sum of these two quantities by the appropriate 8915factor, based on the specified units, in order to produce a |scaled| 8916result, and we want to do the calculation with fixed point arithmetic that 8917does not overflow. 8918 8919@<Scan units and set |cur_val| to $x\cdot(|cur_val|+f/2^{16})$...@>= 8920if inf then @<Scan for \(f)\.{fil} units; |goto attach_fraction| if found@>; 8921@<Scan for \(u)units that are internal dimensions; 8922 |goto attach_sign| with |cur_val| set if found@>; 8923if mu then @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>; 8924if scan_keyword("true") then @<Adjust \(f)for the magnification ratio@>; 8925@.true@> 8926if scan_keyword("pt") then goto attach_fraction; {the easy case} 8927@.pt@> 8928@<Scan for \(a)all other units and adjust |cur_val| and |f| accordingly; 8929 |goto done| in the case of scaled points@>; 8930attach_fraction: if cur_val>=@'40000 then arith_error:=true 8931else cur_val:=cur_val*unity+f; 8932done: 8933 8934@ A specification like `\.{filllll}' or `\.{fill L L L}' will lead to two 8935error messages (one for each additional keyword \.{"l"}). 8936 8937@<Scan for \(f)\.{fil} units...@>= 8938if scan_keyword("fil") then 8939@.fil@> 8940 begin cur_order:=fil; 8941 while scan_keyword("l") do 8942 begin if cur_order=filll then 8943 begin print_err("Illegal unit of measure ("); 8944@.Illegal unit of measure@> 8945 print("replaced by filll)"); 8946 help1("I dddon't go any higher than filll."); error; 8947 end 8948 else incr(cur_order); 8949 end; 8950 goto attach_fraction; 8951 end 8952 8953@ @<Scan for \(u)units that are internal dimensions...@>= 8954save_cur_val:=cur_val; 8955@<Get the next non-blank non-call...@>; 8956if (cur_cmd<min_internal)or(cur_cmd>max_internal) then back_input 8957else begin if mu then 8958 begin scan_something_internal(mu_val,false); @<Coerce glue...@>; 8959 if cur_val_level<>mu_val then mu_error; 8960 end 8961 else scan_something_internal(dimen_val,false); 8962 v:=cur_val; goto found; 8963 end; 8964if mu then goto not_found; 8965if scan_keyword("em") then v:=(@<The em width for |cur_font|@>) 8966@.em@> 8967else if scan_keyword("ex") then v:=(@<The x-height for |cur_font|@>) 8968@.ex@> 8969else goto not_found; 8970@<Scan an optional space@>; 8971found:cur_val:=nx_plus_y(save_cur_val,v,xn_over_d(v,f,@'200000)); 8972goto attach_sign; 8973not_found: 8974 8975@ @<Scan for \(m)\.{mu} units and |goto attach_fraction|@>= 8976if scan_keyword("mu") then goto attach_fraction 8977@.mu@> 8978else begin print_err("Illegal unit of measure ("); print("mu inserted)"); 8979@.Illegal unit of measure@> 8980 help4("The unit of measurement in math glue must be mu.")@/ 8981 ("To recover gracefully from this error, it's best to")@/ 8982 ("delete the erroneous units; e.g., type `2' to delete")@/ 8983 ("two letters. (See Chapter 27 of The TeXbook.)"); 8984@:TeXbook}{\sl The \TeX book@> 8985 error; goto attach_fraction; 8986 end 8987 8988@ @<Adjust \(f)for the magnification ratio@>= 8989begin prepare_mag; 8990if mag<>1000 then 8991 begin cur_val:=xn_over_d(cur_val,1000,mag); 8992 f:=(1000*f+@'200000*remainder) div mag; 8993 cur_val:=cur_val+(f div @'200000); f:=f mod @'200000; 8994 end; 8995end 8996 8997@ The necessary conversion factors can all be specified exactly as 8998fractions whose numerator and denominator sum to 32768 or less. 8999According to the definitions here, $\rm2660\,dd\approx1000.33297\,mm$; 9000this agrees well with the value $\rm1000.333\,mm$ cited by Bosshard 9001@^Bosshard, Hans Rudolf@> 9002in {\sl Technische Grundlagen zur Satzherstellung\/} (Bern, 1980). 9003 9004@d set_conversion_end(#)== denom:=#; end 9005@d set_conversion(#)==@+begin num:=#; set_conversion_end 9006 9007@<Scan for \(a)all other units and adjust |cur_val| and |f|...@>= 9008if scan_keyword("in") then set_conversion(7227)(100) 9009@.in@> 9010else if scan_keyword("pc") then set_conversion(12)(1) 9011@.pc@> 9012else if scan_keyword("cm") then set_conversion(7227)(254) 9013@.cm@> 9014else if scan_keyword("mm") then set_conversion(7227)(2540) 9015@.mm@> 9016else if scan_keyword("bp") then set_conversion(7227)(7200) 9017@.bp@> 9018else if scan_keyword("dd") then set_conversion(1238)(1157) 9019@.dd@> 9020else if scan_keyword("cc") then set_conversion(14856)(1157) 9021@.cc@> 9022else if scan_keyword("sp") then goto done 9023@.sp@> 9024else @<Complain about unknown unit and |goto done2|@>; 9025cur_val:=xn_over_d(cur_val,num,denom); 9026f:=(num*f+@'200000*remainder) div denom;@/ 9027cur_val:=cur_val+(f div @'200000); f:=f mod @'200000; 9028done2: 9029 9030@ @<Complain about unknown unit...@>= 9031begin print_err("Illegal unit of measure ("); print("pt inserted)"); 9032@.Illegal unit of measure@> 9033help6("Dimensions can be in units of em, ex, in, pt, pc,")@/ 9034 ("cm, mm, dd, cc, bp, or sp; but yours is a new one!")@/ 9035 ("I'll assume that you meant to say pt, for printer's points.")@/ 9036 ("To recover gracefully from this error, it's best to")@/ 9037 ("delete the erroneous units; e.g., type `2' to delete")@/ 9038 ("two letters. (See Chapter 27 of The TeXbook.)"); 9039@:TeXbook}{\sl The \TeX book@> 9040error; goto done2; 9041end 9042 9043 9044@ @<Report that this dimension is out of range@>= 9045begin print_err("Dimension too large"); 9046@.Dimension too large@> 9047help2("I can't work with sizes bigger than about 19 feet.")@/ 9048 ("Continue and I'll use the largest value I can.");@/ 9049error; cur_val:=max_dimen; arith_error:=false; 9050end 9051 9052@ The final member of \TeX's value-scanning trio is |scan_glue|, which 9053makes |cur_val| point to a glue specification. The reference count of that 9054glue spec will take account of the fact that |cur_val| is pointing to~it. 9055 9056The |level| parameter should be either |glue_val| or |mu_val|. 9057 9058Since |scan_dimen| was so much more complex than |scan_int|, we might expect 9059|scan_glue| to be even worse. But fortunately, it is very simple, since 9060most of the work has already been done. 9061 9062@p procedure scan_glue(@!level:small_number); 9063 {sets |cur_val| to a glue spec pointer} 9064label exit; 9065var negative:boolean; {should the answer be negated?} 9066@!q:pointer; {new glue specification} 9067@!mu:boolean; {does |level=mu_val|?} 9068begin mu:=(level=mu_val); @<Get the next non-blank non-sign...@>; 9069if (cur_cmd>=min_internal)and(cur_cmd<=max_internal) then 9070 begin scan_something_internal(level,negative); 9071 if cur_val_level>=glue_val then 9072 begin if cur_val_level<>level then mu_error; 9073 return; 9074 end; 9075 if cur_val_level=int_val then scan_dimen(mu,false,true) 9076 else if level=mu_val then mu_error; 9077 end 9078else begin back_input; scan_dimen(mu,false,false); 9079 if negative then negate(cur_val); 9080 end; 9081@<Create a new glue specification whose width is |cur_val|; scan for its 9082 stretch and shrink components@>; 9083exit:end; 9084 9085@ @<Create a new glue specification whose width is |cur_val|...@>= 9086q:=new_spec(zero_glue); width(q):=cur_val; 9087if scan_keyword("plus") then 9088@.plus@> 9089 begin scan_dimen(mu,true,false); 9090 stretch(q):=cur_val; stretch_order(q):=cur_order; 9091 end; 9092if scan_keyword("minus") then 9093@.minus@> 9094 begin scan_dimen(mu,true,false); 9095 shrink(q):=cur_val; shrink_order(q):=cur_order; 9096 end; 9097cur_val:=q 9098 9099@ Here's a similar procedure that returns a pointer to a rule node. This 9100routine is called just after \TeX\ has seen \.{\\hrule} or \.{\\vrule}; 9101therefore |cur_cmd| will be either |hrule| or |vrule|. The idea is to store 9102the default rule dimensions in the node, then to override them if 9103`\.{height}' or `\.{width}' or `\.{depth}' specifications are 9104found (in any order). 9105 9106@d default_rule=26214 {0.4\thinspace pt} 9107 9108@p function scan_rule_spec:pointer; 9109label reswitch; 9110var q:pointer; {the rule node being created} 9111begin q:=new_rule; {|width|, |depth|, and |height| all equal |null_flag| now} 9112if cur_cmd=vrule then width(q):=default_rule 9113else begin height(q):=default_rule; depth(q):=0; 9114 end; 9115reswitch: if scan_keyword("width") then 9116@.width@> 9117 begin scan_normal_dimen; width(q):=cur_val; goto reswitch; 9118 end; 9119if scan_keyword("height") then 9120@.height@> 9121 begin scan_normal_dimen; height(q):=cur_val; goto reswitch; 9122 end; 9123if scan_keyword("depth") then 9124@.depth@> 9125 begin scan_normal_dimen; depth(q):=cur_val; goto reswitch; 9126 end; 9127scan_rule_spec:=q; 9128end; 9129 9130@* \[27] Building token lists. 9131The token lists for macros and for other things like \.{\\mark} and \.{\\output} 9132and \.{\\write} are produced by a procedure called |scan_toks|. 9133 9134Before we get into the details of |scan_toks|, let's consider a much 9135simpler task, that of converting the current string into a token list. 9136The |str_toks| function does this; it classifies spaces as type |spacer| 9137and everything else as type |other_char|. 9138 9139The token list created by |str_toks| begins at |link(temp_head)| and ends 9140at the value |p| that is returned. (If |p=temp_head|, the list is empty.) 9141 9142@p function str_toks(@!b:pool_pointer):pointer; 9143 {changes the string |str_pool[b..pool_ptr]| to a token list} 9144var p:pointer; {tail of the token list} 9145@!q:pointer; {new node being added to the token list via |store_new_token|} 9146@!t:halfword; {token being appended} 9147@!k:pool_pointer; {index into |str_pool|} 9148begin str_room(1); 9149p:=temp_head; link(p):=null; k:=b; 9150while k<pool_ptr do 9151 begin t:=so(str_pool[k]); 9152 if t=" " then t:=space_token 9153 else t:=other_token+t; 9154 fast_store_new_token(t); 9155 incr(k); 9156 end; 9157pool_ptr:=b; str_toks:=p; 9158end; 9159 9160@ The main reason for wanting |str_toks| is the next function, 9161|the_toks|, which has similar input/output characteristics. 9162 9163This procedure is supposed to scan something like `\.{\\skip\\count12}', 9164i.e., whatever can follow `\.{\\the}', and it constructs a token list 9165containing something like `\.{-3.0pt minus 0.5fill}'. 9166 9167@p function the_toks:pointer; 9168var old_setting:0..max_selector; {holds |selector| setting} 9169@!p,@!q,@!r:pointer; {used for copying a token list} 9170@!b:pool_pointer; {base of temporary string} 9171begin get_x_token; scan_something_internal(tok_val,false); 9172if cur_val_level>=ident_val then @<Copy the token list@> 9173else begin old_setting:=selector; selector:=new_string; b:=pool_ptr; 9174 case cur_val_level of 9175 int_val:print_int(cur_val); 9176 dimen_val:begin print_scaled(cur_val); print("pt"); 9177 end; 9178 glue_val: begin print_spec(cur_val,"pt"); delete_glue_ref(cur_val); 9179 end; 9180 mu_val: begin print_spec(cur_val,"mu"); delete_glue_ref(cur_val); 9181 end; 9182 end; {there are no other cases} 9183 selector:=old_setting; the_toks:=str_toks(b); 9184 end; 9185end; 9186 9187@ @<Copy the token list@>= 9188begin p:=temp_head; link(p):=null; 9189if cur_val_level=ident_val then store_new_token(cs_token_flag+cur_val) 9190else if cur_val<>null then 9191 begin r:=link(cur_val); {do not copy the reference count} 9192 while r<>null do 9193 begin fast_store_new_token(info(r)); r:=link(r); 9194 end; 9195 end; 9196the_toks:=p; 9197end 9198 9199@ Here's part of the |expand| subroutine that we are now ready to complete: 9200 9201@p procedure ins_the_toks; 9202begin link(garbage):=the_toks; ins_list(link(temp_head)); 9203end; 9204 9205@ The primitives \.{\\number}, \.{\\romannumeral}, \.{\\string}, \.{\\meaning}, 9206\.{\\fontname}, and \.{\\jobname} are defined as follows. 9207 9208@d number_code=0 {command code for \.{\\number}} 9209@d roman_numeral_code=1 {command code for \.{\\romannumeral}} 9210@d string_code=2 {command code for \.{\\string}} 9211@d meaning_code=3 {command code for \.{\\meaning}} 9212@d font_name_code=4 {command code for \.{\\fontname}} 9213@d job_name_code=5 {command code for \.{\\jobname}} 9214 9215@<Put each...@>= 9216primitive("number",convert,number_code);@/ 9217@!@:number_}{\.{\\number} primitive@> 9218primitive("romannumeral",convert,roman_numeral_code);@/ 9219@!@:roman_numeral_}{\.{\\romannumeral} primitive@> 9220primitive("string",convert,string_code);@/ 9221@!@:string_}{\.{\\string} primitive@> 9222primitive("meaning",convert,meaning_code);@/ 9223@!@:meaning_}{\.{\\meaning} primitive@> 9224primitive("fontname",convert,font_name_code);@/ 9225@!@:font_name_}{\.{\\fontname} primitive@> 9226primitive("jobname",convert,job_name_code);@/ 9227@!@:job_name_}{\.{\\jobname} primitive@> 9228 9229@ @<Cases of |print_cmd_chr|...@>= 9230convert: case chr_code of 9231 number_code: print_esc("number"); 9232 roman_numeral_code: print_esc("romannumeral"); 9233 string_code: print_esc("string"); 9234 meaning_code: print_esc("meaning"); 9235 font_name_code: print_esc("fontname"); 9236 othercases print_esc("jobname") 9237 endcases; 9238 9239@ The procedure |conv_toks| uses |str_toks| to insert the token list 9240for |convert| functions into the scanner; `\.{\\outer}' control sequences 9241are allowed to follow `\.{\\string}' and `\.{\\meaning}'. 9242 9243@p procedure conv_toks; 9244var old_setting:0..max_selector; {holds |selector| setting} 9245@!c:number_code..job_name_code; {desired type of conversion} 9246@!save_scanner_status:small_number; {|scanner_status| upon entry} 9247@!b:pool_pointer; {base of temporary string} 9248begin c:=cur_chr; @<Scan the argument for command |c|@>; 9249old_setting:=selector; selector:=new_string; b:=pool_ptr; 9250@<Print the result of command |c|@>; 9251selector:=old_setting; link(garbage):=str_toks(b); ins_list(link(temp_head)); 9252end; 9253 9254@ @<Scan the argument for command |c|@>= 9255case c of 9256number_code,roman_numeral_code: scan_int; 9257string_code, meaning_code: begin save_scanner_status:=scanner_status; 9258 scanner_status:=normal; get_token; scanner_status:=save_scanner_status; 9259 end; 9260font_name_code: scan_font_ident; 9261job_name_code: if job_name=0 then open_log_file; 9262end {there are no other cases} 9263 9264@ @<Print the result of command |c|@>= 9265case c of 9266number_code: print_int(cur_val); 9267roman_numeral_code: print_roman_int(cur_val); 9268string_code:if cur_cs<>0 then sprint_cs(cur_cs) 9269 else print_char(cur_chr); 9270meaning_code: print_meaning; 9271font_name_code: begin print(font_name[cur_val]); 9272 if font_size[cur_val]<>font_dsize[cur_val] then 9273 begin print(" at "); print_scaled(font_size[cur_val]); 9274 print("pt"); 9275 end; 9276 end; 9277job_name_code: print(job_name); 9278end {there are no other cases} 9279 9280@ Now we can't postpone the difficulties any longer; we must bravely tackle 9281|scan_toks|. This function returns a pointer to the tail of a new token 9282list, and it also makes |def_ref| point to the reference count at the 9283head of that list. 9284 9285There are two boolean parameters, |macro_def| and |xpand|. If |macro_def| 9286is true, the goal is to create the token list for a macro definition; 9287otherwise the goal is to create the token list for some other \TeX\ 9288primitive: \.{\\mark}, \.{\\output}, \.{\\everypar}, \.{\\lowercase}, 9289\.{\\uppercase}, \.{\\message}, \.{\\errmessage}, \.{\\write}, or 9290\.{\\special}. In the latter cases a left brace must be scanned next; this 9291left brace will not be part of the token list, nor will the matching right 9292brace that comes at the end. If |xpand| is false, the token list will 9293simply be copied from the input using |get_token|. Otherwise all expandable 9294tokens will be expanded until unexpandable tokens are left, except that 9295the results of expanding `\.{\\the}' are not expanded further. 9296If both |macro_def| and |xpand| are true, the expansion applies 9297only to the macro body (i.e., to the material following the first 9298|left_brace| character). 9299 9300The value of |cur_cs| when |scan_toks| begins should be the |eqtb| 9301address of the control sequence to display in ``runaway'' error 9302messages. 9303 9304@p function scan_toks(@!macro_def,@!xpand:boolean):pointer; 9305label found,done,done1,done2; 9306var t:halfword; {token representing the highest parameter number} 9307@!s:halfword; {saved token} 9308@!p:pointer; {tail of the token list being built} 9309@!q:pointer; {new node being added to the token list via |store_new_token|} 9310@!unbalance:halfword; {number of unmatched left braces} 9311@!hash_brace:halfword; {possible `\.{\#\{}' token} 9312begin if macro_def then scanner_status:=defining 9313@+else scanner_status:=absorbing; 9314warning_index:=cur_cs; def_ref:=get_avail; token_ref_count(def_ref):=null; 9315p:=def_ref; hash_brace:=0; t:=zero_token; 9316if macro_def then @<Scan and build the parameter part of the macro definition@> 9317else scan_left_brace; {remove the compulsory left brace} 9318@<Scan and build the body of the token list; |goto found| when finished@>; 9319found: scanner_status:=normal; 9320if hash_brace<>0 then store_new_token(hash_brace); 9321scan_toks:=p; 9322end; 9323 9324@ @<Scan and build the parameter part...@>= 9325begin loop begin get_token; {set |cur_cmd|, |cur_chr|, |cur_tok|} 9326 if cur_tok<right_brace_limit then goto done1; 9327 if cur_cmd=mac_param then 9328 @<If the next character is a parameter number, make |cur_tok| 9329 a |match| token; but if it is a left brace, store 9330 `|left_brace|, |end_match|', set |hash_brace|, and |goto done|@>; 9331 store_new_token(cur_tok); 9332 end; 9333done1: store_new_token(end_match_token); 9334if cur_cmd=right_brace then 9335 @<Express shock at the missing left brace; |goto found|@>; 9336done: end 9337 9338@ @<Express shock...@>= 9339begin print_err("Missing { inserted"); incr(align_state); 9340@.Missing \{ inserted@> 9341help2("Where was the left brace? You said something like `\def\a}',")@/ 9342 ("which I'm going to interpret as `\def\a{}'."); error; goto found; 9343end 9344 9345@ @<If the next character is a parameter number...@>= 9346begin s:=match_token+cur_chr; get_token; 9347if cur_cmd=left_brace then 9348 begin hash_brace:=cur_tok; 9349 store_new_token(cur_tok); store_new_token(end_match_token); 9350 goto done; 9351 end; 9352if t=zero_token+9 then 9353 begin print_err("You already have nine parameters"); 9354@.You already have nine...@> 9355 help1("I'm going to ignore the # sign you just used."); error; 9356 end 9357else begin incr(t); 9358 if cur_tok<>t then 9359 begin print_err("Parameters must be numbered consecutively"); 9360@.Parameters...consecutively@> 9361 help2("I've inserted the digit you should have used after the #.")@/ 9362 ("Type `1' to delete what you did use."); back_error; 9363 end; 9364 cur_tok:=s; 9365 end; 9366end 9367 9368@ @<Scan and build the body of the token list; |goto found| when finished@>= 9369unbalance:=1; 9370loop@+ begin if xpand then @<Expand the next part of the input@> 9371 else get_token; 9372 if cur_tok<right_brace_limit then 9373 if cur_cmd<right_brace then incr(unbalance) 9374 else begin decr(unbalance); 9375 if unbalance=0 then goto found; 9376 end 9377 else if cur_cmd=mac_param then 9378 if macro_def then @<Look for parameter number or \.{\#\#}@>; 9379 store_new_token(cur_tok); 9380 end 9381 9382@ Here we insert an entire token list created by |the_toks| without 9383expanding it further. 9384 9385@<Expand the next part of the input@>= 9386begin loop begin get_next; 9387 if cur_cmd<=max_command then goto done2; 9388 if cur_cmd<>the then expand 9389 else begin q:=the_toks; 9390 if link(temp_head)<>null then 9391 begin link(p):=link(temp_head); p:=q; 9392 end; 9393 end; 9394 end; 9395done2: x_token 9396end 9397 9398@ @<Look for parameter number...@>= 9399begin s:=cur_tok; 9400if xpand then get_x_token else get_token; 9401if cur_cmd<>mac_param then 9402 if (cur_tok<=zero_token)or(cur_tok>t) then 9403 begin print_err("Illegal parameter number in definition of "); 9404@.Illegal parameter number...@> 9405 sprint_cs(warning_index); 9406 help3("You meant to type ## instead of #, right?")@/ 9407 ("Or maybe a } was forgotten somewhere earlier, and things")@/ 9408 ("are all screwed up? I'm going to assume that you meant ##."); 9409 back_error; cur_tok:=s; 9410 end 9411 else cur_tok:=out_param_token-"0"+cur_chr; 9412end 9413 9414@ Another way to create a token list is via the \.{\\read} command. The 9415sixteen files potentially usable for reading appear in the following 9416global variables. The value of |read_open[n]| will be |closed| if 9417stream number |n| has not been opened or if it has been fully read; 9418|just_open| if an \.{\\openin} but not a \.{\\read} has been done; 9419and |normal| if it is open and ready to read the next line. 9420 9421@d closed=2 {not open, or at end of file} 9422@d just_open=1 {newly opened, first line not yet read} 9423 9424@<Glob...@>= 9425@!read_file:array[0..15] of alpha_file; {used for \.{\\read}} 9426@!read_open:array[0..16] of normal..closed; {state of |read_file[n]|} 9427 9428@ @<Set init...@>= 9429for k:=0 to 16 do read_open[k]:=closed; 9430 9431@ The |read_toks| procedure constructs a token list like that for any 9432macro definition, and makes |cur_val| point to it. Parameter |r| points 9433to the control sequence that will receive this token list. 9434 9435@p procedure read_toks(@!n:integer;@!r:pointer); 9436label done; 9437var p:pointer; {tail of the token list} 9438@!q:pointer; {new node being added to the token list via |store_new_token|} 9439@!s:integer; {saved value of |align_state|} 9440@!m:small_number; {stream number} 9441begin scanner_status:=defining; warning_index:=r; 9442def_ref:=get_avail; token_ref_count(def_ref):=null; 9443p:=def_ref; {the reference count} 9444store_new_token(end_match_token); 9445if (n<0)or(n>15) then m:=16@+else m:=n; 9446s:=align_state; align_state:=1000000; {disable tab marks, etc.} 9447repeat @<Input and store tokens from the next line of the file@>; 9448until align_state=1000000; 9449cur_val:=def_ref; scanner_status:=normal; align_state:=s; 9450end; 9451 9452@ @<Input and store tokens from the next line of the file@>= 9453begin_file_reading; name:=m+1; 9454if read_open[m]=closed then @<Input for \.{\\read} from the terminal@> 9455else if read_open[m]=just_open then @<Input the first line of |read_file[m]|@> 9456else @<Input the next line of |read_file[m]|@>; 9457limit:=last; 9458if end_line_char_inactive then decr(limit) 9459else buffer[limit]:=end_line_char; 9460first:=limit+1; loc:=start; state:=new_line;@/ 9461loop@+ begin get_token; 9462 if cur_tok=0 then goto done; 9463 {|cur_cmd=cur_chr=0| will occur at the end of the line} 9464 if align_state<1000000 then {unmatched `\.\}' aborts the line} 9465 begin repeat get_token; until cur_tok=0; 9466 align_state:=1000000; goto done; 9467 end; 9468 store_new_token(cur_tok); 9469 end; 9470done: end_file_reading 9471 9472@ Here we input on-line into the |buffer| array, prompting the user explicitly 9473if |n>=0|. The value of |n| is set negative so that additional prompts 9474will not be given in the case of multi-line input. 9475 9476@<Input for \.{\\read} from the terminal@>= 9477if interaction>nonstop_mode then 9478 if n<0 then prompt_input("") 9479 else begin wake_up_terminal; 9480 print_ln; sprint_cs(r); prompt_input("="); n:=-1; 9481 end 9482else fatal_error("*** (cannot \read from terminal in nonstop modes)") 9483@.cannot \\read@> 9484 9485@ The first line of a file must be treated specially, since |input_ln| 9486must be told not to start with |get|. 9487@^system dependencies@> 9488 9489@<Input the first line of |read_file[m]|@>= 9490if input_ln(read_file[m],false) then read_open[m]:=normal 9491else begin a_close(read_file[m]); read_open[m]:=closed; 9492 end 9493 9494@ An empty line is appended at the end of a |read_file|. 9495@^empty line at end of file@> 9496 9497@<Input the next line of |read_file[m]|@>= 9498begin if not input_ln(read_file[m],true) then 9499 begin a_close(read_file[m]); read_open[m]:=closed; 9500 if align_state<>1000000 then 9501 begin runaway; 9502 print_err("File ended within "); print_esc("read"); 9503@.File ended within \\read@> 9504 help1("This \read has unbalanced braces."); 9505 align_state:=1000000; error; 9506 end; 9507 end; 9508end 9509 9510@* \[28] Conditional processing. 9511We consider now the way \TeX\ handles various kinds of \.{\\if} commands. 9512 9513@d if_char_code=0 { `\.{\\if}' } 9514@d if_cat_code=1 { `\.{\\ifcat}' } 9515@d if_int_code=2 { `\.{\\ifnum}' } 9516@d if_dim_code=3 { `\.{\\ifdim}' } 9517@d if_odd_code=4 { `\.{\\ifodd}' } 9518@d if_vmode_code=5 { `\.{\\ifvmode}' } 9519@d if_hmode_code=6 { `\.{\\ifhmode}' } 9520@d if_mmode_code=7 { `\.{\\ifmmode}' } 9521@d if_inner_code=8 { `\.{\\ifinner}' } 9522@d if_void_code=9 { `\.{\\ifvoid}' } 9523@d if_hbox_code=10 { `\.{\\ifhbox}' } 9524@d if_vbox_code=11 { `\.{\\ifvbox}' } 9525@d ifx_code=12 { `\.{\\ifx}' } 9526@d if_eof_code=13 { `\.{\\ifeof}' } 9527@d if_true_code=14 { `\.{\\iftrue}' } 9528@d if_false_code=15 { `\.{\\iffalse}' } 9529@d if_case_code=16 { `\.{\\ifcase}' } 9530 9531@<Put each...@>= 9532primitive("if",if_test,if_char_code); 9533@!@:if_char_}{\.{\\if} primitive@> 9534primitive("ifcat",if_test,if_cat_code); 9535@!@:if_cat_code_}{\.{\\ifcat} primitive@> 9536primitive("ifnum",if_test,if_int_code); 9537@!@:if_int_}{\.{\\ifnum} primitive@> 9538primitive("ifdim",if_test,if_dim_code); 9539@!@:if_dim_}{\.{\\ifdim} primitive@> 9540primitive("ifodd",if_test,if_odd_code); 9541@!@:if_odd_}{\.{\\ifodd} primitive@> 9542primitive("ifvmode",if_test,if_vmode_code); 9543@!@:if_vmode_}{\.{\\ifvmode} primitive@> 9544primitive("ifhmode",if_test,if_hmode_code); 9545@!@:if_hmode_}{\.{\\ifhmode} primitive@> 9546primitive("ifmmode",if_test,if_mmode_code); 9547@!@:if_mmode_}{\.{\\ifmmode} primitive@> 9548primitive("ifinner",if_test,if_inner_code); 9549@!@:if_inner_}{\.{\\ifinner} primitive@> 9550primitive("ifvoid",if_test,if_void_code); 9551@!@:if_void_}{\.{\\ifvoid} primitive@> 9552primitive("ifhbox",if_test,if_hbox_code); 9553@!@:if_hbox_}{\.{\\ifhbox} primitive@> 9554primitive("ifvbox",if_test,if_vbox_code); 9555@!@:if_vbox_}{\.{\\ifvbox} primitive@> 9556primitive("ifx",if_test,ifx_code); 9557@!@:ifx_}{\.{\\ifx} primitive@> 9558primitive("ifeof",if_test,if_eof_code); 9559@!@:if_eof_}{\.{\\ifeof} primitive@> 9560primitive("iftrue",if_test,if_true_code); 9561@!@:if_true_}{\.{\\iftrue} primitive@> 9562primitive("iffalse",if_test,if_false_code); 9563@!@:if_false_}{\.{\\iffalse} primitive@> 9564primitive("ifcase",if_test,if_case_code); 9565@!@:if_case_}{\.{\\ifcase} primitive@> 9566 9567@ @<Cases of |print_cmd_chr|...@>= 9568if_test: case chr_code of 9569 if_cat_code:print_esc("ifcat"); 9570 if_int_code:print_esc("ifnum"); 9571 if_dim_code:print_esc("ifdim"); 9572 if_odd_code:print_esc("ifodd"); 9573 if_vmode_code:print_esc("ifvmode"); 9574 if_hmode_code:print_esc("ifhmode"); 9575 if_mmode_code:print_esc("ifmmode"); 9576 if_inner_code:print_esc("ifinner"); 9577 if_void_code:print_esc("ifvoid"); 9578 if_hbox_code:print_esc("ifhbox"); 9579 if_vbox_code:print_esc("ifvbox"); 9580 ifx_code:print_esc("ifx"); 9581 if_eof_code:print_esc("ifeof"); 9582 if_true_code:print_esc("iftrue"); 9583 if_false_code:print_esc("iffalse"); 9584 if_case_code:print_esc("ifcase"); 9585 othercases print_esc("if") 9586 endcases; 9587 9588@ Conditions can be inside conditions, and this nesting has a stack 9589that is independent of the |save_stack|. 9590 9591Four global variables represent the top of the condition stack: 9592|cond_ptr| points to pushed-down entries, if any; |if_limit| specifies 9593the largest code of a |fi_or_else| command that is syntactically legal; 9594|cur_if| is the name of the current type of conditional; and |if_line| 9595is the line number at which it began. 9596 9597If no conditions are currently in progress, the condition stack has the 9598special state |cond_ptr=null|, |if_limit=normal|, |cur_if=0|, |if_line=0|. 9599Otherwise |cond_ptr| points to a two-word node; the |type|, |subtype|, and 9600|link| fields of the first word contain |if_limit|, |cur_if|, and 9601|cond_ptr| at the next level, and the second word contains the 9602corresponding |if_line|. 9603 9604@d if_node_size=2 {number of words in stack entry for conditionals} 9605@d if_line_field(#)==mem[#+1].int 9606@d if_code=1 {code for \.{\\if...} being evaluated} 9607@d fi_code=2 {code for \.{\\fi}} 9608@d else_code=3 {code for \.{\\else}} 9609@d or_code=4 {code for \.{\\or}} 9610 9611@<Glob...@>= 9612@!cond_ptr:pointer; {top of the condition stack} 9613@!if_limit:normal..or_code; {upper bound on |fi_or_else| codes} 9614@!cur_if:small_number; {type of conditional being worked on} 9615@!if_line:integer; {line where that conditional began} 9616 9617@ @<Set init...@>= 9618cond_ptr:=null; if_limit:=normal; cur_if:=0; if_line:=0; 9619 9620@ @<Put each...@>= 9621primitive("fi",fi_or_else,fi_code); 9622@!@:fi_}{\.{\\fi} primitive@> 9623text(frozen_fi):="fi"; eqtb[frozen_fi]:=eqtb[cur_val]; 9624primitive("or",fi_or_else,or_code); 9625@!@:or_}{\.{\\or} primitive@> 9626primitive("else",fi_or_else,else_code); 9627@!@:else_}{\.{\\else} primitive@> 9628 9629@ @<Cases of |print_cmd_chr|...@>= 9630fi_or_else: if chr_code=fi_code then print_esc("fi") 9631 else if chr_code=or_code then print_esc("or") 9632 else print_esc("else"); 9633 9634@ When we skip conditional text, we keep track of the line number 9635where skipping began, for use in error messages. 9636 9637@<Glob...@>= 9638@!skip_line:integer; {skipping began here} 9639 9640@ Here is a procedure that ignores text until coming to an \.{\\or}, 9641\.{\\else}, or \.{\\fi} at level zero of $\.{\\if}\ldots\.{\\fi}$ 9642nesting. After it has acted, |cur_chr| will indicate the token that 9643was found, but |cur_tok| will not be set (because this makes the 9644procedure run faster). 9645 9646@p procedure pass_text; 9647label done; 9648var l:integer; {level of $\.{\\if}\ldots\.{\\fi}$ nesting} 9649@!save_scanner_status:small_number; {|scanner_status| upon entry} 9650begin save_scanner_status:=scanner_status; scanner_status:=skipping; l:=0; 9651skip_line:=line; 9652loop@+ begin get_next; 9653 if cur_cmd=fi_or_else then 9654 begin if l=0 then goto done; 9655 if cur_chr=fi_code then decr(l); 9656 end 9657 else if cur_cmd=if_test then incr(l); 9658 end; 9659done: scanner_status:=save_scanner_status; 9660end; 9661 9662@ When we begin to process a new \.{\\if}, we set |if_limit:=if_code|; then 9663if\/ \.{\\or} or \.{\\else} or \.{\\fi} occurs before the current \.{\\if} 9664condition has been evaluated, \.{\\relax} will be inserted. 9665For example, a sequence of commands like `\.{\\ifvoid1\\else...\\fi}' 9666would otherwise require something after the `\.1'. 9667 9668@<Push the condition stack@>= 9669begin p:=get_node(if_node_size); link(p):=cond_ptr; type(p):=if_limit; 9670subtype(p):=cur_if; if_line_field(p):=if_line; 9671cond_ptr:=p; cur_if:=cur_chr; if_limit:=if_code; if_line:=line; 9672end 9673 9674@ @<Pop the condition stack@>= 9675begin p:=cond_ptr; if_line:=if_line_field(p); 9676cur_if:=subtype(p); if_limit:=type(p); cond_ptr:=link(p); 9677free_node(p,if_node_size); 9678end 9679 9680@ Here's a procedure that changes the |if_limit| code corresponding to 9681a given value of |cond_ptr|. 9682 9683@p procedure change_if_limit(@!l:small_number;@!p:pointer); 9684label exit; 9685var q:pointer; 9686begin if p=cond_ptr then if_limit:=l {that's the easy case} 9687else begin q:=cond_ptr; 9688 loop@+ begin if q=null then confusion("if"); 9689@:this can't happen if}{\quad if@> 9690 if link(q)=p then 9691 begin type(q):=l; return; 9692 end; 9693 q:=link(q); 9694 end; 9695 end; 9696exit:end; 9697 9698@ A condition is started when the |expand| procedure encounters 9699an |if_test| command; in that case |expand| reduces to |conditional|, 9700which is a recursive procedure. 9701@^recursion@> 9702 9703@p procedure conditional; 9704label exit,common_ending; 9705var b:boolean; {is the condition true?} 9706@!r:"<"..">"; {relation to be evaluated} 9707@!m,@!n:integer; {to be tested against the second operand} 9708@!p,@!q:pointer; {for traversing token lists in \.{\\ifx} tests} 9709@!save_scanner_status:small_number; {|scanner_status| upon entry} 9710@!save_cond_ptr:pointer; {|cond_ptr| corresponding to this conditional} 9711@!this_if:small_number; {type of this conditional} 9712begin @<Push the condition stack@>;@+save_cond_ptr:=cond_ptr;this_if:=cur_chr;@/ 9713@<Either process \.{\\ifcase} or set |b| to the value of a boolean condition@>; 9714if tracing_commands>1 then @<Display the value of |b|@>; 9715if b then 9716 begin change_if_limit(else_code,save_cond_ptr); 9717 return; {wait for \.{\\else} or \.{\\fi}} 9718 end; 9719@<Skip to \.{\\else} or \.{\\fi}, then |goto common_ending|@>; 9720common_ending: if cur_chr=fi_code then @<Pop the condition stack@> 9721else if_limit:=fi_code; {wait for \.{\\fi}} 9722exit:end; 9723 9724@ In a construction like `\.{\\if\\iftrue abc\\else d\\fi}', the first 9725\.{\\else} that we come to after learning that the \.{\\if} is false is 9726not the \.{\\else} we're looking for. Hence the following curious 9727logic is needed. 9728 9729@ @<Skip to \.{\\else} or \.{\\fi}...@>= 9730loop@+ begin pass_text; 9731 if cond_ptr=save_cond_ptr then 9732 begin if cur_chr<>or_code then goto common_ending; 9733 print_err("Extra "); print_esc("or"); 9734@.Extra \\or@> 9735 help1("I'm ignoring this; it doesn't match any \if."); 9736 error; 9737 end 9738 else if cur_chr=fi_code then @<Pop the condition stack@>; 9739 end 9740 9741@ @<Either process \.{\\ifcase} or set |b|...@>= 9742case this_if of 9743if_char_code, if_cat_code: @<Test if two characters match@>; 9744if_int_code, if_dim_code: @<Test relation between integers or dimensions@>; 9745if_odd_code: @<Test if an integer is odd@>; 9746if_vmode_code: b:=(abs(mode)=vmode); 9747if_hmode_code: b:=(abs(mode)=hmode); 9748if_mmode_code: b:=(abs(mode)=mmode); 9749if_inner_code: b:=(mode<0); 9750if_void_code, if_hbox_code, if_vbox_code: @<Test box register status@>; 9751ifx_code: @<Test if two tokens match@>; 9752if_eof_code: begin scan_four_bit_int; b:=(read_open[cur_val]=closed); 9753 end; 9754if_true_code: b:=true; 9755if_false_code: b:=false; 9756if_case_code: @<Select the appropriate case 9757 and |return| or |goto common_ending|@>; 9758end {there are no other cases} 9759 9760@ @<Display the value of |b|@>= 9761begin begin_diagnostic; 9762if b then print("{true}")@+else print("{false}"); 9763end_diagnostic(false); 9764end 9765 9766@ Here we use the fact that |"<"|, |"="|, and |">"| are consecutive ASCII 9767codes. 9768@^ASCII code@> 9769 9770@<Test relation between integers or dimensions@>= 9771begin if this_if=if_int_code then scan_int@+else scan_normal_dimen; 9772n:=cur_val; @<Get the next non-blank non-call...@>; 9773if (cur_tok>=other_token+"<")and(cur_tok<=other_token+">") then 9774 r:=cur_tok-other_token 9775else begin print_err("Missing = inserted for "); 9776@.Missing = inserted@> 9777 print_cmd_chr(if_test,this_if); 9778 help1("I was expecting to see `<', `=', or `>'. Didn't."); 9779 back_error; r:="="; 9780 end; 9781if this_if=if_int_code then scan_int@+else scan_normal_dimen; 9782case r of 9783"<": b:=(n<cur_val); 9784"=": b:=(n=cur_val); 9785">": b:=(n>cur_val); 9786end; 9787end 9788 9789@ @<Test if an integer is odd@>= 9790begin scan_int; b:=odd(cur_val); 9791end 9792 9793@ @<Test box register status@>= 9794begin scan_eight_bit_int; p:=box(cur_val); 9795if this_if=if_void_code then b:=(p=null) 9796else if p=null then b:=false 9797else if this_if=if_hbox_code then b:=(type(p)=hlist_node) 9798else b:=(type(p)=vlist_node); 9799end 9800 9801@ An active character will be treated as category 13 following 9802\.{\\if\\noexpand} or following \.{\\ifcat\\noexpand}. We use the fact that 9803active characters have the smallest tokens, among all control sequences. 9804 9805@d get_x_token_or_active_char==@t@>@; 9806 begin get_x_token; 9807 if cur_cmd=relax then if cur_chr=no_expand_flag then 9808 begin cur_cmd:=active_char; 9809 cur_chr:=cur_tok-cs_token_flag-active_base; 9810 end; 9811 end 9812 9813@<Test if two characters match@>= 9814begin get_x_token_or_active_char; 9815if (cur_cmd>active_char)or(cur_chr>255) then {not a character} 9816 begin m:=relax; n:=256; 9817 end 9818else begin m:=cur_cmd; n:=cur_chr; 9819 end; 9820get_x_token_or_active_char; 9821if (cur_cmd>active_char)or(cur_chr>255) then 9822 begin cur_cmd:=relax; cur_chr:=256; 9823 end; 9824if this_if=if_char_code then b:=(n=cur_chr)@+else b:=(m=cur_cmd); 9825end 9826 9827@ Note that `\.{\\ifx}' will declare two macros different if one is \\{long} 9828or \\{outer} and the other isn't, even though the texts of the macros are 9829the same. 9830 9831We need to reset |scanner_status|, since \.{\\outer} control sequences 9832are allowed, but we might be scanning a macro definition or preamble. 9833 9834@<Test if two tokens match@>= 9835begin save_scanner_status:=scanner_status; scanner_status:=normal; 9836get_next; n:=cur_cs; p:=cur_cmd; q:=cur_chr; 9837get_next; if cur_cmd<>p then b:=false 9838else if cur_cmd<call then b:=(cur_chr=q) 9839else @<Test if two macro texts match@>; 9840scanner_status:=save_scanner_status; 9841end 9842 9843@ Note also that `\.{\\ifx}' decides that macros \.{\\a} and \.{\\b} are 9844different in examples like this: 9845$$\vbox{\halign{\.{#}\hfil&\qquad\.{#}\hfil\cr 9846 {}\\def\\a\{\\c\}& 9847 {}\\def\\c\{\}\cr 9848 {}\\def\\b\{\\d\}& 9849 {}\\def\\d\{\}\cr}}$$ 9850 9851@<Test if two macro texts match@>= 9852begin p:=link(cur_chr); q:=link(equiv(n)); {omit reference counts} 9853if p=q then b:=true 9854else begin while (p<>null)and(q<>null) do 9855 if info(p)<>info(q) then p:=null 9856 else begin p:=link(p); q:=link(q); 9857 end; 9858 b:=((p=null)and(q=null)); 9859 end; 9860end 9861 9862@ @<Select the appropriate case and |return| or |goto common_ending|@>= 9863begin scan_int; n:=cur_val; {|n| is the number of cases to pass} 9864if tracing_commands>1 then 9865 begin begin_diagnostic; print("{case "); print_int(n); print_char("}"); 9866 end_diagnostic(false); 9867 end; 9868while n<>0 do 9869 begin pass_text; 9870 if cond_ptr=save_cond_ptr then 9871 if cur_chr=or_code then decr(n) 9872 else goto common_ending 9873 else if cur_chr=fi_code then @<Pop the condition stack@>; 9874 end; 9875change_if_limit(or_code,save_cond_ptr); 9876return; {wait for \.{\\or}, \.{\\else}, or \.{\\fi}} 9877end 9878 9879@ The processing of conditionals is complete except for the following 9880code, which is actually part of |expand|. It comes into play when 9881\.{\\or}, \.{\\else}, or \.{\\fi} is scanned. 9882 9883@<Terminate the current conditional and skip to \.{\\fi}@>= 9884if cur_chr>if_limit then 9885 if if_limit=if_code then insert_relax {condition not yet evaluated} 9886 else begin print_err("Extra "); print_cmd_chr(fi_or_else,cur_chr); 9887@.Extra \\or@> 9888@.Extra \\else@> 9889@.Extra \\fi@> 9890 help1("I'm ignoring this; it doesn't match any \if."); 9891 error; 9892 end 9893else begin while cur_chr<>fi_code do pass_text; {skip to \.{\\fi}} 9894 @<Pop the condition stack@>; 9895 end 9896 9897@* \[29] File names. 9898It's time now to fret about file names. Besides the fact that different 9899operating systems treat files in different ways, we must cope with the 9900fact that completely different naming conventions are used by different 9901groups of people. The following programs show what is required for one 9902particular operating system; similar routines for other systems are not 9903difficult to devise. 9904@^fingers@> 9905@^system dependencies@> 9906 9907\TeX\ assumes that a file name has three parts: the name proper; its 9908``extension''; and a ``file area'' where it is found in an external file 9909system. The extension of an input file or a write file is assumed to be 9910`\.{.tex}' unless otherwise specified; it is `\.{.log}' on the 9911transcript file that records each run of \TeX; it is `\.{.tfm}' on the font 9912metric files that describe characters in the fonts \TeX\ uses; it is 9913`\.{.dvi}' on the output files that specify typesetting information; and it 9914is `\.{.fmt}' on the format files written by \.{INITEX} to initialize \TeX. 9915The file area can be arbitrary on input files, but files are usually 9916output to the user's current area. If an input file cannot be 9917found on the specified area, \TeX\ will look for it on a special system 9918area; this special area is intended for commonly used input files like 9919\.{webmac.tex}. 9920 9921Simple uses of \TeX\ refer only to file names that have no explicit 9922extension or area. For example, a person usually says `\.{\\input} \.{paper}' 9923or `\.{\\font\\tenrm} \.= \.{helvetica}' instead of `\.{\\input} 9924\.{paper.new}' or `\.{\\font\\tenrm} \.= \.{<csd.knuth>test}'. Simple file 9925names are best, because they make the \TeX\ source files portable; 9926whenever a file name consists entirely of letters and digits, it should be 9927treated in the same way by all implementations of \TeX. However, users 9928need the ability to refer to other files in their environment, especially 9929when responding to error messages concerning unopenable files; therefore 9930we want to let them use the syntax that appears in their favorite 9931operating system. 9932 9933The following procedures don't allow spaces to be part of 9934file names; but some users seem to like names that are spaced-out. 9935System-dependent changes to allow such things should probably 9936be made with reluctance, and only when an entire file name that 9937includes spaces is ``quoted'' somehow. 9938 9939@ In order to isolate the system-dependent aspects of file names, the 9940@^system dependencies@> 9941system-independent parts of \TeX\ are expressed in terms 9942of three system-dependent 9943procedures called |begin_name|, |more_name|, and |end_name|. In 9944essence, if the user-specified characters of the file name are $c_1\ldots c_n$, 9945the system-independent driver program does the operations 9946$$|begin_name|;\,|more_name|(c_1);\,\ldots\,;\,|more_name|(c_n); 9947\,|end_name|.$$ 9948These three procedures communicate with each other via global variables. 9949Afterwards the file name will appear in the string pool as three strings 9950called |cur_name|\penalty10000\hskip-.05em, 9951|cur_area|, and |cur_ext|; the latter two are null (i.e., 9952|""|), unless they were explicitly specified by the user. 9953 9954Actually the situation is slightly more complicated, because \TeX\ needs 9955to know when the file name ends. The |more_name| routine is a function 9956(with side effects) that returns |true| on the calls |more_name|$(c_1)$, 9957\dots, |more_name|$(c_{n-1})$. The final call |more_name|$(c_n)$ 9958returns |false|; or, it returns |true| and the token following $c_n$ is 9959something like `\.{\\hbox}' (i.e., not a character). In other words, 9960|more_name| is supposed to return |true| unless it is sure that the 9961file name has been completely scanned; and |end_name| is supposed to be able 9962to finish the assembly of |cur_name|, |cur_area|, and |cur_ext| regardless of 9963whether $|more_name|(c_n)$ returned |true| or |false|. 9964 9965@<Glob...@>= 9966@!cur_name:str_number; {name of file just scanned} 9967@!cur_area:str_number; {file area just scanned, or \.{""}} 9968@!cur_ext:str_number; {file extension just scanned, or \.{""}} 9969 9970@ The file names we shall deal with for illustrative purposes have the 9971following structure: If the name contains `\.>' or `\.:', the file area 9972consists of all characters up to and including the final such character; 9973otherwise the file area is null. If the remaining file name contains 9974`\..', the file extension consists of all such characters from the first 9975remaining `\..' to the end, otherwise the file extension is null. 9976@^system dependencies@> 9977 9978We can scan such file names easily by using two global variables that keep track 9979of the occurrences of area and extension delimiters: 9980 9981@<Glob...@>= 9982@!area_delimiter:pool_pointer; {the most recent `\.>' or `\.:', if any} 9983@!ext_delimiter:pool_pointer; {the relevant `\..', if any} 9984 9985@ Input files that can't be found in the user's area may appear in a standard 9986system area called |TEX_area|. Font metric files whose areas are not given 9987explicitly are assumed to appear in a standard system area called 9988|TEX_font_area|. These system area names will, of course, vary from place 9989to place. 9990@^system dependencies@> 9991 9992@d TEX_area=="TeXinputs:" 9993@.TeXinputs@> 9994@d TEX_font_area=="TeXfonts:" 9995@.TeXfonts@> 9996 9997@ Here now is the first of the system-dependent routines for file name scanning. 9998@^system dependencies@> 9999 10000@p procedure begin_name; 10001begin area_delimiter:=0; ext_delimiter:=0; 10002end; 10003 10004@ And here's the second. The string pool might change as the file name is 10005being scanned, since a new \.{\\csname} might be entered; therefore we keep 10006|area_delimiter| and |ext_delimiter| relative to the beginning of the current 10007string, instead of assigning an absolute address like |pool_ptr| to them. 10008@^system dependencies@> 10009 10010@p function more_name(@!c:ASCII_code):boolean; 10011begin if c=" " then more_name:=false 10012else begin str_room(1); append_char(c); {contribute |c| to the current string} 10013 if (c=">")or(c=":") then 10014 begin area_delimiter:=cur_length; ext_delimiter:=0; 10015 end 10016 else if (c=".")and(ext_delimiter=0) then ext_delimiter:=cur_length; 10017 more_name:=true; 10018 end; 10019end; 10020 10021@ The third. 10022@^system dependencies@> 10023 10024@p procedure end_name; 10025begin if str_ptr+3>max_strings then 10026 overflow("number of strings",max_strings-init_str_ptr); 10027@:TeX capacity exceeded number of strings}{\quad number of strings@> 10028if area_delimiter=0 then cur_area:="" 10029else begin cur_area:=str_ptr; 10030 str_start[str_ptr+1]:=str_start[str_ptr]+area_delimiter; incr(str_ptr); 10031 end; 10032if ext_delimiter=0 then 10033 begin cur_ext:=""; cur_name:=make_string; 10034 end 10035else begin cur_name:=str_ptr; 10036 str_start[str_ptr+1]:=str_start[str_ptr]+ext_delimiter-area_delimiter-1; 10037 incr(str_ptr); cur_ext:=make_string; 10038 end; 10039end; 10040 10041@ Conversely, here is a routine that takes three strings and prints a file 10042name that might have produced them. (The routine is system dependent, because 10043some operating systems put the file area last instead of first.) 10044@^system dependencies@> 10045 10046@<Basic printing...@>= 10047procedure print_file_name(@!n,@!a,@!e:integer); 10048begin slow_print(a); slow_print(n); slow_print(e); 10049end; 10050 10051@ Another system-dependent routine is needed to convert three internal 10052\TeX\ strings 10053into the |name_of_file| value that is used to open files. The present code 10054allows both lowercase and uppercase letters in the file name. 10055@^system dependencies@> 10056 10057@d append_to_name(#)==begin c:=#; incr(k); 10058 if k<=file_name_size then name_of_file[k]:=xchr[c]; 10059 end 10060 10061@p procedure pack_file_name(@!n,@!a,@!e:str_number); 10062var k:integer; {number of positions filled in |name_of_file|} 10063@!c: ASCII_code; {character being packed} 10064@!j:pool_pointer; {index into |str_pool|} 10065begin k:=0; 10066for j:=str_start[a] to str_start[a+1]-1 do append_to_name(so(str_pool[j])); 10067for j:=str_start[n] to str_start[n+1]-1 do append_to_name(so(str_pool[j])); 10068for j:=str_start[e] to str_start[e+1]-1 do append_to_name(so(str_pool[j])); 10069if k<=file_name_size then name_length:=k@+else name_length:=file_name_size; 10070for k:=name_length+1 to file_name_size do name_of_file[k]:=' '; 10071end; 10072 10073@ A messier routine is also needed, since format file names must be scanned 10074before \TeX's string mechanism has been initialized. We shall use the 10075global variable |TEX_format_default| to supply the text for default system areas 10076and extensions related to format files. 10077@^system dependencies@> 10078 10079@d format_default_length=20 {length of the |TEX_format_default| string} 10080@d format_area_length=11 {length of its area part} 10081@d format_ext_length=4 {length of its `\.{.fmt}' part} 10082@d format_extension=".fmt" {the extension, as a \.{WEB} constant} 10083 10084@<Glob...@>= 10085@!TEX_format_default:packed array[1..format_default_length] of char; 10086 10087@ @<Set init...@>= 10088TEX_format_default:='TeXformats:plain.fmt'; 10089@.TeXformats@> 10090@.plain@> 10091@^system dependencies@> 10092 10093@ @<Check the ``constant'' values for consistency@>= 10094if format_default_length>file_name_size then bad:=31; 10095 10096@ Here is the messy routine that was just mentioned. It sets |name_of_file| 10097from the first |n| characters of |TEX_format_default|, followed by 10098|buffer[a..b]|, followed by the last |format_ext_length| characters of 10099|TEX_format_default|. 10100 10101We dare not give error messages here, since \TeX\ calls this routine before 10102the |error| routine is ready to roll. Instead, we simply drop excess characters, 10103since the error will be detected in another way when a strange file name 10104isn't found. 10105@^system dependencies@> 10106 10107@p procedure pack_buffered_name(@!n:small_number;@!a,@!b:integer); 10108var k:integer; {number of positions filled in |name_of_file|} 10109@!c: ASCII_code; {character being packed} 10110@!j:integer; {index into |buffer| or |TEX_format_default|} 10111begin if n+b-a+1+format_ext_length>file_name_size then 10112 b:=a+file_name_size-n-1-format_ext_length; 10113k:=0; 10114for j:=1 to n do append_to_name(xord[TEX_format_default[j]]); 10115for j:=a to b do append_to_name(buffer[j]); 10116for j:=format_default_length-format_ext_length+1 to format_default_length do 10117 append_to_name(xord[TEX_format_default[j]]); 10118if k<=file_name_size then name_length:=k@+else name_length:=file_name_size; 10119for k:=name_length+1 to file_name_size do name_of_file[k]:=' '; 10120end; 10121 10122@ Here is the only place we use |pack_buffered_name|. This part of the program 10123becomes active when a ``virgin'' \TeX\ is trying to get going, just after 10124the preliminary initialization, or when the user is substituting another 10125format file by typing `\.\&' after the initial `\.{**}' prompt. The buffer 10126contains the first line of input in |buffer[loc..(last-1)]|, where 10127|loc<last| and |buffer[loc]<>" "|. 10128 10129@<Declare the function called |open_fmt_file|@>= 10130function open_fmt_file:boolean; 10131label found,exit; 10132var j:0..buf_size; {the first space after the format file name} 10133begin j:=loc; 10134if buffer[loc]="&" then 10135 begin incr(loc); j:=loc; buffer[last]:=" "; 10136 while buffer[j]<>" " do incr(j); 10137 pack_buffered_name(0,loc,j-1); {try first without the system file area} 10138 if w_open_in(fmt_file) then goto found; 10139 pack_buffered_name(format_area_length,loc,j-1); 10140 {now try the system format file area} 10141 if w_open_in(fmt_file) then goto found; 10142 wake_up_terminal; 10143 wterm_ln('Sorry, I can''t find that format;',' will try PLAIN.'); 10144@.Sorry, I can't find...@> 10145 update_terminal; 10146 end; 10147 {now pull out all the stops: try for the system \.{plain} file} 10148pack_buffered_name(format_default_length-format_ext_length,1,0); 10149if not w_open_in(fmt_file) then 10150 begin wake_up_terminal; 10151 wterm_ln('I can''t find the PLAIN format file!'); 10152@.I can't find PLAIN...@> 10153@.plain@> 10154 open_fmt_file:=false; return; 10155 end; 10156found:loc:=j; open_fmt_file:=true; 10157exit:end; 10158 10159@ Operating systems often make it possible to determine the exact name (and 10160possible version number) of a file that has been opened. The following routine, 10161which simply makes a \TeX\ string from the value of |name_of_file|, should 10162ideally be changed to deduce the full name of file~|f|, which is the file 10163most recently opened, if it is possible to do this in a \PASCAL\ program. 10164@^system dependencies@> 10165 10166This routine might be called after string memory has overflowed, hence 10167we dare not use `|str_room|'. 10168 10169@p function make_name_string:str_number; 10170var k:1..file_name_size; {index into |name_of_file|} 10171begin if (pool_ptr+name_length>pool_size)or(str_ptr=max_strings)or 10172 (cur_length>0) then 10173 make_name_string:="?" 10174else begin for k:=1 to name_length do append_char(xord[name_of_file[k]]); 10175 make_name_string:=make_string; 10176 end; 10177end; 10178function a_make_name_string(var f:alpha_file):str_number; 10179begin a_make_name_string:=make_name_string; 10180end; 10181function b_make_name_string(var f:byte_file):str_number; 10182begin b_make_name_string:=make_name_string; 10183end; 10184function w_make_name_string(var f:word_file):str_number; 10185begin w_make_name_string:=make_name_string; 10186end; 10187 10188@ Now let's consider the ``driver'' 10189routines by which \TeX\ deals with file names 10190in a system-independent manner. First comes a procedure that looks for a 10191file name in the input by calling |get_x_token| for the information. 10192 10193@p procedure scan_file_name; 10194label done; 10195begin name_in_progress:=true; begin_name; 10196@<Get the next non-blank non-call...@>; 10197loop@+begin if (cur_cmd>other_char)or(cur_chr>255) then {not a character} 10198 begin back_input; goto done; 10199 end; 10200 if not more_name(cur_chr) then goto done; 10201 get_x_token; 10202 end; 10203done: end_name; name_in_progress:=false; 10204end; 10205 10206@ The global variable |name_in_progress| is used to prevent recursive 10207use of |scan_file_name|, since the |begin_name| and other procedures 10208communicate via global variables. Recursion would arise only by 10209devious tricks like `\.{\\input\\input f}'; such attempts at sabotage 10210must be thwarted. Furthermore, |name_in_progress| prevents \.{\\input} 10211@^recursion@> 10212from being initiated when a font size specification is being scanned. 10213 10214Another global variable, |job_name|, contains the file name that was first 10215\.{\\input} by the user. This name is extended by `\.{.log}' and `\.{.dvi}' 10216and `\.{.fmt}' in the names of \TeX's output files. 10217 10218@<Glob...@>= 10219@!name_in_progress:boolean; {is a file name being scanned?} 10220@!job_name:str_number; {principal file name} 10221@!log_opened:boolean; {has the transcript file been opened?} 10222 10223@ Initially |job_name=0|; it becomes nonzero as soon as the true name is known. 10224We have |job_name=0| if and only if the `\.{log}' file has not been opened, 10225except of course for a short time just after |job_name| has become nonzero. 10226 10227@<Initialize the output...@>= 10228job_name:=0; name_in_progress:=false; log_opened:=false; 10229 10230@ Here is a routine that manufactures the output file names, assuming that 10231|job_name<>0|. It ignores and changes the current settings of |cur_area| 10232and |cur_ext|. 10233 10234@d pack_cur_name==pack_file_name(cur_name,cur_area,cur_ext) 10235 10236@p procedure pack_job_name(@!s:str_number); {|s = ".log"|, |".dvi"|, or 10237 |format_extension|} 10238begin cur_area:=""; cur_ext:=s; 10239cur_name:=job_name; pack_cur_name; 10240end; 10241 10242@ If some trouble arises when \TeX\ tries to open a file, the following 10243routine calls upon the user to supply another file name. Parameter~|s| 10244is used in the error message to identify the type of file; parameter~|e| 10245is the default extension if none is given. Upon exit from the routine, 10246variables |cur_name|, |cur_area|, |cur_ext|, and |name_of_file| are 10247ready for another attempt at file opening. 10248 10249@p procedure prompt_file_name(@!s,@!e:str_number); 10250label done; 10251var k:0..buf_size; {index into |buffer|} 10252begin if interaction=scroll_mode then wake_up_terminal; 10253if s="input file name" then print_err("I can't find file `") 10254@.I can't find file x@> 10255else print_err("I can't write on file `"); 10256@.I can't write on file x@> 10257print_file_name(cur_name,cur_area,cur_ext); print("'."); 10258if e=".tex" then show_context; 10259print_nl("Please type another "); print(s); 10260@.Please type...@> 10261if interaction<scroll_mode then 10262 fatal_error("*** (job aborted, file error in nonstop mode)"); 10263@.job aborted, file error...@> 10264clear_terminal; prompt_input(": "); @<Scan file name in the buffer@>; 10265if cur_ext="" then cur_ext:=e; 10266pack_cur_name; 10267end; 10268 10269@ @<Scan file name in the buffer@>= 10270begin begin_name; k:=first; 10271while (buffer[k]=" ")and(k<last) do incr(k); 10272loop@+ begin if k=last then goto done; 10273 if not more_name(buffer[k]) then goto done; 10274 incr(k); 10275 end; 10276done:end_name; 10277end 10278 10279@ Here's an example of how these conventions are used. Whenever it is time to 10280ship out a box of stuff, we shall use the macro |ensure_dvi_open|. 10281 10282@d ensure_dvi_open==if output_file_name=0 then 10283 begin if job_name=0 then open_log_file; 10284 pack_job_name(".dvi"); 10285 while not b_open_out(dvi_file) do 10286 prompt_file_name("file name for output",".dvi"); 10287 output_file_name:=b_make_name_string(dvi_file); 10288 end 10289 10290@<Glob...@>= 10291@!dvi_file: byte_file; {the device-independent output goes here} 10292@!output_file_name: str_number; {full name of the output file} 10293@!log_name:str_number; {full name of the log file} 10294 10295@ @<Initialize the output...@>=output_file_name:=0; 10296 10297@ The |open_log_file| routine is used to open the transcript file and to help 10298it catch up to what has previously been printed on the terminal. 10299 10300@p procedure open_log_file; 10301var old_setting:0..max_selector; {previous |selector| setting} 10302@!k:0..buf_size; {index into |months| and |buffer|} 10303@!l:0..buf_size; {end of first input line} 10304@!months:packed array [1..36] of char; {abbreviations of month names} 10305begin old_setting:=selector; 10306if job_name=0 then job_name:="texput"; 10307@.texput@> 10308pack_job_name(".log"); 10309while not a_open_out(log_file) do @<Try to get a different log file name@>; 10310log_name:=a_make_name_string(log_file); 10311selector:=log_only; log_opened:=true; 10312@<Print the banner line, including the date and time@>; 10313input_stack[input_ptr]:=cur_input; {make sure bottom level is in memory} 10314print_nl("**"); 10315@.**@> 10316l:=input_stack[0].limit_field; {last position of first line} 10317if buffer[l]=end_line_char then decr(l); 10318for k:=1 to l do print(buffer[k]); 10319print_ln; {now the transcript file contains the first line of input} 10320selector:=old_setting+2; {|log_only| or |term_and_log|} 10321end; 10322 10323@ Sometimes |open_log_file| is called at awkward moments when \TeX\ is 10324unable to print error messages or even to |show_context|. 10325The |prompt_file_name| routine can result in a |fatal_error|, but the |error| 10326routine will not be invoked because |log_opened| will be false. 10327 10328The normal idea of |batch_mode| is that nothing at all should be written 10329on the terminal. However, in the unusual case that 10330no log file could be opened, we make an exception and allow 10331an explanatory message to be seen. 10332 10333Incidentally, the program always refers to the log file as a `\.{transcript 10334file}', because some systems cannot use the extension `\.{.log}' for 10335this file. 10336 10337@<Try to get a different log file name@>= 10338begin selector:=term_only; 10339prompt_file_name("transcript file name",".log"); 10340end 10341 10342@ @<Print the banner...@>= 10343begin wlog(banner); 10344slow_print(format_ident); print(" "); 10345print_int(day); print_char(" "); 10346months:='JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC'; 10347for k:=3*month-2 to 3*month do wlog(months[k]); 10348print_char(" "); print_int(year); print_char(" "); 10349print_two(time div 60); print_char(":"); print_two(time mod 60); 10350end 10351 10352@ Let's turn now to the procedure that is used to initiate file reading 10353when an `\.{\\input}' command is being processed. 10354 10355@p procedure start_input; {\TeX\ will \.{\\input} something} 10356label done; 10357begin scan_file_name; {set |cur_name| to desired file name} 10358if cur_ext="" then cur_ext:=".tex"; 10359pack_cur_name; 10360loop@+ begin begin_file_reading; {set up |cur_file| and new level of input} 10361 if a_open_in(cur_file) then goto done; 10362 if cur_area="" then 10363 begin pack_file_name(cur_name,TEX_area,cur_ext); 10364 if a_open_in(cur_file) then goto done; 10365 end; 10366 end_file_reading; {remove the level that didn't work} 10367 prompt_file_name("input file name",".tex"); 10368 end; 10369done: name:=a_make_name_string(cur_file); 10370if job_name=0 then 10371 begin job_name:=cur_name; open_log_file; 10372 end; {|open_log_file| doesn't |show_context|, so |limit| 10373 and |loc| needn't be set to meaningful values yet} 10374if term_offset+length(name)>max_print_line-2 then print_ln 10375else if (term_offset>0)or(file_offset>0) then print_char(" "); 10376print_char("("); incr(open_parens); slow_print(name); update_terminal; 10377state:=new_line; 10378if name=str_ptr-1 then {we can conserve string pool space now} 10379 begin flush_string; name:=cur_name; 10380 end; 10381@<Read the first line of the new file@>; 10382end; 10383 10384@ Here we have to remember to tell the |input_ln| routine not to 10385start with a |get|. If the file is empty, it is considered to 10386contain a single blank line. 10387@^system dependencies@> 10388@^empty line at end of file@> 10389 10390@<Read the first line...@>= 10391begin line:=1; 10392if input_ln(cur_file,false) then do_nothing; 10393firm_up_the_line; 10394if end_line_char_inactive then decr(limit) 10395else buffer[limit]:=end_line_char; 10396first:=limit+1; loc:=start; 10397end 10398 10399@* \[30] Font metric data. 10400\TeX\ gets its knowledge about fonts from font metric files, also called 10401\.{TFM} files; the `\.T' in `\.{TFM}' stands for \TeX, 10402but other programs know about them too. 10403@:TFM files}{\.{TFM} files@> 10404@^font metric files@> 10405 10406The information in a \.{TFM} file appears in a sequence of 8-bit bytes. 10407Since the number of bytes is always a multiple of 4, we could 10408also regard the file as a sequence of 32-bit words, but \TeX\ uses the 10409byte interpretation. The format of \.{TFM} files was designed by 10410Lyle Ramshaw in 1980. The intent is to convey a lot of different kinds 10411@^Ramshaw, Lyle Harold@> 10412of information in a compact but useful form. 10413 10414@<Glob...@>= 10415@!tfm_file:byte_file; 10416 10417@ The first 24 bytes (6 words) of a \.{TFM} file contain twelve 16-bit 10418integers that give the lengths of the various subsequent portions 10419of the file. These twelve integers are, in order: 10420$$\vbox{\halign{\hfil#&$\null=\null$#\hfil\cr 10421|lf|&length of the entire file, in words;\cr 10422|lh|&length of the header data, in words;\cr 10423|bc|&smallest character code in the font;\cr 10424|ec|&largest character code in the font;\cr 10425|nw|&number of words in the width table;\cr 10426|nh|&number of words in the height table;\cr 10427|nd|&number of words in the depth table;\cr 10428|ni|&number of words in the italic correction table;\cr 10429|nl|&number of words in the lig/kern table;\cr 10430|nk|&number of words in the kern table;\cr 10431|ne|&number of words in the extensible character table;\cr 10432|np|&number of font parameter words.\cr}}$$ 10433They are all nonnegative and less than $2^{15}$. We must have |bc-1<=ec<=255|, 10434and 10435$$\hbox{|lf=6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np|.}$$ 10436Note that a font may contain as many as 256 characters (if |bc=0| and |ec=255|), 10437and as few as 0 characters (if |bc=ec+1|). 10438 10439Incidentally, when two or more 8-bit bytes are combined to form an integer of 1044016 or more bits, the most significant bytes appear first in the file. 10441This is called BigEndian order. 10442@!@^BigEndian order@> 10443 10444@ The rest of the \.{TFM} file may be regarded as a sequence of ten data 10445arrays having the informal specification 10446$$\def\arr$[#1]#2${\&{array} $[#1]$ \&{of} #2} 10447\vbox{\halign{\hfil\\{#}&$\,:\,$\arr#\hfil\cr 10448header&|[0..lh-1]@t\\{stuff}@>|\cr 10449char\_info&|[bc..ec]char_info_word|\cr 10450width&|[0..nw-1]fix_word|\cr 10451height&|[0..nh-1]fix_word|\cr 10452depth&|[0..nd-1]fix_word|\cr 10453italic&|[0..ni-1]fix_word|\cr 10454lig\_kern&|[0..nl-1]lig_kern_command|\cr 10455kern&|[0..nk-1]fix_word|\cr 10456exten&|[0..ne-1]extensible_recipe|\cr 10457param&|[1..np]fix_word|\cr}}$$ 10458The most important data type used here is a |@!fix_word|, which is 10459a 32-bit representation of a binary fraction. A |fix_word| is a signed 10460quantity, with the two's complement of the entire word used to represent 10461negation. Of the 32 bits in a |fix_word|, exactly 12 are to the left of the 10462binary point; thus, the largest |fix_word| value is $2048-2^{-20}$, and 10463the smallest is $-2048$. We will see below, however, that all but two of 10464the |fix_word| values must lie between $-16$ and $+16$. 10465 10466@ The first data array is a block of header information, which contains 10467general facts about the font. The header must contain at least two words, 10468|header[0]| and |header[1]|, whose meaning is explained below. 10469Additional header information of use to other software routines might 10470also be included, but \TeX82 does not need to know about such details. 10471For example, 16 more words of header information are in use at the Xerox 10472Palo Alto Research Center; the first ten specify the character coding 10473scheme used (e.g., `\.{XEROX text}' or `\.{TeX math symbols}'), the next five 10474give the font identifier (e.g., `\.{HELVETICA}' or `\.{CMSY}'), and the 10475last gives the ``face byte.'' The program that converts \.{DVI} files 10476to Xerox printing format gets this information by looking at the \.{TFM} 10477file, which it needs to read anyway because of other information that 10478is not explicitly repeated in \.{DVI}~format. 10479 10480\yskip\hang|header[0]| is a 32-bit check sum that \TeX\ will copy into 10481the \.{DVI} output file. Later on when the \.{DVI} file is printed, 10482possibly on another computer, the actual font that gets used is supposed 10483to have a check sum that agrees with the one in the \.{TFM} file used by 10484\TeX. In this way, users will be warned about potential incompatibilities. 10485(However, if the check sum is zero in either the font file or the \.{TFM} 10486file, no check is made.) The actual relation between this check sum and 10487the rest of the \.{TFM} file is not important; the check sum is simply an 10488identification number with the property that incompatible fonts almost 10489always have distinct check sums. 10490@^check sum@> 10491 10492\yskip\hang|header[1]| is a |fix_word| containing the design size of 10493the font, in units of \TeX\ points. This number must be at least 1.0; it is 10494fairly arbitrary, but usually the design size is 10.0 for a ``10 point'' 10495font, i.e., a font that was designed to look best at a 10-point size, 10496whatever that really means. When a \TeX\ user asks for a font 10497`\.{at} $\delta$ \.{pt}', the effect is to override the design size 10498and replace it by $\delta$, and to multiply the $x$ and~$y$ coordinates 10499of the points in the font image by a factor of $\delta$ divided by the 10500design size. {\sl All other dimensions in the\/ \.{TFM} file are 10501|fix_word|\kern-1pt\ numbers in design-size units}, with the exception of 10502|param[1]| (which denotes the slant ratio). Thus, for example, the value 10503of |param[6]|, which defines the \.{em} unit, is often the |fix_word| value 10504$2^{20}=1.0$, since many fonts have a design size equal to one em. 10505The other dimensions must be less than 16 design-size units in absolute 10506value; thus, |header[1]| and |param[1]| are the only |fix_word| 10507entries in the whole \.{TFM} file whose first byte might be something 10508besides 0 or 255. 10509 10510@ Next comes the |char_info| array, which contains one |@!char_info_word| 10511per character. Each word in this part of the file contains six fields 10512packed into four bytes as follows. 10513 10514\yskip\hang first byte: |@!width_index| (8 bits)\par 10515\hang second byte: |@!height_index| (4 bits) times 16, plus |@!depth_index| 10516 (4~bits)\par 10517\hang third byte: |@!italic_index| (6 bits) times 4, plus |@!tag| 10518 (2~bits)\par 10519\hang fourth byte: |@!remainder| (8 bits)\par 10520\yskip\noindent 10521The actual width of a character is \\{width}|[width_index]|, in design-size 10522units; this is a device for compressing information, since many characters 10523have the same width. Since it is quite common for many characters 10524to have the same height, depth, or italic correction, the \.{TFM} format 10525imposes a limit of 16 different heights, 16 different depths, and 1052664 different italic corrections. 10527 10528@!@^italic correction@> 10529The italic correction of a character has two different uses. 10530(a)~In ordinary text, the italic correction is added to the width only if 10531the \TeX\ user specifies `\.{\\/}' after the character. 10532(b)~In math formulas, the italic correction is always added to the width, 10533except with respect to the positioning of subscripts. 10534 10535Incidentally, the relation $\\{width}[0]=\\{height}[0]=\\{depth}[0]= 10536\\{italic}[0]=0$ should always hold, so that an index of zero implies a 10537value of zero. The |width_index| should never be zero unless the 10538character does not exist in the font, since a character is valid if and 10539only if it lies between |bc| and |ec| and has a nonzero |width_index|. 10540 10541@ The |tag| field in a |char_info_word| has four values that explain how to 10542interpret the |remainder| field. 10543 10544\yskip\hangg|tag=0| (|no_tag|) means that |remainder| is unused.\par 10545\hangg|tag=1| (|lig_tag|) means that this character has a ligature/kerning 10546program starting at position |remainder| in the |lig_kern| array.\par 10547\hangg|tag=2| (|list_tag|) means that this character is part of a chain of 10548characters of ascending sizes, and not the largest in the chain. The 10549|remainder| field gives the character code of the next larger character.\par 10550\hangg|tag=3| (|ext_tag|) means that this character code represents an 10551extensible character, i.e., a character that is built up of smaller pieces 10552so that it can be made arbitrarily large. The pieces are specified in 10553|@!exten[remainder]|.\par 10554\yskip\noindent 10555Characters with |tag=2| and |tag=3| are treated as characters with |tag=0| 10556unless they are used in special circumstances in math formulas. For example, 10557the \.{\\sum} operation looks for a |list_tag|, and the \.{\\left} 10558operation looks for both |list_tag| and |ext_tag|. 10559 10560@d no_tag=0 {vanilla character} 10561@d lig_tag=1 {character has a ligature/kerning program} 10562@d list_tag=2 {character has a successor in a charlist} 10563@d ext_tag=3 {character is extensible} 10564 10565@ The |lig_kern| array contains instructions in a simple programming language 10566that explains what to do for special letter pairs. Each word in this array is a 10567|@!lig_kern_command| of four bytes. 10568 10569\yskip\hang first byte: |skip_byte|, indicates that this is the final program 10570 step if the byte is 128 or more, otherwise the next step is obtained by 10571 skipping this number of intervening steps.\par 10572\hang second byte: |next_char|, ``if |next_char| follows the current character, 10573 then perform the operation and stop, otherwise continue.''\par 10574\hang third byte: |op_byte|, indicates a ligature step if less than~128, 10575 a kern step otherwise.\par 10576\hang fourth byte: |remainder|.\par 10577\yskip\noindent 10578In a kern step, an 10579additional space equal to |kern[256*(op_byte-128)+remainder]| is inserted 10580between the current character and |next_char|. This amount is 10581often negative, so that the characters are brought closer together 10582by kerning; but it might be positive. 10583 10584There are eight kinds of ligature steps, having |op_byte| codes $4a+2b+c$ where 10585$0\le a\le b+c$ and $0\le b,c\le1$. The character whose code is 10586|remainder| is inserted between the current character and |next_char|; 10587then the current character is deleted if $b=0$, and |next_char| is 10588deleted if $c=0$; then we pass over $a$~characters to reach the next 10589current character (which may have a ligature/kerning program of its own). 10590 10591If the very first instruction of the |lig_kern| array has |skip_byte=255|, 10592the |next_char| byte is the so-called right boundary character of this font; 10593the value of |next_char| need not lie between |bc| and~|ec|. 10594If the very last instruction of the |lig_kern| array has |skip_byte=255|, 10595there is a special ligature/kerning program for a left boundary character, 10596beginning at location |256*op_byte+remainder|. 10597The interpretation is that \TeX\ puts implicit boundary characters 10598before and after each consecutive string of characters from the same font. 10599These implicit characters do not appear in the output, but they can affect 10600ligatures and kerning. 10601 10602If the very first instruction of a character's |lig_kern| program has 10603|skip_byte>128|, the program actually begins in location 10604|256*op_byte+remainder|. This feature allows access to large |lig_kern| 10605arrays, because the first instruction must otherwise 10606appear in a location |<=255|. 10607 10608Any instruction with |skip_byte>128| in the |lig_kern| array must satisfy 10609the condition 10610$$\hbox{|256*op_byte+remainder<nl|.}$$ 10611If such an instruction is encountered during 10612normal program execution, it denotes an unconditional halt; no ligature 10613or kerning command is performed. 10614 10615@d stop_flag==qi(128) {value indicating `\.{STOP}' in a lig/kern program} 10616@d kern_flag==qi(128) {op code for a kern step} 10617@d skip_byte(#)==#.b0 10618@d next_char(#)==#.b1 10619@d op_byte(#)==#.b2 10620@d rem_byte(#)==#.b3 10621 10622@ Extensible characters are specified by an |@!extensible_recipe|, which 10623consists of four bytes called |@!top|, |@!mid|, |@!bot|, and |@!rep| (in this 10624order). These bytes are the character codes of individual pieces used to 10625build up a large symbol. If |top|, |mid|, or |bot| are zero, they are not 10626present in the built-up result. For example, an extensible vertical line is 10627like an extensible bracket, except that the top and bottom pieces are missing. 10628 10629Let $T$, $M$, $B$, and $R$ denote the respective pieces, or an empty box 10630if the piece isn't present. Then the extensible characters have the form 10631$TR^kMR^kB$ from top to bottom, for some |k>=0|, unless $M$ is absent; 10632in the latter case we can have $TR^kB$ for both even and odd values of~|k|. 10633The width of the extensible character is the width of $R$; and the 10634height-plus-depth is the sum of the individual height-plus-depths of the 10635components used, since the pieces are butted together in a vertical list. 10636 10637@d ext_top(#)==#.b0 {|top| piece in a recipe} 10638@d ext_mid(#)==#.b1 {|mid| piece in a recipe} 10639@d ext_bot(#)==#.b2 {|bot| piece in a recipe} 10640@d ext_rep(#)==#.b3 {|rep| piece in a recipe} 10641 10642@ The final portion of a \.{TFM} file is the |param| array, which is another 10643sequence of |fix_word| values. 10644 10645\yskip\hang|param[1]=slant| is the amount of italic slant, which is used 10646to help position accents. For example, |slant=.25| means that when you go 10647up one unit, you also go .25 units to the right. The |slant| is a pure 10648number; it's the only |fix_word| other than the design size itself that is 10649not scaled by the design size. 10650 10651\hang|param[2]=space| is the normal spacing between words in text. 10652Note that character |" "| in the font need not have anything to do with 10653blank spaces. 10654 10655\hang|param[3]=space_stretch| is the amount of glue stretching between words. 10656 10657\hang|param[4]=space_shrink| is the amount of glue shrinking between words. 10658 10659\hang|param[5]=x_height| is the size of one ex in the font; it is also 10660the height of letters for which accents don't have to be raised or lowered. 10661 10662\hang|param[6]=quad| is the size of one em in the font. 10663 10664\hang|param[7]=extra_space| is the amount added to |param[2]| at the 10665ends of sentences. 10666 10667\yskip\noindent 10668If fewer than seven parameters are present, \TeX\ sets the missing parameters 10669to zero. Fonts used for math symbols are required to have 10670additional parameter information, which is explained later. 10671 10672@d slant_code=1 10673@d space_code=2 10674@d space_stretch_code=3 10675@d space_shrink_code=4 10676@d x_height_code=5 10677@d quad_code=6 10678@d extra_space_code=7 10679 10680@ So that is what \.{TFM} files hold. Since \TeX\ has to absorb such information 10681about lots of fonts, it stores most of the data in a large array called 10682|font_info|. Each item of |font_info| is a |memory_word|; the |fix_word| 10683data gets converted into |scaled| entries, while everything else goes into 10684words of type |four_quarters|. 10685 10686When the user defines \.{\\font\\f}, say, \TeX\ assigns an internal number 10687to the user's font~\.{\\f}. Adding this number to |font_id_base| gives the 10688|eqtb| location of a ``frozen'' control sequence that will always select 10689the font. 10690 10691@<Types...@>= 10692@!internal_font_number=font_base..font_max; {|font| in a |char_node|} 10693@!font_index=0..font_mem_size; {index into |font_info|} 10694 10695@ Here now is the (rather formidable) array of font arrays. 10696 10697@d non_char==qi(256) {a |halfword| code that can't match a real character} 10698@d non_address=0 {a spurious |bchar_label|} 10699 10700@<Glob...@>= 10701@!font_info:array[font_index] of memory_word; 10702 {the big collection of font data} 10703@!fmem_ptr:font_index; {first unused word of |font_info|} 10704@!font_ptr:internal_font_number; {largest internal font number in use} 10705@!font_check:array[internal_font_number] of four_quarters; {check sum} 10706@!font_size:array[internal_font_number] of scaled; {``at'' size} 10707@!font_dsize:array[internal_font_number] of scaled; {``design'' size} 10708@!font_params:array[internal_font_number] of font_index; {how many font 10709 parameters are present} 10710@!font_name:array[internal_font_number] of str_number; {name of the font} 10711@!font_area:array[internal_font_number] of str_number; {area of the font} 10712@!font_bc:array[internal_font_number] of eight_bits; 10713 {beginning (smallest) character code} 10714@!font_ec:array[internal_font_number] of eight_bits; 10715 {ending (largest) character code} 10716@!font_glue:array[internal_font_number] of pointer; 10717 {glue specification for interword space, |null| if not allocated} 10718@!font_used:array[internal_font_number] of boolean; 10719 {has a character from this font actually appeared in the output?} 10720@!hyphen_char:array[internal_font_number] of integer; 10721 {current \.{\\hyphenchar} values} 10722@!skew_char:array[internal_font_number] of integer; 10723 {current \.{\\skewchar} values} 10724@!bchar_label:array[internal_font_number] of font_index; 10725 {start of |lig_kern| program for left boundary character, 10726 |non_address| if there is none} 10727@!font_bchar:array[internal_font_number] of min_quarterword..non_char; 10728 {right boundary character, |non_char| if there is none} 10729@!font_false_bchar:array[internal_font_number] of min_quarterword..non_char; 10730 {|font_bchar| if it doesn't exist in the font, otherwise |non_char|} 10731 10732@ Besides the arrays just enumerated, we have directory arrays that make it 10733easy to get at the individual entries in |font_info|. For example, the 10734|char_info| data for character |c| in font |f| will be in 10735|font_info[char_base[f]+c].qqqq|; and if |w| is the |width_index| 10736part of this word (the |b0| field), the width of the character is 10737|font_info[width_base[f]+w].sc|. (These formulas assume that 10738|min_quarterword| has already been added to |c| and to |w|, since \TeX\ 10739stores its quarterwords that way.) 10740 10741@<Glob...@>= 10742@!char_base:array[internal_font_number] of integer; 10743 {base addresses for |char_info|} 10744@!width_base:array[internal_font_number] of integer; 10745 {base addresses for widths} 10746@!height_base:array[internal_font_number] of integer; 10747 {base addresses for heights} 10748@!depth_base:array[internal_font_number] of integer; 10749 {base addresses for depths} 10750@!italic_base:array[internal_font_number] of integer; 10751 {base addresses for italic corrections} 10752@!lig_kern_base:array[internal_font_number] of integer; 10753 {base addresses for ligature/kerning programs} 10754@!kern_base:array[internal_font_number] of integer; 10755 {base addresses for kerns} 10756@!exten_base:array[internal_font_number] of integer; 10757 {base addresses for extensible recipes} 10758@!param_base:array[internal_font_number] of integer; 10759 {base addresses for font parameters} 10760 10761@ @<Set init...@>= 10762for k:=font_base to font_max do font_used[k]:=false; 10763 10764@ \TeX\ always knows at least one font, namely the null font. It has no 10765characters, and its seven parameters are all equal to zero. 10766 10767@<Initialize table...@>= 10768font_ptr:=null_font; fmem_ptr:=7; 10769font_name[null_font]:="nullfont"; font_area[null_font]:=""; 10770hyphen_char[null_font]:="-"; skew_char[null_font]:=-1; 10771bchar_label[null_font]:=non_address; 10772font_bchar[null_font]:=non_char; font_false_bchar[null_font]:=non_char; 10773font_bc[null_font]:=1; font_ec[null_font]:=0; 10774font_size[null_font]:=0; font_dsize[null_font]:=0; 10775char_base[null_font]:=0; width_base[null_font]:=0; 10776height_base[null_font]:=0; depth_base[null_font]:=0; 10777italic_base[null_font]:=0; lig_kern_base[null_font]:=0; 10778kern_base[null_font]:=0; exten_base[null_font]:=0; 10779font_glue[null_font]:=null; font_params[null_font]:=7; 10780param_base[null_font]:=-1; 10781for k:=0 to 6 do font_info[k].sc:=0; 10782 10783@ @<Put each...@>= 10784primitive("nullfont",set_font,null_font); 10785@!@:null_font_}{\.{\\nullfont} primitive@> 10786text(frozen_null_font):="nullfont"; eqtb[frozen_null_font]:=eqtb[cur_val]; 10787 10788@ Of course we want to define macros that suppress the detail of how font 10789information is actually packed, so that we don't have to write things like 10790$$\hbox{|font_info[width_base[f]+font_info[char_base[f]+c].qqqq.b0].sc|}$$ 10791too often. The \.{WEB} definitions here make |char_info(f)(c)| the 10792|four_quarters| word of font information corresponding to character 10793|c| of font |f|. If |q| is such a word, |char_width(f)(q)| will be 10794the character's width; hence the long formula above is at least 10795abbreviated to 10796$$\hbox{|char_width(f)(char_info(f)(c))|.}$$ 10797Usually, of course, we will fetch |q| first and look at several of its 10798fields at the same time. 10799 10800The italic correction of a character will be denoted by 10801|char_italic(f)(q)|, so it is analogous to |char_width|. But we will get 10802at the height and depth in a slightly different way, since we usually want 10803to compute both height and depth if we want either one. The value of 10804|height_depth(q)| will be the 8-bit quantity 10805$$b=|height_index|\times16+|depth_index|,$$ and if |b| is such a byte we 10806will write |char_height(f)(b)| and |char_depth(f)(b)| for the height and 10807depth of the character |c| for which |q=char_info(f)(c)|. Got that? 10808 10809The tag field will be called |char_tag(q)|; the remainder byte will be 10810called |rem_byte(q)|, using a macro that we have already defined above. 10811 10812Access to a character's |width|, |height|, |depth|, and |tag| fields is 10813part of \TeX's inner loop, so we want these macros to produce code that is 10814as fast as possible under the circumstances. 10815@^inner loop@> 10816 10817@d char_info_end(#)==#].qqqq 10818@d char_info(#)==font_info[char_base[#]+char_info_end 10819@d char_width_end(#)==#.b0].sc 10820@d char_width(#)==font_info[width_base[#]+char_width_end 10821@d char_exists(#)==(#.b0>min_quarterword) 10822@d char_italic_end(#)==(qo(#.b2)) div 4].sc 10823@d char_italic(#)==font_info[italic_base[#]+char_italic_end 10824@d height_depth(#)==qo(#.b1) 10825@d char_height_end(#)==(#) div 16].sc 10826@d char_height(#)==font_info[height_base[#]+char_height_end 10827@d char_depth_end(#)==(#) mod 16].sc 10828@d char_depth(#)==font_info[depth_base[#]+char_depth_end 10829@d char_tag(#)==((qo(#.b2)) mod 4) 10830 10831@ The global variable |null_character| is set up to be a word of 10832|char_info| for a character that doesn't exist. Such a word provides a 10833convenient way to deal with erroneous situations. 10834 10835@<Glob...@>= 10836@!null_character:four_quarters; {nonexistent character information} 10837 10838@ @<Set init...@>= 10839null_character.b0:=min_quarterword; null_character.b1:=min_quarterword; 10840null_character.b2:=min_quarterword; null_character.b3:=min_quarterword; 10841 10842@ Here are some macros that help process ligatures and kerns. 10843We write |char_kern(f)(j)| to find the amount of kerning specified by 10844kerning command~|j| in font~|f|. If |j| is the |char_info| for a character 10845with a ligature/kern program, the first instruction of that program is either 10846|i=font_info[lig_kern_start(f)(j)]| or |font_info[lig_kern_restart(f)(i)]|, 10847depending on whether or not |skip_byte(i)<=stop_flag|. 10848 10849The constant |kern_base_offset| should be simplified, for \PASCAL\ compilers 10850that do not do local optimization. 10851@^system dependencies@> 10852 10853@d char_kern_end(#)==256*op_byte(#)+rem_byte(#)].sc 10854@d char_kern(#)==font_info[kern_base[#]+char_kern_end 10855@d kern_base_offset==256*(128+min_quarterword) 10856@d lig_kern_start(#)==lig_kern_base[#]+rem_byte {beginning of lig/kern program} 10857@d lig_kern_restart_end(#)==256*op_byte(#)+rem_byte(#)+32768-kern_base_offset 10858@d lig_kern_restart(#)==lig_kern_base[#]+lig_kern_restart_end 10859 10860@ Font parameters are referred to as |slant(f)|, |space(f)|, etc. 10861 10862@d param_end(#)==param_base[#]].sc 10863@d param(#)==font_info[#+param_end 10864@d slant==param(slant_code) {slant to the right, per unit distance upward} 10865@d space==param(space_code) {normal space between words} 10866@d space_stretch==param(space_stretch_code) {stretch between words} 10867@d space_shrink==param(space_shrink_code) {shrink between words} 10868@d x_height==param(x_height_code) {one ex} 10869@d quad==param(quad_code) {one em} 10870@d extra_space==param(extra_space_code) {additional space at end of sentence} 10871 10872@<The em width for |cur_font|@>=quad(cur_font) 10873 10874@ @<The x-height for |cur_font|@>=x_height(cur_font) 10875 10876@ \TeX\ checks the information of a \.{TFM} file for validity as the 10877file is being read in, so that no further checks will be needed when 10878typesetting is going on. The somewhat tedious subroutine that does this 10879is called |read_font_info|. It has four parameters: the user font 10880identifier~|u|, the file name and area strings |nom| and |aire|, and the 10881``at'' size~|s|. If |s|~is negative, it's the negative of a scale factor 10882to be applied to the design size; |s=-1000| is the normal case. 10883Otherwise |s| will be substituted for the design size; in this 10884case, |s| must be positive and less than $2048\rm\,pt$ 10885(i.e., it must be less than $2^{27}$ when considered as an integer). 10886 10887The subroutine opens and closes a global file variable called |tfm_file|. 10888It returns the value of the internal font number that was just loaded. 10889If an error is detected, an error message is issued and no font 10890information is stored; |null_font| is returned in this case. 10891 10892@d bad_tfm=11 {label for |read_font_info|} 10893@d abort==goto bad_tfm {do this when the \.{TFM} data is wrong} 10894 10895@p function read_font_info(@!u:pointer;@!nom,@!aire:str_number; 10896 @!s:scaled):internal_font_number; {input a \.{TFM} file} 10897label done,bad_tfm,not_found; 10898var k:font_index; {index into |font_info|} 10899@!file_opened:boolean; {was |tfm_file| successfully opened?} 10900@!lf,@!lh,@!bc,@!ec,@!nw,@!nh,@!nd,@!ni,@!nl,@!nk,@!ne,@!np:halfword; 10901 {sizes of subfiles} 10902@!f:internal_font_number; {the new font's number} 10903@!g:internal_font_number; {the number to return} 10904@!a,@!b,@!c,@!d:eight_bits; {byte variables} 10905@!qw:four_quarters;@!sw:scaled; {accumulators} 10906@!bch_label:integer; {left boundary start location, or infinity} 10907@!bchar:0..256; {right boundary character, or 256} 10908@!z:scaled; {the design size or the ``at'' size} 10909@!alpha:integer;@!beta:1..16; 10910 {auxiliary quantities used in fixed-point multiplication} 10911begin g:=null_font;@/ 10912@<Read and check the font data; |abort| if the \.{TFM} file is 10913 malformed; if there's no room for this font, say so and |goto 10914 done|; otherwise |incr(font_ptr)| and |goto done|@>; 10915bad_tfm: @<Report that the font won't be loaded@>; 10916done: if file_opened then b_close(tfm_file); 10917read_font_info:=g; 10918end; 10919 10920@ There are programs called \.{TFtoPL} and \.{PLtoTF} that convert 10921between the \.{TFM} format and a symbolic property-list format 10922that can be easily edited. These programs contain extensive 10923diagnostic information, so \TeX\ does not have to bother giving 10924precise details about why it rejects a particular \.{TFM} file. 10925@.TFtoPL@> @.PLtoTF@> 10926 10927@d start_font_error_message==print_err("Font "); sprint_cs(u); 10928 print_char("="); print_file_name(nom,aire,""); 10929 if s>=0 then 10930 begin print(" at "); print_scaled(s); print("pt"); 10931 end 10932 else if s<>-1000 then 10933 begin print(" scaled "); print_int(-s); 10934 end 10935 10936@<Report that the font won't be loaded@>= 10937start_font_error_message; 10938@.Font x=xx not loadable...@> 10939if file_opened then print(" not loadable: Bad metric (TFM) file") 10940else print(" not loadable: Metric (TFM) file not found"); 10941help5("I wasn't able to read the size data for this font,")@/ 10942("so I will ignore the font specification.")@/ 10943("[Wizards can fix TFM files using TFtoPL/PLtoTF.]")@/ 10944("You might try inserting a different font spec;")@/ 10945("e.g., type `I\font<same font id>=<substitute font name>'."); 10946error 10947 10948@ @<Read and check...@>= 10949@<Open |tfm_file| for input@>; 10950@<Read the {\.{TFM}} size fields@>; 10951@<Use size fields to allocate font information@>; 10952@<Read the {\.{TFM}} header@>; 10953@<Read character data@>; 10954@<Read box dimensions@>; 10955@<Read ligature/kern program@>; 10956@<Read extensible character recipes@>; 10957@<Read font parameters@>; 10958@<Make final adjustments and |goto done|@> 10959 10960@ @<Open |tfm_file| for input@>= 10961file_opened:=false; 10962if aire="" then pack_file_name(nom,TEX_font_area,".tfm") 10963else pack_file_name(nom,aire,".tfm"); 10964if not b_open_in(tfm_file) then abort; 10965file_opened:=true 10966 10967@ Note: A malformed \.{TFM} file might be shorter than it claims to be; 10968thus |eof(tfm_file)| might be true when |read_font_info| refers to 10969|tfm_file^| or when it says |get(tfm_file)|. If such circumstances 10970cause system error messages, you will have to defeat them somehow, 10971for example by defining |fget| to be `\ignorespaces|begin get(tfm_file);| 10972|if eof(tfm_file) then abort; end|\unskip'. 10973@^system dependencies@> 10974 10975@d fget==get(tfm_file) 10976@d fbyte==tfm_file^ 10977@d read_sixteen(#)==begin #:=fbyte; 10978 if #>127 then abort; 10979 fget; #:=#*@'400+fbyte; 10980 end 10981@d store_four_quarters(#)==begin fget; a:=fbyte; qw.b0:=qi(a); 10982 fget; b:=fbyte; qw.b1:=qi(b); 10983 fget; c:=fbyte; qw.b2:=qi(c); 10984 fget; d:=fbyte; qw.b3:=qi(d); 10985 #:=qw; 10986 end 10987 10988@ @<Read the {\.{TFM}} size fields@>= 10989begin read_sixteen(lf); 10990fget; read_sixteen(lh); 10991fget; read_sixteen(bc); 10992fget; read_sixteen(ec); 10993if (bc>ec+1)or(ec>255) then abort; 10994if bc>255 then {|bc=256| and |ec=255|} 10995 begin bc:=1; ec:=0; 10996 end; 10997fget; read_sixteen(nw); 10998fget; read_sixteen(nh); 10999fget; read_sixteen(nd); 11000fget; read_sixteen(ni); 11001fget; read_sixteen(nl); 11002fget; read_sixteen(nk); 11003fget; read_sixteen(ne); 11004fget; read_sixteen(np); 11005if lf<>6+lh+(ec-bc+1)+nw+nh+nd+ni+nl+nk+ne+np then abort; 11006if (nw=0)or(nh=0)or(nd=0)or(ni=0) then abort; 11007end 11008 11009@ The preliminary settings of the index-offset variables |char_base|, 11010|width_base|, |lig_kern_base|, |kern_base|, and |exten_base| will be 11011corrected later by subtracting |min_quarterword| from them; and we will 11012subtract 1 from |param_base| too. It's best to forget about such anomalies 11013until later. 11014 11015@<Use size fields to allocate font information@>= 11016lf:=lf-6-lh; {|lf| words should be loaded into |font_info|} 11017if np<7 then lf:=lf+7-np; {at least seven parameters will appear} 11018if (font_ptr=font_max)or(fmem_ptr+lf>font_mem_size) then 11019 @<Apologize for not loading the font, |goto done|@>; 11020f:=font_ptr+1; 11021char_base[f]:=fmem_ptr-bc; 11022width_base[f]:=char_base[f]+ec+1; 11023height_base[f]:=width_base[f]+nw; 11024depth_base[f]:=height_base[f]+nh; 11025italic_base[f]:=depth_base[f]+nd; 11026lig_kern_base[f]:=italic_base[f]+ni; 11027kern_base[f]:=lig_kern_base[f]+nl-kern_base_offset; 11028exten_base[f]:=kern_base[f]+kern_base_offset+nk; 11029param_base[f]:=exten_base[f]+ne 11030 11031@ @<Apologize for not loading...@>= 11032begin start_font_error_message; 11033print(" not loaded: Not enough room left"); 11034@.Font x=xx not loaded...@> 11035help4("I'm afraid I won't be able to make use of this font,")@/ 11036("because my memory for character-size data is too small.")@/ 11037("If you're really stuck, ask a wizard to enlarge me.")@/ 11038("Or maybe try `I\font<same font id>=<name of loaded font>'."); 11039error; goto done; 11040end 11041 11042@ Only the first two words of the header are needed by \TeX82. 11043 11044@<Read the {\.{TFM}} header@>= 11045begin if lh<2 then abort; 11046store_four_quarters(font_check[f]); 11047fget; read_sixteen(z); {this rejects a negative design size} 11048fget; z:=z*@'400+fbyte; fget; z:=(z*@'20)+(fbyte div@'20); 11049if z<unity then abort; 11050while lh>2 do 11051 begin fget;fget;fget;fget;decr(lh); {ignore the rest of the header} 11052 end; 11053font_dsize[f]:=z; 11054if s<>-1000 then 11055 if s>=0 then z:=s 11056 else z:=xn_over_d(z,-s,1000); 11057font_size[f]:=z; 11058end 11059 11060@ @<Read character data@>= 11061for k:=fmem_ptr to width_base[f]-1 do 11062 begin store_four_quarters(font_info[k].qqqq); 11063 if (a>=nw)or(b div @'20>=nh)or(b mod @'20>=nd)or 11064 (c div 4>=ni) then abort; 11065 case c mod 4 of 11066 lig_tag: if d>=nl then abort; 11067 ext_tag: if d>=ne then abort; 11068 list_tag: @<Check for charlist cycle@>; 11069 othercases do_nothing {|no_tag|} 11070 endcases; 11071 end 11072 11073@ We want to make sure that there is no cycle of characters linked together 11074by |list_tag| entries, since such a cycle would get \TeX\ into an endless 11075loop. If such a cycle exists, the routine here detects it when processing 11076the largest character code in the cycle. 11077 11078@d check_byte_range(#)==begin if (#<bc)or(#>ec) then abort@+end 11079@d current_character_being_worked_on==k+bc-fmem_ptr 11080 11081@<Check for charlist cycle@>= 11082begin check_byte_range(d); 11083while d<current_character_being_worked_on do 11084 begin qw:=char_info(f)(d); 11085 {N.B.: not |qi(d)|, since |char_base[f]| hasn't been adjusted yet} 11086 if char_tag(qw)<>list_tag then goto not_found; 11087 d:=qo(rem_byte(qw)); {next character on the list} 11088 end; 11089if d=current_character_being_worked_on then abort; {yes, there's a cycle} 11090not_found:end 11091 11092@ A |fix_word| whose four bytes are $(a,b,c,d)$ from left to right represents 11093the number 11094$$x=\left\{\vcenter{\halign{$#$,\hfil\qquad&if $#$\hfil\cr 11095b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=0;\cr 11096-16+b\cdot2^{-4}+c\cdot2^{-12}+d\cdot2^{-20}&a=255.\cr}}\right.$$ 11097(No other choices of |a| are allowed, since the magnitude of a number in 11098design-size units must be less than 16.) We want to multiply this 11099quantity by the integer~|z|, which is known to be less than $2^{27}$. 11100If $|z|<2^{23}$, the individual multiplications $b\cdot z$, 11101$c\cdot z$, $d\cdot z$ cannot overflow; otherwise we will divide |z| by 2, 111024, 8, or 16, to obtain a multiplier less than $2^{23}$, and we can 11103compensate for this later. If |z| has thereby been replaced by 11104$|z|^\prime=|z|/2^e$, let $\beta=2^{4-e}$; we shall compute 11105$$\lfloor(b+c\cdot2^{-8}+d\cdot2^{-16})\,z^\prime/\beta\rfloor$$ 11106if $a=0$, or the same quantity minus $\alpha=2^{4+e}z^\prime$ if $a=255$. 11107This calculation must be done exactly, in order to guarantee portability 11108of \TeX\ between computers. 11109 11110@d store_scaled(#)==begin fget; a:=fbyte; fget; b:=fbyte; 11111 fget; c:=fbyte; fget; d:=fbyte;@/ 11112 sw:=(((((d*z)div@'400)+(c*z))div@'400)+(b*z))div beta; 11113 if a=0 then #:=sw@+else if a=255 then #:=sw-alpha@+else abort; 11114 end 11115 11116@<Read box dimensions@>= 11117begin @<Replace |z| by $|z|^\prime$ and compute $\alpha,\beta$@>; 11118for k:=width_base[f] to lig_kern_base[f]-1 do 11119 store_scaled(font_info[k].sc); 11120if font_info[width_base[f]].sc<>0 then abort; {\\{width}[0] must be zero} 11121if font_info[height_base[f]].sc<>0 then abort; {\\{height}[0] must be zero} 11122if font_info[depth_base[f]].sc<>0 then abort; {\\{depth}[0] must be zero} 11123if font_info[italic_base[f]].sc<>0 then abort; {\\{italic}[0] must be zero} 11124end 11125 11126@ @<Replace |z|...@>= 11127begin alpha:=16; 11128while z>=@'40000000 do 11129 begin z:=z div 2; alpha:=alpha+alpha; 11130 end; 11131beta:=256 div alpha; alpha:=alpha*z; 11132end 11133 11134@ @d check_existence(#)==@t@>@;@/ 11135 begin check_byte_range(#); 11136 qw:=char_info(f)(#); {N.B.: not |qi(#)|} 11137 if not char_exists(qw) then abort; 11138 end 11139 11140@<Read ligature/kern program@>= 11141bch_label:=@'77777; bchar:=256; 11142if nl>0 then 11143 begin for k:=lig_kern_base[f] to kern_base[f]+kern_base_offset-1 do 11144 begin store_four_quarters(font_info[k].qqqq); 11145 if a>128 then 11146 begin if 256*c+d>=nl then abort; 11147 if a=255 then if k=lig_kern_base[f] then bchar:=b; 11148 end 11149 else begin if b<>bchar then check_existence(b); 11150 if c<128 then check_existence(d) {check ligature} 11151 else if 256*(c-128)+d>=nk then abort; {check kern} 11152 if a<128 then if k-lig_kern_base[f]+a+1>=nl then abort; 11153 end; 11154 end; 11155 if a=255 then bch_label:=256*c+d; 11156 end; 11157for k:=kern_base[f]+kern_base_offset to exten_base[f]-1 do 11158 store_scaled(font_info[k].sc); 11159 11160@ @<Read extensible character recipes@>= 11161for k:=exten_base[f] to param_base[f]-1 do 11162 begin store_four_quarters(font_info[k].qqqq); 11163 if a<>0 then check_existence(a); 11164 if b<>0 then check_existence(b); 11165 if c<>0 then check_existence(c); 11166 check_existence(d); 11167 end 11168 11169@ We check to see that the \.{TFM} file doesn't end prematurely; but 11170no error message is given for files having more than |lf| words. 11171 11172@<Read font parameters@>= 11173begin for k:=1 to np do 11174 if k=1 then {the |slant| parameter is a pure number} 11175 begin fget; sw:=fbyte; if sw>127 then sw:=sw-256; 11176 fget; sw:=sw*@'400+fbyte; fget; sw:=sw*@'400+fbyte; 11177 fget; font_info[param_base[f]].sc:= 11178 (sw*@'20)+(fbyte div@'20); 11179 end 11180 else store_scaled(font_info[param_base[f]+k-1].sc); 11181if eof(tfm_file) then abort; 11182for k:=np+1 to 7 do font_info[param_base[f]+k-1].sc:=0; 11183end 11184 11185@ Now to wrap it up, we have checked all the necessary things about the \.{TFM} 11186file, and all we need to do is put the finishing touches on the data for 11187the new font. 11188 11189@d adjust(#)==#[f]:=qo(#[f]) 11190 {correct for the excess |min_quarterword| that was added} 11191 11192@<Make final adjustments...@>= 11193if np>=7 then font_params[f]:=np@+else font_params[f]:=7; 11194hyphen_char[f]:=default_hyphen_char; skew_char[f]:=default_skew_char; 11195if bch_label<nl then bchar_label[f]:=bch_label+lig_kern_base[f] 11196else bchar_label[f]:=non_address; 11197font_bchar[f]:=qi(bchar); 11198font_false_bchar[f]:=qi(bchar); 11199if bchar<=ec then if bchar>=bc then 11200 begin qw:=char_info(f)(bchar); {N.B.: not |qi(bchar)|} 11201 if char_exists(qw) then font_false_bchar[f]:=non_char; 11202 end; 11203font_name[f]:=nom; 11204font_area[f]:=aire; 11205font_bc[f]:=bc; font_ec[f]:=ec; font_glue[f]:=null; 11206adjust(char_base); adjust(width_base); adjust(lig_kern_base); 11207adjust(kern_base); adjust(exten_base); 11208decr(param_base[f]); 11209fmem_ptr:=fmem_ptr+lf; font_ptr:=f; g:=f; goto done 11210 11211@ Before we forget about the format of these tables, let's deal with two 11212of \TeX's basic scanning routines related to font information. 11213 11214@<Declare procedures that scan font-related stuff@>= 11215procedure scan_font_ident; 11216var f:internal_font_number; 11217@!m:halfword; 11218begin @<Get the next non-blank non-call...@>; 11219if cur_cmd=def_font then f:=cur_font 11220else if cur_cmd=set_font then f:=cur_chr 11221else if cur_cmd=def_family then 11222 begin m:=cur_chr; scan_four_bit_int; f:=equiv(m+cur_val); 11223 end 11224else begin print_err("Missing font identifier"); 11225@.Missing font identifier@> 11226 help2("I was looking for a control sequence whose")@/ 11227 ("current meaning has been defined by \font."); 11228 back_error; f:=null_font; 11229 end; 11230cur_val:=f; 11231end; 11232 11233@ The following routine is used to implement `\.{\\fontdimen} |n| |f|'. 11234The boolean parameter |writing| is set |true| if the calling program 11235intends to change the parameter value. 11236 11237@<Declare procedures that scan font-related stuff@>= 11238procedure find_font_dimen(@!writing:boolean); 11239 {sets |cur_val| to |font_info| location} 11240var f:internal_font_number; 11241@!n:integer; {the parameter number} 11242begin scan_int; n:=cur_val; scan_font_ident; f:=cur_val; 11243if n<=0 then cur_val:=fmem_ptr 11244else begin if writing and(n<=space_shrink_code)and@| 11245 (n>=space_code)and(font_glue[f]<>null) then 11246 begin delete_glue_ref(font_glue[f]); 11247 font_glue[f]:=null; 11248 end; 11249 if n>font_params[f] then 11250 if f<font_ptr then cur_val:=fmem_ptr 11251 else @<Increase the number of parameters in the last font@> 11252 else cur_val:=n+param_base[f]; 11253 end; 11254@<Issue an error message if |cur_val=fmem_ptr|@>; 11255end; 11256 11257@ @<Issue an error message if |cur_val=fmem_ptr|@>= 11258if cur_val=fmem_ptr then 11259 begin print_err("Font "); print_esc(font_id_text(f)); 11260 print(" has only "); print_int(font_params[f]); 11261 print(" fontdimen parameters"); 11262@.Font x has only...@> 11263 help2("To increase the number of font parameters, you must")@/ 11264 ("use \fontdimen immediately after the \font is loaded."); 11265 error; 11266 end 11267 11268@ @<Increase the number of parameters...@>= 11269begin repeat if fmem_ptr=font_mem_size then 11270 overflow("font memory",font_mem_size); 11271@:TeX capacity exceeded font memory}{\quad font memory@> 11272font_info[fmem_ptr].sc:=0; incr(fmem_ptr); incr(font_params[f]); 11273until n=font_params[f]; 11274cur_val:=fmem_ptr-1; {this equals |param_base[f]+font_params[f]|} 11275end 11276 11277@ When \TeX\ wants to typeset a character that doesn't exist, the 11278character node is not created; thus the output routine can assume 11279that characters exist when it sees them. The following procedure 11280prints a warning message unless the user has suppressed it. 11281 11282@p procedure char_warning(@!f:internal_font_number;@!c:eight_bits); 11283begin if tracing_lost_chars>0 then 11284 begin begin_diagnostic; 11285 print_nl("Missing character: There is no "); 11286@.Missing character@> 11287 print_ASCII(c); print(" in font "); 11288 slow_print(font_name[f]); print_char("!"); end_diagnostic(false); 11289 end; 11290end; 11291 11292@ Here is a function that returns a pointer to a character node for a 11293given character in a given font. If that character doesn't exist, 11294|null| is returned instead. 11295 11296@p function new_character(@!f:internal_font_number;@!c:eight_bits):pointer; 11297label exit; 11298var p:pointer; {newly allocated node} 11299begin if font_bc[f]<=c then if font_ec[f]>=c then 11300 if char_exists(char_info(f)(qi(c))) then 11301 begin p:=get_avail; font(p):=f; character(p):=qi(c); 11302 new_character:=p; return; 11303 end; 11304char_warning(f,c); 11305new_character:=null; 11306exit:end; 11307 11308@* \[31] Device-independent file format. 11309The most important output produced by a run of \TeX\ is the ``device 11310independent'' (\.{DVI}) file that specifies where characters and rules 11311are to appear on printed pages. The form of these files was designed by 11312David R. Fuchs in 1979. Almost any reasonable typesetting device can be 11313@^Fuchs, David Raymond@> 11314@:DVI_files}{\.{DVI} files@> 11315driven by a program that takes \.{DVI} files as input, and dozens of such 11316\.{DVI}-to-whatever programs have been written. Thus, it is possible to 11317print the output of \TeX\ on many different kinds of equipment, using \TeX\ 11318as a device-independent ``front end.'' 11319 11320A \.{DVI} file is a stream of 8-bit bytes, which may be regarded as a 11321series of commands in a machine-like language. The first byte of each command 11322is the operation code, and this code is followed by zero or more bytes 11323that provide parameters to the command. The parameters themselves may consist 11324of several consecutive bytes; for example, the `|set_rule|' command has two 11325parameters, each of which is four bytes long. Parameters are usually 11326regarded as nonnegative integers; but four-byte-long parameters, 11327and shorter parameters that denote distances, can be 11328either positive or negative. Such parameters are given in two's complement 11329notation. For example, a two-byte-long distance parameter has a value between 11330$-2^{15}$ and $2^{15}-1$. As in \.{TFM} files, numbers that occupy 11331more than one byte position appear in BigEndian order. 11332 11333A \.{DVI} file consists of a ``preamble,'' followed by a sequence of one 11334or more ``pages,'' followed by a ``postamble.'' The preamble is simply a 11335|pre| command, with its parameters that define the dimensions used in the 11336file; this must come first. Each ``page'' consists of a |bop| command, 11337followed by any number of other commands that tell where characters are to 11338be placed on a physical page, followed by an |eop| command. The pages 11339appear in the order that \TeX\ generated them. If we ignore |nop| commands 11340and \\{fnt\_def} commands (which are allowed between any two commands in 11341the file), each |eop| command is immediately followed by a |bop| command, 11342or by a |post| command; in the latter case, there are no more pages in the 11343file, and the remaining bytes form the postamble. Further details about 11344the postamble will be explained later. 11345 11346Some parameters in \.{DVI} commands are ``pointers.'' These are four-byte 11347quantities that give the location number of some other byte in the file; 11348the first byte is number~0, then comes number~1, and so on. For example, 11349one of the parameters of a |bop| command points to the previous |bop|; 11350this makes it feasible to read the pages in backwards order, in case the 11351results are being directed to a device that stacks its output face up. 11352Suppose the preamble of a \.{DVI} file occupies bytes 0 to 99. Now if the 11353first page occupies bytes 100 to 999, say, and if the second 11354page occupies bytes 1000 to 1999, then the |bop| that starts in byte 1000 11355points to 100 and the |bop| that starts in byte 2000 points to 1000. (The 11356very first |bop|, i.e., the one starting in byte 100, has a pointer of~$-1$.) 11357 11358@ The \.{DVI} format is intended to be both compact and easily interpreted 11359by a machine. Compactness is achieved by making most of the information 11360implicit instead of explicit. When a \.{DVI}-reading program reads the 11361commands for a page, it keeps track of several quantities: (a)~The current 11362font |f| is an integer; this value is changed only 11363by \\{fnt} and \\{fnt\_num} commands. (b)~The current position on the page 11364is given by two numbers called the horizontal and vertical coordinates, 11365|h| and |v|. Both coordinates are zero at the upper left corner of the page; 11366moving to the right corresponds to increasing the horizontal coordinate, and 11367moving down corresponds to increasing the vertical coordinate. Thus, the 11368coordinates are essentially Cartesian, except that vertical directions are 11369flipped; the Cartesian version of |(h,v)| would be |(h,-v)|. (c)~The 11370current spacing amounts are given by four numbers |w|, |x|, |y|, and |z|, 11371where |w| and~|x| are used for horizontal spacing and where |y| and~|z| 11372are used for vertical spacing. (d)~There is a stack containing 11373|(h,v,w,x,y,z)| values; the \.{DVI} commands |push| and |pop| are used to 11374change the current level of operation. Note that the current font~|f| is 11375not pushed and popped; the stack contains only information about 11376positioning. 11377 11378The values of |h|, |v|, |w|, |x|, |y|, and |z| are signed integers having up 11379to 32 bits, including the sign. Since they represent physical distances, 11380there is a small unit of measurement such that increasing |h| by~1 means 11381moving a certain tiny distance to the right. The actual unit of 11382measurement is variable, as explained below; \TeX\ sets things up so that 11383its \.{DVI} output is in sp units, i.e., scaled points, in agreement with 11384all the |scaled| dimensions in \TeX's data structures. 11385 11386@ Here is a list of all the commands that may appear in a \.{DVI} file. Each 11387command is specified by its symbolic name (e.g., |bop|), its opcode byte 11388(e.g., 139), and its parameters (if any). The parameters are followed 11389by a bracketed number telling how many bytes they occupy; for example, 11390`|p[4]|' means that parameter |p| is four bytes long. 11391 11392\yskip\hang|set_char_0| 0. Typeset character number~0 from font~|f| 11393such that the reference point of the character is at |(h,v)|. Then 11394increase |h| by the width of that character. Note that a character may 11395have zero or negative width, so one cannot be sure that |h| will advance 11396after this command; but |h| usually does increase. 11397 11398\yskip\hang\\{set\_char\_1} through \\{set\_char\_127} (opcodes 1 to 127). 11399Do the operations of |set_char_0|; but use the character whose number 11400matches the opcode, instead of character~0. 11401 11402\yskip\hang|set1| 128 |c[1]|. Same as |set_char_0|, except that character 11403number~|c| is typeset. \TeX82 uses this command for characters in the 11404range |128<=c<256|. 11405 11406\yskip\hang|@!set2| 129 |c[2]|. Same as |set1|, except that |c|~is two 11407bytes long, so it is in the range |0<=c<65536|. \TeX82 never uses this 11408command, but it should come in handy for extensions of \TeX\ that deal 11409with oriental languages. 11410@^oriental characters@>@^Chinese characters@>@^Japanese characters@> 11411 11412\yskip\hang|@!set3| 130 |c[3]|. Same as |set1|, except that |c|~is three 11413bytes long, so it can be as large as $2^{24}-1$. Not even the Chinese 11414language has this many characters, but this command might prove useful 11415in some yet unforeseen extension. 11416 11417\yskip\hang|@!set4| 131 |c[4]|. Same as |set1|, except that |c|~is four 11418bytes long. Imagine that. 11419 11420\yskip\hang|set_rule| 132 |a[4]| |b[4]|. Typeset a solid black rectangle 11421of height~|a| and width~|b|, with its bottom left corner at |(h,v)|. Then 11422set |h:=h+b|. If either |a<=0| or |b<=0|, nothing should be typeset. Note 11423that if |b<0|, the value of |h| will decrease even though nothing else happens. 11424See below for details about how to typeset rules so that consistency with 11425\MF\ is guaranteed. 11426 11427\yskip\hang|@!put1| 133 |c[1]|. Typeset character number~|c| from font~|f| 11428such that the reference point of the character is at |(h,v)|. (The `put' 11429commands are exactly like the `set' commands, except that they simply put out a 11430character or a rule without moving the reference point afterwards.) 11431 11432\yskip\hang|@!put2| 134 |c[2]|. Same as |set2|, except that |h| is not changed. 11433 11434\yskip\hang|@!put3| 135 |c[3]|. Same as |set3|, except that |h| is not changed. 11435 11436\yskip\hang|@!put4| 136 |c[4]|. Same as |set4|, except that |h| is not changed. 11437 11438\yskip\hang|put_rule| 137 |a[4]| |b[4]|. Same as |set_rule|, except that 11439|h| is not changed. 11440 11441\yskip\hang|nop| 138. No operation, do nothing. Any number of |nop|'s 11442may occur between \.{DVI} commands, but a |nop| cannot be inserted between 11443a command and its parameters or between two parameters. 11444 11445\yskip\hang|bop| 139 $c_0[4]$ $c_1[4]$ $\ldots$ $c_9[4]$ $p[4]$. Beginning 11446of a page: Set |(h,v,w,x,y,z):=(0,0,0,0,0,0)| and set the stack empty. Set 11447the current font |f| to an undefined value. The ten $c_i$ parameters hold 11448the values of \.{\\count0} $\ldots$ \.{\\count9} in \TeX\ at the time 11449\.{\\shipout} was invoked for this page; they can be used to identify 11450pages, if a user wants to print only part of a \.{DVI} file. The parameter 11451|p| points to the previous |bop| in the file; the first 11452|bop| has $p=-1$. 11453 11454\yskip\hang|eop| 140. End of page: Print what you have read since the 11455previous |bop|. At this point the stack should be empty. (The \.{DVI}-reading 11456programs that drive most output devices will have kept a buffer of the 11457material that appears on the page that has just ended. This material is 11458largely, but not entirely, in order by |v| coordinate and (for fixed |v|) by 11459|h|~coordinate; so it usually needs to be sorted into some order that is 11460appropriate for the device in question.) 11461 11462\yskip\hang|push| 141. Push the current values of |(h,v,w,x,y,z)| onto the 11463top of the stack; do not change any of these values. Note that |f| is 11464not pushed. 11465 11466\yskip\hang|pop| 142. Pop the top six values off of the stack and assign 11467them respectively to |(h,v,w,x,y,z)|. The number of pops should never 11468exceed the number of pushes, since it would be highly embarrassing if the 11469stack were empty at the time of a |pop| command. 11470 11471\yskip\hang|right1| 143 |b[1]|. Set |h:=h+b|, i.e., move right |b| units. 11472The parameter is a signed number in two's complement notation, |-128<=b<128|; 11473if |b<0|, the reference point moves left. 11474 11475\yskip\hang|right2| 144 |b[2]|. Same as |right1|, except that |b| is a 11476two-byte quantity in the range |-32768<=b<32768|. 11477 11478\yskip\hang|right3| 145 |b[3]|. Same as |right1|, except that |b| is a 11479three-byte quantity in the range |@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11480 11481\yskip\hang|right4| 146 |b[4]|. Same as |right1|, except that |b| is a 11482four-byte quantity in the range |@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11483 11484\yskip\hang|w0| 147. Set |h:=h+w|; i.e., move right |w| units. With luck, 11485this parameterless command will usually suffice, because the same kind of motion 11486will occur several times in succession; the following commands explain how 11487|w| gets particular values. 11488 11489\yskip\hang|w1| 148 |b[1]|. Set |w:=b| and |h:=h+b|. The value of |b| is a 11490signed quantity in two's complement notation, |-128<=b<128|. This command 11491changes the current |w|~spacing and moves right by |b|. 11492 11493\yskip\hang|@!w2| 149 |b[2]|. Same as |w1|, but |b| is two bytes long, 11494|-32768<=b<32768|. 11495 11496\yskip\hang|@!w3| 150 |b[3]|. Same as |w1|, but |b| is three bytes long, 11497|@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11498 11499\yskip\hang|@!w4| 151 |b[4]|. Same as |w1|, but |b| is four bytes long, 11500|@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11501 11502\yskip\hang|x0| 152. Set |h:=h+x|; i.e., move right |x| units. The `|x|' 11503commands are like the `|w|' commands except that they involve |x| instead 11504of |w|. 11505 11506\yskip\hang|x1| 153 |b[1]|. Set |x:=b| and |h:=h+b|. The value of |b| is a 11507signed quantity in two's complement notation, |-128<=b<128|. This command 11508changes the current |x|~spacing and moves right by |b|. 11509 11510\yskip\hang|@!x2| 154 |b[2]|. Same as |x1|, but |b| is two bytes long, 11511|-32768<=b<32768|. 11512 11513\yskip\hang|@!x3| 155 |b[3]|. Same as |x1|, but |b| is three bytes long, 11514|@t$-2^{23}$@><=b<@t$2^{23}$@>|. 11515 11516\yskip\hang|@!x4| 156 |b[4]|. Same as |x1|, but |b| is four bytes long, 11517|@t$-2^{31}$@><=b<@t$2^{31}$@>|. 11518 11519\yskip\hang|down1| 157 |a[1]|. Set |v:=v+a|, i.e., move down |a| units. 11520The parameter is a signed number in two's complement notation, |-128<=a<128|; 11521if |a<0|, the reference point moves up. 11522 11523\yskip\hang|@!down2| 158 |a[2]|. Same as |down1|, except that |a| is a 11524two-byte quantity in the range |-32768<=a<32768|. 11525 11526\yskip\hang|@!down3| 159 |a[3]|. Same as |down1|, except that |a| is a 11527three-byte quantity in the range |@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11528 11529\yskip\hang|@!down4| 160 |a[4]|. Same as |down1|, except that |a| is a 11530four-byte quantity in the range |@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11531 11532\yskip\hang|y0| 161. Set |v:=v+y|; i.e., move down |y| units. With luck, 11533this parameterless command will usually suffice, because the same kind of motion 11534will occur several times in succession; the following commands explain how 11535|y| gets particular values. 11536 11537\yskip\hang|y1| 162 |a[1]|. Set |y:=a| and |v:=v+a|. The value of |a| is a 11538signed quantity in two's complement notation, |-128<=a<128|. This command 11539changes the current |y|~spacing and moves down by |a|. 11540 11541\yskip\hang|@!y2| 163 |a[2]|. Same as |y1|, but |a| is two bytes long, 11542|-32768<=a<32768|. 11543 11544\yskip\hang|@!y3| 164 |a[3]|. Same as |y1|, but |a| is three bytes long, 11545|@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11546 11547\yskip\hang|@!y4| 165 |a[4]|. Same as |y1|, but |a| is four bytes long, 11548|@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11549 11550\yskip\hang|z0| 166. Set |v:=v+z|; i.e., move down |z| units. The `|z|' commands 11551are like the `|y|' commands except that they involve |z| instead of |y|. 11552 11553\yskip\hang|z1| 167 |a[1]|. Set |z:=a| and |v:=v+a|. The value of |a| is a 11554signed quantity in two's complement notation, |-128<=a<128|. This command 11555changes the current |z|~spacing and moves down by |a|. 11556 11557\yskip\hang|@!z2| 168 |a[2]|. Same as |z1|, but |a| is two bytes long, 11558|-32768<=a<32768|. 11559 11560\yskip\hang|@!z3| 169 |a[3]|. Same as |z1|, but |a| is three bytes long, 11561|@t$-2^{23}$@><=a<@t$2^{23}$@>|. 11562 11563\yskip\hang|@!z4| 170 |a[4]|. Same as |z1|, but |a| is four bytes long, 11564|@t$-2^{31}$@><=a<@t$2^{31}$@>|. 11565 11566\yskip\hang|fnt_num_0| 171. Set |f:=0|. Font 0 must previously have been 11567defined by a \\{fnt\_def} instruction, as explained below. 11568 11569\yskip\hang\\{fnt\_num\_1} through \\{fnt\_num\_63} (opcodes 172 to 234). Set 11570|f:=1|, \dots, \hbox{|f:=63|}, respectively. 11571 11572\yskip\hang|fnt1| 235 |k[1]|. Set |f:=k|. \TeX82 uses this command for font 11573numbers in the range |64<=k<256|. 11574 11575\yskip\hang|@!fnt2| 236 |k[2]|. Same as |fnt1|, except that |k|~is two 11576bytes long, so it is in the range |0<=k<65536|. \TeX82 never generates this 11577command, but large font numbers may prove useful for specifications of 11578color or texture, or they may be used for special fonts that have fixed 11579numbers in some external coding scheme. 11580 11581\yskip\hang|@!fnt3| 237 |k[3]|. Same as |fnt1|, except that |k|~is three 11582bytes long, so it can be as large as $2^{24}-1$. 11583 11584\yskip\hang|@!fnt4| 238 |k[4]|. Same as |fnt1|, except that |k|~is four 11585bytes long; this is for the really big font numbers (and for the negative ones). 11586 11587\yskip\hang|xxx1| 239 |k[1]| |x[k]|. This command is undefined in 11588general; it functions as a $(k+2)$-byte |nop| unless special \.{DVI}-reading 11589programs are being used. \TeX82 generates |xxx1| when a short enough 11590\.{\\special} appears, setting |k| to the number of bytes being sent. It 11591is recommended that |x| be a string having the form of a keyword followed 11592by possible parameters relevant to that keyword. 11593 11594\yskip\hang|@!xxx2| 240 |k[2]| |x[k]|. Like |xxx1|, but |0<=k<65536|. 11595 11596\yskip\hang|@!xxx3| 241 |k[3]| |x[k]|. Like |xxx1|, but |0<=k<@t$2^{24}$@>|. 11597 11598\yskip\hang|xxx4| 242 |k[4]| |x[k]|. Like |xxx1|, but |k| can be ridiculously 11599large. \TeX82 uses |xxx4| when sending a string of length 256 or more. 11600 11601\yskip\hang|fnt_def1| 243 |k[1]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11602Define font |k|, where |0<=k<256|; font definitions will be explained shortly. 11603 11604\yskip\hang|@!fnt_def2| 244 |k[2]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11605Define font |k|, where |0<=k<65536|. 11606 11607\yskip\hang|@!fnt_def3| 245 |k[3]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11608Define font |k|, where |0<=k<@t$2^{24}$@>|. 11609 11610\yskip\hang|@!fnt_def4| 246 |k[4]| |c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|. 11611Define font |k|, where |@t$-2^{31}$@><=k<@t$2^{31}$@>|. 11612 11613\yskip\hang|pre| 247 |i[1]| |num[4]| |den[4]| |mag[4]| |k[1]| |x[k]|. 11614Beginning of the preamble; this must come at the very beginning of the 11615file. Parameters |i|, |num|, |den|, |mag|, |k|, and |x| are explained below. 11616 11617\yskip\hang|post| 248. Beginning of the postamble, see below. 11618 11619\yskip\hang|post_post| 249. Ending of the postamble, see below. 11620 11621\yskip\noindent Commands 250--255 are undefined at the present time. 11622 11623@ @d set_char_0=0 {typeset character 0 and move right} 11624@d set1=128 {typeset a character and move right} 11625@d set_rule=132 {typeset a rule and move right} 11626@d put_rule=137 {typeset a rule} 11627@d nop=138 {no operation} 11628@d bop=139 {beginning of page} 11629@d eop=140 {ending of page} 11630@d push=141 {save the current positions} 11631@d pop=142 {restore previous positions} 11632@d right1=143 {move right} 11633@d w0=147 {move right by |w|} 11634@d w1=148 {move right and set |w|} 11635@d x0=152 {move right by |x|} 11636@d x1=153 {move right and set |x|} 11637@d down1=157 {move down} 11638@d y0=161 {move down by |y|} 11639@d y1=162 {move down and set |y|} 11640@d z0=166 {move down by |z|} 11641@d z1=167 {move down and set |z|} 11642@d fnt_num_0=171 {set current font to 0} 11643@d fnt1=235 {set current font} 11644@d xxx1=239 {extension to \.{DVI} primitives} 11645@d xxx4=242 {potentially long extension to \.{DVI} primitives} 11646@d fnt_def1=243 {define the meaning of a font number} 11647@d pre=247 {preamble} 11648@d post=248 {postamble beginning} 11649@d post_post=249 {postamble ending} 11650 11651@ The preamble contains basic information about the file as a whole. As 11652stated above, there are six parameters: 11653$$\hbox{|@!i[1]| |@!num[4]| |@!den[4]| |@!mag[4]| |@!k[1]| |@!x[k]|.}$$ 11654The |i| byte identifies \.{DVI} format; currently this byte is always set 11655to~2. (The value |i=3| is currently used for an extended format that 11656allows a mixture of right-to-left and left-to-right typesetting. 11657Some day we will set |i=4|, when \.{DVI} format makes another 11658incompatible change---perhaps in the year 2048.) 11659 11660The next two parameters, |num| and |den|, are positive integers that define 11661the units of measurement; they are the numerator and denominator of a 11662fraction by which all dimensions in the \.{DVI} file could be multiplied 11663in order to get lengths in units of $10^{-7}$ meters. Since $\rm 7227{pt} = 11664254{cm}$, and since \TeX\ works with scaled points where there are $2^{16}$ 11665sp in a point, \TeX\ sets 11666$|num|/|den|=(254\cdot10^5)/(7227\cdot2^{16})=25400000/473628672$. 11667@^sp@> 11668 11669The |mag| parameter is what \TeX\ calls \.{\\mag}, i.e., 1000 times the 11670desired magnification. The actual fraction by which dimensions are 11671multiplied is therefore $|mag|\cdot|num|/1000|den|$. Note that if a \TeX\ 11672source document does not call for any `\.{true}' dimensions, and if you 11673change it only by specifying a different \.{\\mag} setting, the \.{DVI} 11674file that \TeX\ creates will be completely unchanged except for the value 11675of |mag| in the preamble and postamble. (Fancy \.{DVI}-reading programs allow 11676users to override the |mag|~setting when a \.{DVI} file is being printed.) 11677 11678Finally, |k| and |x| allow the \.{DVI} writer to include a comment, which is not 11679interpreted further. The length of comment |x| is |k|, where |0<=k<256|. 11680 11681@d id_byte=2 {identifies the kind of \.{DVI} files described here} 11682 11683@ Font definitions for a given font number |k| contain further parameters 11684$$\hbox{|c[4]| |s[4]| |d[4]| |a[1]| |l[1]| |n[a+l]|.}$$ 11685The four-byte value |c| is the check sum that \TeX\ found in the \.{TFM} 11686file for this font; |c| should match the check sum of the font found by 11687programs that read this \.{DVI} file. 11688@^check sum@> 11689 11690Parameter |s| contains a fixed-point scale factor that is applied to 11691the character widths in font |k|; font dimensions in \.{TFM} files and 11692other font files are relative to this quantity, which is called the 11693``at size'' elsewhere in this documentation. The value of |s| is 11694always positive and less than $2^{27}$. It is given in the same units 11695as the other \.{DVI} dimensions, i.e., in sp when \TeX82 has made the 11696file. Parameter |d| is similar to |s|; it is the ``design size,'' and 11697(like~|s|) it is given in \.{DVI} units. Thus, font |k| is to be used 11698at $|mag|\cdot s/1000d$ times its normal size. 11699 11700The remaining part of a font definition gives the external name of the font, 11701which is an ASCII string of length |a+l|. The number |a| is the length 11702of the ``area'' or directory, and |l| is the length of the font name itself; 11703the standard local system font area is supposed to be used when |a=0|. 11704The |n| field contains the area in its first |a| bytes. 11705 11706Font definitions must appear before the first use of a particular font number. 11707Once font |k| is defined, it must not be defined again; however, we 11708shall see below that font definitions appear in the postamble as well as 11709in the pages, so in this sense each font number is defined exactly twice, 11710if at all. Like |nop| commands, font definitions can 11711appear before the first |bop|, or between an |eop| and a |bop|. 11712 11713@ Sometimes it is desirable to make horizontal or vertical rules line up 11714precisely with certain features in characters of a font. It is possible to 11715guarantee the correct matching between \.{DVI} output and the characters 11716generated by \MF\ by adhering to the following principles: (1)~The \MF\ 11717characters should be positioned so that a bottom edge or left edge that is 11718supposed to line up with the bottom or left edge of a rule appears at the 11719reference point, i.e., in row~0 and column~0 of the \MF\ raster. This 11720ensures that the position of the rule will not be rounded differently when 11721the pixel size is not a perfect multiple of the units of measurement in 11722the \.{DVI} file. (2)~A typeset rule of height $a>0$ and width $b>0$ 11723should be equivalent to a \MF-generated character having black pixels in 11724precisely those raster positions whose \MF\ coordinates satisfy 11725|0<=x<@t$\alpha$@>b| and |0<=y<@t$\alpha$@>a|, where $\alpha$ is the number 11726of pixels per \.{DVI} unit. 11727@:METAFONT}{\MF@> 11728@^alignment of rules with characters@> 11729@^rules aligning with characters@> 11730 11731@ The last page in a \.{DVI} file is followed by `|post|'; this command 11732introduces the postamble, which summarizes important facts that \TeX\ has 11733accumulated about the file, making it possible to print subsets of the data 11734with reasonable efficiency. The postamble has the form 11735$$\vbox{\halign{\hbox{#\hfil}\cr 11736 |post| |p[4]| |num[4]| |den[4]| |mag[4]| |l[4]| |u[4]| |s[2]| |t[2]|\cr 11737 $\langle\,$font definitions$\,\rangle$\cr 11738 |post_post| |q[4]| |i[1]| 223's$[{\G}4]$\cr}}$$ 11739Here |p| is a pointer to the final |bop| in the file. The next three 11740parameters, |num|, |den|, and |mag|, are duplicates of the quantities that 11741appeared in the preamble. 11742 11743Parameters |l| and |u| give respectively the height-plus-depth of the tallest 11744page and the width of the widest page, in the same units as other dimensions 11745of the file. These numbers might be used by a \.{DVI}-reading program to 11746position individual ``pages'' on large sheets of film or paper; however, 11747the standard convention for output on normal size paper is to position each 11748page so that the upper left-hand corner is exactly one inch from the left 11749and the top. Experience has shown that it is unwise to design \.{DVI}-to-printer 11750software that attempts cleverly to center the output; a fixed position of 11751the upper left corner is easiest for users to understand and to work with. 11752Therefore |l| and~|u| are often ignored. 11753 11754Parameter |s| is the maximum stack depth (i.e., the largest excess of 11755|push| commands over |pop| commands) needed to process this file. Then 11756comes |t|, the total number of pages (|bop| commands) present. 11757 11758The postamble continues with font definitions, which are any number of 11759\\{fnt\_def} commands as described above, possibly interspersed with |nop| 11760commands. Each font number that is used in the \.{DVI} file must be defined 11761exactly twice: Once before it is first selected by a \\{fnt} command, and once 11762in the postamble. 11763 11764@ The last part of the postamble, following the |post_post| byte that 11765signifies the end of the font definitions, contains |q|, a pointer to the 11766|post| command that started the postamble. An identification byte, |i|, 11767comes next; this currently equals~2, as in the preamble. 11768 11769The |i| byte is followed by four or more bytes that are all equal to 11770the decimal number 223 (i.e., @'337 in octal). \TeX\ puts out four to seven of 11771these trailing bytes, until the total length of the file is a multiple of 11772four bytes, since this works out best on machines that pack four bytes per 11773word; but any number of 223's is allowed, as long as there are at least four 11774of them. In effect, 223 is a sort of signature that is added at the very end. 11775@^Fuchs, David Raymond@> 11776 11777This curious way to finish off a \.{DVI} file makes it feasible for 11778\.{DVI}-reading programs to find the postamble first, on most computers, 11779even though \TeX\ wants to write the postamble last. Most operating 11780systems permit random access to individual words or bytes of a file, so 11781the \.{DVI} reader can start at the end and skip backwards over the 223's 11782until finding the identification byte. Then it can back up four bytes, read 11783|q|, and move to byte |q| of the file. This byte should, of course, 11784contain the value 248 (|post|); now the postamble can be read, so the 11785\.{DVI} reader can discover all the information needed for typesetting the 11786pages. Note that it is also possible to skip through the \.{DVI} file at 11787reasonably high speed to locate a particular page, if that proves 11788desirable. This saves a lot of time, since \.{DVI} files used in production 11789jobs tend to be large. 11790 11791Unfortunately, however, standard \PASCAL\ does not include the ability to 11792@^system dependencies@> 11793access a random position in a file, or even to determine the length of a file. 11794Almost all systems nowadays provide the necessary capabilities, so \.{DVI} 11795format has been designed to work most efficiently with modern operating systems. 11796But if \.{DVI} files have to be processed under the restrictions of standard 11797\PASCAL, one can simply read them from front to back, since the necessary 11798header information is present in the preamble and in the font definitions. 11799(The |l| and |u| and |s| and |t| parameters, which appear only in the 11800postamble, are ``frills'' that are handy but not absolutely necessary.) 11801 11802@* \[32] Shipping pages out. 11803After considering \TeX's eyes and stomach, we come now to the bowels. 11804@^bowels@> 11805 11806The |ship_out| procedure is given a pointer to a box; its mission is 11807to describe that box in \.{DVI} form, outputting a ``page'' to |dvi_file|. 11808The \.{DVI} coordinates $(h,v)=(0,0)$ should correspond to the upper left 11809corner of the box being shipped. 11810 11811Since boxes can be inside of boxes inside of boxes, the main work of 11812|ship_out| is done by two mutually recursive routines, |hlist_out| 11813and |vlist_out|, which traverse the hlists and vlists inside of horizontal 11814and vertical boxes. 11815 11816As individual pages are being processed, we need to accumulate 11817information about the entire set of pages, since such statistics must be 11818reported in the postamble. The global variables |total_pages|, |max_v|, 11819|max_h|, |max_push|, and |last_bop| are used to record this information. 11820 11821The variable |doing_leaders| is |true| while leaders are being output. 11822The variable |dead_cycles| contains the number of times an output routine 11823has been initiated since the last |ship_out|. 11824 11825A few additional global variables are also defined here for use in 11826|vlist_out| and |hlist_out|. They could have been local variables, but 11827that would waste stack space when boxes are deeply nested, since the 11828values of these variables are not needed during recursive calls. 11829@^recursion@> 11830 11831@<Glob...@>= 11832@!total_pages:integer; {the number of pages that have been shipped out} 11833@!max_v:scaled; {maximum height-plus-depth of pages shipped so far} 11834@!max_h:scaled; {maximum width of pages shipped so far} 11835@!max_push:integer; {deepest nesting of |push| commands encountered so far} 11836@!last_bop:integer; {location of previous |bop| in the \.{DVI} output} 11837@!dead_cycles:integer; {recent outputs that didn't ship anything out} 11838@!doing_leaders:boolean; {are we inside a leader box?} 11839@# 11840@!c,@!f:quarterword; {character and font in current |char_node|} 11841@!rule_ht,@!rule_dp,@!rule_wd:scaled; {size of current rule being output} 11842@!g:pointer; {current glue specification} 11843@!lq,@!lr:integer; {quantities used in calculations for leaders} 11844 11845@ @<Set init...@>= 11846total_pages:=0; max_v:=0; max_h:=0; max_push:=0; last_bop:=-1; 11847doing_leaders:=false; dead_cycles:=0; cur_s:=-1; 11848 11849@ The \.{DVI} bytes are output to a buffer instead of being written directly 11850to the output file. This makes it possible to reduce the overhead of 11851subroutine calls, thereby measurably speeding up the computation, since 11852output of \.{DVI} bytes is part of \TeX's inner loop. And it has another 11853advantage as well, since we can change instructions in the buffer in order to 11854make the output more compact. For example, a `|down2|' command can be 11855changed to a `|y2|', thereby making a subsequent `|y0|' command possible, 11856saving two bytes. 11857 11858The output buffer is divided into two parts of equal size; the bytes found 11859in |dvi_buf[0..half_buf-1]| constitute the first half, and those in 11860|dvi_buf[half_buf..dvi_buf_size-1]| constitute the second. The global 11861variable |dvi_ptr| points to the position that will receive the next 11862output byte. When |dvi_ptr| reaches |dvi_limit|, which is always equal 11863to one of the two values |half_buf| or |dvi_buf_size|, the half buffer that 11864is about to be invaded next is sent to the output and |dvi_limit| is 11865changed to its other value. Thus, there is always at least a half buffer's 11866worth of information present, except at the very beginning of the job. 11867 11868Bytes of the \.{DVI} file are numbered sequentially starting with 0; 11869the next byte to be generated will be number |dvi_offset+dvi_ptr|. 11870A byte is present in the buffer only if its number is |>=dvi_gone|. 11871 11872@<Types...@>= 11873@!dvi_index=0..dvi_buf_size; {an index into the output buffer} 11874 11875@ Some systems may find it more efficient to make |dvi_buf| a |packed| 11876array, since output of four bytes at once may be facilitated. 11877@^system dependencies@> 11878 11879@<Glob...@>= 11880@!dvi_buf:array[dvi_index] of eight_bits; {buffer for \.{DVI} output} 11881@!half_buf:dvi_index; {half of |dvi_buf_size|} 11882@!dvi_limit:dvi_index; {end of the current half buffer} 11883@!dvi_ptr:dvi_index; {the next available buffer address} 11884@!dvi_offset:integer; {|dvi_buf_size| times the number of times the 11885 output buffer has been fully emptied} 11886@!dvi_gone:integer; {the number of bytes already output to |dvi_file|} 11887 11888@ Initially the buffer is all in one piece; we will output half of it only 11889after it first fills up. 11890 11891@<Set init...@>= 11892half_buf:=dvi_buf_size div 2; dvi_limit:=dvi_buf_size; dvi_ptr:=0; 11893dvi_offset:=0; dvi_gone:=0; 11894 11895@ The actual output of |dvi_buf[a..b]| to |dvi_file| is performed by calling 11896|write_dvi(a,b)|. For best results, this procedure should be optimized to 11897run as fast as possible on each particular system, since it is part of 11898\TeX's inner loop. It is safe to assume that |a| and |b+1| will both be 11899multiples of 4 when |write_dvi(a,b)| is called; therefore it is possible on 11900many machines to use efficient methods to pack four bytes per word and to 11901output an array of words with one system call. 11902@^system dependencies@> 11903@^inner loop@> 11904@^defecation@> 11905 11906@p procedure write_dvi(@!a,@!b:dvi_index); 11907var k:dvi_index; 11908begin for k:=a to b do write(dvi_file,dvi_buf[k]); 11909end; 11910 11911@ To put a byte in the buffer without paying the cost of invoking a procedure 11912each time, we use the macro |dvi_out|. 11913 11914@d dvi_out(#)==@+begin dvi_buf[dvi_ptr]:=#; incr(dvi_ptr); 11915 if dvi_ptr=dvi_limit then dvi_swap; 11916 end 11917 11918@p procedure dvi_swap; {outputs half of the buffer} 11919begin if dvi_limit=dvi_buf_size then 11920 begin write_dvi(0,half_buf-1); dvi_limit:=half_buf; 11921 dvi_offset:=dvi_offset+dvi_buf_size; dvi_ptr:=0; 11922 end 11923else begin write_dvi(half_buf,dvi_buf_size-1); dvi_limit:=dvi_buf_size; 11924 end; 11925dvi_gone:=dvi_gone+half_buf; 11926end; 11927 11928@ Here is how we clean out the buffer when \TeX\ is all through; |dvi_ptr| 11929will be a multiple of~4. 11930 11931@<Empty the last bytes out of |dvi_buf|@>= 11932if dvi_limit=half_buf then write_dvi(half_buf,dvi_buf_size-1); 11933if dvi_ptr>0 then write_dvi(0,dvi_ptr-1) 11934 11935@ The |dvi_four| procedure outputs four bytes in two's complement notation, 11936without risking arithmetic overflow. 11937 11938@p procedure dvi_four(@!x:integer); 11939begin if x>=0 then dvi_out(x div @'100000000) 11940else begin x:=x+@'10000000000; 11941 x:=x+@'10000000000; 11942 dvi_out((x div @'100000000) + 128); 11943 end; 11944x:=x mod @'100000000; dvi_out(x div @'200000); 11945x:=x mod @'200000; dvi_out(x div @'400); 11946dvi_out(x mod @'400); 11947end; 11948 11949@ A mild optimization of the output is performed by the |dvi_pop| 11950routine, which issues a |pop| unless it is possible to cancel a 11951`|push| |pop|' pair. The parameter to |dvi_pop| is the byte address 11952following the old |push| that matches the new |pop|. 11953 11954@p procedure dvi_pop(@!l:integer); 11955begin if (l=dvi_offset+dvi_ptr)and(dvi_ptr>0) then decr(dvi_ptr) 11956else dvi_out(pop); 11957end; 11958 11959@ Here's a procedure that outputs a font definition. Since \TeX82 uses at 11960most 256 different fonts per job, |fnt_def1| is always used as the command code. 11961 11962@p procedure dvi_font_def(@!f:internal_font_number); 11963var k:pool_pointer; {index into |str_pool|} 11964begin dvi_out(fnt_def1); 11965dvi_out(f-font_base-1);@/ 11966dvi_out(qo(font_check[f].b0)); 11967dvi_out(qo(font_check[f].b1)); 11968dvi_out(qo(font_check[f].b2)); 11969dvi_out(qo(font_check[f].b3));@/ 11970dvi_four(font_size[f]); 11971dvi_four(font_dsize[f]);@/ 11972dvi_out(length(font_area[f])); 11973dvi_out(length(font_name[f])); 11974@<Output the font name whose internal number is |f|@>; 11975end; 11976 11977@ @<Output the font name whose internal number is |f|@>= 11978for k:=str_start[font_area[f]] to str_start[font_area[f]+1]-1 do 11979 dvi_out(so(str_pool[k])); 11980for k:=str_start[font_name[f]] to str_start[font_name[f]+1]-1 do 11981 dvi_out(so(str_pool[k])) 11982 11983@ Versions of \TeX\ intended for small computers might well choose to omit 11984the ideas in the next few parts of this program, since it is not really 11985necessary to optimize the \.{DVI} code by making use of the |w0|, |x0|, 11986|y0|, and |z0| commands. Furthermore, the algorithm that we are about to 11987describe does not pretend to give an optimum reduction in the length 11988of the \.{DVI} code; after all, speed is more important than compactness. 11989But the method is surprisingly effective, and it takes comparatively little 11990time. 11991 11992We can best understand the basic idea by first considering a simpler problem 11993that has the same essential characteristics. Given a sequence of digits, 11994say $3\,1\,4\,1\,5\,9\,2\,6\,5\,3\,5\,8\,9$, we want to assign subscripts 11995$d$, $y$, or $z$ to each digit so as to maximize the number of ``$y$-hits'' 11996and ``$z$-hits''; a $y$-hit is an instance of two appearances of the same 11997digit with the subscript $y$, where no $y$'s intervene between the two 11998appearances, and a $z$-hit is defined similarly. For example, the sequence 11999above could be decorated with subscripts as follows: 12000$$3_z\,1_y\,4_d\,1_y\,5_y\,9_d\,2_d\,6_d\,5_y\,3_z\,5_y\,8_d\,9_d.$$ 12001There are three $y$-hits ($1_y\ldots1_y$ and $5_y\ldots5_y\ldots5_y$) and 12002one $z$-hit ($3_z\ldots3_z$); there are no $d$-hits, since the two appearances 12003of $9_d$ have $d$'s between them, but we don't count $d$-hits so it doesn't 12004matter how many there are. These subscripts are analogous to the \.{DVI} 12005commands called \\{down}, $y$, and $z$, and the digits are analogous to 12006different amounts of vertical motion; a $y$-hit or $z$-hit corresponds to 12007the opportunity to use the one-byte commands |y0| or |z0| in a \.{DVI} file. 12008 12009\TeX's method of assigning subscripts works like this: Append a new digit, 12010say $\delta$, to the right of the sequence. Now look back through the 12011sequence until one of the following things happens: (a)~You see 12012$\delta_y$ or $\delta_z$, and this was the first time you encountered a 12013$y$ or $z$ subscript, respectively. Then assign $y$ or $z$ to the new 12014$\delta$; you have scored a hit. (b)~You see $\delta_d$, and no $y$ 12015subscripts have been encountered so far during this search. Then change 12016the previous $\delta_d$ to $\delta_y$ (this corresponds to changing a 12017command in the output buffer), and assign $y$ to the new $\delta$; it's 12018another hit. (c)~You see $\delta_d$, and a $y$ subscript has been seen 12019but not a $z$. Change the previous $\delta_d$ to $\delta_z$ and assign 12020$z$ to the new $\delta$. (d)~You encounter both $y$ and $z$ subscripts 12021before encountering a suitable $\delta$, or you scan all the way to the 12022front of the sequence. Assign $d$ to the new $\delta$; this assignment may 12023be changed later. 12024 12025The subscripts $3_z\,1_y\,4_d\ldots\,$ in the example above were, in fact, 12026produced by this procedure, as the reader can verify. (Go ahead and try it.) 12027 12028@ In order to implement such an idea, \TeX\ maintains a stack of pointers 12029to the \\{down}, $y$, and $z$ commands that have been generated for the 12030current page. And there is a similar stack for \\{right}, |w|, and |x| 12031commands. These stacks are called the down stack and right stack, and their 12032top elements are maintained in the variables |down_ptr| and |right_ptr|. 12033 12034Each entry in these stacks contains four fields: The |width| field is 12035the amount of motion down or to the right; the |location| field is the 12036byte number of the \.{DVI} command in question (including the appropriate 12037|dvi_offset|); the |link| field points to the next item below this one 12038on the stack; and the |info| field encodes the options for possible change 12039in the \.{DVI} command. 12040 12041@d movement_node_size=3 {number of words per entry in the down and right stacks} 12042@d location(#)==mem[#+2].int {\.{DVI} byte number for a movement command} 12043 12044@<Glob...@>= 12045@!down_ptr,@!right_ptr:pointer; {heads of the down and right stacks} 12046 12047@ @<Set init...@>= 12048down_ptr:=null; right_ptr:=null; 12049 12050@ Here is a subroutine that produces a \.{DVI} command for some specified 12051downward or rightward motion. It has two parameters: |w| is the amount 12052of motion, and |o| is either |down1| or |right1|. We use the fact that 12053the command codes have convenient arithmetic properties: |y1-down1=w1-right1| 12054and |z1-down1=x1-right1|. 12055 12056@p procedure movement(@!w:scaled;@!o:eight_bits); 12057label exit,found,not_found,2,1; 12058var mstate:small_number; {have we seen a |y| or |z|?} 12059@!p,@!q:pointer; {current and top nodes on the stack} 12060@!k:integer; {index into |dvi_buf|, modulo |dvi_buf_size|} 12061begin q:=get_node(movement_node_size); {new node for the top of the stack} 12062width(q):=w; location(q):=dvi_offset+dvi_ptr; 12063if o=down1 then 12064 begin link(q):=down_ptr; down_ptr:=q; 12065 end 12066else begin link(q):=right_ptr; right_ptr:=q; 12067 end; 12068@<Look at the other stack entries until deciding what sort of \.{DVI} command 12069 to generate; |goto found| if node |p| is a ``hit''@>; 12070@<Generate a |down| or |right| command for |w| and |return|@>; 12071found: @<Generate a |y0| or |z0| command in order to reuse a previous 12072 appearance of~|w|@>; 12073exit:end; 12074 12075@ The |info| fields in the entries of the down stack or the right stack 12076have six possible settings: |y_here| or |z_here| mean that the \.{DVI} 12077command refers to |y| or |z|, respectively (or to |w| or |x|, in the 12078case of horizontal motion); |yz_OK| means that the \.{DVI} command is 12079\\{down} (or \\{right}) but can be changed to either |y| or |z| (or 12080to either |w| or |x|); |y_OK| means that it is \\{down} and can be changed 12081to |y| but not |z|; |z_OK| is similar; and |d_fixed| means it must stay 12082\\{down}. 12083 12084The four settings |yz_OK|, |y_OK|, |z_OK|, |d_fixed| would not need to 12085be distinguished from each other if we were simply solving the 12086digit-subscripting problem mentioned above. But in \TeX's case there is 12087a complication because of the nested structure of |push| and |pop| 12088commands. Suppose we add parentheses to the digit-subscripting problem, 12089redefining hits so that $\delta_y\ldots \delta_y$ is a hit if all $y$'s between 12090the $\delta$'s are enclosed in properly nested parentheses, and if the 12091parenthesis level of the right-hand $\delta_y$ is deeper than or equal to 12092that of the left-hand one. Thus, `(' and `)' correspond to `|push|' 12093and `|pop|'. Now if we want to assign a subscript to the final 1 in the 12094sequence 12095$$2_y\,7_d\,1_d\,(\,8_z\,2_y\,8_z\,)\,1$$ 12096we cannot change the previous $1_d$ to $1_y$, since that would invalidate 12097the $2_y\ldots2_y$ hit. But we can change it to $1_z$, scoring a hit 12098since the intervening $8_z$'s are enclosed in parentheses. 12099 12100The program below removes movement nodes that are introduced after a |push|, 12101before it outputs the corresponding |pop|. 12102 12103@d y_here=1 {|info| when the movement entry points to a |y| command} 12104@d z_here=2 {|info| when the movement entry points to a |z| command} 12105@d yz_OK=3 {|info| corresponding to an unconstrained \\{down} command} 12106@d y_OK=4 {|info| corresponding to a \\{down} that can't become a |z|} 12107@d z_OK=5 {|info| corresponding to a \\{down} that can't become a |y|} 12108@d d_fixed=6 {|info| corresponding to a \\{down} that can't change} 12109 12110@ When the |movement| procedure gets to the label |found|, the value of 12111|info(p)| will be either |y_here| or |z_here|. If it is, say, |y_here|, 12112the procedure generates a |y0| command (or a |w0| command), and marks 12113all |info| fields between |q| and |p| so that |y| is not OK in that range. 12114 12115@<Generate a |y0| or |z0| command...@>= 12116info(q):=info(p); 12117if info(q)=y_here then 12118 begin dvi_out(o+y0-down1); {|y0| or |w0|} 12119 while link(q)<>p do 12120 begin q:=link(q); 12121 case info(q) of 12122 yz_OK: info(q):=z_OK; 12123 y_OK: info(q):=d_fixed; 12124 othercases do_nothing 12125 endcases; 12126 end; 12127 end 12128else begin dvi_out(o+z0-down1); {|z0| or |x0|} 12129 while link(q)<>p do 12130 begin q:=link(q); 12131 case info(q) of 12132 yz_OK: info(q):=y_OK; 12133 z_OK: info(q):=d_fixed; 12134 othercases do_nothing 12135 endcases; 12136 end; 12137 end 12138 12139@ @<Generate a |down| or |right|...@>= 12140info(q):=yz_OK; 12141if abs(w)>=@'40000000 then 12142 begin dvi_out(o+3); {|down4| or |right4|} 12143 dvi_four(w); return; 12144 end; 12145if abs(w)>=@'100000 then 12146 begin dvi_out(o+2); {|down3| or |right3|} 12147 if w<0 then w:=w+@'100000000; 12148 dvi_out(w div @'200000); w:=w mod @'200000; goto 2; 12149 end; 12150if abs(w)>=@'200 then 12151 begin dvi_out(o+1); {|down2| or |right2|} 12152 if w<0 then w:=w+@'200000; 12153 goto 2; 12154 end; 12155dvi_out(o); {|down1| or |right1|} 12156if w<0 then w:=w+@'400; 12157goto 1; 121582: dvi_out(w div @'400); 121591: dvi_out(w mod @'400); return 12160 12161@ As we search through the stack, we are in one of three states, 12162|y_seen|, |z_seen|, or |none_seen|, depending on whether we have 12163encountered |y_here| or |z_here| nodes. These states are encoded as 12164multiples of 6, so that they can be added to the |info| fields for quick 12165decision-making. 12166@^inner loop@> 12167 12168@d none_seen=0 {no |y_here| or |z_here| nodes have been encountered yet} 12169@d y_seen=6 {we have seen |y_here| but not |z_here|} 12170@d z_seen=12 {we have seen |z_here| but not |y_here|} 12171 12172@<Look at the other stack entries until deciding...@>= 12173p:=link(q); mstate:=none_seen; 12174while p<>null do 12175 begin if width(p)=w then @<Consider a node with matching width; 12176 |goto found| if it's a hit@> 12177 else case mstate+info(p) of 12178 none_seen+y_here: mstate:=y_seen; 12179 none_seen+z_here: mstate:=z_seen; 12180 y_seen+z_here,z_seen+y_here: goto not_found; 12181 othercases do_nothing 12182 endcases; 12183 p:=link(p); 12184 end; 12185not_found: 12186 12187@ We might find a valid hit in a |y| or |z| byte that is already gone 12188from the buffer. But we can't change bytes that are gone forever; ``the 12189moving finger writes, $\ldots\,\,$.'' 12190 12191@<Consider a node with matching width...@>= 12192case mstate+info(p) of 12193none_seen+yz_OK,none_seen+y_OK,z_seen+yz_OK,z_seen+y_OK:@t@>@;@/ 12194 if location(p)<dvi_gone then goto not_found 12195 else @<Change buffered instruction to |y| or |w| and |goto found|@>; 12196none_seen+z_OK,y_seen+yz_OK,y_seen+z_OK:@t@>@;@/ 12197 if location(p)<dvi_gone then goto not_found 12198 else @<Change buffered instruction to |z| or |x| and |goto found|@>; 12199none_seen+y_here,none_seen+z_here,y_seen+z_here,z_seen+y_here: goto found; 12200othercases do_nothing 12201endcases 12202 12203@ @<Change buffered instruction to |y| or |w| and |goto found|@>= 12204begin k:=location(p)-dvi_offset; 12205if k<0 then k:=k+dvi_buf_size; 12206dvi_buf[k]:=dvi_buf[k]+y1-down1; 12207info(p):=y_here; goto found; 12208end 12209 12210@ @<Change buffered instruction to |z| or |x| and |goto found|@>= 12211begin k:=location(p)-dvi_offset; 12212if k<0 then k:=k+dvi_buf_size; 12213dvi_buf[k]:=dvi_buf[k]+z1-down1; 12214info(p):=z_here; goto found; 12215end 12216 12217@ In case you are wondering when all the movement nodes are removed from 12218\TeX's memory, the answer is that they are recycled just before 12219|hlist_out| and |vlist_out| finish outputting a box. This restores the 12220down and right stacks to the state they were in before the box was output, 12221except that some |info|'s may have become more restrictive. 12222 12223@p procedure prune_movements(@!l:integer); 12224 {delete movement nodes with |location>=l|} 12225label done,exit; 12226var p:pointer; {node being deleted} 12227begin while down_ptr<>null do 12228 begin if location(down_ptr)<l then goto done; 12229 p:=down_ptr; down_ptr:=link(p); free_node(p,movement_node_size); 12230 end; 12231done: while right_ptr<>null do 12232 begin if location(right_ptr)<l then return; 12233 p:=right_ptr; right_ptr:=link(p); free_node(p,movement_node_size); 12234 end; 12235exit:end; 12236 12237@ The actual distances by which we want to move might be computed as the 12238sum of several separate movements. For example, there might be several 12239glue nodes in succession, or we might want to move right by the width of 12240some box plus some amount of glue. More importantly, the baselineskip 12241distances are computed in terms of glue together with the depth and 12242height of adjacent boxes, and we want the \.{DVI} file to lump these 12243three quantities together into a single motion. 12244 12245Therefore, \TeX\ maintains two pairs of global variables: |dvi_h| and |dvi_v| 12246are the |h| and |v| coordinates corresponding to the commands actually 12247output to the \.{DVI} file, while |cur_h| and |cur_v| are the coordinates 12248corresponding to the current state of the output routines. Coordinate 12249changes will accumulate in |cur_h| and |cur_v| without being reflected 12250in the output, until such a change becomes necessary or desirable; we 12251can call the |movement| procedure whenever we want to make |dvi_h=cur_h| 12252or |dvi_v=cur_v|. 12253 12254The current font reflected in the \.{DVI} output is called |dvi_f|; 12255there is no need for a `\\{cur\_f}' variable. 12256 12257The depth of nesting of |hlist_out| and |vlist_out| is called |cur_s|; 12258this is essentially the depth of |push| commands in the \.{DVI} output. 12259 12260@d synch_h==if cur_h<>dvi_h then 12261 begin movement(cur_h-dvi_h,right1); dvi_h:=cur_h; 12262 end 12263@d synch_v==if cur_v<>dvi_v then 12264 begin movement(cur_v-dvi_v,down1); dvi_v:=cur_v; 12265 end 12266 12267@<Glob...@>= 12268@!dvi_h,@!dvi_v:scaled; {a \.{DVI} reader program thinks we are here} 12269@!cur_h,@!cur_v:scaled; {\TeX\ thinks we are here} 12270@!dvi_f:internal_font_number; {the current font} 12271@!cur_s:integer; {current depth of output box nesting, initially $-1$} 12272 12273@ @<Initialize variables as |ship_out| begins@>= 12274dvi_h:=0; dvi_v:=0; cur_h:=h_offset; dvi_f:=null_font; 12275ensure_dvi_open; 12276if total_pages=0 then 12277 begin dvi_out(pre); dvi_out(id_byte); {output the preamble} 12278@^preamble of \.{DVI} file@> 12279 dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp} 12280 prepare_mag; dvi_four(mag); {magnification factor is frozen} 12281 old_setting:=selector; selector:=new_string; 12282 print(" TeX output "); print_int(year); print_char("."); 12283 print_two(month); print_char("."); print_two(day); 12284 print_char(":"); print_two(time div 60); 12285 print_two(time mod 60); 12286 selector:=old_setting; dvi_out(cur_length); 12287 for s:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[s])); 12288 pool_ptr:=str_start[str_ptr]; {flush the current string} 12289 end 12290 12291@ When |hlist_out| is called, its duty is to output the box represented 12292by the |hlist_node| pointed to by |temp_ptr|. The reference point of that 12293box has coordinates |(cur_h,cur_v)|. 12294 12295Similarly, when |vlist_out| is called, its duty is to output the box represented 12296by the |vlist_node| pointed to by |temp_ptr|. The reference point of that 12297box has coordinates |(cur_h,cur_v)|. 12298@^recursion@> 12299 12300@p procedure@?vlist_out; forward; {|hlist_out| and |vlist_out| are mutually 12301 recursive} 12302 12303@ The recursive procedures |hlist_out| and |vlist_out| each have local variables 12304|save_h| and |save_v| to hold the values of |dvi_h| and |dvi_v| just before 12305entering a new level of recursion. In effect, the values of |save_h| and 12306|save_v| on \TeX's run-time stack correspond to the values of |h| and |v| 12307that a \.{DVI}-reading program will push onto its coordinate stack. 12308 12309@d move_past=13 {go to this label when advancing past glue or a rule} 12310@d fin_rule=14 {go to this label to finish processing a rule} 12311@d next_p=15 {go to this label when finished with node |p|} 12312 12313@p @t\4@>@<Declare procedures needed in |hlist_out|, |vlist_out|@>@t@>@/ 12314procedure hlist_out; {output an |hlist_node| box} 12315label reswitch, move_past, fin_rule, next_p; 12316var base_line: scaled; {the baseline coordinate for this box} 12317@!left_edge: scaled; {the left coordinate for this box} 12318@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to} 12319@!this_box: pointer; {pointer to containing box} 12320@!g_order: glue_ord; {applicable order of infinity for glue} 12321@!g_sign: normal..shrinking; {selects type of glue} 12322@!p:pointer; {current position in the hlist} 12323@!save_loc:integer; {\.{DVI} byte location upon entry} 12324@!leader_box:pointer; {the leader box being replicated} 12325@!leader_wd:scaled; {width of leader box being replicated} 12326@!lx:scaled; {extra space between leader boxes} 12327@!outer_doing_leaders:boolean; {were we doing leaders?} 12328@!edge:scaled; {left edge of sub-box, or right edge of leader space} 12329@!glue_temp:real; {glue value before rounding} 12330@!cur_glue:real; {glue seen so far} 12331@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio} 12332begin cur_g:=0; cur_glue:=float_constant(0); 12333this_box:=temp_ptr; g_order:=glue_order(this_box); 12334g_sign:=glue_sign(this_box); p:=list_ptr(this_box); 12335incr(cur_s); 12336if cur_s>0 then dvi_out(push); 12337if cur_s>max_push then max_push:=cur_s; 12338save_loc:=dvi_offset+dvi_ptr; base_line:=cur_v; left_edge:=cur_h; 12339while p<>null do @<Output node |p| for |hlist_out| and move to the next node, 12340 maintaining the condition |cur_v=base_line|@>; 12341prune_movements(save_loc); 12342if cur_s>0 then dvi_pop(save_loc); 12343decr(cur_s); 12344end; 12345 12346@ We ought to give special care to the efficiency of one part of |hlist_out|, 12347since it belongs to \TeX's inner loop. When a |char_node| is encountered, 12348we save a little time by processing several nodes in succession until 12349reaching a non-|char_node|. The program uses the fact that |set_char_0=0|. 12350@^inner loop@> 12351 12352@<Output node |p| for |hlist_out|...@>= 12353reswitch: if is_char_node(p) then 12354 begin synch_h; synch_v; 12355 repeat f:=font(p); c:=character(p); 12356 if f<>dvi_f then @<Change font |dvi_f| to |f|@>; 12357 if c>=qi(128) then dvi_out(set1); 12358 dvi_out(qo(c));@/ 12359 cur_h:=cur_h+char_width(f)(char_info(f)(c)); 12360 p:=link(p); 12361 until not is_char_node(p); 12362 dvi_h:=cur_h; 12363 end 12364else @<Output the non-|char_node| |p| for |hlist_out| 12365 and move to the next node@> 12366 12367@ @<Change font |dvi_f| to |f|@>= 12368begin if not font_used[f] then 12369 begin dvi_font_def(f); font_used[f]:=true; 12370 end; 12371if f<=64+font_base then dvi_out(f-font_base-1+fnt_num_0) 12372else begin dvi_out(fnt1); dvi_out(f-font_base-1); 12373 end; 12374dvi_f:=f; 12375end 12376 12377@ @<Output the non-|char_node| |p| for |hlist_out|...@>= 12378begin case type(p) of 12379hlist_node,vlist_node:@<Output a box in an hlist@>; 12380rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p); 12381 goto fin_rule; 12382 end; 12383whatsit_node: @<Output the whatsit node |p| in an hlist@>; 12384glue_node: @<Move right or output leaders@>; 12385kern_node,math_node:cur_h:=cur_h+width(p); 12386ligature_node: @<Make node |p| look like a |char_node| and |goto reswitch|@>; 12387othercases do_nothing 12388endcases;@/ 12389goto next_p; 12390fin_rule: @<Output a rule in an hlist@>; 12391move_past: cur_h:=cur_h+rule_wd; 12392next_p:p:=link(p); 12393end 12394 12395@ @<Output a box in an hlist@>= 12396if list_ptr(p)=null then cur_h:=cur_h+width(p) 12397else begin save_h:=dvi_h; save_v:=dvi_v; 12398 cur_v:=base_line+shift_amount(p); {shift the box down} 12399 temp_ptr:=p; edge:=cur_h; 12400 if type(p)=vlist_node then vlist_out@+else hlist_out; 12401 dvi_h:=save_h; dvi_v:=save_v; 12402 cur_h:=edge+width(p); cur_v:=base_line; 12403 end 12404 12405@ @<Output a rule in an hlist@>= 12406if is_running(rule_ht) then rule_ht:=height(this_box); 12407if is_running(rule_dp) then rule_dp:=depth(this_box); 12408rule_ht:=rule_ht+rule_dp; {this is the rule thickness} 12409if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules} 12410 begin synch_h; cur_v:=base_line+rule_dp; synch_v; 12411 dvi_out(set_rule); dvi_four(rule_ht); dvi_four(rule_wd); 12412 cur_v:=base_line; dvi_h:=dvi_h+rule_wd; 12413 end 12414 12415@ @d billion==float_constant(1000000000) 12416@d vet_glue(#)== glue_temp:=#; 12417 if glue_temp>billion then 12418 glue_temp:=billion 12419 else if glue_temp<-billion then 12420 glue_temp:=-billion 12421 12422@<Move right or output leaders@>= 12423begin g:=glue_ptr(p); rule_wd:=width(g)-cur_g; 12424if g_sign<>normal then 12425 begin if g_sign=stretching then 12426 begin if stretch_order(g)=g_order then 12427 begin cur_glue:=cur_glue+stretch(g); 12428 vet_glue(float(glue_set(this_box))*cur_glue); 12429@^real multiplication@> 12430 cur_g:=round(glue_temp); 12431 end; 12432 end 12433 else if shrink_order(g)=g_order then 12434 begin cur_glue:=cur_glue-shrink(g); 12435 vet_glue(float(glue_set(this_box))*cur_glue); 12436 cur_g:=round(glue_temp); 12437 end; 12438 end; 12439rule_wd:=rule_wd+cur_g; 12440if subtype(p)>=a_leaders then 12441 @<Output leaders in an hlist, |goto fin_rule| if a rule 12442 or to |next_p| if done@>; 12443goto move_past; 12444end 12445 12446@ @<Output leaders in an hlist...@>= 12447begin leader_box:=leader_ptr(p); 12448if type(leader_box)=rule_node then 12449 begin rule_ht:=height(leader_box); rule_dp:=depth(leader_box); 12450 goto fin_rule; 12451 end; 12452leader_wd:=width(leader_box); 12453if (leader_wd>0)and(rule_wd>0) then 12454 begin rule_wd:=rule_wd+10; {compensate for floating-point rounding} 12455 edge:=cur_h+rule_wd; lx:=0; 12456 @<Let |cur_h| be the position of the first box, and set |leader_wd+lx| 12457 to the spacing between corresponding parts of boxes@>; 12458 while cur_h+leader_wd<=edge do 12459 @<Output a leader box at |cur_h|, 12460 then advance |cur_h| by |leader_wd+lx|@>; 12461 cur_h:=edge-10; goto next_p; 12462 end; 12463end 12464 12465@ The calculations related to leaders require a bit of care. First, in the 12466case of |a_leaders| (aligned leaders), we want to move |cur_h| to 12467|left_edge| plus the smallest multiple of |leader_wd| for which the result 12468is not less than the current value of |cur_h|; i.e., |cur_h| should become 12469$|left_edge|+|leader_wd|\times\lceil 12470(|cur_h|-|left_edge|)/|leader_wd|\rceil$. The program here should work in 12471all cases even though some implementations of \PASCAL\ give nonstandard 12472results for the |div| operation when |cur_h| is less than |left_edge|. 12473 12474In the case of |c_leaders| (centered leaders), we want to increase |cur_h| 12475by half of the excess space not occupied by the leaders; and in the 12476case of |x_leaders| (expanded leaders) we increase |cur_h| 12477by $1/(q+1)$ of this excess space, where $q$ is the number of times the 12478leader box will be replicated. Slight inaccuracies in the division might 12479accumulate; half of this rounding error is placed at each end of the leaders. 12480 12481@<Let |cur_h| be the position of the first box, ...@>= 12482if subtype(p)=a_leaders then 12483 begin save_h:=cur_h; 12484 cur_h:=left_edge+leader_wd*((cur_h-left_edge)@!div leader_wd); 12485 if cur_h<save_h then cur_h:=cur_h+leader_wd; 12486 end 12487else begin lq:=rule_wd div leader_wd; {the number of box copies} 12488 lr:=rule_wd mod leader_wd; {the remaining space} 12489 if subtype(p)=c_leaders then cur_h:=cur_h+(lr div 2) 12490 else begin lx:=lr div (lq+1); 12491 cur_h:=cur_h+((lr-(lq-1)*lx) div 2); 12492 end; 12493 end 12494 12495@ The `\\{synch}' operations here are intended to decrease the number of 12496bytes needed to specify horizontal and vertical motion in the \.{DVI} output. 12497 12498@<Output a leader box at |cur_h|, ...@>= 12499begin cur_v:=base_line+shift_amount(leader_box); synch_v; save_v:=dvi_v;@/ 12500synch_h; save_h:=dvi_h; temp_ptr:=leader_box; 12501outer_doing_leaders:=doing_leaders; doing_leaders:=true; 12502if type(leader_box)=vlist_node then vlist_out@+else hlist_out; 12503doing_leaders:=outer_doing_leaders; 12504dvi_v:=save_v; dvi_h:=save_h; cur_v:=base_line; 12505cur_h:=save_h+leader_wd+lx; 12506end 12507 12508@ The |vlist_out| routine is similar to |hlist_out|, but a bit simpler. 12509 12510@p procedure vlist_out; {output a |vlist_node| box} 12511label move_past, fin_rule, next_p; 12512var left_edge: scaled; {the left coordinate for this box} 12513@!top_edge: scaled; {the top coordinate for this box} 12514@!save_h,@!save_v: scaled; {what |dvi_h| and |dvi_v| should pop to} 12515@!this_box: pointer; {pointer to containing box} 12516@!g_order: glue_ord; {applicable order of infinity for glue} 12517@!g_sign: normal..shrinking; {selects type of glue} 12518@!p:pointer; {current position in the vlist} 12519@!save_loc:integer; {\.{DVI} byte location upon entry} 12520@!leader_box:pointer; {the leader box being replicated} 12521@!leader_ht:scaled; {height of leader box being replicated} 12522@!lx:scaled; {extra space between leader boxes} 12523@!outer_doing_leaders:boolean; {were we doing leaders?} 12524@!edge:scaled; {bottom boundary of leader space} 12525@!glue_temp:real; {glue value before rounding} 12526@!cur_glue:real; {glue seen so far} 12527@!cur_g:scaled; {rounded equivalent of |cur_glue| times the glue ratio} 12528begin cur_g:=0; cur_glue:=float_constant(0); 12529this_box:=temp_ptr; g_order:=glue_order(this_box); 12530g_sign:=glue_sign(this_box); p:=list_ptr(this_box); 12531incr(cur_s); 12532if cur_s>0 then dvi_out(push); 12533if cur_s>max_push then max_push:=cur_s; 12534save_loc:=dvi_offset+dvi_ptr; left_edge:=cur_h; cur_v:=cur_v-height(this_box); 12535top_edge:=cur_v; 12536while p<>null do @<Output node |p| for |vlist_out| and move to the next node, 12537 maintaining the condition |cur_h=left_edge|@>; 12538prune_movements(save_loc); 12539if cur_s>0 then dvi_pop(save_loc); 12540decr(cur_s); 12541end; 12542 12543@ @<Output node |p| for |vlist_out|...@>= 12544begin if is_char_node(p) then confusion("vlistout") 12545@:this can't happen vlistout}{\quad vlistout@> 12546else @<Output the non-|char_node| |p| for |vlist_out|@>; 12547next_p:p:=link(p); 12548end 12549 12550@ @<Output the non-|char_node| |p| for |vlist_out|@>= 12551begin case type(p) of 12552hlist_node,vlist_node:@<Output a box in a vlist@>; 12553rule_node: begin rule_ht:=height(p); rule_dp:=depth(p); rule_wd:=width(p); 12554 goto fin_rule; 12555 end; 12556whatsit_node: @<Output the whatsit node |p| in a vlist@>; 12557glue_node: @<Move down or output leaders@>; 12558kern_node:cur_v:=cur_v+width(p); 12559othercases do_nothing 12560endcases;@/ 12561goto next_p; 12562fin_rule: @<Output a rule in a vlist, |goto next_p|@>; 12563move_past: cur_v:=cur_v+rule_ht; 12564end 12565 12566@ The |synch_v| here allows the \.{DVI} output to use one-byte commands 12567for adjusting |v| in most cases, since the baselineskip distance will 12568usually be constant. 12569 12570@<Output a box in a vlist@>= 12571if list_ptr(p)=null then cur_v:=cur_v+height(p)+depth(p) 12572else begin cur_v:=cur_v+height(p); synch_v; 12573 save_h:=dvi_h; save_v:=dvi_v; 12574 cur_h:=left_edge+shift_amount(p); {shift the box right} 12575 temp_ptr:=p; 12576 if type(p)=vlist_node then vlist_out@+else hlist_out; 12577 dvi_h:=save_h; dvi_v:=save_v; 12578 cur_v:=save_v+depth(p); cur_h:=left_edge; 12579 end 12580 12581@ @<Output a rule in a vlist...@>= 12582if is_running(rule_wd) then rule_wd:=width(this_box); 12583rule_ht:=rule_ht+rule_dp; {this is the rule thickness} 12584cur_v:=cur_v+rule_ht; 12585if (rule_ht>0)and(rule_wd>0) then {we don't output empty rules} 12586 begin synch_h; synch_v; 12587 dvi_out(put_rule); dvi_four(rule_ht); dvi_four(rule_wd); 12588 end; 12589goto next_p 12590 12591@ @<Move down or output leaders@>= 12592begin g:=glue_ptr(p); rule_ht:=width(g)-cur_g; 12593if g_sign<>normal then 12594 begin if g_sign=stretching then 12595 begin if stretch_order(g)=g_order then 12596 begin cur_glue:=cur_glue+stretch(g); 12597 vet_glue(float(glue_set(this_box))*cur_glue); 12598@^real multiplication@> 12599 cur_g:=round(glue_temp); 12600 end; 12601 end 12602 else if shrink_order(g)=g_order then 12603 begin cur_glue:=cur_glue-shrink(g); 12604 vet_glue(float(glue_set(this_box))*cur_glue); 12605 cur_g:=round(glue_temp); 12606 end; 12607 end; 12608rule_ht:=rule_ht+cur_g; 12609if subtype(p)>=a_leaders then 12610 @<Output leaders in a vlist, |goto fin_rule| if a rule 12611 or to |next_p| if done@>; 12612goto move_past; 12613end 12614 12615@ @<Output leaders in a vlist...@>= 12616begin leader_box:=leader_ptr(p); 12617if type(leader_box)=rule_node then 12618 begin rule_wd:=width(leader_box); rule_dp:=0; 12619 goto fin_rule; 12620 end; 12621leader_ht:=height(leader_box)+depth(leader_box); 12622if (leader_ht>0)and(rule_ht>0) then 12623 begin rule_ht:=rule_ht+10; {compensate for floating-point rounding} 12624 edge:=cur_v+rule_ht; lx:=0; 12625 @<Let |cur_v| be the position of the first box, and set |leader_ht+lx| 12626 to the spacing between corresponding parts of boxes@>; 12627 while cur_v+leader_ht<=edge do 12628 @<Output a leader box at |cur_v|, 12629 then advance |cur_v| by |leader_ht+lx|@>; 12630 cur_v:=edge-10; goto next_p; 12631 end; 12632end 12633 12634@ @<Let |cur_v| be the position of the first box, ...@>= 12635if subtype(p)=a_leaders then 12636 begin save_v:=cur_v; 12637 cur_v:=top_edge+leader_ht*((cur_v-top_edge)@!div leader_ht); 12638 if cur_v<save_v then cur_v:=cur_v+leader_ht; 12639 end 12640else begin lq:=rule_ht div leader_ht; {the number of box copies} 12641 lr:=rule_ht mod leader_ht; {the remaining space} 12642 if subtype(p)=c_leaders then cur_v:=cur_v+(lr div 2) 12643 else begin lx:=lr div (lq+1); 12644 cur_v:=cur_v+((lr-(lq-1)*lx) div 2); 12645 end; 12646 end 12647 12648@ When we reach this part of the program, |cur_v| indicates the top of a 12649leader box, not its baseline. 12650 12651@<Output a leader box at |cur_v|, ...@>= 12652begin cur_h:=left_edge+shift_amount(leader_box); synch_h; save_h:=dvi_h;@/ 12653cur_v:=cur_v+height(leader_box); synch_v; save_v:=dvi_v; 12654temp_ptr:=leader_box; 12655outer_doing_leaders:=doing_leaders; doing_leaders:=true; 12656if type(leader_box)=vlist_node then vlist_out@+else hlist_out; 12657doing_leaders:=outer_doing_leaders; 12658dvi_v:=save_v; dvi_h:=save_h; cur_h:=left_edge; 12659cur_v:=save_v-height(leader_box)+leader_ht+lx; 12660end 12661 12662@ The |hlist_out| and |vlist_out| procedures are now complete, so we are 12663ready for the |ship_out| routine that gets them started in the first place. 12664 12665@p procedure ship_out(@!p:pointer); {output the box |p|} 12666label done; 12667var page_loc:integer; {location of the current |bop|} 12668@!j,@!k:0..9; {indices to first ten count registers} 12669@!s:pool_pointer; {index into |str_pool|} 12670@!old_setting:0..max_selector; {saved |selector| setting} 12671begin if tracing_output>0 then 12672 begin print_nl(""); print_ln; 12673 print("Completed box being shipped out"); 12674@.Completed box...@> 12675 end; 12676if term_offset>max_print_line-9 then print_ln 12677else if (term_offset>0)or(file_offset>0) then print_char(" "); 12678print_char("["); j:=9; 12679while (count(j)=0)and(j>0) do decr(j); 12680for k:=0 to j do 12681 begin print_int(count(k)); 12682 if k<j then print_char("."); 12683 end; 12684update_terminal; 12685if tracing_output>0 then 12686 begin print_char("]"); 12687 begin_diagnostic; show_box(p); end_diagnostic(true); 12688 end; 12689@<Ship box |p| out@>; 12690if tracing_output<=0 then print_char("]"); 12691dead_cycles:=0; 12692update_terminal; {progress report} 12693@<Flush the box from memory, showing statistics if requested@>; 12694end; 12695 12696@ @<Flush the box from memory, showing statistics if requested@>= 12697@!stat if tracing_stats>1 then 12698 begin print_nl("Memory usage before: "); 12699@.Memory usage...@> 12700 print_int(var_used); print_char("&"); 12701 print_int(dyn_used); print_char(";"); 12702 end; 12703tats@/ 12704flush_node_list(p); 12705@!stat if tracing_stats>1 then 12706 begin print(" after: "); 12707 print_int(var_used); print_char("&"); 12708 print_int(dyn_used); print("; still untouched: "); 12709 print_int(hi_mem_min-lo_mem_max-1); print_ln; 12710 end; 12711tats 12712 12713@ @<Ship box |p| out@>= 12714@<Update the values of |max_h| and |max_v|; but if the page is too large, 12715 |goto done|@>; 12716@<Initialize variables as |ship_out| begins@>; 12717page_loc:=dvi_offset+dvi_ptr; 12718dvi_out(bop); 12719for k:=0 to 9 do dvi_four(count(k)); 12720dvi_four(last_bop); last_bop:=page_loc; 12721cur_v:=height(p)+v_offset; temp_ptr:=p; 12722if type(p)=vlist_node then vlist_out@+else hlist_out; 12723dvi_out(eop); incr(total_pages); cur_s:=-1; 12724done: 12725 12726@ Sometimes the user will generate a huge page because other error messages 12727are being ignored. Such pages are not output to the \.{dvi} file, since they 12728may confuse the printing software. 12729 12730@<Update the values of |max_h| and |max_v|; but if the page is too large...@>= 12731if (height(p)>max_dimen)or@|(depth(p)>max_dimen)or@| 12732 (height(p)+depth(p)+v_offset>max_dimen)or@| 12733 (width(p)+h_offset>max_dimen) then 12734 begin print_err("Huge page cannot be shipped out"); 12735@.Huge page...@> 12736 help2("The page just created is more than 18 feet tall or")@/ 12737 ("more than 18 feet wide, so I suspect something went wrong."); 12738 error; 12739 if tracing_output<=0 then 12740 begin begin_diagnostic; 12741 print_nl("The following box has been deleted:"); 12742@.The following...deleted@> 12743 show_box(p); 12744 end_diagnostic(true); 12745 end; 12746 goto done; 12747 end; 12748if height(p)+depth(p)+v_offset>max_v then max_v:=height(p)+depth(p)+v_offset; 12749if width(p)+h_offset>max_h then max_h:=width(p)+h_offset 12750 12751@ At the end of the program, we must finish things off by writing the 12752post\-amble. If |total_pages=0|, the \.{DVI} file was never opened. 12753If |total_pages>=65536|, the \.{DVI} file will lie. And if 12754|max_push>=65536|, the user deserves whatever chaos might ensue. 12755 12756An integer variable |k| will be declared for use by this routine. 12757 12758@<Finish the \.{DVI} file@>= 12759while cur_s>-1 do 12760 begin if cur_s>0 then dvi_out(pop) 12761 else begin dvi_out(eop); incr(total_pages); 12762 end; 12763 decr(cur_s); 12764 end; 12765if total_pages=0 then print_nl("No pages of output.") 12766@.No pages of output@> 12767else begin dvi_out(post); {beginning of the postamble} 12768 dvi_four(last_bop); last_bop:=dvi_offset+dvi_ptr-5; {|post| location} 12769 dvi_four(25400000); dvi_four(473628672); {conversion ratio for sp} 12770 prepare_mag; dvi_four(mag); {magnification factor} 12771 dvi_four(max_v); dvi_four(max_h);@/ 12772 dvi_out(max_push div 256); dvi_out(max_push mod 256);@/ 12773 dvi_out((total_pages div 256) mod 256); dvi_out(total_pages mod 256);@/ 12774 @<Output the font definitions for all fonts that were used@>; 12775 dvi_out(post_post); dvi_four(last_bop); dvi_out(id_byte);@/ 12776 k:=4+((dvi_buf_size-dvi_ptr) mod 4); {the number of 223's} 12777 while k>0 do 12778 begin dvi_out(223); decr(k); 12779 end; 12780 @<Empty the last bytes out of |dvi_buf|@>; 12781 print_nl("Output written on "); slow_print(output_file_name); 12782@.Output written on x@> 12783 print(" ("); print_int(total_pages); print(" page"); 12784 if total_pages<>1 then print_char("s"); 12785 print(", "); print_int(dvi_offset+dvi_ptr); print(" bytes)."); 12786 b_close(dvi_file); 12787 end 12788 12789@ @<Output the font definitions...@>= 12790while font_ptr>font_base do 12791 begin if font_used[font_ptr] then dvi_font_def(font_ptr); 12792 decr(font_ptr); 12793 end 12794 12795@* \[33] Packaging. 12796We're essentially done with the parts of \TeX\ that are concerned with 12797the input (|get_next|) and the output (|ship_out|). So it's time to 12798get heavily into the remaining part, which does the real work of typesetting. 12799 12800After lists are constructed, \TeX\ wraps them up and puts them into boxes. 12801Two major subroutines are given the responsibility for this task: |hpack| 12802applies to horizontal lists (hlists) and |vpack| applies to vertical lists 12803(vlists). The main duty of |hpack| and |vpack| is to compute the dimensions 12804of the resulting boxes, and to adjust the glue if one of those dimensions 12805is pre-specified. The computed sizes normally enclose all of the material 12806inside the new box; but some items may stick out if negative glue is used, 12807if the box is overfull, or if a \.{\\vbox} includes other boxes that have 12808been shifted left. 12809 12810The subroutine call |hpack(p,w,m)| returns a pointer to an |hlist_node| 12811for a box containing the hlist that starts at |p|. Parameter |w| specifies 12812a width; and parameter |m| is either `|exactly|' or `|additional|'. Thus, 12813|hpack(p,w,exactly)| produces a box whose width is exactly |w|, while 12814|hpack(p,w,additional)| yields a box whose width is the natural width plus 12815|w|. It is convenient to define a macro called `|natural|' to cover the 12816most common case, so that we can say |hpack(p,natural)| to get a box that 12817has the natural width of list |p|. 12818 12819Similarly, |vpack(p,w,m)| returns a pointer to a |vlist_node| for a 12820box containing the vlist that starts at |p|. In this case |w| represents 12821a height instead of a width; the parameter |m| is interpreted as in |hpack|. 12822 12823@d exactly=0 {a box dimension is pre-specified} 12824@d additional=1 {a box dimension is increased from the natural one} 12825@d natural==0,additional {shorthand for parameters to |hpack| and |vpack|} 12826 12827@ The parameters to |hpack| and |vpack| correspond to \TeX's primitives 12828like `\.{\\hbox} \.{to} \.{300pt}', `\.{\\hbox} \.{spread} \.{10pt}'; note 12829that `\.{\\hbox}' with no dimension following it is equivalent to 12830`\.{\\hbox} \.{spread} \.{0pt}'. The |scan_spec| subroutine scans such 12831constructions in the user's input, including the mandatory left brace that 12832follows them, and it puts the specification onto |save_stack| so that the 12833desired box can later be obtained by executing the following code: 12834$$\vbox{\halign{#\hfil\cr 12835|save_ptr:=save_ptr-2;|\cr 12836|hpack(p,saved(1),saved(0)).|\cr}}$$ 12837Special care is necessary to ensure that the special |save_stack| codes 12838are placed just below the new group code, because scanning can change 12839|save_stack| when \.{\\csname} appears. 12840 12841@p procedure scan_spec(@!c:group_code;@!three_codes:boolean); 12842 {scans a box specification and left brace} 12843label found; 12844var @!s:integer; {temporarily saved value} 12845@!spec_code:exactly..additional; 12846begin if three_codes then s:=saved(0); 12847if scan_keyword("to") then spec_code:=exactly 12848@.to@> 12849else if scan_keyword("spread") then spec_code:=additional 12850@.spread@> 12851else begin spec_code:=additional; cur_val:=0; 12852 goto found; 12853 end; 12854scan_normal_dimen; 12855found: if three_codes then 12856 begin saved(0):=s; incr(save_ptr); 12857 end; 12858saved(0):=spec_code; saved(1):=cur_val; save_ptr:=save_ptr+2; 12859new_save_level(c); scan_left_brace; 12860end; 12861 12862@ To figure out the glue setting, |hpack| and |vpack| determine how much 12863stretchability and shrinkability are present, considering all four orders 12864of infinity. The highest order of infinity that has a nonzero coefficient 12865is then used as if no other orders were present. 12866 12867For example, suppose that the given list contains six glue nodes with 12868the respective stretchabilities 3pt, 8fill, 5fil, 6pt, $-3$fil, $-8$fill. 12869Then the total is essentially 2fil; and if a total additional space of 6pt 12870is to be achieved by stretching, the actual amounts of stretch will be 128710pt, 0pt, 15pt, 0pt, $-9$pt, and 0pt, since only `fil' glue will be 12872considered. (The `fill' glue is therefore not really stretching infinitely 12873with respect to `fil'; nobody would actually want that to happen.) 12874 12875The arrays |total_stretch| and |total_shrink| are used to determine how much 12876glue of each kind is present. A global variable |last_badness| is used 12877to implement \.{\\badness}. 12878 12879@<Glob...@>= 12880@!total_stretch, @!total_shrink: array[glue_ord] of scaled; 12881 {glue found by |hpack| or |vpack|} 12882@!last_badness:integer; {badness of the most recently packaged box} 12883 12884@ If the global variable |adjust_tail| is non-null, the |hpack| routine 12885also removes all occurrences of |ins_node|, |mark_node|, and |adjust_node| 12886items and appends the resulting material onto the list that ends at 12887location |adjust_tail|. 12888 12889@< Glob...@>= 12890@!adjust_tail:pointer; {tail of adjustment list} 12891 12892@ @<Set init...@>=adjust_tail:=null; last_badness:=0; 12893 12894@ Here now is |hpack|, which contains few if any surprises. 12895 12896@p function hpack(@!p:pointer;@!w:scaled;@!m:small_number):pointer; 12897label reswitch, common_ending, exit; 12898var r:pointer; {the box node that will be returned} 12899@!q:pointer; {trails behind |p|} 12900@!h,@!d,@!x:scaled; {height, depth, and natural width} 12901@!s:scaled; {shift amount} 12902@!g:pointer; {points to a glue specification} 12903@!o:glue_ord; {order of infinity} 12904@!f:internal_font_number; {the font in a |char_node|} 12905@!i:four_quarters; {font information about a |char_node|} 12906@!hd:eight_bits; {height and depth indices for a character} 12907begin last_badness:=0; r:=get_node(box_node_size); type(r):=hlist_node; 12908subtype(r):=min_quarterword; shift_amount(r):=0; 12909q:=r+list_offset; link(q):=p;@/ 12910h:=0; @<Clear dimensions to zero@>; 12911while p<>null do @<Examine node |p| in the hlist, taking account of its effect 12912 on the dimensions of the new box, or moving it to the adjustment list; 12913 then advance |p| to the next node@>; 12914if adjust_tail<>null then link(adjust_tail):=null; 12915height(r):=h; depth(r):=d;@/ 12916@<Determine the value of |width(r)| and the appropriate glue setting; 12917 then |return| or |goto common_ending|@>; 12918common_ending: @<Finish issuing a diagnostic message 12919 for an overfull or underfull hbox@>; 12920exit: hpack:=r; 12921end; 12922 12923@ @<Clear dimensions to zero@>= 12924d:=0; x:=0; 12925total_stretch[normal]:=0; total_shrink[normal]:=0; 12926total_stretch[fil]:=0; total_shrink[fil]:=0; 12927total_stretch[fill]:=0; total_shrink[fill]:=0; 12928total_stretch[filll]:=0; total_shrink[filll]:=0 12929 12930@ @<Examine node |p| in the hlist, taking account of its effect...@>= 12931@^inner loop@> 12932begin reswitch: while is_char_node(p) do 12933 @<Incorporate character dimensions into the dimensions of 12934 the hbox that will contain~it, then move to the next node@>; 12935if p<>null then 12936 begin case type(p) of 12937 hlist_node,vlist_node,rule_node,unset_node: 12938 @<Incorporate box dimensions into the dimensions of 12939 the hbox that will contain~it@>; 12940 ins_node,mark_node,adjust_node: if adjust_tail<>null then 12941 @<Transfer node |p| to the adjustment list@>; 12942 whatsit_node:@<Incorporate a whatsit node into an hbox@>; 12943 glue_node:@<Incorporate glue into the horizontal totals@>; 12944 kern_node,math_node: x:=x+width(p); 12945 ligature_node: @<Make node |p| look like a |char_node| 12946 and |goto reswitch|@>; 12947 othercases do_nothing 12948 endcases;@/ 12949 p:=link(p); 12950 end; 12951end 12952 12953 12954@ @<Make node |p| look like a |char_node| and |goto reswitch|@>= 12955begin mem[lig_trick]:=mem[lig_char(p)]; link(lig_trick):=link(p); 12956p:=lig_trick; goto reswitch; 12957end 12958 12959@ The code here implicitly uses the fact that running dimensions are 12960indicated by |null_flag|, which will be ignored in the calculations 12961because it is a highly negative number. 12962 12963@<Incorporate box dimensions into the dimensions of the hbox...@>= 12964begin x:=x+width(p); 12965if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p); 12966if height(p)-s>h then h:=height(p)-s; 12967if depth(p)+s>d then d:=depth(p)+s; 12968end 12969 12970@ The following code is part of \TeX's inner loop; i.e., adding another 12971character of text to the user's input will cause each of these instructions 12972to be exercised one more time. 12973@^inner loop@> 12974 12975@<Incorporate character dimensions into the dimensions of the hbox...@>= 12976begin f:=font(p); i:=char_info(f)(character(p)); hd:=height_depth(i); 12977x:=x+char_width(f)(i);@/ 12978s:=char_height(f)(hd);@+if s>h then h:=s; 12979s:=char_depth(f)(hd);@+if s>d then d:=s; 12980p:=link(p); 12981end 12982 12983@ Although node |q| is not necessarily the immediate predecessor of node |p|, 12984it always points to some node in the list preceding |p|. Thus, we can delete 12985nodes by moving |q| when necessary. The algorithm takes linear time, and the 12986extra computation does not intrude on the inner loop unless it is necessary 12987to make a deletion. 12988@^inner loop@> 12989 12990@<Transfer node |p| to the adjustment list@>= 12991begin while link(q)<>p do q:=link(q); 12992if type(p)=adjust_node then 12993 begin link(adjust_tail):=adjust_ptr(p); 12994 while link(adjust_tail)<>null do adjust_tail:=link(adjust_tail); 12995 p:=link(p); free_node(link(q),small_node_size); 12996 end 12997else begin link(adjust_tail):=p; adjust_tail:=p; p:=link(p); 12998 end; 12999link(q):=p; p:=q; 13000end 13001 13002@ @<Incorporate glue into the horizontal totals@>= 13003begin g:=glue_ptr(p); x:=x+width(g);@/ 13004o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g); 13005o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g); 13006if subtype(p)>=a_leaders then 13007 begin g:=leader_ptr(p); 13008 if height(g)>h then h:=height(g); 13009 if depth(g)>d then d:=depth(g); 13010 end; 13011end 13012 13013@ When we get to the present part of the program, |x| is the natural width 13014of the box being packaged. 13015 13016@<Determine the value of |width(r)| and the appropriate glue setting...@>= 13017if m=additional then w:=x+w; 13018width(r):=w; x:=w-x; {now |x| is the excess to be made up} 13019if x=0 then 13020 begin glue_sign(r):=normal; glue_order(r):=normal; 13021 set_glue_ratio_zero(glue_set(r)); 13022 return; 13023 end 13024else if x>0 then @<Determine horizontal glue stretch setting, then |return| 13025 or \hbox{|goto common_ending|}@> 13026else @<Determine horizontal glue shrink setting, then |return| 13027 or \hbox{|goto common_ending|}@> 13028 13029@ @<Determine horizontal glue stretch setting...@>= 13030begin @<Determine the stretch order@>; 13031glue_order(r):=o; glue_sign(r):=stretching; 13032if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o]) 13033@^real division@> 13034else begin glue_sign(r):=normal; 13035 set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch} 13036 end; 13037if o=normal then if list_ptr(r)<>null then 13038 @<Report an underfull hbox and |goto common_ending|, if this box 13039 is sufficiently bad@>; 13040return; 13041end 13042 13043@ @<Determine the stretch order@>= 13044if total_stretch[filll]<>0 then o:=filll 13045else if total_stretch[fill]<>0 then o:=fill 13046else if total_stretch[fil]<>0 then o:=fil 13047else o:=normal 13048 13049@ @<Report an underfull hbox and |goto common_ending|, if...@>= 13050begin last_badness:=badness(x,total_stretch[normal]); 13051if last_badness>hbadness then 13052 begin print_ln; 13053 if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose"); 13054 print(" \hbox (badness "); print_int(last_badness); 13055@.Underfull \\hbox...@> 13056@.Loose \\hbox...@> 13057 goto common_ending; 13058 end; 13059end 13060 13061@ In order to provide a decent indication of where an overfull or underfull 13062box originated, we use a global variable |pack_begin_line| that is 13063set nonzero only when |hpack| is being called by the paragraph builder 13064or the alignment finishing routine. 13065 13066@<Glob...@>= 13067@!pack_begin_line:integer; {source file line where the current paragraph 13068 or alignment began; a negative value denotes alignment} 13069 13070@ @<Set init...@>= 13071pack_begin_line:=0; 13072 13073@ @<Finish issuing a diagnostic message for an overfull or underfull hbox@>= 13074if output_active then print(") has occurred while \output is active") 13075else begin if pack_begin_line<>0 then 13076 begin if pack_begin_line>0 then print(") in paragraph at lines ") 13077 else print(") in alignment at lines "); 13078 print_int(abs(pack_begin_line)); 13079 print("--"); 13080 end 13081 else print(") detected at line "); 13082 print_int(line); 13083 end; 13084print_ln;@/ 13085font_in_short_display:=null_font; short_display(list_ptr(r)); print_ln;@/ 13086begin_diagnostic; show_box(r); end_diagnostic(true) 13087 13088@ @<Determine horizontal glue shrink setting...@>= 13089begin @<Determine the shrink order@>; 13090glue_order(r):=o; glue_sign(r):=shrinking; 13091if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o]) 13092@^real division@> 13093else begin glue_sign(r):=normal; 13094 set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink} 13095 end; 13096if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then 13097 begin last_badness:=1000000; 13098 set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage} 13099 @<Report an overfull hbox and |goto common_ending|, if this box 13100 is sufficiently bad@>; 13101 end 13102else if o=normal then if list_ptr(r)<>null then 13103 @<Report a tight hbox and |goto common_ending|, if this box 13104 is sufficiently bad@>; 13105return; 13106end 13107 13108@ @<Determine the shrink order@>= 13109if total_shrink[filll]<>0 then o:=filll 13110else if total_shrink[fill]<>0 then o:=fill 13111else if total_shrink[fil]<>0 then o:=fil 13112else o:=normal 13113 13114@ @<Report an overfull hbox and |goto common_ending|, if...@>= 13115if (-x-total_shrink[normal]>hfuzz)or(hbadness<100) then 13116 begin if (overfull_rule>0)and(-x-total_shrink[normal]>hfuzz) then 13117 begin while link(q)<>null do q:=link(q); 13118 link(q):=new_rule; 13119 width(link(q)):=overfull_rule; 13120 end; 13121 print_ln; print_nl("Overfull \hbox ("); 13122@.Overfull \\hbox...@> 13123 print_scaled(-x-total_shrink[normal]); print("pt too wide"); 13124 goto common_ending; 13125 end 13126 13127@ @<Report a tight hbox and |goto common_ending|, if...@>= 13128begin last_badness:=badness(-x,total_shrink[normal]); 13129if last_badness>hbadness then 13130 begin print_ln; print_nl("Tight \hbox (badness "); print_int(last_badness); 13131@.Tight \\hbox...@> 13132 goto common_ending; 13133 end; 13134end 13135 13136@ The |vpack| subroutine is actually a special case of a slightly more 13137general routine called |vpackage|, which has four parameters. The fourth 13138parameter, which is |max_dimen| in the case of |vpack|, specifies the 13139maximum depth of the page box that is constructed. The depth is first 13140computed by the normal rules; if it exceeds this limit, the reference 13141point is simply moved down until the limiting depth is attained. 13142 13143@d vpack(#)==vpackage(#,max_dimen) {special case of unconstrained depth} 13144 13145@p function vpackage(@!p:pointer;@!h:scaled;@!m:small_number;@!l:scaled): 13146 pointer; 13147label common_ending, exit; 13148var r:pointer; {the box node that will be returned} 13149@!w,@!d,@!x:scaled; {width, depth, and natural height} 13150@!s:scaled; {shift amount} 13151@!g:pointer; {points to a glue specification} 13152@!o:glue_ord; {order of infinity} 13153begin last_badness:=0; r:=get_node(box_node_size); type(r):=vlist_node; 13154subtype(r):=min_quarterword; shift_amount(r):=0; 13155list_ptr(r):=p;@/ 13156w:=0; @<Clear dimensions to zero@>; 13157while p<>null do @<Examine node |p| in the vlist, taking account of its effect 13158 on the dimensions of the new box; then advance |p| to the next node@>; 13159width(r):=w; 13160if d>l then 13161 begin x:=x+d-l; depth(r):=l; 13162 end 13163else depth(r):=d; 13164@<Determine the value of |height(r)| and the appropriate glue setting; 13165 then |return| or |goto common_ending|@>; 13166common_ending: @<Finish issuing a diagnostic message 13167 for an overfull or underfull vbox@>; 13168exit: vpackage:=r; 13169end; 13170 13171@ @<Examine node |p| in the vlist, taking account of its effect...@>= 13172begin if is_char_node(p) then confusion("vpack") 13173@:this can't happen vpack}{\quad vpack@> 13174else case type(p) of 13175 hlist_node,vlist_node,rule_node,unset_node: 13176 @<Incorporate box dimensions into the dimensions of 13177 the vbox that will contain~it@>; 13178 whatsit_node:@<Incorporate a whatsit node into a vbox@>; 13179 glue_node: @<Incorporate glue into the vertical totals@>; 13180 kern_node: begin x:=x+d+width(p); d:=0; 13181 end; 13182 othercases do_nothing 13183 endcases; 13184p:=link(p); 13185end 13186 13187@ @<Incorporate box dimensions into the dimensions of the vbox...@>= 13188begin x:=x+d+height(p); d:=depth(p); 13189if type(p)>=rule_node then s:=0 @+else s:=shift_amount(p); 13190if width(p)+s>w then w:=width(p)+s; 13191end 13192 13193@ @<Incorporate glue into the vertical totals@>= 13194begin x:=x+d; d:=0;@/ 13195g:=glue_ptr(p); x:=x+width(g);@/ 13196o:=stretch_order(g); total_stretch[o]:=total_stretch[o]+stretch(g); 13197o:=shrink_order(g); total_shrink[o]:=total_shrink[o]+shrink(g); 13198if subtype(p)>=a_leaders then 13199 begin g:=leader_ptr(p); 13200 if width(g)>w then w:=width(g); 13201 end; 13202end 13203 13204@ When we get to the present part of the program, |x| is the natural height 13205of the box being packaged. 13206 13207@<Determine the value of |height(r)| and the appropriate glue setting...@>= 13208if m=additional then h:=x+h; 13209height(r):=h; x:=h-x; {now |x| is the excess to be made up} 13210if x=0 then 13211 begin glue_sign(r):=normal; glue_order(r):=normal; 13212 set_glue_ratio_zero(glue_set(r)); 13213 return; 13214 end 13215else if x>0 then @<Determine vertical glue stretch setting, then |return| 13216 or \hbox{|goto common_ending|}@> 13217else @<Determine vertical glue shrink setting, then |return| 13218 or \hbox{|goto common_ending|}@> 13219 13220@ @<Determine vertical glue stretch setting...@>= 13221begin @<Determine the stretch order@>; 13222glue_order(r):=o; glue_sign(r):=stretching; 13223if total_stretch[o]<>0 then glue_set(r):=unfloat(x/total_stretch[o]) 13224@^real division@> 13225else begin glue_sign(r):=normal; 13226 set_glue_ratio_zero(glue_set(r)); {there's nothing to stretch} 13227 end; 13228if o=normal then if list_ptr(r)<>null then 13229 @<Report an underfull vbox and |goto common_ending|, if this box 13230 is sufficiently bad@>; 13231return; 13232end 13233 13234@ @<Report an underfull vbox and |goto common_ending|, if...@>= 13235begin last_badness:=badness(x,total_stretch[normal]); 13236if last_badness>vbadness then 13237 begin print_ln; 13238 if last_badness>100 then print_nl("Underfull")@+else print_nl("Loose"); 13239 print(" \vbox (badness "); print_int(last_badness); 13240@.Underfull \\vbox...@> 13241@.Loose \\vbox...@> 13242 goto common_ending; 13243 end; 13244end 13245 13246@ @<Finish issuing a diagnostic message for an overfull or underfull vbox@>= 13247if output_active then print(") has occurred while \output is active") 13248else begin if pack_begin_line<>0 then {it's actually negative} 13249 begin print(") in alignment at lines "); 13250 print_int(abs(pack_begin_line)); 13251 print("--"); 13252 end 13253 else print(") detected at line "); 13254 print_int(line); 13255 print_ln;@/ 13256 end; 13257begin_diagnostic; show_box(r); end_diagnostic(true) 13258 13259@ @<Determine vertical glue shrink setting...@>= 13260begin @<Determine the shrink order@>; 13261glue_order(r):=o; glue_sign(r):=shrinking; 13262if total_shrink[o]<>0 then glue_set(r):=unfloat((-x)/total_shrink[o]) 13263@^real division@> 13264else begin glue_sign(r):=normal; 13265 set_glue_ratio_zero(glue_set(r)); {there's nothing to shrink} 13266 end; 13267if (total_shrink[o]<-x)and(o=normal)and(list_ptr(r)<>null) then 13268 begin last_badness:=1000000; 13269 set_glue_ratio_one(glue_set(r)); {use the maximum shrinkage} 13270 @<Report an overfull vbox and |goto common_ending|, if this box 13271 is sufficiently bad@>; 13272 end 13273else if o=normal then if list_ptr(r)<>null then 13274 @<Report a tight vbox and |goto common_ending|, if this box 13275 is sufficiently bad@>; 13276return; 13277end 13278 13279@ @<Report an overfull vbox and |goto common_ending|, if...@>= 13280if (-x-total_shrink[normal]>vfuzz)or(vbadness<100) then 13281 begin print_ln; print_nl("Overfull \vbox ("); 13282@.Overfull \\vbox...@> 13283 print_scaled(-x-total_shrink[normal]); print("pt too high"); 13284 goto common_ending; 13285 end 13286 13287@ @<Report a tight vbox and |goto common_ending|, if...@>= 13288begin last_badness:=badness(-x,total_shrink[normal]); 13289if last_badness>vbadness then 13290 begin print_ln; print_nl("Tight \vbox (badness "); print_int(last_badness); 13291@.Tight \\vbox...@> 13292 goto common_ending; 13293 end; 13294end 13295 13296@ When a box is being appended to the current vertical list, the 13297baselineskip calculation is handled by the |append_to_vlist| routine. 13298 13299@p procedure append_to_vlist(@!b:pointer); 13300var d:scaled; {deficiency of space between baselines} 13301@!p:pointer; {a new glue node} 13302begin if prev_depth>ignore_depth then 13303 begin d:=width(baseline_skip)-prev_depth-height(b); 13304 if d<line_skip_limit then p:=new_param_glue(line_skip_code) 13305 else begin p:=new_skip_param(baseline_skip_code); 13306 width(temp_ptr):=d; {|temp_ptr=glue_ptr(p)|} 13307 end; 13308 link(tail):=p; tail:=p; 13309 end; 13310link(tail):=b; tail:=b; prev_depth:=depth(b); 13311end; 13312 13313@* \[34] Data structures for math mode. 13314When \TeX\ reads a formula that is enclosed between \.\$'s, it constructs an 13315{\sl mlist}, which is essentially a tree structure representing that 13316formula. An mlist is a linear sequence of items, but we can regard it as 13317a tree structure because mlists can appear within mlists. For example, many 13318of the entries can be subscripted or superscripted, and such ``scripts'' 13319are mlists in their own right. 13320 13321An entire formula is parsed into such a tree before any of the actual 13322typesetting is done, because the current style of type is usually not 13323known until the formula has been fully scanned. For example, when the 13324formula `\.{\$a+b \\over c+d\$}' is being read, there is no way to tell 13325that `\.{a+b}' will be in script size until `\.{\\over}' has appeared. 13326 13327During the scanning process, each element of the mlist being built is 13328classified as a relation, a binary operator, an open parenthesis, etc., 13329or as a construct like `\.{\\sqrt}' that must be built up. This classification 13330appears in the mlist data structure. 13331 13332After a formula has been fully scanned, the mlist is converted to an hlist 13333so that it can be incorporated into the surrounding text. This conversion is 13334controlled by a recursive procedure that decides all of the appropriate 13335styles by a ``top-down'' process starting at the outermost level and working 13336in towards the subformulas. The formula is ultimately pasted together using 13337combinations of horizontal and vertical boxes, with glue and penalty nodes 13338inserted as necessary. 13339 13340An mlist is represented internally as a linked list consisting chiefly 13341of ``noads'' (pronounced ``no-adds''), to distinguish them from the somewhat 13342similar ``nodes'' in hlists and vlists. Certain kinds of ordinary nodes are 13343allowed to appear in mlists together with the noads; \TeX\ tells the difference 13344by means of the |type| field, since a noad's |type| is always greater than 13345that of a node. An mlist does not contain character nodes, hlist nodes, vlist 13346nodes, math nodes, ligature nodes, 13347or unset nodes; in particular, each mlist item appears in the 13348variable-size part of |mem|, so the |type| field is always present. 13349 13350@ Each noad is four or more words long. The first word contains the |type| 13351and |subtype| and |link| fields that are already so familiar to us; the 13352second, third, and fourth words are called the noad's |nucleus|, |subscr|, 13353and |supscr| fields. 13354 13355Consider, for example, the simple formula `\.{\$x\^2\$}', which would be 13356parsed into an mlist containing a single element called an |ord_noad|. 13357The |nucleus| of this noad is a representation of `\.x', the |subscr| is 13358empty, and the |supscr| is a representation of `\.2'. 13359 13360The |nucleus|, |subscr|, and |supscr| fields are further broken into 13361subfields. If |p| points to a noad, and if |q| is one of its principal 13362fields (e.g., |q=subscr(p)|), there are several possibilities for the 13363subfields, depending on the |math_type| of |q|. 13364 13365\yskip\hang|math_type(q)=math_char| means that |fam(q)| refers to one of 13366the sixteen font families, and |character(q)| is the number of a character 13367within a font of that family, as in a character node. 13368 13369\yskip\hang|math_type(q)=math_text_char| is similar, but the character is 13370unsubscripted and unsuperscripted and it is followed immediately by another 13371character from the same font. (This |math_type| setting appears only 13372briefly during the processing; it is used to suppress unwanted italic 13373corrections.) 13374 13375\yskip\hang|math_type(q)=empty| indicates a field with no value (the 13376corresponding attribute of noad |p| is not present). 13377 13378\yskip\hang|math_type(q)=sub_box| means that |info(q)| points to a box 13379node (either an |hlist_node| or a |vlist_node|) that should be used as the 13380value of the field. The |shift_amount| in the subsidiary box node is the 13381amount by which that box will be shifted downward. 13382 13383\yskip\hang|math_type(q)=sub_mlist| means that |info(q)| points to 13384an mlist; the mlist must be converted to an hlist in order to obtain 13385the value of this field. 13386 13387\yskip\noindent In the latter case, we might have |info(q)=null|. This 13388is not the same as |math_type(q)=empty|; for example, `\.{\$P\_\{\}\$}' 13389and `\.{\$P\$}' produce different results (the former will not have the 13390``italic correction'' added to the width of |P|, but the ``script skip'' 13391will be added). 13392 13393The definitions of subfields given here are evidently wasteful of space, 13394since a halfword is being used for the |math_type| although only three 13395bits would be needed. However, there are hardly ever many noads present at 13396once, since they are soon converted to nodes that take up even more space, 13397so we can afford to represent them in whatever way simplifies the 13398programming. 13399 13400@d noad_size=4 {number of words in a normal noad} 13401@d nucleus(#)==#+1 {the |nucleus| field of a noad} 13402@d supscr(#)==#+2 {the |supscr| field of a noad} 13403@d subscr(#)==#+3 {the |subscr| field of a noad} 13404@d math_type==link {a |halfword| in |mem|} 13405@d fam==font {a |quarterword| in |mem|} 13406@d math_char=1 {|math_type| when the attribute is simple} 13407@d sub_box=2 {|math_type| when the attribute is a box} 13408@d sub_mlist=3 {|math_type| when the attribute is a formula} 13409@d math_text_char=4 {|math_type| when italic correction is dubious} 13410 13411@ Each portion of a formula is classified as Ord, Op, Bin, Rel, Ope, 13412Clo, Pun, or Inn, for purposes of spacing and line breaking. An 13413|ord_noad|, |op_noad|, |bin_noad|, |rel_noad|, |open_noad|, |close_noad|, 13414|punct_noad|, or |inner_noad| is used to represent portions of the various 13415types. For example, an `\.=' sign in a formula leads to the creation of a 13416|rel_noad| whose |nucleus| field is a representation of an equals sign 13417(usually |fam=0|, |character=@'75|). A formula preceded by \.{\\mathrel} 13418also results in a |rel_noad|. When a |rel_noad| is followed by an 13419|op_noad|, say, and possibly separated by one or more ordinary nodes (not 13420noads), \TeX\ will insert a penalty node (with the current |rel_penalty|) 13421just after the formula that corresponds to the |rel_noad|, unless there 13422already was a penalty immediately following; and a ``thick space'' will be 13423inserted just before the formula that corresponds to the |op_noad|. 13424 13425A noad of type |ord_noad|, |op_noad|, \dots, |inner_noad| usually 13426has a |subtype=normal|. The only exception is that an |op_noad| might 13427have |subtype=limits| or |no_limits|, if the normal positioning of 13428limits has been overridden for this operator. 13429 13430@d ord_noad=unset_node+3 {|type| of a noad classified Ord} 13431@d op_noad=ord_noad+1 {|type| of a noad classified Op} 13432@d bin_noad=ord_noad+2 {|type| of a noad classified Bin} 13433@d rel_noad=ord_noad+3 {|type| of a noad classified Rel} 13434@d open_noad=ord_noad+4 {|type| of a noad classified Ope} 13435@d close_noad=ord_noad+5 {|type| of a noad classified Clo} 13436@d punct_noad=ord_noad+6 {|type| of a noad classified Pun} 13437@d inner_noad=ord_noad+7 {|type| of a noad classified Inn} 13438@d limits=1 {|subtype| of |op_noad| whose scripts are to be above, below} 13439@d no_limits=2 {|subtype| of |op_noad| whose scripts are to be normal} 13440 13441@ A |radical_noad| is five words long; the fifth word is the |left_delimiter| 13442field, which usually represents a square root sign. 13443 13444A |fraction_noad| is six words long; it has a |right_delimiter| field 13445as well as a |left_delimiter|. 13446 13447Delimiter fields are of type |four_quarters|, and they have four subfields 13448called |small_fam|, |small_char|, |large_fam|, |large_char|. These subfields 13449represent variable-size delimiters by giving the ``small'' and ``large'' 13450starting characters, as explained in Chapter~17 of {\sl The \TeX book}. 13451@:TeXbook}{\sl The \TeX book@> 13452 13453A |fraction_noad| is actually quite different from all other noads. Not 13454only does it have six words, it has |thickness|, |denominator|, and 13455|numerator| fields instead of |nucleus|, |subscr|, and |supscr|. The 13456|thickness| is a scaled value that tells how thick to make a fraction 13457rule; however, the special value |default_code| is used to stand for the 13458|default_rule_thickness| of the current size. The |numerator| and 13459|denominator| point to mlists that define a fraction; we always have 13460$$\hbox{|math_type(numerator)=math_type(denominator)=sub_mlist|}.$$ The 13461|left_delimiter| and |right_delimiter| fields specify delimiters that will 13462be placed at the left and right of the fraction. In this way, a 13463|fraction_noad| is able to represent all of \TeX's operators \.{\\over}, 13464\.{\\atop}, \.{\\above}, \.{\\overwithdelims}, \.{\\atopwithdelims}, and 13465 \.{\\abovewithdelims}. 13466 13467@d left_delimiter(#)==#+4 {first delimiter field of a noad} 13468@d right_delimiter(#)==#+5 {second delimiter field of a fraction noad} 13469@d radical_noad=inner_noad+1 {|type| of a noad for square roots} 13470@d radical_noad_size=5 {number of |mem| words in a radical noad} 13471@d fraction_noad=radical_noad+1 {|type| of a noad for generalized fractions} 13472@d fraction_noad_size=6 {number of |mem| words in a fraction noad} 13473@d small_fam(#)==mem[#].qqqq.b0 {|fam| for ``small'' delimiter} 13474@d small_char(#)==mem[#].qqqq.b1 {|character| for ``small'' delimiter} 13475@d large_fam(#)==mem[#].qqqq.b2 {|fam| for ``large'' delimiter} 13476@d large_char(#)==mem[#].qqqq.b3 {|character| for ``large'' delimiter} 13477@d thickness==width {|thickness| field in a fraction noad} 13478@d default_code==@'10000000000 {denotes |default_rule_thickness|} 13479@d numerator==supscr {|numerator| field in a fraction noad} 13480@d denominator==subscr {|denominator| field in a fraction noad} 13481 13482@ The global variable |empty_field| is set up for initialization of empty 13483fields in new noads. Similarly, |null_delimiter| is for the initialization 13484of delimiter fields. 13485 13486@<Glob...@>= 13487@!empty_field:two_halves; 13488@!null_delimiter:four_quarters; 13489 13490@ @<Set init...@>= 13491empty_field.rh:=empty; empty_field.lh:=null;@/ 13492null_delimiter.b0:=0; null_delimiter.b1:=min_quarterword;@/ 13493null_delimiter.b2:=0; null_delimiter.b3:=min_quarterword; 13494 13495@ The |new_noad| function creates an |ord_noad| that is completely null. 13496 13497@p function new_noad:pointer; 13498var p:pointer; 13499begin p:=get_node(noad_size); 13500type(p):=ord_noad; subtype(p):=normal; 13501mem[nucleus(p)].hh:=empty_field; 13502mem[subscr(p)].hh:=empty_field; 13503mem[supscr(p)].hh:=empty_field; 13504new_noad:=p; 13505end; 13506 13507@ A few more kinds of noads will complete the set: An |under_noad| has its 13508nucleus underlined; an |over_noad| has it overlined. An |accent_noad| places 13509an accent over its nucleus; the accent character appears as 13510|fam(accent_chr(p))| and |character(accent_chr(p))|. A |vcenter_noad| 13511centers its nucleus vertically with respect to the axis of the formula; 13512in such noads we always have |math_type(nucleus(p))=sub_box|. 13513 13514And finally, we have |left_noad| and |right_noad| types, to implement 13515\TeX's \.{\\left} and \.{\\right}. The |nucleus| of such noads is 13516replaced by a |delimiter| field; thus, for example, `\.{\\left(}' produces 13517a |left_noad| such that |delimiter(p)| holds the family and character 13518codes for all left parentheses. A |left_noad| never appears in an mlist 13519except as the first element, and a |right_noad| never appears in an mlist 13520except as the last element; furthermore, we either have both a |left_noad| 13521and a |right_noad|, or neither one is present. The |subscr| and |supscr| 13522fields are always |empty| in a |left_noad| and a |right_noad|. 13523 13524@d under_noad=fraction_noad+1 {|type| of a noad for underlining} 13525@d over_noad=under_noad+1 {|type| of a noad for overlining} 13526@d accent_noad=over_noad+1 {|type| of a noad for accented subformulas} 13527@d accent_noad_size=5 {number of |mem| words in an accent noad} 13528@d accent_chr(#)==#+4 {the |accent_chr| field of an accent noad} 13529@d vcenter_noad=accent_noad+1 {|type| of a noad for \.{\\vcenter}} 13530@d left_noad=vcenter_noad+1 {|type| of a noad for \.{\\left}} 13531@d right_noad=left_noad+1 {|type| of a noad for \.{\\right}} 13532@d delimiter==nucleus {|delimiter| field in left and right noads} 13533@d scripts_allowed(#)==(type(#)>=ord_noad)and(type(#)<left_noad) 13534 13535@ Math formulas can also contain instructions like \.{\\textstyle} that 13536override \TeX's normal style rules. A |style_node| is inserted into the 13537data structure to record such instructions; it is three words long, so it 13538is considered a node instead of a noad. The |subtype| is either |display_style| 13539or |text_style| or |script_style| or |script_script_style|. The 13540second and third words of a |style_node| are not used, but they are 13541present because a |choice_node| is converted to a |style_node|. 13542 13543\TeX\ uses even numbers 0, 2, 4, 6 to encode the basic styles 13544|display_style|, \dots, |script_script_style|, and adds~1 to get the 13545``cramped'' versions of these styles. This gives a numerical order that 13546is backwards from the convention of Appendix~G in {\sl The \TeX book\/}; 13547i.e., a smaller style has a larger numerical value. 13548@:TeXbook}{\sl The \TeX book@> 13549 13550@d style_node=unset_node+1 {|type| of a style node} 13551@d style_node_size=3 {number of words in a style node} 13552@d display_style=0 {|subtype| for \.{\\displaystyle}} 13553@d text_style=2 {|subtype| for \.{\\textstyle}} 13554@d script_style=4 {|subtype| for \.{\\scriptstyle}} 13555@d script_script_style=6 {|subtype| for \.{\\scriptscriptstyle}} 13556@d cramped=1 {add this to an uncramped style if you want to cramp it} 13557 13558@p function new_style(@!s:small_number):pointer; {create a style node} 13559var p:pointer; {the new node} 13560begin p:=get_node(style_node_size); type(p):=style_node; 13561subtype(p):=s; width(p):=0; depth(p):=0; {the |width| and |depth| are not used} 13562new_style:=p; 13563end; 13564 13565@ Finally, the \.{\\mathchoice} primitive creates a |choice_node|, which 13566has special subfields |display_mlist|, |text_mlist|, |script_mlist|, 13567and |script_script_mlist| pointing to the mlists for each style. 13568 13569@d choice_node=unset_node+2 {|type| of a choice node} 13570@d display_mlist(#)==info(#+1) {mlist to be used in display style} 13571@d text_mlist(#)==link(#+1) {mlist to be used in text style} 13572@d script_mlist(#)==info(#+2) {mlist to be used in script style} 13573@d script_script_mlist(#)==link(#+2) {mlist to be used in scriptscript style} 13574 13575@p function new_choice:pointer; {create a choice node} 13576var p:pointer; {the new node} 13577begin p:=get_node(style_node_size); type(p):=choice_node; 13578subtype(p):=0; {the |subtype| is not used} 13579display_mlist(p):=null; text_mlist(p):=null; script_mlist(p):=null; 13580script_script_mlist(p):=null; 13581new_choice:=p; 13582end; 13583 13584@ Let's consider now the previously unwritten part of |show_node_list| 13585that displays the things that can only be present in mlists; this 13586program illustrates how to access the data structures just defined. 13587 13588In the context of the following program, |p| points to a node or noad that 13589should be displayed, and the current string contains the ``recursion history'' 13590that leads to this point. The recursion history consists of a dot for each 13591outer level in which |p| is subsidiary to some node, or in which |p| is 13592subsidiary to the |nucleus| field of some noad; the dot is replaced by 13593`\.\_' or `\.\^' or `\./' or `\.\\' if |p| is descended from the |subscr| 13594or |supscr| or |denominator| or |numerator| fields of noads. For example, 13595the current string would be `\.{.\^.\_/}' if |p| points to the |ord_noad| for 13596|x| in the (ridiculous) formula 13597`\.{\$\\sqrt\{a\^\{\\mathinner\{b\_\{c\\over x+y\}\}\}\}\$}'. 13598 13599@<Cases of |show_node_list| that arise...@>= 13600style_node:print_style(subtype(p)); 13601choice_node:@<Display choice node |p|@>; 13602ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad, 13603 radical_noad,over_noad,under_noad,vcenter_noad,accent_noad, 13604 left_noad,right_noad:@<Display normal noad |p|@>; 13605fraction_noad:@<Display fraction noad |p|@>; 13606 13607@ Here are some simple routines used in the display of noads. 13608 13609@<Declare procedures needed for displaying the elements of mlists@>= 13610procedure print_fam_and_char(@!p:pointer); {prints family and character} 13611begin print_esc("fam"); print_int(fam(p)); print_char(" "); 13612print_ASCII(qo(character(p))); 13613end; 13614@# 13615procedure print_delimiter(@!p:pointer); {prints a delimiter as 24-bit hex value} 13616var a:integer; {accumulator} 13617begin a:=small_fam(p)*256+qo(small_char(p)); 13618a:=a*@"1000+large_fam(p)*256+qo(large_char(p)); 13619if a<0 then print_int(a) {this should never happen} 13620else print_hex(a); 13621end; 13622 13623@ The next subroutine will descend to another level of recursion when a 13624subsidiary mlist needs to be displayed. The parameter |c| indicates what 13625character is to become part of the recursion history. An empty mlist is 13626distinguished from a field with |math_type(p)=empty|, because these are 13627not equivalent (as explained above). 13628@^recursion@> 13629 13630@<Declare procedures needed for displaying...@>= 13631procedure@?show_info; forward;@t\2@>@?{|show_node_list(info(temp_ptr))|} 13632procedure print_subsidiary_data(@!p:pointer;@!c:ASCII_code); 13633 {display a noad field} 13634begin if cur_length>=depth_threshold then 13635 begin if math_type(p)<>empty then print(" []"); 13636 end 13637else begin append_char(c); {include |c| in the recursion history} 13638 temp_ptr:=p; {prepare for |show_info| if recursion is needed} 13639 case math_type(p) of 13640 math_char: begin print_ln; print_current_string; print_fam_and_char(p); 13641 end; 13642 sub_box: show_info; {recursive call} 13643 sub_mlist: if info(p)=null then 13644 begin print_ln; print_current_string; print("{}"); 13645 end 13646 else show_info; {recursive call} 13647 othercases do_nothing {|empty|} 13648 endcases;@/ 13649 flush_char; {remove |c| from the recursion history} 13650 end; 13651end; 13652 13653@ The inelegant introduction of |show_info| in the code above seems better 13654than the alternative of using \PASCAL's strange |forward| declaration for a 13655procedure with parameters. The \PASCAL\ convention about dropping parameters 13656from a post-|forward| procedure is, frankly, so intolerable to the author 13657of \TeX\ that he would rather stoop to communication via a global temporary 13658variable. (A similar stoopidity occurred with respect to |hlist_out| and 13659|vlist_out| above, and it will occur with respect to |mlist_to_hlist| below.) 13660@^Knuth, Donald Ervin@> 13661@:PASCAL}{\PASCAL@> 13662 13663@p procedure show_info; {the reader will kindly forgive this} 13664begin show_node_list(info(temp_ptr)); 13665end; 13666 13667@ @<Declare procedures needed for displaying...@>= 13668procedure print_style(@!c:integer); 13669begin case c div 2 of 136700: print_esc("displaystyle"); {|display_style=0|} 136711: print_esc("textstyle"); {|text_style=2|} 136722: print_esc("scriptstyle"); {|script_style=4|} 136733: print_esc("scriptscriptstyle"); {|script_script_style=6|} 13674othercases print("Unknown style!") 13675endcases; 13676end; 13677 13678@ @<Display choice node |p|@>= 13679begin print_esc("mathchoice"); 13680append_char("D"); show_node_list(display_mlist(p)); flush_char; 13681append_char("T"); show_node_list(text_mlist(p)); flush_char; 13682append_char("S"); show_node_list(script_mlist(p)); flush_char; 13683append_char("s"); show_node_list(script_script_mlist(p)); flush_char; 13684end 13685 13686@ @<Display normal noad |p|@>= 13687begin case type(p) of 13688ord_noad: print_esc("mathord"); 13689op_noad: print_esc("mathop"); 13690bin_noad: print_esc("mathbin"); 13691rel_noad: print_esc("mathrel"); 13692open_noad: print_esc("mathopen"); 13693close_noad: print_esc("mathclose"); 13694punct_noad: print_esc("mathpunct"); 13695inner_noad: print_esc("mathinner"); 13696over_noad: print_esc("overline"); 13697under_noad: print_esc("underline"); 13698vcenter_noad: print_esc("vcenter"); 13699radical_noad: begin print_esc("radical"); print_delimiter(left_delimiter(p)); 13700 end; 13701accent_noad: begin print_esc("accent"); print_fam_and_char(accent_chr(p)); 13702 end; 13703left_noad: begin print_esc("left"); print_delimiter(delimiter(p)); 13704 end; 13705right_noad: begin print_esc("right"); print_delimiter(delimiter(p)); 13706 end; 13707end; 13708if subtype(p)<>normal then 13709 if subtype(p)=limits then print_esc("limits") 13710 else print_esc("nolimits"); 13711if type(p)<left_noad then print_subsidiary_data(nucleus(p),"."); 13712print_subsidiary_data(supscr(p),"^"); 13713print_subsidiary_data(subscr(p),"_"); 13714end 13715 13716@ @<Display fraction noad |p|@>= 13717begin print_esc("fraction, thickness "); 13718if thickness(p)=default_code then print("= default") 13719else print_scaled(thickness(p)); 13720if (small_fam(left_delimiter(p))<>0)or@+ 13721 (small_char(left_delimiter(p))<>min_quarterword)or@| 13722 (large_fam(left_delimiter(p))<>0)or@| 13723 (large_char(left_delimiter(p))<>min_quarterword) then 13724 begin print(", left-delimiter "); print_delimiter(left_delimiter(p)); 13725 end; 13726if (small_fam(right_delimiter(p))<>0)or@| 13727 (small_char(right_delimiter(p))<>min_quarterword)or@| 13728 (large_fam(right_delimiter(p))<>0)or@| 13729 (large_char(right_delimiter(p))<>min_quarterword) then 13730 begin print(", right-delimiter "); print_delimiter(right_delimiter(p)); 13731 end; 13732print_subsidiary_data(numerator(p),"\"); 13733print_subsidiary_data(denominator(p),"/"); 13734end 13735 13736@ That which can be displayed can also be destroyed. 13737 13738@<Cases of |flush_node_list| that arise...@>= 13739style_node: begin free_node(p,style_node_size); goto done; 13740 end; 13741choice_node:begin flush_node_list(display_mlist(p)); 13742 flush_node_list(text_mlist(p)); 13743 flush_node_list(script_mlist(p)); 13744 flush_node_list(script_script_mlist(p)); 13745 free_node(p,style_node_size); goto done; 13746 end; 13747ord_noad,op_noad,bin_noad,rel_noad,open_noad,close_noad,punct_noad,inner_noad, 13748 radical_noad,over_noad,under_noad,vcenter_noad,accent_noad:@t@>@;@/ 13749 begin if math_type(nucleus(p))>=sub_box then 13750 flush_node_list(info(nucleus(p))); 13751 if math_type(supscr(p))>=sub_box then 13752 flush_node_list(info(supscr(p))); 13753 if math_type(subscr(p))>=sub_box then 13754 flush_node_list(info(subscr(p))); 13755 if type(p)=radical_noad then free_node(p,radical_noad_size) 13756 else if type(p)=accent_noad then free_node(p,accent_noad_size) 13757 else free_node(p,noad_size); 13758 goto done; 13759 end; 13760left_noad,right_noad: begin free_node(p,noad_size); goto done; 13761 end; 13762fraction_noad: begin flush_node_list(info(numerator(p))); 13763 flush_node_list(info(denominator(p))); 13764 free_node(p,fraction_noad_size); goto done; 13765 end; 13766 13767@* \[35] Subroutines for math mode. 13768In order to convert mlists to hlists, i.e., noads to nodes, we need several 13769subroutines that are conveniently dealt with now. 13770 13771Let us first introduce the macros that make it easy to get at the parameters and 13772other font information. A size code, which is a multiple of 16, is added to a 13773family number to get an index into the table of internal font numbers 13774for each combination of family and size. (Be alert: Size codes get 13775larger as the type gets smaller.) 13776 13777@d text_size=0 {size code for the largest size in a family} 13778@d script_size=16 {size code for the medium size in a family} 13779@d script_script_size=32 {size code for the smallest size in a family} 13780 13781@<Basic printing procedures@>= 13782procedure print_size(@!s:integer); 13783begin if s=text_size then print_esc("textfont") 13784else if s=script_size then print_esc("scriptfont") 13785else print_esc("scriptscriptfont"); 13786end; 13787 13788@ Before an mlist is converted to an hlist, \TeX\ makes sure that 13789the fonts in family~2 have enough parameters to be math-symbol 13790fonts, and that the fonts in family~3 have enough parameters to be 13791math-extension fonts. The math-symbol parameters are referred to by using the 13792following macros, which take a size code as their parameter; for example, 13793|num1(cur_size)| gives the value of the |num1| parameter for the current size. 13794@^parameters for symbols@> 13795@^font parameters@> 13796 13797@d mathsy_end(#)==fam_fnt(2+#)]].sc 13798@d mathsy(#)==font_info[#+param_base[mathsy_end 13799@d math_x_height==mathsy(5) {height of `\.x'} 13800@d math_quad==mathsy(6) {\.{18mu}} 13801@d num1==mathsy(8) {numerator shift-up in display styles} 13802@d num2==mathsy(9) {numerator shift-up in non-display, non-\.{\\atop}} 13803@d num3==mathsy(10) {numerator shift-up in non-display \.{\\atop}} 13804@d denom1==mathsy(11) {denominator shift-down in display styles} 13805@d denom2==mathsy(12) {denominator shift-down in non-display styles} 13806@d sup1==mathsy(13) {superscript shift-up in uncramped display style} 13807@d sup2==mathsy(14) {superscript shift-up in uncramped non-display} 13808@d sup3==mathsy(15) {superscript shift-up in cramped styles} 13809@d sub1==mathsy(16) {subscript shift-down if superscript is absent} 13810@d sub2==mathsy(17) {subscript shift-down if superscript is present} 13811@d sup_drop==mathsy(18) {superscript baseline below top of large box} 13812@d sub_drop==mathsy(19) {subscript baseline below bottom of large box} 13813@d delim1==mathsy(20) {size of \.{\\atopwithdelims} delimiters 13814 in display styles} 13815@d delim2==mathsy(21) {size of \.{\\atopwithdelims} delimiters in non-displays} 13816@d axis_height==mathsy(22) {height of fraction lines above the baseline} 13817@d total_mathsy_params=22 13818 13819@ The math-extension parameters have similar macros, but the size code is 13820omitted (since it is always |cur_size| when we refer to such parameters). 13821@^parameters for symbols@> 13822@^font parameters@> 13823 13824@d mathex(#)==font_info[#+param_base[fam_fnt(3+cur_size)]].sc 13825@d default_rule_thickness==mathex(8) {thickness of \.{\\over} bars} 13826@d big_op_spacing1==mathex(9) {minimum clearance above a displayed op} 13827@d big_op_spacing2==mathex(10) {minimum clearance below a displayed op} 13828@d big_op_spacing3==mathex(11) {minimum baselineskip above displayed op} 13829@d big_op_spacing4==mathex(12) {minimum baselineskip below displayed op} 13830@d big_op_spacing5==mathex(13) {padding above and below displayed limits} 13831@d total_mathex_params=13 13832 13833@ We also need to compute the change in style between mlists and their 13834subsidiaries. The following macros define the subsidiary style for 13835an overlined nucleus (|cramped_style|), for a subscript or a superscript 13836(|sub_style| or |sup_style|), or for a numerator or denominator (|num_style| 13837or |denom_style|). 13838 13839@d cramped_style(#)==2*(# div 2)+cramped {cramp the style} 13840@d sub_style(#)==2*(# div 4)+script_style+cramped {smaller and cramped} 13841@d sup_style(#)==2*(# div 4)+script_style+(# mod 2) {smaller} 13842@d num_style(#)==#+2-2*(# div 6) {smaller unless already script-script} 13843@d denom_style(#)==2*(# div 2)+cramped+2-2*(# div 6) {smaller, cramped} 13844 13845@ When the style changes, the following piece of program computes associated 13846information: 13847 13848@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>= 13849begin if cur_style<script_style then cur_size:=text_size 13850else cur_size:=16*((cur_style-text_style) div 2); 13851cur_mu:=x_over_n(math_quad(cur_size),18); 13852end 13853 13854@ Here is a function that returns a pointer to a rule node having a given 13855thickness |t|. The rule will extend horizontally to the boundary of the vlist 13856that eventually contains it. 13857 13858@p function fraction_rule(@!t:scaled):pointer; 13859 {construct the bar for a fraction} 13860var p:pointer; {the new node} 13861begin p:=new_rule; height(p):=t; depth(p):=0; fraction_rule:=p; 13862end; 13863 13864@ The |overbar| function returns a pointer to a vlist box that consists of 13865a given box |b|, above which has been placed a kern of height |k| under a 13866fraction rule of thickness |t| under additional space of height |t|. 13867 13868@p function overbar(@!b:pointer;@!k,@!t:scaled):pointer; 13869var p,@!q:pointer; {nodes being constructed} 13870begin p:=new_kern(k); link(p):=b; q:=fraction_rule(t); link(q):=p; 13871p:=new_kern(t); link(p):=q; overbar:=vpack(p,natural); 13872end; 13873 13874@ The |var_delimiter| function, which finds or constructs a sufficiently 13875large delimiter, is the most interesting of the auxiliary functions that 13876currently concern us. Given a pointer |d| to a delimiter field in some noad, 13877together with a size code |s| and a vertical distance |v|, this function 13878returns a pointer to a box that contains the smallest variant of |d| whose 13879height plus depth is |v| or more. (And if no variant is large enough, it 13880returns the largest available variant.) In particular, this routine will 13881construct arbitrarily large delimiters from extensible components, if 13882|d| leads to such characters. 13883 13884The value returned is a box whose |shift_amount| has been set so that 13885the box is vertically centered with respect to the axis in the given size. 13886If a built-up symbol is returned, the height of the box before shifting 13887will be the height of its topmost component. 13888 13889@p@t\4@>@<Declare subprocedures for |var_delimiter|@> 13890function var_delimiter(@!d:pointer;@!s:small_number;@!v:scaled):pointer; 13891label found,continue; 13892var b:pointer; {the box that will be constructed} 13893@!f,@!g: internal_font_number; {best-so-far and tentative font codes} 13894@!c,@!x,@!y: quarterword; {best-so-far and tentative character codes} 13895@!m,@!n: integer; {the number of extensible pieces} 13896@!u: scaled; {height-plus-depth of a tentative character} 13897@!w: scaled; {largest height-plus-depth so far} 13898@!q: four_quarters; {character info} 13899@!hd: eight_bits; {height-depth byte} 13900@!r: four_quarters; {extensible pieces} 13901@!z: small_number; {runs through font family members} 13902@!large_attempt: boolean; {are we trying the ``large'' variant?} 13903begin f:=null_font; w:=0; large_attempt:=false; 13904z:=small_fam(d); x:=small_char(d); 13905loop@+ begin @<Look at the variants of |(z,x)|; set |f| and |c| whenever 13906 a better character is found; |goto found| as soon as a 13907 large enough variant is encountered@>; 13908 if large_attempt then goto found; {there were none large enough} 13909 large_attempt:=true; z:=large_fam(d); x:=large_char(d); 13910 end; 13911found: if f<>null_font then 13912 @<Make variable |b| point to a box for |(f,c)|@> 13913else begin b:=new_null_box; 13914 width(b):=null_delimiter_space; {use this width if no delimiter was found} 13915 end; 13916shift_amount(b):=half(height(b)-depth(b)) - axis_height(s); 13917var_delimiter:=b; 13918end; 13919 13920@ The search process is complicated slightly by the facts that some of the 13921characters might not be present in some of the fonts, and they might not 13922be probed in increasing order of height. 13923 13924@<Look at the variants of |(z,x)|; set |f| and |c|...@>= 13925if (z<>0)or(x<>min_quarterword) then 13926 begin z:=z+s+16; 13927 repeat z:=z-16; g:=fam_fnt(z); 13928 if g<>null_font then 13929 @<Look at the list of characters starting with |x| in 13930 font |g|; set |f| and |c| whenever 13931 a better character is found; |goto found| as soon as a 13932 large enough variant is encountered@>; 13933 until z<16; 13934 end 13935 13936@ @<Look at the list of characters starting with |x|...@>= 13937begin y:=x; 13938if (qo(y)>=font_bc[g])and(qo(y)<=font_ec[g]) then 13939 begin continue: q:=char_info(g)(y); 13940 if char_exists(q) then 13941 begin if char_tag(q)=ext_tag then 13942 begin f:=g; c:=y; goto found; 13943 end; 13944 hd:=height_depth(q); 13945 u:=char_height(g)(hd)+char_depth(g)(hd); 13946 if u>w then 13947 begin f:=g; c:=y; w:=u; 13948 if u>=v then goto found; 13949 end; 13950 if char_tag(q)=list_tag then 13951 begin y:=rem_byte(q); goto continue; 13952 end; 13953 end; 13954 end; 13955end 13956 13957@ Here is a subroutine that creates a new box, whose list contains a 13958single character, and whose width includes the italic correction for 13959that character. The height or depth of the box will be negative, if 13960the height or depth of the character is negative; thus, this routine 13961may deliver a slightly different result than |hpack| would produce. 13962 13963@<Declare subprocedures for |var_delimiter|@>= 13964function char_box(@!f:internal_font_number;@!c:quarterword):pointer; 13965var q:four_quarters; 13966@!hd:eight_bits; {|height_depth| byte} 13967@!b,@!p:pointer; {the new box and its character node} 13968begin q:=char_info(f)(c); hd:=height_depth(q); 13969b:=new_null_box; width(b):=char_width(f)(q)+char_italic(f)(q); 13970height(b):=char_height(f)(hd); depth(b):=char_depth(f)(hd); 13971p:=get_avail; character(p):=c; font(p):=f; list_ptr(b):=p; char_box:=b; 13972end; 13973 13974@ When the following code is executed, |char_tag(q)| will be equal to 13975|ext_tag| if and only if a built-up symbol is supposed to be returned. 13976 13977@<Make variable |b| point to a box for |(f,c)|@>= 13978if char_tag(q)=ext_tag then 13979 @<Construct an extensible character in a new box |b|, 13980 using recipe |rem_byte(q)| and font |f|@> 13981else b:=char_box(f,c) 13982 13983@ When we build an extensible character, it's handy to have the 13984following subroutine, which puts a given character on top 13985of the characters already in box |b|: 13986 13987@<Declare subprocedures for |var_delimiter|@>= 13988procedure stack_into_box(@!b:pointer;@!f:internal_font_number; 13989 @!c:quarterword); 13990var p:pointer; {new node placed into |b|} 13991begin p:=char_box(f,c); link(p):=list_ptr(b); list_ptr(b):=p; 13992height(b):=height(p); 13993end; 13994 13995@ Another handy subroutine computes the height plus depth of 13996a given character: 13997 13998@<Declare subprocedures for |var_delimiter|@>= 13999function height_plus_depth(@!f:internal_font_number;@!c:quarterword):scaled; 14000var q:four_quarters; 14001@!hd:eight_bits; {|height_depth| byte} 14002begin q:=char_info(f)(c); hd:=height_depth(q); 14003height_plus_depth:=char_height(f)(hd)+char_depth(f)(hd); 14004end; 14005 14006@ @<Construct an extensible...@>= 14007begin b:=new_null_box; 14008type(b):=vlist_node; 14009r:=font_info[exten_base[f]+rem_byte(q)].qqqq;@/ 14010@<Compute the minimum suitable height, |w|, and the corresponding 14011 number of extension steps, |n|; also set |width(b)|@>; 14012c:=ext_bot(r); 14013if c<>min_quarterword then stack_into_box(b,f,c); 14014c:=ext_rep(r); 14015for m:=1 to n do stack_into_box(b,f,c); 14016c:=ext_mid(r); 14017if c<>min_quarterword then 14018 begin stack_into_box(b,f,c); c:=ext_rep(r); 14019 for m:=1 to n do stack_into_box(b,f,c); 14020 end; 14021c:=ext_top(r); 14022if c<>min_quarterword then stack_into_box(b,f,c); 14023depth(b):=w-height(b); 14024end 14025 14026@ The width of an extensible character is the width of the repeatable 14027module. If this module does not have positive height plus depth, 14028we don't use any copies of it, otherwise we use as few as possible 14029(in groups of two if there is a middle part). 14030 14031@<Compute the minimum suitable height, |w|, and...@>= 14032c:=ext_rep(r); u:=height_plus_depth(f,c); 14033w:=0; q:=char_info(f)(c); width(b):=char_width(f)(q)+char_italic(f)(q);@/ 14034c:=ext_bot(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14035c:=ext_mid(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14036c:=ext_top(r);@+if c<>min_quarterword then w:=w+height_plus_depth(f,c); 14037n:=0; 14038if u>0 then while w<v do 14039 begin w:=w+u; incr(n); 14040 if ext_mid(r)<>min_quarterword then w:=w+u; 14041 end 14042 14043@ The next subroutine is much simpler; it is used for numerators and 14044denominators of fractions as well as for displayed operators and 14045their limits above and below. It takes a given box~|b| and 14046changes it so that the new box is centered in a box of width~|w|. 14047The centering is done by putting \.{\\hss} glue at the left and right 14048of the list inside |b|, then packaging the new box; thus, the 14049actual box might not really be centered, if it already contains 14050infinite glue. 14051 14052The given box might contain a single character whose italic correction 14053has been added to the width of the box; in this case a compensating 14054kern is inserted. 14055 14056@p function rebox(@!b:pointer;@!w:scaled):pointer; 14057var p:pointer; {temporary register for list manipulation} 14058@!f:internal_font_number; {font in a one-character box} 14059@!v:scaled; {width of a character without italic correction} 14060begin if (width(b)<>w)and(list_ptr(b)<>null) then 14061 begin if type(b)=vlist_node then b:=hpack(b,natural); 14062 p:=list_ptr(b); 14063 if (is_char_node(p))and(link(p)=null) then 14064 begin f:=font(p); v:=char_width(f)(char_info(f)(character(p))); 14065 if v<>width(b) then link(p):=new_kern(width(b)-v); 14066 end; 14067 free_node(b,box_node_size); 14068 b:=new_glue(ss_glue); link(b):=p; 14069 while link(p)<>null do p:=link(p); 14070 link(p):=new_glue(ss_glue); 14071 rebox:=hpack(b,w,exactly); 14072 end 14073else begin width(b):=w; rebox:=b; 14074 end; 14075end; 14076 14077@ Here is a subroutine that creates a new glue specification from another 14078one that is expressed in `\.{mu}', given the value of the math unit. 14079 14080@d mu_mult(#)==nx_plus_y(n,#,xn_over_d(#,f,@'200000)) 14081 14082@p function math_glue(@!g:pointer;@!m:scaled):pointer; 14083var p:pointer; {the new glue specification} 14084@!n:integer; {integer part of |m|} 14085@!f:scaled; {fraction part of |m|} 14086begin n:=x_over_n(m,@'200000); f:=remainder;@/ 14087if f<0 then 14088 begin decr(n); f:=f+@'200000; 14089 end; 14090p:=get_node(glue_spec_size); 14091width(p):=mu_mult(width(g)); {convert \.{mu} to \.{pt}} 14092stretch_order(p):=stretch_order(g); 14093if stretch_order(p)=normal then stretch(p):=mu_mult(stretch(g)) 14094else stretch(p):=stretch(g); 14095shrink_order(p):=shrink_order(g); 14096if shrink_order(p)=normal then shrink(p):=mu_mult(shrink(g)) 14097else shrink(p):=shrink(g); 14098math_glue:=p; 14099end; 14100 14101@ The |math_kern| subroutine removes |mu_glue| from a kern node, given 14102the value of the math unit. 14103 14104@p procedure math_kern(@!p:pointer;@!m:scaled); 14105var @!n:integer; {integer part of |m|} 14106@!f:scaled; {fraction part of |m|} 14107begin if subtype(p)=mu_glue then 14108 begin n:=x_over_n(m,@'200000); f:=remainder;@/ 14109 if f<0 then 14110 begin decr(n); f:=f+@'200000; 14111 end; 14112 width(p):=mu_mult(width(p)); subtype(p):=explicit; 14113 end; 14114end; 14115 14116@ Sometimes it is necessary to destroy an mlist. The following 14117subroutine empties the current list, assuming that |abs(mode)=mmode|. 14118 14119@p procedure flush_math; 14120begin flush_node_list(link(head)); flush_node_list(incompleat_noad); 14121link(head):=null; tail:=head; incompleat_noad:=null; 14122end; 14123 14124@* \[36] Typesetting math formulas. 14125\TeX's most important routine for dealing with formulas is called 14126|mlist_to_hlist|. After a formula has been scanned and represented as an 14127mlist, this routine converts it to an hlist that can be placed into a box 14128or incorporated into the text of a paragraph. There are three implicit 14129parameters, passed in global variables: |cur_mlist| points to the first 14130node or noad in the given mlist (and it might be |null|); |cur_style| is a 14131style code; and |mlist_penalties| is |true| if penalty nodes for potential 14132line breaks are to be inserted into the resulting hlist. After 14133|mlist_to_hlist| has acted, |link(temp_head)| points to the translated hlist. 14134 14135Since mlists can be inside mlists, the procedure is recursive. And since this 14136is not part of \TeX's inner loop, the program has been written in a manner 14137that stresses compactness over efficiency. 14138@^recursion@> 14139 14140@<Glob...@>= 14141@!cur_mlist:pointer; {beginning of mlist to be translated} 14142@!cur_style:small_number; {style code at current place in the list} 14143@!cur_size:small_number; {size code corresponding to |cur_style|} 14144@!cur_mu:scaled; {the math unit width corresponding to |cur_size|} 14145@!mlist_penalties:boolean; {should |mlist_to_hlist| insert penalties?} 14146 14147@ The recursion in |mlist_to_hlist| is due primarily to a subroutine 14148called |clean_box| that puts a given noad field into a box using a given 14149math style; |mlist_to_hlist| can call |clean_box|, which can call 14150|mlist_to_hlist|. 14151@^recursion@> 14152 14153The box returned by |clean_box| is ``clean'' in the 14154sense that its |shift_amount| is zero. 14155 14156@p procedure@?mlist_to_hlist; forward;@t\2@>@/ 14157function clean_box(@!p:pointer;@!s:small_number):pointer; 14158label found; 14159var q:pointer; {beginning of a list to be boxed} 14160@!save_style:small_number; {|cur_style| to be restored} 14161@!x:pointer; {box to be returned} 14162@!r:pointer; {temporary pointer} 14163begin case math_type(p) of 14164math_char: begin cur_mlist:=new_noad; mem[nucleus(cur_mlist)]:=mem[p]; 14165 end; 14166sub_box: begin q:=info(p); goto found; 14167 end; 14168sub_mlist: cur_mlist:=info(p); 14169othercases begin q:=new_null_box; goto found; 14170 end 14171endcases;@/ 14172save_style:=cur_style; cur_style:=s; mlist_penalties:=false;@/ 14173mlist_to_hlist; q:=link(temp_head); {recursive call} 14174cur_style:=save_style; {restore the style} 14175@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14176found: if is_char_node(q)or(q=null) then x:=hpack(q,natural) 14177 else if (link(q)=null)and(type(q)<=vlist_node)and(shift_amount(q)=0) then 14178 x:=q {it's already clean} 14179 else x:=hpack(q,natural); 14180@<Simplify a trivial box@>; 14181clean_box:=x; 14182end; 14183 14184@ Here we save memory space in a common case. 14185 14186@<Simplify a trivial box@>= 14187q:=list_ptr(x); 14188if is_char_node(q) then 14189 begin r:=link(q); 14190 if r<>null then if link(r)=null then if not is_char_node(r) then 14191 if type(r)=kern_node then {unneeded italic correction} 14192 begin free_node(r,small_node_size); link(q):=null; 14193 end; 14194 end 14195 14196@ It is convenient to have a procedure that converts a |math_char| 14197field to an ``unpacked'' form. The |fetch| routine sets |cur_f|, |cur_c|, 14198and |cur_i| to the font code, character code, and character information bytes of 14199a given noad field. It also takes care of issuing error messages for 14200nonexistent characters; in such cases, |char_exists(cur_i)| will be |false| 14201after |fetch| has acted, and the field will also have been reset to |empty|. 14202 14203@p procedure fetch(@!a:pointer); {unpack the |math_char| field |a|} 14204begin cur_c:=character(a); cur_f:=fam_fnt(fam(a)+cur_size); 14205if cur_f=null_font then 14206 @<Complain about an undefined family and set |cur_i| null@> 14207else begin if (qo(cur_c)>=font_bc[cur_f])and(qo(cur_c)<=font_ec[cur_f]) then 14208 cur_i:=char_info(cur_f)(cur_c) 14209 else cur_i:=null_character; 14210 if not(char_exists(cur_i)) then 14211 begin char_warning(cur_f,qo(cur_c)); 14212 math_type(a):=empty; 14213 end; 14214 end; 14215end; 14216 14217@ @<Complain about an undefined family...@>= 14218begin print_err(""); print_size(cur_size); print_char(" "); 14219print_int(fam(a)); print(" is undefined (character "); 14220print_ASCII(qo(cur_c)); print_char(")"); 14221help4("Somewhere in the math formula just ended, you used the")@/ 14222("stated character from an undefined font family. For example,")@/ 14223("plain TeX doesn't allow \it or \sl in subscripts. Proceed,")@/ 14224("and I'll try to forget that I needed that character."); 14225error; cur_i:=null_character; math_type(a):=empty; 14226end 14227 14228@ The outputs of |fetch| are placed in global variables. 14229 14230@<Glob...@>= 14231@!cur_f:internal_font_number; {the |font| field of a |math_char|} 14232@!cur_c:quarterword; {the |character| field of a |math_char|} 14233@!cur_i:four_quarters; {the |char_info| of a |math_char|, 14234 or a lig/kern instruction} 14235 14236@ We need to do a lot of different things, so |mlist_to_hlist| makes two 14237passes over the given mlist. 14238 14239The first pass does most of the processing: It removes ``mu'' spacing from 14240glue, it recursively evaluates all subsidiary mlists so that only the 14241top-level mlist remains to be handled, it puts fractions and square roots 14242and such things into boxes, it attaches subscripts and superscripts, and 14243it computes the overall height and depth of the top-level mlist so that 14244the size of delimiters for a |left_noad| and a |right_noad| will be known. 14245The hlist resulting from each noad is recorded in that noad's |new_hlist| 14246field, an integer field that replaces the |nucleus| or |thickness|. 14247@^recursion@> 14248 14249The second pass eliminates all noads and inserts the correct glue and 14250penalties between nodes. 14251 14252@d new_hlist(#)==mem[nucleus(#)].int {the translation of an mlist} 14253 14254@ Here is the overall plan of |mlist_to_hlist|, and the list of its 14255local variables. 14256 14257@d done_with_noad=80 {go here when a noad has been fully translated} 14258@d done_with_node=81 {go here when a node has been fully converted} 14259@d check_dimensions=82 {go here to update |max_h| and |max_d|} 14260@d delete_q=83 {go here to delete |q| and move to the next node} 14261 14262@p@t\4@>@<Declare math construction procedures@> 14263procedure mlist_to_hlist; 14264label reswitch, check_dimensions, done_with_noad, done_with_node, delete_q, 14265 done; 14266var mlist:pointer; {beginning of the given list} 14267@!penalties:boolean; {should penalty nodes be inserted?} 14268@!style:small_number; {the given style} 14269@!save_style:small_number; {holds |cur_style| during recursion} 14270@!q:pointer; {runs through the mlist} 14271@!r:pointer; {the most recent noad preceding |q|} 14272@!r_type:small_number; {the |type| of noad |r|, or |op_noad| if |r=null|} 14273@!t:small_number; {the effective |type| of noad |q| during the second pass} 14274@!p,@!x,@!y,@!z: pointer; {temporary registers for list construction} 14275@!pen:integer; {a penalty to be inserted} 14276@!s:small_number; {the size of a noad to be deleted} 14277@!max_h,@!max_d:scaled; {maximum height and depth of the list translated so far} 14278@!delta:scaled; {offset between subscript and superscript} 14279begin mlist:=cur_mlist; penalties:=mlist_penalties; 14280style:=cur_style; {tuck global parameters away as local variables} 14281q:=mlist; r:=null; r_type:=op_noad; max_h:=0; max_d:=0; 14282@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14283while q<>null do @<Process node-or-noad |q| as much as possible in preparation 14284 for the second pass of |mlist_to_hlist|, then move to the next 14285 item in the mlist@>; 14286@<Convert \(a)a final |bin_noad| to an |ord_noad|@>; 14287@<Make a second pass over the mlist, removing all noads and inserting the 14288 proper spacing and penalties@>; 14289end; 14290 14291@ We use the fact that no character nodes appear in an mlist, hence 14292the field |type(q)| is always present. 14293 14294@<Process node-or-noad...@>= 14295begin @<Do first-pass processing based on |type(q)|; |goto done_with_noad| 14296 if a noad has been fully processed, |goto check_dimensions| if it 14297 has been translated into |new_hlist(q)|, or |goto done_with_node| 14298 if a node has been fully processed@>; 14299check_dimensions: z:=hpack(new_hlist(q),natural); 14300if height(z)>max_h then max_h:=height(z); 14301if depth(z)>max_d then max_d:=depth(z); 14302free_node(z,box_node_size); 14303done_with_noad: r:=q; r_type:=type(r); 14304done_with_node: q:=link(q); 14305end 14306 14307@ One of the things we must do on the first pass is change a |bin_noad| to 14308an |ord_noad| if the |bin_noad| is not in the context of a binary operator. 14309The values of |r| and |r_type| make this fairly easy. 14310 14311@<Do first-pass processing...@>= 14312reswitch: delta:=0; 14313case type(q) of 14314bin_noad: case r_type of 14315 bin_noad,op_noad,rel_noad,open_noad,punct_noad,left_noad: 14316 begin type(q):=ord_noad; goto reswitch; 14317 end; 14318 othercases do_nothing 14319 endcases; 14320rel_noad,close_noad,punct_noad,right_noad: begin@t@>@;@/ 14321 @<Convert \(a)a final |bin_noad| to an |ord_noad|@>; 14322 if type(q)=right_noad then goto done_with_noad; 14323 end; 14324@t\4@>@<Cases for noads that can follow a |bin_noad|@>@; 14325@t\4@>@<Cases for nodes that can appear in an mlist, after which we 14326 |goto done_with_node|@>@; 14327othercases confusion("mlist1") 14328@:this can't happen mlist1}{\quad mlist1@> 14329endcases;@/ 14330@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@> 14331 14332@ @<Convert \(a)a final |bin_noad| to an |ord_noad|@>= 14333if r_type=bin_noad then type(r):=ord_noad 14334 14335@ @<Cases for nodes that can appear in an mlist...@>= 14336style_node: begin cur_style:=subtype(q); 14337 @<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14338 goto done_with_node; 14339 end; 14340choice_node: @<Change this node to a style node followed by the correct choice, 14341 then |goto done_with_node|@>; 14342ins_node,mark_node,adjust_node, 14343 whatsit_node,penalty_node,disc_node: goto done_with_node; 14344rule_node: begin if height(q)>max_h then max_h:=height(q); 14345 if depth(q)>max_d then max_d:=depth(q); goto done_with_node; 14346 end; 14347glue_node: begin @<Convert \(m)math glue to ordinary glue@>; 14348 goto done_with_node; 14349 end; 14350kern_node: begin math_kern(q,cur_mu); goto done_with_node; 14351 end; 14352 14353@ @d choose_mlist(#)==begin p:=#(q); #(q):=null;@+end 14354 14355@<Change this node to a style node...@>= 14356begin case cur_style div 2 of 143570: choose_mlist(display_mlist); {|display_style=0|} 143581: choose_mlist(text_mlist); {|text_style=2|} 143592: choose_mlist(script_mlist); {|script_style=4|} 143603: choose_mlist(script_script_mlist); {|script_script_style=6|} 14361end; {there are no other cases} 14362flush_node_list(display_mlist(q)); 14363flush_node_list(text_mlist(q)); 14364flush_node_list(script_mlist(q)); 14365flush_node_list(script_script_mlist(q));@/ 14366type(q):=style_node; subtype(q):=cur_style; width(q):=0; depth(q):=0; 14367if p<>null then 14368 begin z:=link(q); link(q):=p; 14369 while link(p)<>null do p:=link(p); 14370 link(p):=z; 14371 end; 14372goto done_with_node; 14373end 14374 14375@ Conditional math glue (`\.{\\nonscript}') results in a |glue_node| 14376pointing to |zero_glue|, with |subtype(q)=cond_math_glue|; in such a case 14377the node following will be eliminated if it is a glue or kern node and if the 14378current size is different from |text_size|. Unconditional math glue 14379(`\.{\\muskip}') is converted to normal glue by multiplying the dimensions 14380by |cur_mu|. 14381@!@:non_script_}{\.{\\nonscript} primitive@> 14382 14383@<Convert \(m)math glue to ordinary glue@>= 14384if subtype(q)=mu_glue then 14385 begin x:=glue_ptr(q); 14386 y:=math_glue(x,cur_mu); delete_glue_ref(x); glue_ptr(q):=y; 14387 subtype(q):=normal; 14388 end 14389else if (cur_size<>text_size)and(subtype(q)=cond_math_glue) then 14390 begin p:=link(q); 14391 if p<>null then if (type(p)=glue_node)or(type(p)=kern_node) then 14392 begin link(q):=link(p); link(p):=null; flush_node_list(p); 14393 end; 14394 end 14395 14396@ @<Cases for noads that can follow a |bin_noad|@>= 14397left_noad: goto done_with_noad; 14398fraction_noad: begin make_fraction(q); goto check_dimensions; 14399 end; 14400op_noad: begin delta:=make_op(q); 14401 if subtype(q)=limits then goto check_dimensions; 14402 end; 14403ord_noad: make_ord(q); 14404open_noad,inner_noad: do_nothing; 14405radical_noad: make_radical(q); 14406over_noad: make_over(q); 14407under_noad: make_under(q); 14408accent_noad: make_math_accent(q); 14409vcenter_noad: make_vcenter(q); 14410 14411@ Most of the actual construction work of |mlist_to_hlist| is done 14412by procedures with names 14413like |make_fraction|, |make_radical|, etc. To illustrate 14414the general setup of such procedures, let's begin with a couple of 14415simple ones. 14416 14417@<Declare math...@>= 14418procedure make_over(@!q:pointer); 14419begin info(nucleus(q)):=@| 14420 overbar(clean_box(nucleus(q),cramped_style(cur_style)),@| 14421 3*default_rule_thickness,default_rule_thickness); 14422math_type(nucleus(q)):=sub_box; 14423end; 14424 14425@ @<Declare math...@>= 14426procedure make_under(@!q:pointer); 14427var p,@!x,@!y: pointer; {temporary registers for box construction} 14428@!delta:scaled; {overall height plus depth} 14429begin x:=clean_box(nucleus(q),cur_style); 14430p:=new_kern(3*default_rule_thickness); link(x):=p; 14431link(p):=fraction_rule(default_rule_thickness); 14432y:=vpack(x,natural); 14433delta:=height(y)+depth(y)+default_rule_thickness; 14434height(y):=height(x); depth(y):=delta-height(y); 14435info(nucleus(q)):=y; math_type(nucleus(q)):=sub_box; 14436end; 14437 14438@ @<Declare math...@>= 14439procedure make_vcenter(@!q:pointer); 14440var v:pointer; {the box that should be centered vertically} 14441@!delta:scaled; {its height plus depth} 14442begin v:=info(nucleus(q)); 14443if type(v)<>vlist_node then confusion("vcenter"); 14444@:this can't happen vcenter}{\quad vcenter@> 14445delta:=height(v)+depth(v); 14446height(v):=axis_height(cur_size)+half(delta); 14447depth(v):=delta-height(v); 14448end; 14449 14450@ According to the rules in the \.{DVI} file specifications, we ensure alignment 14451@^square roots@> 14452between a square root sign and the rule above its nucleus by assuming that the 14453baseline of the square-root symbol is the same as the bottom of the rule. The 14454height of the square-root symbol will be the thickness of the rule, and the 14455depth of the square-root symbol should exceed or equal the height-plus-depth 14456of the nucleus plus a certain minimum clearance~|clr|. The symbol will be 14457placed so that the actual clearance is |clr| plus half the excess. 14458 14459@<Declare math...@>= 14460procedure make_radical(@!q:pointer); 14461var x,@!y:pointer; {temporary registers for box construction} 14462@!delta,@!clr:scaled; {dimensions involved in the calculation} 14463begin x:=clean_box(nucleus(q),cramped_style(cur_style)); 14464if cur_style<text_style then {display style} 14465 clr:=default_rule_thickness+(abs(math_x_height(cur_size)) div 4) 14466else begin clr:=default_rule_thickness; clr:=clr + (abs(clr) div 4); 14467 end; 14468y:=var_delimiter(left_delimiter(q),cur_size,height(x)+depth(x)+clr+ 14469 default_rule_thickness); 14470delta:=depth(y)-(height(x)+depth(x)+clr); 14471if delta>0 then clr:=clr+half(delta); {increase the actual clearance} 14472shift_amount(y):=-(height(x)+clr); 14473link(y):=overbar(x,clr,height(y)); 14474info(nucleus(q)):=hpack(y,natural); math_type(nucleus(q)):=sub_box; 14475end; 14476 14477@ Slants are not considered when placing accents in math mode. The accenter is 14478centered over the accentee, and the accent width is treated as zero with 14479respect to the size of the final box. 14480 14481@<Declare math...@>= 14482procedure make_math_accent(@!q:pointer); 14483label done,done1; 14484var p,@!x,@!y:pointer; {temporary registers for box construction} 14485@!a:integer; {address of lig/kern instruction} 14486@!c:quarterword; {accent character} 14487@!f:internal_font_number; {its font} 14488@!i:four_quarters; {its |char_info|} 14489@!s:scaled; {amount to skew the accent to the right} 14490@!h:scaled; {height of character being accented} 14491@!delta:scaled; {space to remove between accent and accentee} 14492@!w:scaled; {width of the accentee, not including sub/superscripts} 14493begin fetch(accent_chr(q)); 14494if char_exists(cur_i) then 14495 begin i:=cur_i; c:=cur_c; f:=cur_f;@/ 14496 @<Compute the amount of skew@>; 14497 x:=clean_box(nucleus(q),cramped_style(cur_style)); w:=width(x); h:=height(x); 14498 @<Switch to a larger accent if available and appropriate@>; 14499 if h<x_height(f) then delta:=h@+else delta:=x_height(f); 14500 if (math_type(supscr(q))<>empty)or(math_type(subscr(q))<>empty) then 14501 if math_type(nucleus(q))=math_char then 14502 @<Swap the subscript and superscript into box |x|@>; 14503 y:=char_box(f,c); 14504 shift_amount(y):=s+half(w-width(y)); 14505 width(y):=0; p:=new_kern(-delta); link(p):=x; link(y):=p; 14506 y:=vpack(y,natural); width(y):=width(x); 14507 if height(y)<h then @<Make the height of box |y| equal to |h|@>; 14508 info(nucleus(q)):=y; 14509 math_type(nucleus(q)):=sub_box; 14510 end; 14511end; 14512 14513@ @<Make the height of box |y|...@>= 14514begin p:=new_kern(h-height(y)); link(p):=list_ptr(y); list_ptr(y):=p; 14515height(y):=h; 14516end 14517 14518@ @<Switch to a larger accent if available and appropriate@>= 14519loop@+ begin if char_tag(i)<>list_tag then goto done; 14520 y:=rem_byte(i); 14521 i:=char_info(f)(y); 14522 if not char_exists(i) then goto done; 14523 if char_width(f)(i)>w then goto done; 14524 c:=y; 14525 end; 14526done: 14527 14528@ @<Compute the amount of skew@>= 14529s:=0; 14530if math_type(nucleus(q))=math_char then 14531 begin fetch(nucleus(q)); 14532 if char_tag(cur_i)=lig_tag then 14533 begin a:=lig_kern_start(cur_f)(cur_i); 14534 cur_i:=font_info[a].qqqq; 14535 if skip_byte(cur_i)>stop_flag then 14536 begin a:=lig_kern_restart(cur_f)(cur_i); 14537 cur_i:=font_info[a].qqqq; 14538 end; 14539 loop@+ begin if qo(next_char(cur_i))=skew_char[cur_f] then 14540 begin if op_byte(cur_i)>=kern_flag then 14541 if skip_byte(cur_i)<=stop_flag then s:=char_kern(cur_f)(cur_i); 14542 goto done1; 14543 end; 14544 if skip_byte(cur_i)>=stop_flag then goto done1; 14545 a:=a+qo(skip_byte(cur_i))+1; 14546 cur_i:=font_info[a].qqqq; 14547 end; 14548 end; 14549 end; 14550done1: 14551 14552@ @<Swap the subscript and superscript into box |x|@>= 14553begin flush_node_list(x); x:=new_noad; 14554mem[nucleus(x)]:=mem[nucleus(q)]; 14555mem[supscr(x)]:=mem[supscr(q)]; 14556mem[subscr(x)]:=mem[subscr(q)];@/ 14557mem[supscr(q)].hh:=empty_field; 14558mem[subscr(q)].hh:=empty_field;@/ 14559math_type(nucleus(q)):=sub_mlist; info(nucleus(q)):=x; 14560x:=clean_box(nucleus(q),cur_style); delta:=delta+height(x)-h; h:=height(x); 14561end 14562 14563@ The |make_fraction| procedure is a bit different because it sets 14564|new_hlist(q)| directly rather than making a sub-box. 14565 14566@<Declare math...@>= 14567procedure make_fraction(@!q:pointer); 14568var p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14569@!delta,@!delta1,@!delta2,@!shift_up,@!shift_down,@!clr:scaled; 14570 {dimensions for box calculations} 14571begin if thickness(q)=default_code then thickness(q):=default_rule_thickness; 14572@<Create equal-width boxes |x| and |z| for the numerator and denominator, 14573 and compute the default amounts |shift_up| and |shift_down| by which they 14574 are displaced from the baseline@>; 14575if thickness(q)=0 then @<Adjust \(s)|shift_up| and |shift_down| for the case 14576 of no fraction line@> 14577else @<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>; 14578@<Construct a vlist box for the fraction, according to |shift_up| and 14579 |shift_down|@>; 14580@<Put the \(f)fraction into a box with its delimiters, and make |new_hlist(q)| 14581 point to it@>; 14582end; 14583 14584@ @<Create equal-width boxes |x| and |z| for the numerator and denom...@>= 14585x:=clean_box(numerator(q),num_style(cur_style)); 14586z:=clean_box(denominator(q),denom_style(cur_style)); 14587if width(x)<width(z) then x:=rebox(x,width(z)) 14588else z:=rebox(z,width(x)); 14589if cur_style<text_style then {display style} 14590 begin shift_up:=num1(cur_size); shift_down:=denom1(cur_size); 14591 end 14592else begin shift_down:=denom2(cur_size); 14593 if thickness(q)<>0 then shift_up:=num2(cur_size) 14594 else shift_up:=num3(cur_size); 14595 end 14596 14597@ The numerator and denominator must be separated by a certain minimum 14598clearance, called |clr| in the following program. The difference between 14599|clr| and the actual clearance is |2delta|. 14600 14601@<Adjust \(s)|shift_up| and |shift_down| for the case of no fraction line@>= 14602begin if cur_style<text_style then clr:=7*default_rule_thickness 14603else clr:=3*default_rule_thickness; 14604delta:=half(clr-((shift_up-depth(x))-(height(z)-shift_down))); 14605if delta>0 then 14606 begin shift_up:=shift_up+delta; 14607 shift_down:=shift_down+delta; 14608 end; 14609end 14610 14611@ In the case of a fraction line, the minimum clearance depends on the actual 14612thickness of the line. 14613 14614@<Adjust \(s)|shift_up| and |shift_down| for the case of a fraction line@>= 14615begin if cur_style<text_style then clr:=3*thickness(q) 14616else clr:=thickness(q); 14617delta:=half(thickness(q)); 14618delta1:=clr-((shift_up-depth(x))-(axis_height(cur_size)+delta)); 14619delta2:=clr-((axis_height(cur_size)-delta)-(height(z)-shift_down)); 14620if delta1>0 then shift_up:=shift_up+delta1; 14621if delta2>0 then shift_down:=shift_down+delta2; 14622end 14623 14624@ @<Construct a vlist box for the fraction...@>= 14625v:=new_null_box; type(v):=vlist_node; 14626height(v):=shift_up+height(x); depth(v):=depth(z)+shift_down; 14627width(v):=width(x); {this also equals |width(z)|} 14628if thickness(q)=0 then 14629 begin p:=new_kern((shift_up-depth(x))-(height(z)-shift_down)); 14630 link(p):=z; 14631 end 14632else begin y:=fraction_rule(thickness(q));@/ 14633 p:=new_kern((axis_height(cur_size)-delta)-@|(height(z)-shift_down));@/ 14634 link(y):=p; link(p):=z;@/ 14635 p:=new_kern((shift_up-depth(x))-(axis_height(cur_size)+delta)); 14636 link(p):=y; 14637 end; 14638link(x):=p; list_ptr(v):=x 14639 14640@ @<Put the \(f)fraction into a box with its delimiters...@>= 14641if cur_style<text_style then delta:=delim1(cur_size) 14642else delta:=delim2(cur_size); 14643x:=var_delimiter(left_delimiter(q), cur_size, delta); link(x):=v;@/ 14644z:=var_delimiter(right_delimiter(q), cur_size, delta); link(v):=z;@/ 14645new_hlist(q):=hpack(x,natural) 14646 14647@ If the nucleus of an |op_noad| is a single character, it is to be 14648centered vertically with respect to the axis, after first being enlarged 14649(via a character list in the font) if we are in display style. The normal 14650convention for placing displayed limits is to put them above and below the 14651operator in display style. 14652 14653The italic correction is removed from the character if there is a subscript 14654and the limits are not being displayed. The |make_op| 14655routine returns the value that should be used as an offset between 14656subscript and superscript. 14657 14658After |make_op| has acted, |subtype(q)| will be |limits| if and only if 14659the limits have been set above and below the operator. In that case, 14660|new_hlist(q)| will already contain the desired final box. 14661 14662@<Declare math...@>= 14663function make_op(@!q:pointer):scaled; 14664var delta:scaled; {offset between subscript and superscript} 14665@!p,@!v,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14666@!c:quarterword;@+@!i:four_quarters; {registers for character examination} 14667@!shift_up,@!shift_down:scaled; {dimensions for box calculation} 14668begin if (subtype(q)=normal)and(cur_style<text_style) then 14669 subtype(q):=limits; 14670if math_type(nucleus(q))=math_char then 14671 begin fetch(nucleus(q)); 14672 if (cur_style<text_style)and(char_tag(cur_i)=list_tag) then {make it larger} 14673 begin c:=rem_byte(cur_i); i:=char_info(cur_f)(c); 14674 if char_exists(i) then 14675 begin cur_c:=c; cur_i:=i; character(nucleus(q)):=c; 14676 end; 14677 end; 14678 delta:=char_italic(cur_f)(cur_i); x:=clean_box(nucleus(q),cur_style); 14679 if (math_type(subscr(q))<>empty)and(subtype(q)<>limits) then 14680 width(x):=width(x)-delta; {remove italic correction} 14681 shift_amount(x):=half(height(x)-depth(x)) - axis_height(cur_size); 14682 {center vertically} 14683 math_type(nucleus(q)):=sub_box; info(nucleus(q)):=x; 14684 end 14685else delta:=0; 14686if subtype(q)=limits then 14687 @<Construct a box with limits above and below it, skewed by |delta|@>; 14688make_op:=delta; 14689end; 14690 14691@ The following program builds a vlist box |v| for displayed limits. The 14692width of the box is not affected by the fact that the limits may be skewed. 14693 14694@<Construct a box with limits above and below it...@>= 14695begin x:=clean_box(supscr(q),sup_style(cur_style)); 14696y:=clean_box(nucleus(q),cur_style); 14697z:=clean_box(subscr(q),sub_style(cur_style)); 14698v:=new_null_box; type(v):=vlist_node; width(v):=width(y); 14699if width(x)>width(v) then width(v):=width(x); 14700if width(z)>width(v) then width(v):=width(z); 14701x:=rebox(x,width(v)); y:=rebox(y,width(v)); z:=rebox(z,width(v));@/ 14702shift_amount(x):=half(delta); shift_amount(z):=-shift_amount(x); 14703height(v):=height(y); depth(v):=depth(y); 14704@<Attach the limits to |y| and adjust |height(v)|, |depth(v)| to 14705 account for their presence@>; 14706new_hlist(q):=v; 14707end 14708 14709@ We use |shift_up| and |shift_down| in the following program for the 14710amount of glue between the displayed operator |y| and its limits |x| and 14711|z|. The vlist inside box |v| will consist of |x| followed by |y| followed 14712by |z|, with kern nodes for the spaces between and around them. 14713 14714@<Attach the limits to |y| and adjust |height(v)|, |depth(v)|...@>= 14715if math_type(supscr(q))=empty then 14716 begin free_node(x,box_node_size); list_ptr(v):=y; 14717 end 14718else begin shift_up:=big_op_spacing3-depth(x); 14719 if shift_up<big_op_spacing1 then shift_up:=big_op_spacing1; 14720 p:=new_kern(shift_up); link(p):=y; link(x):=p;@/ 14721 p:=new_kern(big_op_spacing5); link(p):=x; list_ptr(v):=p; 14722 height(v):=height(v)+big_op_spacing5+height(x)+depth(x)+shift_up; 14723 end; 14724if math_type(subscr(q))=empty then free_node(z,box_node_size) 14725else begin shift_down:=big_op_spacing4-height(z); 14726 if shift_down<big_op_spacing2 then shift_down:=big_op_spacing2; 14727 p:=new_kern(shift_down); link(y):=p; link(p):=z;@/ 14728 p:=new_kern(big_op_spacing5); link(z):=p; 14729 depth(v):=depth(v)+big_op_spacing5+height(z)+depth(z)+shift_down; 14730 end 14731 14732@ A ligature found in a math formula does not create a |ligature_node|, because 14733there is no question of hyphenation afterwards; the ligature will simply be 14734stored in an ordinary |char_node|, after residing in an |ord_noad|. 14735 14736The |math_type| is converted to |math_text_char| here if we would not want to 14737apply an italic correction to the current character unless it belongs 14738to a math font (i.e., a font with |space=0|). 14739 14740No boundary characters enter into these ligatures. 14741 14742@<Declare math...@>= 14743procedure make_ord(@!q:pointer); 14744label restart,exit; 14745var a:integer; {address of lig/kern instruction} 14746@!p,@!r:pointer; {temporary registers for list manipulation} 14747begin restart:@t@>@;@/ 14748if math_type(subscr(q))=empty then if math_type(supscr(q))=empty then 14749 if math_type(nucleus(q))=math_char then 14750 begin p:=link(q); 14751 if p<>null then if (type(p)>=ord_noad)and(type(p)<=punct_noad) then 14752 if math_type(nucleus(p))=math_char then 14753 if fam(nucleus(p))=fam(nucleus(q)) then 14754 begin math_type(nucleus(q)):=math_text_char; 14755 fetch(nucleus(q)); 14756 if char_tag(cur_i)=lig_tag then 14757 begin a:=lig_kern_start(cur_f)(cur_i); 14758 cur_c:=character(nucleus(p)); 14759 cur_i:=font_info[a].qqqq; 14760 if skip_byte(cur_i)>stop_flag then 14761 begin a:=lig_kern_restart(cur_f)(cur_i); 14762 cur_i:=font_info[a].qqqq; 14763 end; 14764 loop@+ begin @<If instruction |cur_i| is a kern with |cur_c|, attach 14765 the kern after~|q|; or if it is a ligature with |cur_c|, combine 14766 noads |q| and~|p| appropriately; then |return| if the cursor has 14767 moved past a noad, or |goto restart|@>; 14768 if skip_byte(cur_i)>=stop_flag then return; 14769 a:=a+qo(skip_byte(cur_i))+1; 14770 cur_i:=font_info[a].qqqq; 14771 end; 14772 end; 14773 end; 14774 end; 14775exit:end; 14776 14777@ Note that a ligature between an |ord_noad| and another kind of noad 14778is replaced by an |ord_noad|, when the two noads collapse into one. 14779But we could make a parenthesis (say) change shape when it follows 14780certain letters. Presumably a font designer will define such 14781ligatures only when this convention makes sense. 14782 14783\chardef\?='174 % vertical line to indicate character retention 14784 14785@<If instruction |cur_i| is a kern with |cur_c|, ...@>= 14786if next_char(cur_i)=cur_c then if skip_byte(cur_i)<=stop_flag then 14787 if op_byte(cur_i)>=kern_flag then 14788 begin p:=new_kern(char_kern(cur_f)(cur_i)); 14789 link(p):=link(q); link(q):=p; return; 14790 end 14791 else begin check_interrupt; {allow a way out of infinite ligature loop} 14792 case op_byte(cur_i) of 14793 qi(1),qi(5): character(nucleus(q)):=rem_byte(cur_i); {\.{=:\?}, \.{=:\?>}} 14794 qi(2),qi(6): character(nucleus(p)):=rem_byte(cur_i); {\.{\?=:}, \.{\?=:>}} 14795 qi(3),qi(7),qi(11):begin r:=new_noad; {\.{\?=:\?}, \.{\?=:\?>}, \.{\?=:\?>>}} 14796 character(nucleus(r)):=rem_byte(cur_i); 14797 fam(nucleus(r)):=fam(nucleus(q));@/ 14798 link(q):=r; link(r):=p; 14799 if op_byte(cur_i)<qi(11) then math_type(nucleus(r)):=math_char 14800 else math_type(nucleus(r)):=math_text_char; {prevent combination} 14801 end; 14802 othercases begin link(q):=link(p); 14803 character(nucleus(q)):=rem_byte(cur_i); {\.{=:}} 14804 mem[subscr(q)]:=mem[subscr(p)]; mem[supscr(q)]:=mem[supscr(p)];@/ 14805 free_node(p,noad_size); 14806 end 14807 endcases; 14808 if op_byte(cur_i)>qi(3) then return; 14809 math_type(nucleus(q)):=math_char; goto restart; 14810 end 14811 14812@ When we get to the following part of the program, we have ``fallen through'' 14813from cases that did not lead to |check_dimensions| or |done_with_noad| or 14814|done_with_node|. Thus, |q|~points to a noad whose nucleus may need to be 14815converted to an hlist, and whose subscripts and superscripts need to be 14816appended if they are present. 14817 14818If |nucleus(q)| is not a |math_char|, the variable |delta| is the amount 14819by which a superscript should be moved right with respect to a subscript 14820when both are present. 14821@^subscripts@> 14822@^superscripts@> 14823 14824@<Convert \(n)|nucleus(q)| to an hlist and attach the sub/superscripts@>= 14825case math_type(nucleus(q)) of 14826math_char, math_text_char: 14827 @<Create a character node |p| for |nucleus(q)|, possibly followed 14828 by a kern node for the italic correction, and set |delta| to the 14829 italic correction if a subscript is present@>; 14830empty: p:=null; 14831sub_box: p:=info(nucleus(q)); 14832sub_mlist: begin cur_mlist:=info(nucleus(q)); save_style:=cur_style; 14833 mlist_penalties:=false; mlist_to_hlist; {recursive call} 14834@^recursion@> 14835 cur_style:=save_style; @<Set up the values...@>; 14836 p:=hpack(link(temp_head),natural); 14837 end; 14838othercases confusion("mlist2") 14839@:this can't happen mlist2}{\quad mlist2@> 14840endcases;@/ 14841new_hlist(q):=p; 14842if (math_type(subscr(q))=empty)and(math_type(supscr(q))=empty) then 14843 goto check_dimensions; 14844make_scripts(q,delta) 14845 14846@ @<Create a character node |p| for |nucleus(q)|...@>= 14847begin fetch(nucleus(q)); 14848if char_exists(cur_i) then 14849 begin delta:=char_italic(cur_f)(cur_i); p:=new_character(cur_f,qo(cur_c)); 14850 if (math_type(nucleus(q))=math_text_char)and(space(cur_f)<>0) then 14851 delta:=0; {no italic correction in mid-word of text font} 14852 if (math_type(subscr(q))=empty)and(delta<>0) then 14853 begin link(p):=new_kern(delta); delta:=0; 14854 end; 14855 end 14856else p:=null; 14857end 14858 14859@ The purpose of |make_scripts(q,delta)| is to attach the subscript and/or 14860superscript of noad |q| to the list that starts at |new_hlist(q)|, 14861given that the subscript and superscript aren't both empty. The superscript 14862will appear to the right of the subscript by a given distance |delta|. 14863 14864We set |shift_down| and |shift_up| to the minimum amounts to shift the 14865baseline of subscripts and superscripts based on the given nucleus. 14866 14867@<Declare math...@>= 14868procedure make_scripts(@!q:pointer;@!delta:scaled); 14869var p,@!x,@!y,@!z:pointer; {temporary registers for box construction} 14870@!shift_up,@!shift_down,@!clr:scaled; {dimensions in the calculation} 14871@!t:small_number; {subsidiary size code} 14872begin p:=new_hlist(q); 14873if is_char_node(p) then 14874 begin shift_up:=0; shift_down:=0; 14875 end 14876else begin z:=hpack(p,natural); 14877 if cur_style<script_style then t:=script_size@+else t:=script_script_size; 14878 shift_up:=height(z)-sup_drop(t); 14879 shift_down:=depth(z)+sub_drop(t); 14880 free_node(z,box_node_size); 14881 end; 14882if math_type(supscr(q))=empty then 14883 @<Construct a subscript box |x| when there is no superscript@> 14884else begin @<Construct a superscript box |x|@>; 14885 if math_type(subscr(q))=empty then shift_amount(x):=-shift_up 14886 else @<Construct a sub/superscript combination box |x|, with the 14887 superscript offset by |delta|@>; 14888 end; 14889if new_hlist(q)=null then new_hlist(q):=x 14890else begin p:=new_hlist(q); 14891 while link(p)<>null do p:=link(p); 14892 link(p):=x; 14893 end; 14894end; 14895 14896@ When there is a subscript without a superscript, the top of the subscript 14897should not exceed the baseline plus four-fifths of the x-height. 14898 14899@<Construct a subscript box |x| when there is no superscript@>= 14900begin x:=clean_box(subscr(q),sub_style(cur_style)); 14901width(x):=width(x)+script_space; 14902if shift_down<sub1(cur_size) then shift_down:=sub1(cur_size); 14903clr:=height(x)-(abs(math_x_height(cur_size)*4) div 5); 14904if shift_down<clr then shift_down:=clr; 14905shift_amount(x):=shift_down; 14906end 14907 14908@ The bottom of a superscript should never descend below the baseline plus 14909one-fourth of the x-height. 14910 14911@<Construct a superscript box |x|@>= 14912begin x:=clean_box(supscr(q),sup_style(cur_style)); 14913width(x):=width(x)+script_space; 14914if odd(cur_style) then clr:=sup3(cur_size) 14915else if cur_style<text_style then clr:=sup1(cur_size) 14916else clr:=sup2(cur_size); 14917if shift_up<clr then shift_up:=clr; 14918clr:=depth(x)+(abs(math_x_height(cur_size)) div 4); 14919if shift_up<clr then shift_up:=clr; 14920end 14921 14922@ When both subscript and superscript are present, the subscript must be 14923separated from the superscript by at least four times |default_rule_thickness|. 14924If this condition would be violated, the subscript moves down, after which 14925both subscript and superscript move up so that the bottom of the superscript 14926is at least as high as the baseline plus four-fifths of the x-height. 14927 14928@<Construct a sub/superscript combination box |x|...@>= 14929begin y:=clean_box(subscr(q),sub_style(cur_style)); 14930width(y):=width(y)+script_space; 14931if shift_down<sub2(cur_size) then shift_down:=sub2(cur_size); 14932clr:=4*default_rule_thickness- 14933 ((shift_up-depth(x))-(height(y)-shift_down)); 14934if clr>0 then 14935 begin shift_down:=shift_down+clr; 14936 clr:=(abs(math_x_height(cur_size)*4) div 5)-(shift_up-depth(x)); 14937 if clr>0 then 14938 begin shift_up:=shift_up+clr; 14939 shift_down:=shift_down-clr; 14940 end; 14941 end; 14942shift_amount(x):=delta; {superscript is |delta| to the right of the subscript} 14943p:=new_kern((shift_up-depth(x))-(height(y)-shift_down)); link(x):=p; link(p):=y; 14944x:=vpack(x,natural); shift_amount(x):=shift_down; 14945end 14946 14947@ We have now tied up all the loose ends of the first pass of |mlist_to_hlist|. 14948The second pass simply goes through and hooks everything together with the 14949proper glue and penalties. It also handles the |left_noad| and |right_noad| that 14950might be present, since |max_h| and |max_d| are now known. Variable |p| points 14951to a node at the current end of the final hlist. 14952 14953@<Make a second pass over the mlist, ...@>= 14954p:=temp_head; link(p):=null; q:=mlist; r_type:=0; cur_style:=style; 14955@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 14956while q<>null do 14957 begin @<If node |q| is a style node, change the style and |goto delete_q|; 14958 otherwise if it is not a noad, put it into the hlist, 14959 advance |q|, and |goto done|; otherwise set |s| to the size 14960 of noad |q|, set |t| to the associated type (|ord_noad.. 14961 inner_noad|), and set |pen| to the associated penalty@>; 14962 @<Append inter-element spacing based on |r_type| and |t|@>; 14963 @<Append any |new_hlist| entries for |q|, and any appropriate penalties@>; 14964 r_type:=t; 14965 delete_q: r:=q; q:=link(q); free_node(r,s); 14966 done: end 14967 14968@ Just before doing the big |case| switch in the second pass, the program 14969sets up default values so that most of the branches are short. 14970 14971@<If node |q| is a style node, change the style...@>= 14972t:=ord_noad; s:=noad_size; pen:=inf_penalty; 14973case type(q) of 14974op_noad,open_noad,close_noad,punct_noad,inner_noad: t:=type(q); 14975bin_noad: begin t:=bin_noad; pen:=bin_op_penalty; 14976 end; 14977rel_noad: begin t:=rel_noad; pen:=rel_penalty; 14978 end; 14979ord_noad,vcenter_noad,over_noad,under_noad: do_nothing; 14980radical_noad: s:=radical_noad_size; 14981accent_noad: s:=accent_noad_size; 14982fraction_noad: begin t:=inner_noad; s:=fraction_noad_size; 14983 end; 14984left_noad,right_noad: t:=make_left_right(q,style,max_d,max_h); 14985style_node: @<Change the current style and |goto delete_q|@>; 14986whatsit_node,penalty_node,rule_node,disc_node,adjust_node,ins_node,mark_node, 14987 glue_node,kern_node:@t@>@;@/ 14988 begin link(p):=q; p:=q; q:=link(q); link(p):=null; goto done; 14989 end; 14990othercases confusion("mlist3") 14991@:this can't happen mlist3}{\quad mlist3@> 14992endcases 14993 14994@ The |make_left_right| function constructs a left or right delimiter of 14995the required size and returns the value |open_noad| or |close_noad|. The 14996|right_noad| and |left_noad| will both be based on the original |style|, 14997so they will have consistent sizes. 14998 14999We use the fact that |right_noad-left_noad=close_noad-open_noad|. 15000 15001@<Declare math...@>= 15002function make_left_right(@!q:pointer;@!style:small_number; 15003 @!max_d,@!max_h:scaled):small_number; 15004var delta,@!delta1,@!delta2:scaled; {dimensions used in the calculation} 15005begin if style<script_style then cur_size:=text_size 15006else cur_size:=16*((style-text_style) div 2); 15007delta2:=max_d+axis_height(cur_size); 15008delta1:=max_h+max_d-delta2; 15009if delta2>delta1 then delta1:=delta2; {|delta1| is max distance from axis} 15010delta:=(delta1 div 500)*delimiter_factor; 15011delta2:=delta1+delta1-delimiter_shortfall; 15012if delta<delta2 then delta:=delta2; 15013new_hlist(q):=var_delimiter(delimiter(q),cur_size,delta); 15014make_left_right:=type(q)-(left_noad-open_noad); {|open_noad| or |close_noad|} 15015end; 15016 15017@ @<Change the current style and |goto delete_q|@>= 15018begin cur_style:=subtype(q); s:=style_node_size; 15019@<Set up the values of |cur_size| and |cur_mu|, based on |cur_style|@>; 15020goto delete_q; 15021end 15022 15023@ The inter-element spacing in math formulas depends on a $8\times8$ table that 15024\TeX\ preloads as a 64-digit string. The elements of this string have the 15025following significance: 15026$$\vbox{\halign{#\hfil\cr 15027\.0 means no space;\cr 15028\.1 means a conditional thin space (\.{\\nonscript\\mskip\\thinmuskip});\cr 15029\.2 means a thin space (\.{\\mskip\\thinmuskip});\cr 15030\.3 means a conditional medium space 15031 (\.{\\nonscript\\mskip\\medmuskip});\cr 15032\.4 means a conditional thick space 15033 (\.{\\nonscript\\mskip\\thickmuskip});\cr 15034\.* means an impossible case.\cr}}$$ 15035This is all pretty cryptic, but {\sl The \TeX book\/} explains what is 15036supposed to happen, and the string makes it happen. 15037@:TeXbook}{\sl The \TeX book@> 15038 15039A global variable |magic_offset| is computed so that if |a| and |b| are 15040in the range |ord_noad..inner_noad|, then |str_pool[a*8+b+magic_offset]| 15041is the digit for spacing between noad types |a| and |b|. 15042 15043If \PASCAL\ had provided a good way to preload constant arrays, this part of 15044the program would not have been so strange. 15045@:PASCAL}{\PASCAL@> 15046 15047@d math_spacing=@;@/ 15048@t\hskip-35pt@> 15049"0234000122*4000133**3**344*0400400*000000234000111*1111112341011" 15050@t$ \hskip-35pt$@> 15051 15052@<Glob...@>= 15053@!magic_offset:integer; {used to find inter-element spacing} 15054 15055@ @<Compute the magic offset@>= 15056magic_offset:=str_start[math_spacing]-9*ord_noad 15057 15058@ @<Append inter-element spacing based on |r_type| and |t|@>= 15059if r_type>0 then {not the first noad} 15060 begin case so(str_pool[r_type*8+t+magic_offset]) of 15061 "0": x:=0; 15062 "1": if cur_style<script_style then x:=thin_mu_skip_code@+else x:=0; 15063 "2": x:=thin_mu_skip_code; 15064 "3": if cur_style<script_style then x:=med_mu_skip_code@+else x:=0; 15065 "4": if cur_style<script_style then x:=thick_mu_skip_code@+else x:=0; 15066 othercases confusion("mlist4") 15067@:this can't happen mlist4}{\quad mlist4@> 15068 endcases; 15069 if x<>0 then 15070 begin y:=math_glue(glue_par(x),cur_mu); 15071 z:=new_glue(y); glue_ref_count(y):=null; link(p):=z; p:=z;@/ 15072 subtype(z):=x+1; {store a symbolic subtype} 15073 end; 15074 end 15075 15076@ We insert a penalty node after the hlist entries of noad |q| if |pen| 15077is not an ``infinite'' penalty, and if the node immediately following |q| 15078is not a penalty node or a |rel_noad| or absent entirely. 15079 15080@<Append any |new_hlist| entries for |q|, and any appropriate penalties@>= 15081if new_hlist(q)<>null then 15082 begin link(p):=new_hlist(q); 15083 repeat p:=link(p); 15084 until link(p)=null; 15085 end; 15086if penalties then if link(q)<>null then if pen<inf_penalty then 15087 begin r_type:=type(link(q)); 15088 if r_type<>penalty_node then if r_type<>rel_noad then 15089 begin z:=new_penalty(pen); link(p):=z; p:=z; 15090 end; 15091 end 15092 15093@* \[37] Alignment. 15094It's sort of a miracle whenever \.{\\halign} and \.{\\valign} work, because 15095they cut across so many of the control structures of \TeX. 15096 15097Therefore the 15098present page is probably not the best place for a beginner to start reading 15099this program; it is better to master everything else first. 15100 15101Let us focus our thoughts on an example of what the input might be, in order 15102to get some idea about how the alignment miracle happens. The example doesn't 15103do anything useful, but it is sufficiently general to indicate all of the 15104special cases that must be dealt with; please do not be disturbed by its 15105apparent complexity and meaninglessness. 15106$$\vbox{\halign{\.{#}\hfil\cr 15107{}\\tabskip 2pt plus 3pt\cr 15108{}\\halign to 300pt\{u1\#v1\&\cr 15109\hskip 50pt\\tabskip 1pt plus 1fil u2\#v2\&\cr 15110\hskip 50pt u3\#v3\\cr\cr 15111\hskip 25pt a1\&\\omit a2\&\\vrule\\cr\cr 15112\hskip 25pt \\noalign\{\\vskip 3pt\}\cr 15113\hskip 25pt b1\\span b2\\cr\cr 15114\hskip 25pt \\omit\&c2\\span\\omit\\cr\}\cr}}$$ 15115Here's what happens: 15116 15117\yskip 15118(0) When `\.{\\halign to 300pt\{}' is scanned, the |scan_spec| routine 15119places the 300pt dimension onto the |save_stack|, and an |align_group| 15120code is placed above it. This will make it possible to complete the alignment 15121when the matching `\.\}' is found. 15122 15123(1) The preamble is scanned next. Macros in the preamble are not expanded, 15124@^preamble@> 15125except as part of a tabskip specification. For example, if \.{u2} had been 15126a macro in the preamble above, it would have been expanded, since \TeX\ 15127must look for `\.{minus...}' as part of the tabskip glue. A ``preamble list'' 15128is constructed based on the user's preamble; in our case it contains the 15129following seven items: 15130$$\vbox{\halign{\.{#}\hfil\qquad&(#)\hfil\cr 15131{}\\glue 2pt plus 3pt&the tabskip preceding column 1\cr 15132{}\\alignrecord, width $-\infty$&preamble info for column 1\cr 15133{}\\glue 2pt plus 3pt&the tabskip between columns 1 and 2\cr 15134{}\\alignrecord, width $-\infty$&preamble info for column 2\cr 15135{}\\glue 1pt plus 1fil&the tabskip between columns 2 and 3\cr 15136{}\\alignrecord, width $-\infty$&preamble info for column 3\cr 15137{}\\glue 1pt plus 1fil&the tabskip following column 3\cr}}$$ 15138These ``alignrecord'' entries have the same size as an |unset_node|, 15139since they will later be converted into such nodes. However, at the 15140moment they have no |type| or |subtype| fields; they have |info| fields 15141instead, and these |info| fields are initially set to the value |end_span|, 15142for reasons explained below. Furthermore, the alignrecord nodes have no 15143|height| or |depth| fields; these are renamed |u_part| and |v_part|, 15144and they point to token lists for the templates of the alignment. 15145For example, the |u_part| field in the first alignrecord points to the 15146token list `\.{u1}', i.e., the template preceding the `\.\#' for column~1. 15147 15148(2) \TeX\ now looks at what follows the \.{\\cr} that ended the preamble. 15149It is not `\.{\\noalign}' or `\.{\\omit}', so this input is put back to 15150be read again, and the template `\.{u1}' is fed to the scanner. Just 15151before reading `\.{u1}', \TeX\ goes into restricted horizontal mode. 15152Just after reading `\.{u1}', \TeX\ will see `\.{a1}', and then (when the 15153{\.\&} is sensed) \TeX\ will see `\.{v1}'. Then \TeX\ scans an |endv| 15154token, indicating the end of a column. At this point an |unset_node| is 15155created, containing the contents of the current hlist (i.e., `\.{u1a1v1}'). 15156The natural width of this unset node replaces the |width| field of the 15157alignrecord for column~1; in general, the alignrecords will record the 15158maximum natural width that has occurred so far in a given column. 15159 15160(3) Since `\.{\\omit}' follows the `\.\&', the templates for column~2 15161are now bypassed. Again \TeX\ goes into restricted horizontal mode and 15162makes an |unset_node| from the resulting hlist; but this time the 15163hlist contains simply `\.{a2}'. The natural width of the new unset box 15164is remembered in the |width| field of the alignrecord for column~2. 15165 15166(4) A third |unset_node| is created for column 3, using essentially the 15167mechanism that worked for column~1; this unset box contains `\.{u3\\vrule 15168v3}'. The vertical rule in this case has running dimensions that will later 15169extend to the height and depth of the whole first row, since each |unset_node| 15170in a row will eventually inherit the height and depth of its enclosing box. 15171 15172(5) The first row has now ended; it is made into a single unset box 15173comprising the following seven items: 15174$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15175{}\\glue 2pt plus 3pt\cr 15176{}\\unsetbox for 1 column: u1a1v1\cr 15177{}\\glue 2pt plus 3pt\cr 15178{}\\unsetbox for 1 column: a2\cr 15179{}\\glue 1pt plus 1fil\cr 15180{}\\unsetbox for 1 column: u3\\vrule v3\cr 15181{}\\glue 1pt plus 1fil\cr}}$$ 15182The width of this unset row is unimportant, but it has the correct height 15183and depth, so the correct baselineskip glue will be computed as the row 15184is inserted into a vertical list. 15185 15186(6) Since `\.{\\noalign}' follows the current \.{\\cr}, \TeX\ appends 15187additional material (in this case \.{\\vskip 3pt}) to the vertical list. 15188While processing this material, \TeX\ will be in internal vertical 15189mode, and |no_align_group| will be on |save_stack|. 15190 15191(7) The next row produces an unset box that looks like this: 15192$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15193{}\\glue 2pt plus 3pt\cr 15194{}\\unsetbox for 2 columns: u1b1v1u2b2v2\cr 15195{}\\glue 1pt plus 1fil\cr 15196{}\\unsetbox for 1 column: {\rm(empty)}\cr 15197{}\\glue 1pt plus 1fil\cr}}$$ 15198The natural width of the unset box that spans columns 1~and~2 is stored 15199in a ``span node,'' which we will explain later; the |info| field of the 15200alignrecord for column~1 now points to the new span node, and the |info| 15201of the span node points to |end_span|. 15202 15203(8) The final row produces the unset box 15204$$\vbox{\halign{\hbox to 325pt{\qquad\.{#}\hfil}\cr 15205{}\\glue 2pt plus 3pt\cr 15206{}\\unsetbox for 1 column: {\rm(empty)}\cr 15207{}\\glue 2pt plus 3pt\cr 15208{}\\unsetbox for 2 columns: u2c2v2\cr 15209{}\\glue 1pt plus 1fil\cr}}$$ 15210A new span node is attached to the alignrecord for column 2. 15211 15212(9) The last step is to compute the true column widths and to change all the 15213unset boxes to hboxes, appending the whole works to the vertical list that 15214encloses the \.{\\halign}. The rules for deciding on the final widths of 15215each unset column box will be explained below. 15216 15217\yskip\noindent 15218Note that as \.{\\halign} is being processed, we fearlessly give up control 15219to the rest of \TeX. At critical junctures, an alignment routine is 15220called upon to step in and do some little action, but most of the time 15221these routines just lurk in the background. It's something like 15222post-hypnotic suggestion. 15223 15224@ We have mentioned that alignrecords contain no |height| or |depth| fields. 15225Their |glue_sign| and |glue_order| are pre-empted as well, since it 15226is necessary to store information about what to do when a template ends. 15227This information is called the |extra_info| field. 15228 15229@d u_part(#)==mem[#+height_offset].int {pointer to \<u_j> token list} 15230@d v_part(#)==mem[#+depth_offset].int {pointer to \<v_j> token list} 15231@d extra_info(#)==info(#+list_offset) {info to remember during template} 15232 15233@ Alignments can occur within alignments, so a small stack is used to access 15234the alignrecord information. At each level we have a |preamble| pointer, 15235indicating the beginning of the preamble list; a |cur_align| pointer, 15236indicating the current position in the preamble list; a |cur_span| pointer, 15237indicating the value of |cur_align| at the beginning of a sequence of 15238spanned columns; a |cur_loop| pointer, indicating the tabskip glue before 15239an alignrecord that should be copied next if the current list is extended; 15240and the |align_state| variable, which indicates the nesting of braces so 15241that \.{\\cr} and \.{\\span} and tab marks are properly intercepted. 15242There also are pointers |cur_head| and |cur_tail| to the head and tail 15243of a list of adjustments being moved out from horizontal mode to 15244vertical~mode. 15245 15246The current values of these seven quantities appear in global variables; 15247when they have to be pushed down, they are stored in 5-word nodes, and 15248|align_ptr| points to the topmost such node. 15249 15250@d preamble==link(align_head) {the current preamble list} 15251@d align_stack_node_size=5 {number of |mem| words to save alignment states} 15252 15253@<Glob...@>= 15254@!cur_align:pointer; {current position in preamble list} 15255@!cur_span:pointer; {start of currently spanned columns in preamble list} 15256@!cur_loop:pointer; {place to copy when extending a periodic preamble} 15257@!align_ptr:pointer; {most recently pushed-down alignment stack node} 15258@!cur_head,@!cur_tail:pointer; {adjustment list pointers} 15259 15260@ The |align_state| and |preamble| variables are initialized elsewhere. 15261 15262@<Set init...@>= 15263align_ptr:=null; cur_align:=null; cur_span:=null; cur_loop:=null; 15264cur_head:=null; cur_tail:=null; 15265 15266@ Alignment stack maintenance is handled by a pair of trivial routines 15267called |push_alignment| and |pop_alignment|. 15268 15269@p procedure push_alignment; 15270var p:pointer; {the new alignment stack node} 15271begin p:=get_node(align_stack_node_size); 15272link(p):=align_ptr; info(p):=cur_align; 15273llink(p):=preamble; rlink(p):=cur_span; 15274mem[p+2].int:=cur_loop; mem[p+3].int:=align_state; 15275info(p+4):=cur_head; link(p+4):=cur_tail; 15276align_ptr:=p; 15277cur_head:=get_avail; 15278end; 15279@# 15280procedure pop_alignment; 15281var p:pointer; {the top alignment stack node} 15282begin free_avail(cur_head); 15283p:=align_ptr; 15284cur_tail:=link(p+4); cur_head:=info(p+4); 15285align_state:=mem[p+3].int; cur_loop:=mem[p+2].int; 15286cur_span:=rlink(p); preamble:=llink(p); 15287cur_align:=info(p); align_ptr:=link(p); 15288free_node(p,align_stack_node_size); 15289end; 15290 15291@ \TeX\ has eight procedures that govern alignments: |init_align| and 15292|fin_align| are used at the very beginning and the very end; |init_row| and 15293|fin_row| are used at the beginning and end of individual rows; |init_span| 15294is used at the beginning of a sequence of spanned columns (possibly involving 15295only one column); |init_col| and |fin_col| are used at the beginning and 15296end of individual columns; and |align_peek| is used after \.{\\cr} to see 15297whether the next item is \.{\\noalign}. 15298 15299We shall consider these routines in the order they are first used during 15300the course of a complete \.{\\halign}, namely |init_align|, |align_peek|, 15301|init_row|, |init_span|, |init_col|, |fin_col|, |fin_row|, |fin_align|. 15302 15303@ When \.{\\halign} or \.{\\valign} has been scanned in an appropriate 15304mode, \TeX\ calls |init_align|, whose task is to get everything off to a 15305good start. This mostly involves scanning the preamble and putting its 15306information into the preamble list. 15307@^preamble@> 15308 15309@p @t\4@>@<Declare the procedure called |get_preamble_token|@>@t@>@/ 15310procedure@?align_peek; forward;@t\2@>@/ 15311procedure@?normal_paragraph; forward;@t\2@>@/ 15312procedure init_align; 15313label done, done1, done2, continue; 15314var save_cs_ptr:pointer; {|warning_index| value for error messages} 15315@!p:pointer; {for short-term temporary use} 15316begin save_cs_ptr:=cur_cs; {\.{\\halign} or \.{\\valign}, usually} 15317push_alignment; align_state:=-1000000; {enter a new alignment level} 15318@<Check for improper alignment in displayed math@>; 15319push_nest; {enter a new semantic level} 15320@<Change current mode to |-vmode| for \.{\\halign}, |-hmode| for \.{\\valign}@>; 15321scan_spec(align_group,false);@/ 15322@<Scan the preamble and record it in the |preamble| list@>; 15323new_save_level(align_group); 15324if every_cr<>null then begin_token_list(every_cr,every_cr_text); 15325align_peek; {look for \.{\\noalign} or \.{\\omit}} 15326end; 15327 15328@ In vertical modes, |prev_depth| already has the correct value. But 15329if we are in |mmode| (displayed formula mode), we reach out to the 15330enclosing vertical mode for the |prev_depth| value that produces the 15331correct baseline calculations. 15332 15333@<Change current mode...@>= 15334if mode=mmode then 15335 begin mode:=-vmode; prev_depth:=nest[nest_ptr-2].aux_field.sc; 15336 end 15337else if mode>0 then negate(mode) 15338 15339@ When \.{\\halign} is used as a displayed formula, there should be 15340no other pieces of mlists present. 15341 15342@<Check for improper alignment in displayed math@>= 15343if (mode=mmode)and((tail<>head)or(incompleat_noad<>null)) then 15344 begin print_err("Improper "); print_esc("halign"); print(" inside $$'s"); 15345@.Improper \\halign...@> 15346 help3("Displays can use special alignments (like \eqalignno)")@/ 15347 ("only if nothing but the alignment itself is between $$'s.")@/ 15348 ("So I've deleted the formulas that preceded this alignment."); 15349 error; flush_math; 15350 end 15351 15352@ @<Scan the preamble and record it in the |preamble| list@>= 15353preamble:=null; cur_align:=align_head; cur_loop:=null; scanner_status:=aligning; 15354warning_index:=save_cs_ptr; align_state:=-1000000; 15355 {at this point, |cur_cmd=left_brace|} 15356loop@+ begin @<Append the current tabskip glue to the preamble list@>; 15357 if cur_cmd=car_ret then goto done; {\.{\\cr} ends the preamble} 15358 @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|, 15359 looking for changes in the tabskip glue; append an 15360 alignrecord to the preamble list@>; 15361 end; 15362done: scanner_status:=normal 15363 15364@ @<Append the current tabskip glue to the preamble list@>= 15365link(cur_align):=new_param_glue(tab_skip_code); 15366cur_align:=link(cur_align) 15367 15368@ @<Scan preamble text until |cur_cmd| is |tab_mark| or |car_ret|...@>= 15369@<Scan the template \<u_j>, putting the resulting token list in |hold_head|@>; 15370link(cur_align):=new_null_box; cur_align:=link(cur_align); {a new alignrecord} 15371info(cur_align):=end_span; width(cur_align):=null_flag; 15372u_part(cur_align):=link(hold_head); 15373@<Scan the template \<v_j>, putting the resulting token list in |hold_head|@>; 15374v_part(cur_align):=link(hold_head) 15375 15376@ We enter `\.{\\span}' into |eqtb| with |tab_mark| as its command code, 15377and with |span_code| as the command modifier. This makes \TeX\ interpret it 15378essentially the same as an alignment delimiter like `\.\&', yet it is 15379recognizably different when we need to distinguish it from a normal delimiter. 15380It also turns out to be useful to give a special |cr_code| to `\.{\\cr}', 15381and an even larger |cr_cr_code| to `\.{\\crcr}'. 15382 15383The end of a template is represented by two ``frozen'' control sequences 15384called \.{\\endtemplate}. The first has the command code |end_template|, which 15385is |>outer_call|, so it will not easily disappear in the presence of errors. 15386The |get_x_token| routine converts the first into the second, which has |endv| 15387as its command code. 15388 15389@d span_code=256 {distinct from any character} 15390@d cr_code=257 {distinct from |span_code| and from any character} 15391@d cr_cr_code=cr_code+1 {this distinguishes \.{\\crcr} from \.{\\cr}} 15392@d end_template_token==cs_token_flag+frozen_end_template 15393 15394@<Put each of \TeX's primitives into the hash table@>= 15395primitive("span",tab_mark,span_code);@/ 15396@!@:span_}{\.{\\span} primitive@> 15397primitive("cr",car_ret,cr_code); 15398@!@:cr_}{\.{\\cr} primitive@> 15399text(frozen_cr):="cr"; eqtb[frozen_cr]:=eqtb[cur_val];@/ 15400primitive("crcr",car_ret,cr_cr_code); 15401@!@:cr_cr_}{\.{\\crcr} primitive@> 15402text(frozen_end_template):="endtemplate"; text(frozen_endv):="endtemplate"; 15403eq_type(frozen_endv):=endv; equiv(frozen_endv):=null_list; 15404eq_level(frozen_endv):=level_one;@/ 15405eqtb[frozen_end_template]:=eqtb[frozen_endv]; 15406eq_type(frozen_end_template):=end_template; 15407 15408@ @<Cases of |print_cmd_chr|...@>= 15409tab_mark: if chr_code=span_code then print_esc("span") 15410 else chr_cmd("alignment tab character "); 15411car_ret: if chr_code=cr_code then print_esc("cr") 15412 else print_esc("crcr"); 15413 15414@ The preamble is copied directly, except that \.{\\tabskip} causes a change 15415to the tabskip glue, thereby possibly expanding macros that immediately 15416follow it. An appearance of \.{\\span} also causes such an expansion. 15417 15418Note that if the preamble contains `\.{\\global\\tabskip}', the `\.{\\global}' 15419token survives in the preamble and the `\.{\\tabskip}' defines new 15420tabskip glue (locally). 15421 15422@<Declare the procedure called |get_preamble_token|@>= 15423procedure get_preamble_token; 15424label restart; 15425begin restart: get_token; 15426while (cur_chr=span_code)and(cur_cmd=tab_mark) do 15427 begin get_token; {this token will be expanded once} 15428 if cur_cmd>max_command then 15429 begin expand; get_token; 15430 end; 15431 end; 15432if cur_cmd=endv then 15433 fatal_error("(interwoven alignment preambles are not allowed)"); 15434@.interwoven alignment preambles...@> 15435if (cur_cmd=assign_glue)and(cur_chr=glue_base+tab_skip_code) then 15436 begin scan_optional_equals; scan_glue(glue_val); 15437 if global_defs>0 then geq_define(glue_base+tab_skip_code,glue_ref,cur_val) 15438 else eq_define(glue_base+tab_skip_code,glue_ref,cur_val); 15439 goto restart; 15440 end; 15441end; 15442 15443@ Spaces are eliminated from the beginning of a template. 15444 15445@<Scan the template \<u_j>...@>= 15446p:=hold_head; link(p):=null; 15447loop@+ begin get_preamble_token; 15448 if cur_cmd=mac_param then goto done1; 15449 if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then 15450 if (p=hold_head)and(cur_loop=null)and(cur_cmd=tab_mark) 15451 then cur_loop:=cur_align 15452 else begin print_err("Missing # inserted in alignment preamble"); 15453@.Missing \# inserted...@> 15454 help3("There should be exactly one # between &'s, when an")@/ 15455 ("\halign or \valign is being set up. In this case you had")@/ 15456 ("none, so I've put one in; maybe that will work."); 15457 back_error; goto done1; 15458 end 15459 else if (cur_cmd<>spacer)or(p<>hold_head) then 15460 begin link(p):=get_avail; p:=link(p); info(p):=cur_tok; 15461 end; 15462 end; 15463done1: 15464 15465@ @<Scan the template \<v_j>...@>= 15466p:=hold_head; link(p):=null; 15467loop@+ begin continue: get_preamble_token; 15468 if (cur_cmd<=car_ret)and(cur_cmd>=tab_mark)and(align_state=-1000000) then 15469 goto done2; 15470 if cur_cmd=mac_param then 15471 begin print_err("Only one # is allowed per tab"); 15472@.Only one \# is allowed...@> 15473 help3("There should be exactly one # between &'s, when an")@/ 15474 ("\halign or \valign is being set up. In this case you had")@/ 15475 ("more than one, so I'm ignoring all but the first."); 15476 error; goto continue; 15477 end; 15478 link(p):=get_avail; p:=link(p); info(p):=cur_tok; 15479 end; 15480done2: link(p):=get_avail; p:=link(p); 15481info(p):=end_template_token {put \.{\\endtemplate} at the end} 15482 15483@ The tricky part about alignments is getting the templates into the 15484scanner at the right time, and recovering control when a row or column 15485is finished. 15486 15487We usually begin a row after each \.{\\cr} has been sensed, unless that 15488\.{\\cr} is followed by \.{\\noalign} or by the right brace that terminates 15489the alignment. The |align_peek| routine is used to look ahead and do 15490the right thing; it either gets a new row started, or gets a \.{\\noalign} 15491started, or finishes off the alignment. 15492 15493@<Declare the procedure called |align_peek|@>= 15494procedure align_peek; 15495label restart; 15496begin restart: align_state:=1000000; @<Get the next non-blank non-call token@>; 15497if cur_cmd=no_align then 15498 begin scan_left_brace; new_save_level(no_align_group); 15499 if mode=-vmode then normal_paragraph; 15500 end 15501else if cur_cmd=right_brace then fin_align 15502else if (cur_cmd=car_ret)and(cur_chr=cr_cr_code) then 15503 goto restart {ignore \.{\\crcr}} 15504else begin init_row; {start a new row} 15505 init_col; {start a new column and replace what we peeked at} 15506 end; 15507end; 15508 15509@ To start a row (i.e., a `row' that rhymes with `dough' but not with `bough'), 15510we enter a new semantic level, copy the first tabskip glue, and change 15511from internal vertical mode to restricted horizontal mode or vice versa. 15512The |space_factor| and |prev_depth| are not used on this semantic level, 15513but we clear them to zero just to be tidy. 15514 15515@p @t\4@>@<Declare the procedure called |init_span|@>@t@>@/ 15516procedure init_row; 15517begin push_nest; mode:=(-hmode-vmode)-mode; 15518if mode=-hmode then space_factor:=0 @+else prev_depth:=0; 15519tail_append(new_glue(glue_ptr(preamble))); 15520subtype(tail):=tab_skip_code+1;@/ 15521cur_align:=link(preamble); cur_tail:=cur_head; init_span(cur_align); 15522end; 15523 15524@ The parameter to |init_span| is a pointer to the alignrecord where the 15525next column or group of columns will begin. A new semantic level is 15526entered, so that the columns will generate a list for subsequent packaging. 15527 15528@<Declare the procedure called |init_span|@>= 15529procedure init_span(@!p:pointer); 15530begin push_nest; 15531if mode=-hmode then space_factor:=1000 15532else begin prev_depth:=ignore_depth; normal_paragraph; 15533 end; 15534cur_span:=p; 15535end; 15536 15537@ When a column begins, we assume that |cur_cmd| is either |omit| or else 15538the current token should be put back into the input until the \<u_j> 15539template has been scanned. (Note that |cur_cmd| might be |tab_mark| or 15540|car_ret|.) We also assume that |align_state| is approximately 1000000 at 15541this time. We remain in the same mode, and start the template if it is 15542called for. 15543 15544@p procedure init_col; 15545begin extra_info(cur_align):=cur_cmd; 15546if cur_cmd=omit then align_state:=0 15547else begin back_input; begin_token_list(u_part(cur_align),u_template); 15548 end; {now |align_state=1000000|} 15549end; 15550 15551@ The scanner sets |align_state| to zero when the \<u_j> template ends. When 15552a subsequent \.{\\cr} or \.{\\span} or tab mark occurs with |align_state=0|, 15553the scanner activates the following code, which fires up the \<v_j> template. 15554We need to remember the |cur_chr|, which is either |cr_cr_code|, |cr_code|, 15555|span_code|, or a character code, depending on how the column text has ended. 15556 15557This part of the program had better not be activated when the preamble 15558to another alignment is being scanned, or when no alignment preamble is active. 15559 15560@<Insert the \(v)\<v_j>...@>= 15561begin if (scanner_status=aligning) or (cur_align=null) then 15562 fatal_error("(interwoven alignment preambles are not allowed)"); 15563@.interwoven alignment preambles...@> 15564cur_cmd:=extra_info(cur_align); extra_info(cur_align):=cur_chr; 15565if cur_cmd=omit then begin_token_list(omit_template,v_template) 15566else begin_token_list(v_part(cur_align),v_template); 15567align_state:=1000000; goto restart; 15568end 15569 15570@ The token list |omit_template| just referred to is a constant token 15571list that contains the special control sequence \.{\\endtemplate} only. 15572 15573@<Initialize the special...@>= 15574info(omit_template):=end_template_token; {|link(omit_template)=null|} 15575 15576@ When the |endv| command at the end of a \<v_j> template comes through the 15577scanner, things really start to happen; and it is the |fin_col| routine 15578that makes them happen. This routine returns |true| if a row as well as a 15579column has been finished. 15580 15581@p function fin_col:boolean; 15582label exit; 15583var p:pointer; {the alignrecord after the current one} 15584@!q,@!r:pointer; {temporary pointers for list manipulation} 15585@!s:pointer; {a new span node} 15586@!u:pointer; {a new unset box} 15587@!w:scaled; {natural width} 15588@!o:glue_ord; {order of infinity} 15589@!n:halfword; {span counter} 15590begin if cur_align=null then confusion("endv"); 15591q:=link(cur_align);@+if q=null then confusion("endv"); 15592@:this can't happen endv}{\quad endv@> 15593if align_state<500000 then 15594 fatal_error("(interwoven alignment preambles are not allowed)"); 15595@.interwoven alignment preambles...@> 15596p:=link(q); 15597@<If the preamble list has been traversed, check that the row has ended@>; 15598if extra_info(cur_align)<>span_code then 15599 begin unsave; new_save_level(align_group);@/ 15600 @<Package an unset box for the current column and record its width@>; 15601 @<Copy the tabskip glue between columns@>; 15602 if extra_info(cur_align)>=cr_code then 15603 begin fin_col:=true; return; 15604 end; 15605 init_span(p); 15606 end; 15607align_state:=1000000; @<Get the next non-blank non-call token@>; 15608cur_align:=p; 15609init_col; fin_col:=false; 15610exit: end; 15611 15612@ @<If the preamble list has been traversed, check that the row has ended@>= 15613if (p=null)and(extra_info(cur_align)<cr_code) then 15614 if cur_loop<>null then @<Lengthen the preamble periodically@> 15615 else begin print_err("Extra alignment tab has been changed to "); 15616@.Extra alignment tab...@> 15617 print_esc("cr"); 15618 help3("You have given more \span or & marks than there were")@/ 15619 ("in the preamble to the \halign or \valign now in progress.")@/ 15620 ("So I'll assume that you meant to type \cr instead."); 15621 extra_info(cur_align):=cr_code; error; 15622 end 15623 15624@ @<Lengthen the preamble...@>= 15625begin link(q):=new_null_box; p:=link(q); {a new alignrecord} 15626info(p):=end_span; width(p):=null_flag; cur_loop:=link(cur_loop); 15627@<Copy the templates from node |cur_loop| into node |p|@>; 15628cur_loop:=link(cur_loop); 15629link(p):=new_glue(glue_ptr(cur_loop)); 15630end 15631 15632@ @<Copy the templates from node |cur_loop| into node |p|@>= 15633q:=hold_head; r:=u_part(cur_loop); 15634while r<>null do 15635 begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r); 15636 end; 15637link(q):=null; u_part(p):=link(hold_head); 15638q:=hold_head; r:=v_part(cur_loop); 15639while r<>null do 15640 begin link(q):=get_avail; q:=link(q); info(q):=info(r); r:=link(r); 15641 end; 15642link(q):=null; v_part(p):=link(hold_head) 15643 15644@ @<Copy the tabskip glue...@>= 15645tail_append(new_glue(glue_ptr(link(cur_align)))); 15646subtype(tail):=tab_skip_code+1 15647 15648@ @<Package an unset...@>= 15649begin if mode=-hmode then 15650 begin adjust_tail:=cur_tail; u:=hpack(link(head),natural); w:=width(u); 15651 cur_tail:=adjust_tail; adjust_tail:=null; 15652 end 15653else begin u:=vpackage(link(head),natural,0); w:=height(u); 15654 end; 15655n:=min_quarterword; {this represents a span count of 1} 15656if cur_span<>cur_align then @<Update width entry for spanned columns@> 15657else if w>width(cur_align) then width(cur_align):=w; 15658type(u):=unset_node; span_count(u):=n;@/ 15659@<Determine the stretch order@>; 15660glue_order(u):=o; glue_stretch(u):=total_stretch[o];@/ 15661@<Determine the shrink order@>; 15662glue_sign(u):=o; glue_shrink(u):=total_shrink[o];@/ 15663pop_nest; link(tail):=u; tail:=u; 15664end 15665 15666@ A span node is a 2-word record containing |width|, |info|, and |link| 15667fields. The |link| field is not really a link, it indicates the number of 15668spanned columns; the |info| field points to a span node for the same 15669starting column, having a greater extent of spanning, or to |end_span|, 15670which has the largest possible |link| field; the |width| field holds the 15671largest natural width corresponding to a particular set of spanned columns. 15672 15673A list of the maximum widths so far, for spanned columns starting at a 15674given column, begins with the |info| field of the alignrecord for that 15675column. 15676 15677@d span_node_size=2 {number of |mem| words for a span node} 15678 15679@<Initialize the special list heads...@>= 15680link(end_span):=max_quarterword+1; info(end_span):=null; 15681 15682@ @<Update width entry for spanned columns@>= 15683begin q:=cur_span; 15684repeat incr(n); q:=link(link(q)); 15685until q=cur_align; 15686if n>max_quarterword then confusion("256 spans"); {this can happen, but won't} 15687@^system dependencies@> 15688@:this can't happen 256 spans}{\quad 256 spans@> 15689q:=cur_span; while link(info(q))<n do q:=info(q); 15690if link(info(q))>n then 15691 begin s:=get_node(span_node_size); info(s):=info(q); link(s):=n; 15692 info(q):=s; width(s):=w; 15693 end 15694else if width(info(q))<w then width(info(q)):=w; 15695end 15696 15697@ At the end of a row, we append an unset box to the current vlist (for 15698\.{\\halign}) or the current hlist (for \.{\\valign}). This unset box 15699contains the unset boxes for the columns, separated by the tabskip glue. 15700Everything will be set later. 15701 15702@p procedure fin_row; 15703var p:pointer; {the new unset box} 15704begin if mode=-hmode then 15705 begin p:=hpack(link(head),natural); 15706 pop_nest; append_to_vlist(p); 15707 if cur_head<>cur_tail then 15708 begin link(tail):=link(cur_head); tail:=cur_tail; 15709 end; 15710 end 15711else begin p:=vpack(link(head),natural); pop_nest; 15712 link(tail):=p; tail:=p; space_factor:=1000; 15713 end; 15714type(p):=unset_node; glue_stretch(p):=0; 15715if every_cr<>null then begin_token_list(every_cr,every_cr_text); 15716align_peek; 15717end; {note that |glue_shrink(p)=0| since |glue_shrink==shift_amount|} 15718 15719@ Finally, we will reach the end of the alignment, and we can breathe a 15720sigh of relief that memory hasn't overflowed. All the unset boxes will now be 15721set so that the columns line up, taking due account of spanned columns. 15722 15723@p procedure@?do_assignments; forward;@t\2@>@/ 15724procedure@?resume_after_display; forward;@t\2@>@/ 15725procedure@?build_page; forward;@t\2@>@/ 15726procedure fin_align; 15727var @!p,@!q,@!r,@!s,@!u,@!v: pointer; {registers for the list operations} 15728@!t,@!w:scaled; {width of column} 15729@!o:scaled; {shift offset for unset boxes} 15730@!n:halfword; {matching span amount} 15731@!rule_save:scaled; {temporary storage for |overfull_rule|} 15732@!aux_save:memory_word; {temporary storage for |aux|} 15733begin if cur_group<>align_group then confusion("align1"); 15734@:this can't happen align}{\quad align@> 15735unsave; {that |align_group| was for individual entries} 15736if cur_group<>align_group then confusion("align0"); 15737unsave; {that |align_group| was for the whole alignment} 15738if nest[nest_ptr-1].mode_field=mmode then o:=display_indent 15739 else o:=0; 15740@<Go through the preamble list, determining the column widths and 15741 changing the alignrecords to dummy unset boxes@>; 15742@<Package the preamble list, to determine the actual tabskip glue amounts, 15743 and let |p| point to this prototype box@>; 15744@<Set the glue in all the unset boxes of the current list@>; 15745flush_node_list(p); pop_alignment; 15746@<Insert the \(c)current list into its environment@>; 15747end;@/ 15748@t\4@>@<Declare the procedure called |align_peek|@> 15749 15750@ It's time now to dismantle the preamble list and to compute the column 15751widths. Let $w_{ij}$ be the maximum of the natural widths of all entries 15752that span columns $i$ through $j$, inclusive. The alignrecord for column~$i$ 15753contains $w_{ii}$ in its |width| field, and there is also a linked list of 15754the nonzero $w_{ij}$ for increasing $j$, accessible via the |info| field; 15755these span nodes contain the value $j-i+|min_quarterword|$ in their 15756|link| fields. The values of $w_{ii}$ were initialized to |null_flag|, which 15757we regard as $-\infty$. 15758 15759The final column widths are defined by the formula 15760$$w_j=\max_{1\L i\L j}\biggl( w_{ij}-\sum_{i\L k<j}(t_k+w_k)\biggr),$$ 15761where $t_k$ is the natural width of the tabskip glue between columns 15762$k$ and~$k+1$. However, if $w_{ij}=-\infty$ for all |i| in the range 15763|1<=i<=j| (i.e., if every entry that involved column~|j| also involved 15764column~|j+1|), we let $w_j=0$, and we zero out the tabskip glue after 15765column~|j|. 15766 15767\TeX\ computes these values by using the following scheme: First $w_1=w_{11}$. 15768Then replace $w_{2j}$ by $\max(w_{2j},w_{1j}-t_1-w_1)$, for all $j>1$. 15769Then $w_2=w_{22}$. Then replace $w_{3j}$ by $\max(w_{3j},w_{2j}-t_2-w_2)$ 15770for all $j>2$; and so on. If any $w_j$ turns out to be $-\infty$, its 15771value is changed to zero and so is the next tabskip. 15772 15773@<Go through the preamble list,...@>= 15774q:=link(preamble); 15775repeat flush_list(u_part(q)); flush_list(v_part(q)); 15776p:=link(link(q)); 15777if width(q)=null_flag then 15778 @<Nullify |width(q)| and the tabskip glue following this column@>; 15779if info(q)<>end_span then 15780 @<Merge the widths in the span nodes of |q| with those of |p|, 15781 destroying the span nodes of |q|@>; 15782type(q):=unset_node; span_count(q):=min_quarterword; height(q):=0; 15783depth(q):=0; glue_order(q):=normal; glue_sign(q):=normal; 15784glue_stretch(q):=0; glue_shrink(q):=0; q:=p; 15785until q=null 15786 15787@ @<Nullify |width(q)| and the tabskip glue following this column@>= 15788begin width(q):=0; r:=link(q); s:=glue_ptr(r); 15789if s<>zero_glue then 15790 begin add_glue_ref(zero_glue); delete_glue_ref(s); 15791 glue_ptr(r):=zero_glue; 15792 end; 15793end 15794 15795@ Merging of two span-node lists is a typical exercise in the manipulation of 15796linearly linked data structures. The essential invariant in the following 15797|repeat| loop is that we want to dispense with node |r|, in |q|'s list, 15798and |u| is its successor; all nodes of |p|'s list up to and including |s| 15799have been processed, and the successor of |s| matches |r| or precedes |r| 15800or follows |r|, according as |link(r)=n| or |link(r)>n| or |link(r)<n|. 15801 15802@<Merge the widths...@>= 15803begin t:=width(q)+width(glue_ptr(link(q))); 15804r:=info(q); s:=end_span; info(s):=p; n:=min_quarterword+1; 15805repeat width(r):=width(r)-t; u:=info(r); 15806while link(r)>n do 15807 begin s:=info(s); n:=link(info(s))+1; 15808 end; 15809if link(r)<n then 15810 begin info(r):=info(s); info(s):=r; decr(link(r)); s:=r; 15811 end 15812else begin if width(r)>width(info(s)) then width(info(s)):=width(r); 15813 free_node(r,span_node_size); 15814 end; 15815r:=u; 15816until r=end_span; 15817end 15818 15819@ Now the preamble list has been converted to a list of alternating unset 15820boxes and tabskip glue, where the box widths are equal to the final 15821column sizes. In case of \.{\\valign}, we change the widths to heights, 15822so that a correct error message will be produced if the alignment is 15823overfull or underfull. 15824 15825@<Package the preamble list...@>= 15826save_ptr:=save_ptr-2; pack_begin_line:=-mode_line; 15827if mode=-vmode then 15828 begin rule_save:=overfull_rule; 15829 overfull_rule:=0; {prevent rule from being packaged} 15830 p:=hpack(preamble,saved(1),saved(0)); overfull_rule:=rule_save; 15831 end 15832else begin q:=link(preamble); 15833 repeat height(q):=width(q); width(q):=0; q:=link(link(q)); 15834 until q=null; 15835 p:=vpack(preamble,saved(1),saved(0)); 15836 q:=link(preamble); 15837 repeat width(q):=height(q); height(q):=0; q:=link(link(q)); 15838 until q=null; 15839 end; 15840pack_begin_line:=0 15841 15842@ @<Set the glue in all the unset...@>= 15843q:=link(head); s:=head; 15844while q<>null do 15845 begin if not is_char_node(q) then 15846 if type(q)=unset_node then 15847 @<Set the unset box |q| and the unset boxes in it@> 15848 else if type(q)=rule_node then 15849 @<Make the running dimensions in rule |q| extend to the 15850 boundaries of the alignment@>; 15851 s:=q; q:=link(q); 15852 end 15853 15854@ @<Make the running dimensions in rule |q| extend...@>= 15855begin if is_running(width(q)) then width(q):=width(p); 15856if is_running(height(q)) then height(q):=height(p); 15857if is_running(depth(q)) then depth(q):=depth(p); 15858if o<>0 then 15859 begin r:=link(q); link(q):=null; q:=hpack(q,natural); 15860 shift_amount(q):=o; link(q):=r; link(s):=q; 15861 end; 15862end 15863 15864@ The unset box |q| represents a row that contains one or more unset boxes, 15865depending on how soon \.{\\cr} occurred in that row. 15866 15867@<Set the unset box |q| and the unset boxes in it@>= 15868begin if mode=-vmode then 15869 begin type(q):=hlist_node; width(q):=width(p); 15870 end 15871else begin type(q):=vlist_node; height(q):=height(p); 15872 end; 15873glue_order(q):=glue_order(p); glue_sign(q):=glue_sign(p); 15874glue_set(q):=glue_set(p); shift_amount(q):=o; 15875r:=link(list_ptr(q)); s:=link(list_ptr(p)); 15876repeat @<Set the glue in node |r| and change it from an unset node@>; 15877r:=link(link(r)); s:=link(link(s)); 15878until r=null; 15879end 15880 15881@ A box made from spanned columns will be followed by tabskip glue nodes and 15882by empty boxes as if there were no spanning. This permits perfect alignment 15883of subsequent entries, and it prevents values that depend on floating point 15884arithmetic from entering into the dimensions of any boxes. 15885 15886@<Set the glue in node |r|...@>= 15887n:=span_count(r); t:=width(s); w:=t; u:=hold_head; 15888while n>min_quarterword do 15889 begin decr(n); 15890 @<Append tabskip glue and an empty box to list |u|, 15891 and update |s| and |t| as the prototype nodes are passed@>; 15892 end; 15893if mode=-vmode then 15894 @<Make the unset node |r| into an |hlist_node| of width |w|, 15895 setting the glue as if the width were |t|@> 15896else @<Make the unset node |r| into a |vlist_node| of height |w|, 15897 setting the glue as if the height were |t|@>; 15898shift_amount(r):=0; 15899if u<>hold_head then {append blank boxes to account for spanned nodes} 15900 begin link(u):=link(r); link(r):=link(hold_head); r:=u; 15901 end 15902 15903@ @<Append tabskip glue and an empty box to list |u|...@>= 15904s:=link(s); v:=glue_ptr(s); link(u):=new_glue(v); u:=link(u); 15905subtype(u):=tab_skip_code+1; t:=t+width(v); 15906if glue_sign(p)=stretching then 15907 begin if stretch_order(v)=glue_order(p) then 15908 t:=t+round(float(glue_set(p))*stretch(v)); 15909@^real multiplication@> 15910 end 15911else if glue_sign(p)=shrinking then 15912 begin if shrink_order(v)=glue_order(p) then 15913 t:=t-round(float(glue_set(p))*shrink(v)); 15914 end; 15915s:=link(s); link(u):=new_null_box; u:=link(u); t:=t+width(s); 15916if mode=-vmode then width(u):=width(s)@+else 15917 begin type(u):=vlist_node; height(u):=width(s); 15918 end 15919 15920@ @<Make the unset node |r| into an |hlist_node| of width |w|...@>= 15921begin height(r):=height(q); depth(r):=depth(q); 15922if t=width(r) then 15923 begin glue_sign(r):=normal; glue_order(r):=normal; 15924 set_glue_ratio_zero(glue_set(r)); 15925 end 15926else if t>width(r) then 15927 begin glue_sign(r):=stretching; 15928 if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r)) 15929 else glue_set(r):=unfloat((t-width(r))/glue_stretch(r)); 15930@^real division@> 15931 end 15932else begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking; 15933 if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r)) 15934 else if (glue_order(r)=normal)and(width(r)-t>glue_shrink(r)) then 15935 set_glue_ratio_one(glue_set(r)) 15936 else glue_set(r):=unfloat((width(r)-t)/glue_shrink(r)); 15937 end; 15938width(r):=w; type(r):=hlist_node; 15939end 15940 15941@ @<Make the unset node |r| into a |vlist_node| of height |w|...@>= 15942begin width(r):=width(q); 15943if t=height(r) then 15944 begin glue_sign(r):=normal; glue_order(r):=normal; 15945 set_glue_ratio_zero(glue_set(r)); 15946 end 15947else if t>height(r) then 15948 begin glue_sign(r):=stretching; 15949 if glue_stretch(r)=0 then set_glue_ratio_zero(glue_set(r)) 15950 else glue_set(r):=unfloat((t-height(r))/glue_stretch(r)); 15951@^real division@> 15952 end 15953else begin glue_order(r):=glue_sign(r); glue_sign(r):=shrinking; 15954 if glue_shrink(r)=0 then set_glue_ratio_zero(glue_set(r)) 15955 else if (glue_order(r)=normal)and(height(r)-t>glue_shrink(r)) then 15956 set_glue_ratio_one(glue_set(r)) 15957 else glue_set(r):=unfloat((height(r)-t)/glue_shrink(r)); 15958 end; 15959height(r):=w; type(r):=vlist_node; 15960end 15961 15962@ We now have a completed alignment, in the list that starts at |head| 15963and ends at |tail|. This list will be merged with the one that encloses 15964it. (In case the enclosing mode is |mmode|, for displayed formulas, 15965we will need to insert glue before and after the display; that part of the 15966program will be deferred until we're more familiar with such operations.) 15967 15968In restricted horizontal mode, the |clang| part of |aux| is undefined; 15969an over-cautious \PASCAL\ runtime system may complain about this. 15970@^dirty \PASCAL@> 15971 15972@<Insert the \(c)current list into its environment@>= 15973aux_save:=aux; p:=link(head); q:=tail; pop_nest; 15974if mode=mmode then @<Finish an alignment in a display@> 15975else begin aux:=aux_save; link(tail):=p; 15976 if p<>null then tail:=q; 15977 if mode=vmode then build_page; 15978 end 15979 15980@* \[38] Breaking paragraphs into lines. 15981We come now to what is probably the most interesting algorithm of \TeX: 15982the mechanism for choosing the ``best possible'' breakpoints that yield 15983the individual lines of a paragraph. \TeX's line-breaking algorithm takes 15984a given horizontal list and converts it to a sequence of boxes that are 15985appended to the current vertical list. In the course of doing this, it 15986creates a special data structure containing three kinds of records that are 15987not used elsewhere in \TeX. Such nodes are created while a paragraph is 15988being processed, and they are destroyed afterwards; thus, the other parts 15989of \TeX\ do not need to know anything about how line-breaking is done. 15990 15991The method used here is based on an approach devised by Michael F. Plass and 15992@^Plass, Michael Frederick@> 15993@^Knuth, Donald Ervin@> 15994the author in 1977, subsequently generalized and improved by the same two 15995people in 1980. A detailed discussion appears in {\sl SOFTWARE---Practice 15996\AM\ Experience \bf11} (1981), 1119--1184, where it is shown that the 15997line-breaking problem can be regarded as a special case of the problem of 15998computing the shortest path in an acyclic network. The cited paper includes 15999numerous examples and describes the history of line breaking as it has been 16000practiced by printers through the ages. The present implementation adds two 16001new ideas to the algorithm of 1980: Memory space requirements are considerably 16002reduced by using smaller records for inactive nodes than for active ones, 16003and arithmetic overflow is avoided by using ``delta distances'' instead of 16004keeping track of the total distance from the beginning of the paragraph to the 16005current point. 16006 16007@ The |line_break| procedure should be invoked only in horizontal mode; it 16008leaves that mode and places its output into the current vlist of the 16009enclosing vertical mode (or internal vertical mode). 16010There is one explicit parameter: |final_widow_penalty| is the amount of 16011additional penalty to be inserted before the final line of the paragraph. 16012 16013There are also a number of implicit parameters: The hlist to be broken 16014starts at |link(head)|, and it is nonempty. The value of |prev_graf| in the 16015enclosing semantic level tells where the paragraph should begin in the 16016sequence of line numbers, in case hanging indentation or \.{\\parshape} 16017is in use; |prev_graf| is zero unless this paragraph is being continued 16018after a displayed formula. Other implicit parameters, such as the 16019|par_shape_ptr| and various penalties to use for hyphenation, etc., appear 16020in |eqtb|. 16021 16022After |line_break| has acted, it will have updated the current vlist and the 16023value of |prev_graf|. Furthermore, the global variable |just_box| will 16024point to the final box created by |line_break|, so that the width of this 16025line can be ascertained when it is necessary to decide whether to use 16026|above_display_skip| or |above_display_short_skip| before a displayed formula. 16027 16028@<Glob...@>= 16029@!just_box:pointer; {the |hlist_node| for the last line of the new paragraph} 16030 16031@ Since |line_break| is a rather lengthy procedure---sort of a small world unto 16032itself---we must build it up little by little, somewhat more cautiously 16033than we have done with the simpler procedures of \TeX. Here is the 16034general outline. 16035 16036@p@t\4@>@<Declare subprocedures for |line_break|@> 16037procedure line_break(@!final_widow_penalty:integer); 16038label done,done1,done2,done3,done4,done5,continue; 16039var @<Local variables for line breaking@>@; 16040begin pack_begin_line:=mode_line; {this is for over/underfull box messages} 16041@<Get ready to start line breaking@>; 16042@<Find optimal breakpoints@>; 16043@<Break the paragraph at the chosen breakpoints, justify the resulting lines 16044to the correct widths, and append them to the current vertical list@>; 16045@<Clean up the memory by removing the break nodes@>; 16046pack_begin_line:=0; 16047end; 16048 16049@ The first task is to move the list from |head| to |temp_head| and go 16050into the enclosing semantic level. We also append the \.{\\parfillskip} 16051glue to the end of the paragraph, removing a space (or other glue node) if 16052it was there, since spaces usually precede blank lines and instances of 16053`\.{\$\$}'. The |par_fill_skip| is preceded by an infinite penalty, so 16054it will never be considered as a potential breakpoint. 16055 16056This code assumes that a |glue_node| and a |penalty_node| occupy the 16057same number of |mem|~words. 16058@^data structure assumptions@> 16059 16060@<Get ready to start...@>= 16061link(temp_head):=link(head); 16062if is_char_node(tail) then tail_append(new_penalty(inf_penalty)) 16063else if type(tail)<>glue_node then tail_append(new_penalty(inf_penalty)) 16064else begin type(tail):=penalty_node; delete_glue_ref(glue_ptr(tail)); 16065 flush_node_list(leader_ptr(tail)); penalty(tail):=inf_penalty; 16066 end; 16067link(tail):=new_param_glue(par_fill_skip_code); 16068init_cur_lang:=prev_graf mod @'200000; 16069init_l_hyf:=prev_graf div @'20000000; 16070init_r_hyf:=(prev_graf div @'200000) mod @'100; 16071pop_nest; 16072 16073@ When looking for optimal line breaks, \TeX\ creates a ``break node'' for 16074each break that is {\sl feasible}, in the sense that there is a way to end 16075a line at the given place without requiring any line to stretch more than 16076a given tolerance. A break node is characterized by three things: the position 16077of the break (which is a pointer to a |glue_node|, |math_node|, |penalty_node|, 16078or |disc_node|); the ordinal number of the line that will follow this 16079breakpoint; and the fitness classification of the line that has just 16080ended, i.e., |tight_fit|, |decent_fit|, |loose_fit|, or |very_loose_fit|. 16081 16082@d tight_fit=3 {fitness classification for lines shrinking 0.5 to 1.0 of their 16083 shrinkability} 16084@d loose_fit=1 {fitness classification for lines stretching 0.5 to 1.0 of their 16085 stretchability} 16086@d very_loose_fit=0 {fitness classification for lines stretching more than 16087 their stretchability} 16088@d decent_fit=2 {fitness classification for all other lines} 16089 16090@ The algorithm essentially determines the best possible way to achieve 16091each feasible combination of position, line, and fitness. Thus, it answers 16092questions like, ``What is the best way to break the opening part of the 16093paragraph so that the fourth line is a tight line ending at such-and-such 16094a place?'' However, the fact that all lines are to be the same length 16095after a certain point makes it possible to regard all sufficiently large 16096line numbers as equivalent, when the looseness parameter is zero, and this 16097makes it possible for the algorithm to save space and time. 16098 16099An ``active node'' and a ``passive node'' are created in |mem| for each 16100feasible breakpoint that needs to be considered. Active nodes are three 16101words long and passive nodes are two words long. We need active nodes only 16102for breakpoints near the place in the paragraph that is currently being 16103examined, so they are recycled within a comparatively short time after 16104they are created. 16105 16106@ An active node for a given breakpoint contains six fields: 16107 16108\yskip\hang|link| points to the next node in the list of active nodes; the 16109last active node has |link=last_active|. 16110 16111\yskip\hang|break_node| points to the passive node associated with this 16112breakpoint. 16113 16114\yskip\hang|line_number| is the number of the line that follows this 16115breakpoint. 16116 16117\yskip\hang|fitness| is the fitness classification of the line ending at this 16118breakpoint. 16119 16120\yskip\hang|type| is either |hyphenated| or |unhyphenated|, depending on 16121whether this breakpoint is a |disc_node|. 16122 16123\yskip\hang|total_demerits| is the minimum possible sum of demerits over all 16124lines leading from the beginning of the paragraph to this breakpoint. 16125 16126\yskip\noindent 16127The value of |link(active)| points to the first active node on a linked list 16128of all currently active nodes. This list is in order by |line_number|, 16129except that nodes with |line_number>easy_line| may be in any order relative 16130to each other. 16131 16132@d active_node_size=3 {number of words in active nodes} 16133@d fitness==subtype {|very_loose_fit..tight_fit| on final line for this break} 16134@d break_node==rlink {pointer to the corresponding passive node} 16135@d line_number==llink {line that begins at this breakpoint} 16136@d total_demerits(#)==mem[#+2].int {the quantity that \TeX\ minimizes} 16137@d unhyphenated=0 {the |type| of a normal active break node} 16138@d hyphenated=1 {the |type| of an active node that breaks at a |disc_node|} 16139@d last_active==active {the active list ends where it begins} 16140 16141@ @<Initialize the special list heads...@>= 16142type(last_active):=hyphenated; line_number(last_active):=max_halfword; 16143subtype(last_active):=0; {the |subtype| is never examined by the algorithm} 16144 16145@ The passive node for a given breakpoint contains only four fields: 16146 16147\yskip\hang|link| points to the passive node created just before this one, 16148if any, otherwise it is |null|. 16149 16150\yskip\hang|cur_break| points to the position of this breakpoint in the 16151horizontal list for the paragraph being broken. 16152 16153\yskip\hang|prev_break| points to the passive node that should precede this 16154one in an optimal path to this breakpoint. 16155 16156\yskip\hang|serial| is equal to |n| if this passive node is the |n|th 16157one created during the current pass. (This field is used only when 16158printing out detailed statistics about the line-breaking calculations.) 16159 16160\yskip\noindent 16161There is a global variable called |passive| that points to the most 16162recently created passive node. Another global variable, |printed_node|, 16163is used to help print out the paragraph when detailed information about 16164the line-breaking computation is being displayed. 16165 16166@d passive_node_size=2 {number of words in passive nodes} 16167@d cur_break==rlink {in passive node, points to position of this breakpoint} 16168@d prev_break==llink {points to passive node that should precede this one} 16169@d serial==info {serial number for symbolic identification} 16170 16171@<Glob...@>= 16172@!passive:pointer; {most recent node on passive list} 16173@!printed_node:pointer; {most recent node that has been printed} 16174@!pass_number:halfword; {the number of passive nodes allocated on this pass} 16175 16176@ The active list also contains ``delta'' nodes that help the algorithm 16177compute the badness of individual lines. Such nodes appear only between two 16178active nodes, and they have |type=delta_node|. If |p| and |r| are active nodes 16179and if |q| is a delta node between them, so that |link(p)=q| and |link(q)=r|, 16180then |q| tells the space difference between lines in the horizontal list that 16181start after breakpoint |p| and lines that start after breakpoint |r|. In 16182other words, if we know the length of the line that starts after |p| and 16183ends at our current position, then the corresponding length of the line that 16184starts after |r| is obtained by adding the amounts in node~|q|. A delta node 16185contains six scaled numbers, since it must record the net change in glue 16186stretchability with respect to all orders of infinity. The natural width 16187difference appears in |mem[q+1].sc|; the stretch differences in units of 16188pt, fil, fill, and filll appear in |mem[q+2..q+5].sc|; and the shrink difference 16189appears in |mem[q+6].sc|. The |subtype| field of a delta node is not used. 16190 16191@d delta_node_size=7 {number of words in a delta node} 16192@d delta_node=2 {|type| field in a delta node} 16193 16194@ As the algorithm runs, it maintains a set of six delta-like registers 16195for the length of the line following the first active breakpoint to the 16196current position in the given hlist. When it makes a pass through the 16197active list, it also maintains a similar set of six registers for the 16198length following the active breakpoint of current interest. A third set 16199holds the length of an empty line (namely, the sum of \.{\\leftskip} and 16200\.{\\rightskip}); and a fourth set is used to create new delta nodes. 16201 16202When we pass a delta node we want to do operations like 16203$$\hbox{\ignorespaces|for 16204k:=1 to 6 do cur_active_width[k]:=cur_active_width[k]+mem[q+k].sc|};$$ and we 16205want to do this without the overhead of |for| loops. The |do_all_six| 16206macro makes such six-tuples convenient. 16207 16208@d do_all_six(#)==#(1);#(2);#(3);#(4);#(5);#(6) 16209 16210@<Glob...@>= 16211@!active_width:array[1..6] of scaled; 16212 {distance from first active node to~|cur_p|} 16213@!cur_active_width:array[1..6] of scaled; {distance from current active node} 16214@!background:array[1..6] of scaled; {length of an ``empty'' line} 16215@!break_width:array[1..6] of scaled; {length being computed after current break} 16216 16217@ Let's state the principles of the delta nodes more precisely and concisely, 16218so that the following programs will be less obscure. For each legal 16219breakpoint~|p| in the paragraph, we define two quantities $\alpha(p)$ and 16220$\beta(p)$ such that the length of material in a line from breakpoint~|p| 16221to breakpoint~|q| is $\gamma+\beta(q)-\alpha(p)$, for some fixed $\gamma$. 16222Intuitively, $\alpha(p)$ and $\beta(q)$ are the total length of material from 16223the beginning of the paragraph to a point ``after'' a break at |p| and to a 16224point ``before'' a break at |q|; and $\gamma$ is the width of an empty line, 16225namely the length contributed by \.{\\leftskip} and \.{\\rightskip}. 16226 16227Suppose, for example, that the paragraph consists entirely of alternating 16228boxes and glue skips; let the boxes have widths $x_1\ldots x_n$ and 16229let the skips have widths $y_1\ldots y_n$, so that the paragraph can be 16230represented by $x_1y_1\ldots x_ny_n$. Let $p_i$ be the legal breakpoint 16231at $y_i$; then $\alpha(p_i)=x_1+y_1+\cdots+x_i+y_i$, and $\beta(p_i)= 16232x_1+y_1+\cdots+x_i$. To check this, note that the length of material from 16233$p_2$ to $p_5$, say, is $\gamma+x_3+y_3+x_4+y_4+x_5=\gamma+\beta(p_5) 16234-\alpha(p_2)$. 16235 16236The quantities $\alpha$, $\beta$, $\gamma$ involve glue stretchability and 16237shrinkability as well as a natural width. If we were to compute $\alpha(p)$ 16238and $\beta(p)$ for each |p|, we would need multiple precision arithmetic, and 16239the multiprecise numbers would have to be kept in the active nodes. 16240\TeX\ avoids this problem by working entirely with relative differences 16241or ``deltas.'' Suppose, for example, that the active list contains 16242$a_1\,\delta_1\,a_2\,\delta_2\,a_3$, where the |a|'s are active breakpoints 16243and the $\delta$'s are delta nodes. Then $\delta_1=\alpha(a_1)-\alpha(a_2)$ 16244and $\delta_2=\alpha(a_2)-\alpha(a_3)$. If the line breaking algorithm is 16245currently positioned at some other breakpoint |p|, the |active_width| array 16246contains the value $\gamma+\beta(p)-\alpha(a_1)$. If we are scanning through 16247the list of active nodes and considering a tentative line that runs from 16248$a_2$ to~|p|, say, the |cur_active_width| array will contain the value 16249$\gamma+\beta(p)-\alpha(a_2)$. Thus, when we move from $a_2$ to $a_3$, 16250we want to add $\alpha(a_2)-\alpha(a_3)$ to |cur_active_width|; and this 16251is just $\delta_2$, which appears in the active list between $a_2$ and 16252$a_3$. The |background| array contains $\gamma$. The |break_width| array 16253will be used to calculate values of new delta nodes when the active 16254list is being updated. 16255 16256@ Glue nodes in a horizontal list that is being paragraphed are not supposed to 16257include ``infinite'' shrinkability; that is why the algorithm maintains 16258four registers for stretching but only one for shrinking. If the user tries to 16259introduce infinite shrinkability, the shrinkability will be reset to finite 16260and an error message will be issued. A boolean variable |no_shrink_error_yet| 16261prevents this error message from appearing more than once per paragraph. 16262 16263@d check_shrinkage(#)==if (shrink_order(#)<>normal)and(shrink(#)<>0) then 16264 begin #:=finite_shrink(#); 16265 end 16266 16267@<Glob...@>= 16268@!no_shrink_error_yet:boolean; {have we complained about infinite shrinkage?} 16269 16270@ @<Declare subprocedures for |line_break|@>= 16271function finite_shrink(@!p:pointer):pointer; {recovers from infinite shrinkage} 16272var q:pointer; {new glue specification} 16273begin if no_shrink_error_yet then 16274 begin no_shrink_error_yet:=false; 16275 print_err("Infinite glue shrinkage found in a paragraph"); 16276@.Infinite glue shrinkage...@> 16277 help5("The paragraph just ended includes some glue that has")@/ 16278 ("infinite shrinkability, e.g., `\hskip 0pt minus 1fil'.")@/ 16279 ("Such glue doesn't belong there---it allows a paragraph")@/ 16280 ("of any length to fit on one line. But it's safe to proceed,")@/ 16281 ("since the offensive shrinkability has been made finite."); 16282 error; 16283 end; 16284q:=new_spec(p); shrink_order(q):=normal; 16285delete_glue_ref(p); finite_shrink:=q; 16286end; 16287 16288@ @<Get ready to start...@>= 16289no_shrink_error_yet:=true;@/ 16290check_shrinkage(left_skip); check_shrinkage(right_skip);@/ 16291q:=left_skip; r:=right_skip; background[1]:=width(q)+width(r);@/ 16292background[2]:=0; background[3]:=0; background[4]:=0; background[5]:=0;@/ 16293background[2+stretch_order(q)]:=stretch(q);@/ 16294background[2+stretch_order(r)]:=@|background[2+stretch_order(r)]+stretch(r);@/ 16295background[6]:=shrink(q)+shrink(r); 16296 16297@ A pointer variable |cur_p| runs through the given horizontal list as we look 16298for breakpoints. This variable is global, since it is used both by |line_break| 16299and by its subprocedure |try_break|. 16300 16301Another global variable called |threshold| is used to determine the feasibility 16302of individual lines: Breakpoints are feasible if there is a way to reach 16303them without creating lines whose badness exceeds |threshold|. (The 16304badness is compared to |threshold| before penalties are added, so that 16305penalty values do not affect the feasibility of breakpoints, except that 16306no break is allowed when the penalty is 10000 or more.) If |threshold| 16307is 10000 or more, all legal breaks are considered feasible, since the 16308|badness| function specified above never returns a value greater than~10000. 16309 16310Up to three passes might be made through the paragraph in an attempt to find at 16311least one set of feasible breakpoints. On the first pass, we have 16312|threshold=pretolerance| and |second_pass=final_pass=false|. 16313If this pass fails to find a 16314feasible solution, |threshold| is set to |tolerance|, |second_pass| is set 16315|true|, and an attempt is made to hyphenate as many words as possible. 16316If that fails too, we add |emergency_stretch| to the background 16317stretchability and set |final_pass=true|. 16318 16319@<Glob...@>= 16320@!cur_p:pointer; {the current breakpoint under consideration} 16321@!second_pass:boolean; {is this our second attempt to break this paragraph?} 16322@!final_pass:boolean; {is this our final attempt to break this paragraph?} 16323@!threshold:integer; {maximum badness on feasible lines} 16324 16325@ The heart of the line-breaking procedure is `|try_break|', a subroutine 16326that tests if the current breakpoint |cur_p| is feasible, by running 16327through the active list to see what lines of text can be made from active 16328nodes to~|cur_p|. If feasible breaks are possible, new break nodes are 16329created. If |cur_p| is too far from an active node, that node is 16330deactivated. 16331 16332The parameter |pi| to |try_break| is the penalty associated 16333with a break at |cur_p|; we have |pi=eject_penalty| if the break is forced, 16334and |pi=inf_penalty| if the break is illegal. 16335 16336The other parameter, |break_type|, is set to |hyphenated| or |unhyphenated|, 16337depending on whether or not the current break is at a |disc_node|. The 16338end of a paragraph is also regarded as `|hyphenated|'; this case is 16339distinguishable by the condition |cur_p=null|. 16340 16341@d copy_to_cur_active(#)==cur_active_width[#]:=active_width[#] 16342@d deactivate=60 {go here when node |r| should be deactivated} 16343 16344@<Declare subprocedures for |line_break|@>= 16345procedure try_break(@!pi:integer;@!break_type:small_number); 16346label exit,done,done1,continue,deactivate; 16347var r:pointer; {runs through the active list} 16348@!prev_r:pointer; {stays a step behind |r|} 16349@!old_l:halfword; {maximum line number in current equivalence class of lines} 16350@!no_break_yet:boolean; {have we found a feasible break at |cur_p|?} 16351@<Other local variables for |try_break|@>@; 16352begin @<Make sure that |pi| is in the proper range@>; 16353no_break_yet:=true; prev_r:=active; old_l:=0; 16354do_all_six(copy_to_cur_active); 16355loop@+ begin continue: r:=link(prev_r); 16356 @<If node |r| is of type |delta_node|, update |cur_active_width|, 16357 set |prev_r| and |prev_prev_r|, then |goto continue|@>; 16358 @<If a line number class has ended, create new active nodes for 16359 the best feasible breaks in that class; then |return| 16360 if |r=last_active|, otherwise compute the new |line_width|@>; 16361 @<Consider the demerits for a line from |r| to |cur_p|; 16362 deactivate node |r| if it should no longer be active; 16363 then |goto continue| if a line from |r| to |cur_p| is infeasible, 16364 otherwise record a new feasible break@>; 16365 end; 16366exit: @!stat @<Update the value of |printed_node| for 16367 symbolic displays@>@+tats@; 16368end; 16369 16370@ @<Other local variables for |try_break|@>= 16371@!prev_prev_r:pointer; {a step behind |prev_r|, if |type(prev_r)=delta_node|} 16372@!s:pointer; {runs through nodes ahead of |cur_p|} 16373@!q:pointer; {points to a new node being created} 16374@!v:pointer; {points to a glue specification or a node ahead of |cur_p|} 16375@!t:integer; {node count, if |cur_p| is a discretionary node} 16376@!f:internal_font_number; {used in character width calculation} 16377@!l:halfword; {line number of current active node} 16378@!node_r_stays_active:boolean; {should node |r| remain in the active list?} 16379@!line_width:scaled; {the current line will be justified to this width} 16380@!fit_class:very_loose_fit..tight_fit; {possible fitness class of test line} 16381@!b:halfword; {badness of test line} 16382@!d:integer; {demerits of test line} 16383@!artificial_demerits:boolean; {has |d| been forced to zero?} 16384@!save_link:pointer; {temporarily holds value of |link(cur_p)|} 16385@!shortfall:scaled; {used in badness calculations} 16386 16387@ @<Make sure that |pi| is in the proper range@>= 16388if abs(pi)>=inf_penalty then 16389 if pi>0 then return {this breakpoint is inhibited by infinite penalty} 16390 else pi:=eject_penalty {this breakpoint will be forced} 16391 16392@ The following code uses the fact that |type(last_active)<>delta_node|. 16393 16394@d update_width(#)==@| 16395 cur_active_width[#]:=cur_active_width[#]+mem[r+#].sc 16396 16397@<If node |r|...@>= 16398@^inner loop@> 16399if type(r)=delta_node then 16400 begin do_all_six(update_width); 16401 prev_prev_r:=prev_r; prev_r:=r; goto continue; 16402 end 16403 16404@ As we consider various ways to end a line at |cur_p|, in a given line number 16405class, we keep track of the best total demerits known, in an array with 16406one entry for each of the fitness classifications. For example, 16407|minimal_demerits[tight_fit]| contains the fewest total demerits of feasible 16408line breaks ending at |cur_p| with a |tight_fit| line; |best_place[tight_fit]| 16409points to the passive node for the break before~|cur_p| that achieves such 16410an optimum; and |best_pl_line[tight_fit]| is the |line_number| field in the 16411active node corresponding to |best_place[tight_fit]|. When no feasible break 16412sequence is known, the |minimal_demerits| entries will be equal to 16413|awful_bad|, which is $2^{30}-1$. Another variable, |minimum_demerits|, 16414keeps track of the smallest value in the |minimal_demerits| array. 16415 16416@d awful_bad==@'7777777777 {more than a billion demerits} 16417 16418@<Global...@>= 16419@!minimal_demerits:array[very_loose_fit..tight_fit] of integer; {best total 16420 demerits known for current line class and position, given the fitness} 16421@!minimum_demerits:integer; {best total demerits known for current line class 16422 and position} 16423@!best_place:array[very_loose_fit..tight_fit] of pointer; {how to achieve 16424 |minimal_demerits|} 16425@!best_pl_line:array[very_loose_fit..tight_fit] of halfword; {corresponding 16426 line number} 16427 16428@ @<Get ready to start...@>= 16429minimum_demerits:=awful_bad; 16430minimal_demerits[tight_fit]:=awful_bad; 16431minimal_demerits[decent_fit]:=awful_bad; 16432minimal_demerits[loose_fit]:=awful_bad; 16433minimal_demerits[very_loose_fit]:=awful_bad; 16434 16435@ The first part of the following code is part of \TeX's inner loop, so 16436we don't want to waste any time. The current active node, namely node |r|, 16437contains the line number that will be considered next. At the end of the 16438list we have arranged the data structure so that |r=last_active| and 16439|line_number(last_active)>old_l|. 16440@^inner loop@> 16441 16442@<If a line number class...@>= 16443begin l:=line_number(r); 16444if l>old_l then 16445 begin {now we are no longer in the inner loop} 16446 if (minimum_demerits<awful_bad)and@| 16447 ((old_l<>easy_line)or(r=last_active)) then 16448 @<Create new active nodes for the best feasible breaks 16449 just found@>; 16450 if r=last_active then return; 16451 @<Compute the new line width@>; 16452 end; 16453end 16454 16455@ It is not necessary to create new active nodes having |minimal_demerits| 16456greater than 16457|minimum_demerits+abs(adj_demerits)|, since such active nodes will never 16458be chosen in the final paragraph breaks. This observation allows us to 16459omit a substantial number of feasible breakpoints from further consideration. 16460 16461@<Create new active nodes...@>= 16462begin if no_break_yet then @<Compute the values of |break_width|@>; 16463@<Insert a delta node to prepare for breaks at |cur_p|@>; 16464if abs(adj_demerits)>=awful_bad-minimum_demerits then 16465 minimum_demerits:=awful_bad-1 16466else minimum_demerits:=minimum_demerits+abs(adj_demerits); 16467for fit_class:=very_loose_fit to tight_fit do 16468 begin if minimal_demerits[fit_class]<=minimum_demerits then 16469 @<Insert a new active node 16470 from |best_place[fit_class]| to |cur_p|@>; 16471 minimal_demerits[fit_class]:=awful_bad; 16472 end; 16473minimum_demerits:=awful_bad; 16474@<Insert a delta node to prepare for the next active node@>; 16475end 16476 16477@ When we insert a new active node for a break at |cur_p|, suppose this 16478new node is to be placed just before active node |a|; then we essentially 16479want to insert `$\delta\,|cur_p|\,\delta^\prime$' before |a|, where 16480$\delta=\alpha(a)-\alpha(|cur_p|)$ and $\delta^\prime=\alpha(|cur_p|)-\alpha(a)$ 16481in the notation explained above. The |cur_active_width| array now holds 16482$\gamma+\beta(|cur_p|)-\alpha(a)$; so $\delta$ can be obtained by 16483subtracting |cur_active_width| from the quantity $\gamma+\beta(|cur_p|)- 16484\alpha(|cur_p|)$. The latter quantity can be regarded as the length of a 16485line ``from |cur_p| to |cur_p|''; we call it the |break_width| at |cur_p|. 16486 16487The |break_width| is usually negative, since it consists of the background 16488(which is normally zero) minus the width of nodes following~|cur_p| that are 16489eliminated after a break. If, for example, node |cur_p| is a glue node, the 16490width of this glue is subtracted from the background; and we also look 16491ahead to eliminate all subsequent glue and penalty and kern and math 16492nodes, subtracting their widths as well. 16493 16494Kern nodes do not disappear at a line break unless they are |explicit|. 16495 16496@d set_break_width_to_background(#)==break_width[#]:=background[#] 16497 16498@<Compute the values of |break...@>= 16499begin no_break_yet:=false; do_all_six(set_break_width_to_background); 16500s:=cur_p; 16501if break_type>unhyphenated then if cur_p<>null then 16502 @<Compute the discretionary |break_width| values@>; 16503while s<>null do 16504 begin if is_char_node(s) then goto done; 16505 case type(s) of 16506 glue_node:@<Subtract glue from |break_width|@>; 16507 penalty_node: do_nothing; 16508 math_node: break_width[1]:=break_width[1]-width(s); 16509 kern_node: if subtype(s)<>explicit then goto done 16510 else break_width[1]:=break_width[1]-width(s); 16511 othercases goto done 16512 endcases;@/ 16513 s:=link(s); 16514 end; 16515done: end 16516 16517@ @<Subtract glue from |break...@>= 16518begin v:=glue_ptr(s); break_width[1]:=break_width[1]-width(v); 16519break_width[2+stretch_order(v)]:=break_width[2+stretch_order(v)]-stretch(v); 16520break_width[6]:=break_width[6]-shrink(v); 16521end 16522 16523@ When |cur_p| is a discretionary break, the length of a line ``from |cur_p| to 16524|cur_p|'' has to be defined properly so that the other calculations work out. 16525Suppose that the pre-break text at |cur_p| has length $l_0$, the post-break 16526text has length $l_1$, and the replacement text has length |l|. Suppose 16527also that |q| is the node following the replacement text. Then length of a 16528line from |cur_p| to |q| will be computed as $\gamma+\beta(q)-\alpha(|cur_p|)$, 16529where $\beta(q)=\beta(|cur_p|)-l_0+l$. The actual length will be the background 16530plus $l_1$, so the length from |cur_p| to |cur_p| should be $\gamma+l_0+l_1-l$. 16531If the post-break text of the discretionary is empty, a break may also 16532discard~|q|; in that unusual case we subtract the length of~|q| and any 16533other nodes that will be discarded after the discretionary break. 16534 16535The value of $l_0$ need not be computed, since |line_break| will put 16536it into the global variable |disc_width| before calling |try_break|. 16537 16538@<Glob...@>= 16539@!disc_width:scaled; {the length of discretionary material preceding a break} 16540 16541@ @<Compute the discretionary |break...@>= 16542begin t:=replace_count(cur_p); v:=cur_p; s:=post_break(cur_p); 16543while t>0 do 16544 begin decr(t); v:=link(v); 16545 @<Subtract the width of node |v| from |break_width|@>; 16546 end; 16547while s<>null do 16548 begin @<Add the width of node |s| to |break_width|@>; 16549 s:=link(s); 16550 end; 16551break_width[1]:=break_width[1]+disc_width; 16552if post_break(cur_p)=null then s:=link(v); 16553 {nodes may be discardable after the break} 16554end 16555 16556@ Replacement texts and discretionary texts are supposed to contain 16557only character nodes, kern nodes, ligature nodes, and box or rule nodes. 16558 16559@<Subtract the width of node |v|...@>= 16560if is_char_node(v) then 16561 begin f:=font(v); 16562 break_width[1]:=break_width[1]-char_width(f)(char_info(f)(character(v))); 16563 end 16564else case type(v) of 16565 ligature_node: begin f:=font(lig_char(v));@/ 16566 break_width[1]:=@|break_width[1]- 16567 char_width(f)(char_info(f)(character(lig_char(v)))); 16568 end; 16569 hlist_node,vlist_node,rule_node,kern_node: 16570 break_width[1]:=break_width[1]-width(v); 16571 othercases confusion("disc1") 16572@:this can't happen disc1}{\quad disc1@> 16573 endcases 16574 16575@ @<Add the width of node |s| to |b...@>= 16576if is_char_node(s) then 16577 begin f:=font(s); 16578 break_width[1]:=@|break_width[1]+char_width(f)(char_info(f)(character(s))); 16579 end 16580else case type(s) of 16581 ligature_node: begin f:=font(lig_char(s)); 16582 break_width[1]:=break_width[1]+ 16583 char_width(f)(char_info(f)(character(lig_char(s)))); 16584 end; 16585 hlist_node,vlist_node,rule_node,kern_node: 16586 break_width[1]:=break_width[1]+width(s); 16587 othercases confusion("disc2") 16588@:this can't happen disc2}{\quad disc2@> 16589 endcases 16590 16591@ We use the fact that |type(active)<>delta_node|. 16592 16593@d convert_to_break_width(#)==@| 16594 mem[prev_r+#].sc:=@|@t\hskip10pt@>mem[prev_r+#].sc 16595 -cur_active_width[#]+break_width[#] 16596@d store_break_width(#)==active_width[#]:=break_width[#] 16597@d new_delta_to_break_width(#)==@| 16598 mem[q+#].sc:=break_width[#]-cur_active_width[#] 16599 16600@<Insert a delta node to prepare for breaks at |cur_p|@>= 16601if type(prev_r)=delta_node then {modify an existing delta node} 16602 begin do_all_six(convert_to_break_width); 16603 end 16604else if prev_r=active then {no delta node needed at the beginning} 16605 begin do_all_six(store_break_width); 16606 end 16607else begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/ 16608 subtype(q):=0; {the |subtype| is not used} 16609 do_all_six(new_delta_to_break_width); 16610 link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q; 16611 end 16612 16613@ When the following code is performed, we will have just inserted at 16614least one active node before |r|, so |type(prev_r)<>delta_node|. 16615 16616@d new_delta_from_break_width(#)==@|mem[q+#].sc:= 16617 cur_active_width[#]-break_width[#] 16618 16619@<Insert a delta node to prepare for the next active node@>= 16620if r<>last_active then 16621 begin q:=get_node(delta_node_size); link(q):=r; type(q):=delta_node;@/ 16622 subtype(q):=0; {the |subtype| is not used} 16623 do_all_six(new_delta_from_break_width); 16624 link(prev_r):=q; prev_prev_r:=prev_r; prev_r:=q; 16625 end 16626 16627@ When we create an active node, we also create the corresponding 16628passive node. 16629 16630@<Insert a new active node from |best_place[fit_class]| to |cur_p|@>= 16631begin q:=get_node(passive_node_size); 16632link(q):=passive; passive:=q; cur_break(q):=cur_p; 16633@!stat incr(pass_number); serial(q):=pass_number;@+tats@;@/ 16634prev_break(q):=best_place[fit_class];@/ 16635q:=get_node(active_node_size); break_node(q):=passive; 16636line_number(q):=best_pl_line[fit_class]+1; 16637fitness(q):=fit_class; type(q):=break_type; 16638total_demerits(q):=minimal_demerits[fit_class]; 16639link(q):=r; link(prev_r):=q; prev_r:=q; 16640@!stat if tracing_paragraphs>0 then 16641 @<Print a symbolic description of the new break node@>; 16642tats@;@/ 16643end 16644 16645@ @<Print a symbolic description of the new break node@>= 16646begin print_nl("@@@@"); print_int(serial(passive)); 16647@.\AT!\AT!@> 16648print(": line "); print_int(line_number(q)-1); 16649print_char("."); print_int(fit_class); 16650if break_type=hyphenated then print_char("-"); 16651print(" t="); print_int(total_demerits(q)); 16652print(" -> @@@@"); 16653if prev_break(passive)=null then print_char("0") 16654else print_int(serial(prev_break(passive))); 16655end 16656 16657@ The length of lines depends on whether the user has specified 16658\.{\\parshape} or \.{\\hangindent}. If |par_shape_ptr| is not null, it 16659points to a $(2n+1)$-word record in |mem|, where the |info| in the first 16660word contains the value of |n|, and the other $2n$ words contain the left 16661margins and line lengths for the first |n| lines of the paragraph; the 16662specifications for line |n| apply to all subsequent lines. If 16663|par_shape_ptr=null|, the shape of the paragraph depends on the value of 16664|n=hang_after|; if |n>=0|, hanging indentation takes place on lines |n+1|, 16665|n+2|, \dots, otherwise it takes place on lines 1, \dots, $\vert 16666n\vert$. When hanging indentation is active, the left margin is 16667|hang_indent|, if |hang_indent>=0|, else it is 0; the line length is 16668$|hsize|-\vert|hang_indent|\vert$. The normal setting is 16669|par_shape_ptr=null|, |hang_after=1|, and |hang_indent=0|. 16670Note that if |hang_indent=0|, the value of |hang_after| is irrelevant. 16671@^length of lines@> @^hanging indentation@> 16672 16673@<Glob...@>= 16674@!easy_line:halfword; {line numbers |>easy_line| are equivalent in break nodes} 16675@!last_special_line:halfword; {line numbers |>last_special_line| all have 16676 the same width} 16677@!first_width:scaled; {the width of all lines |<=last_special_line|, if 16678 no \.{\\parshape} has been specified} 16679@!second_width:scaled; {the width of all lines |>last_special_line|} 16680@!first_indent:scaled; {left margin to go with |first_width|} 16681@!second_indent:scaled; {left margin to go with |second_width|} 16682 16683@ We compute the values of |easy_line| and the other local variables relating 16684to line length when the |line_break| procedure is initializing itself. 16685 16686@<Get ready to start...@>= 16687if par_shape_ptr=null then 16688 if hang_indent=0 then 16689 begin last_special_line:=0; second_width:=hsize; 16690 second_indent:=0; 16691 end 16692 else @<Set line length parameters in preparation for hanging indentation@> 16693else begin last_special_line:=info(par_shape_ptr)-1; 16694 second_width:=mem[par_shape_ptr+2*(last_special_line+1)].sc; 16695 second_indent:=mem[par_shape_ptr+2*last_special_line+1].sc; 16696 end; 16697if looseness=0 then easy_line:=last_special_line 16698else easy_line:=max_halfword 16699 16700@ @<Set line length parameters in preparation for hanging indentation@>= 16701begin last_special_line:=abs(hang_after); 16702if hang_after<0 then 16703 begin first_width:=hsize-abs(hang_indent); 16704 if hang_indent>=0 then first_indent:=hang_indent 16705 else first_indent:=0; 16706 second_width:=hsize; second_indent:=0; 16707 end 16708else begin first_width:=hsize; first_indent:=0; 16709 second_width:=hsize-abs(hang_indent); 16710 if hang_indent>=0 then second_indent:=hang_indent 16711 else second_indent:=0; 16712 end; 16713end 16714 16715@ When we come to the following code, we have just encountered the first 16716active node~|r| whose |line_number| field contains |l|. Thus we want to 16717compute the length of the $l\mskip1mu$th line of the current paragraph. Furthermore, 16718we want to set |old_l| to the last number in the class of line numbers 16719equivalent to~|l|. 16720 16721@<Compute the new line width@>= 16722if l>easy_line then 16723 begin line_width:=second_width; old_l:=max_halfword-1; 16724 end 16725else begin old_l:=l; 16726 if l>last_special_line then line_width:=second_width 16727 else if par_shape_ptr=null then line_width:=first_width 16728 else line_width:=mem[par_shape_ptr+2*l@,].sc; 16729 end 16730 16731@ The remaining part of |try_break| deals with the calculation of 16732demerits for a break from |r| to |cur_p|. 16733 16734The first thing to do is calculate the badness, |b|. This value will always 16735be between zero and |inf_bad+1|; the latter value occurs only in the 16736case of lines from |r| to |cur_p| that cannot shrink enough to fit the necessary 16737width. In such cases, node |r| will be deactivated. 16738We also deactivate node~|r| when a break at~|cur_p| is forced, since future 16739breaks must go through a forced break. 16740 16741@<Consider the demerits for a line from |r| to |cur_p|...@>= 16742begin artificial_demerits:=false;@/ 16743@^inner loop@> 16744shortfall:=line_width-cur_active_width[1]; {we're this much too short} 16745if shortfall>0 then 16746 @<Set the value of |b| to the badness for stretching the line, 16747 and compute the corresponding |fit_class|@> 16748else @<Set the value of |b| to the badness for shrinking the line, 16749 and compute the corresponding |fit_class|@>; 16750if (b>inf_bad)or(pi=eject_penalty) then 16751 @<Prepare to deactivate node~|r|, and |goto deactivate| unless 16752 there is a reason to consider lines of text from |r| to |cur_p|@> 16753else begin prev_r:=r; 16754 if b>threshold then goto continue; 16755 node_r_stays_active:=true; 16756 end; 16757@<Record a new feasible break@>; 16758if node_r_stays_active then goto continue; {|prev_r| has been set to |r|} 16759deactivate: @<Deactivate node |r|@>; 16760end 16761 16762@ When a line must stretch, the available stretchability can be found in the 16763subarray |cur_active_width[2..5]|, in units of points, fil, fill, and filll. 16764 16765The present section is part of \TeX's inner loop, and it is most often performed 16766when the badness is infinite; therefore it is worth while to make a quick 16767test for large width excess and small stretchability, before calling the 16768|badness| subroutine. 16769@^inner loop@> 16770 16771@<Set the value of |b| to the badness for stretching...@>= 16772if (cur_active_width[3]<>0)or(cur_active_width[4]<>0)or@| 16773 (cur_active_width[5]<>0) then 16774 begin b:=0; fit_class:=decent_fit; {infinite stretch} 16775 end 16776else begin if shortfall>7230584 then if cur_active_width[2]<1663497 then 16777 begin b:=inf_bad; fit_class:=very_loose_fit; goto done1; 16778 end; 16779 b:=badness(shortfall,cur_active_width[2]); 16780 if b>12 then 16781 if b>99 then fit_class:=very_loose_fit 16782 else fit_class:=loose_fit 16783 else fit_class:=decent_fit; 16784 done1: 16785 end 16786 16787@ Shrinkability is never infinite in a paragraph; 16788we can shrink the line from |r| to |cur_p| by at most |cur_active_width[6]|. 16789 16790@<Set the value of |b| to the badness for shrinking...@>= 16791begin if -shortfall>cur_active_width[6] then b:=inf_bad+1 16792else b:=badness(-shortfall,cur_active_width[6]); 16793if b>12 then fit_class:=tight_fit@+else fit_class:=decent_fit; 16794end 16795 16796@ During the final pass, we dare not lose all active nodes, lest we lose 16797touch with the line breaks already found. The code shown here makes sure 16798that such a catastrophe does not happen, by permitting overfull boxes as 16799a last resort. This particular part of \TeX\ was a source of several subtle 16800bugs before the correct program logic was finally discovered; readers 16801who seek to ``improve'' \TeX\ should therefore think thrice before daring 16802to make any changes here. 16803@^overfull boxes@> 16804 16805@<Prepare to deactivate node~|r|, and |goto deactivate| unless...@>= 16806begin if final_pass and (minimum_demerits=awful_bad) and@| 16807 (link(r)=last_active) and 16808 (prev_r=active) then 16809 artificial_demerits:=true {set demerits zero, this break is forced} 16810else if b>threshold then goto deactivate; 16811node_r_stays_active:=false; 16812end 16813 16814@ When we get to this part of the code, the line from |r| to |cur_p| is 16815feasible, its badness is~|b|, and its fitness classification is |fit_class|. 16816We don't want to make an active node for this break yet, but we will 16817compute the total demerits and record them in the |minimal_demerits| array, 16818if such a break is the current champion among all ways to get to |cur_p| 16819in a given line-number class and fitness class. 16820 16821@<Record a new feasible break@>= 16822if artificial_demerits then d:=0 16823else @<Compute the demerits, |d|, from |r| to |cur_p|@>; 16824@!stat if tracing_paragraphs>0 then 16825 @<Print a symbolic description of this feasible break@>; 16826tats@;@/ 16827d:=d+total_demerits(r); {this is the minimum total demerits 16828 from the beginning to |cur_p| via |r|} 16829if d<=minimal_demerits[fit_class] then 16830 begin minimal_demerits[fit_class]:=d; 16831 best_place[fit_class]:=break_node(r); best_pl_line[fit_class]:=l; 16832 if d<minimum_demerits then minimum_demerits:=d; 16833 end 16834 16835@ @<Print a symbolic description of this feasible break@>= 16836begin if printed_node<>cur_p then 16837 @<Print the list between |printed_node| and |cur_p|, 16838 then set |printed_node:=cur_p|@>; 16839print_nl("@@"); 16840@.\AT!@> 16841if cur_p=null then print_esc("par") 16842else if type(cur_p)<>glue_node then 16843 begin if type(cur_p)=penalty_node then print_esc("penalty") 16844 else if type(cur_p)=disc_node then print_esc("discretionary") 16845 else if type(cur_p)=kern_node then print_esc("kern") 16846 else print_esc("math"); 16847 end; 16848print(" via @@@@"); 16849if break_node(r)=null then print_char("0") 16850else print_int(serial(break_node(r))); 16851print(" b="); 16852if b>inf_bad then print_char("*")@+else print_int(b); 16853@.*\relax@> 16854print(" p="); print_int(pi); print(" d="); 16855if artificial_demerits then print_char("*")@+else print_int(d); 16856end 16857 16858@ @<Print the list between |printed_node| and |cur_p|...@>= 16859begin print_nl(""); 16860if cur_p=null then short_display(link(printed_node)) 16861else begin save_link:=link(cur_p); 16862 link(cur_p):=null; print_nl(""); short_display(link(printed_node)); 16863 link(cur_p):=save_link; 16864 end; 16865printed_node:=cur_p; 16866end 16867 16868@ When the data for a discretionary break is being displayed, we will have 16869printed the |pre_break| and |post_break| lists; we want to skip over the 16870third list, so that the discretionary data will not appear twice. The 16871following code is performed at the very end of |try_break|. 16872 16873@<Update the value of |printed_node|...@>= 16874if cur_p=printed_node then if cur_p<>null then if type(cur_p)=disc_node then 16875 begin t:=replace_count(cur_p); 16876 while t>0 do 16877 begin decr(t); printed_node:=link(printed_node); 16878 end; 16879 end 16880 16881@ @<Compute the demerits, |d|, from |r| to |cur_p|@>= 16882begin d:=line_penalty+b; 16883if abs(d)>=10000 then d:=100000000@+else d:=d*d; 16884if pi<>0 then 16885 if pi>0 then d:=d+pi*pi 16886 else if pi>eject_penalty then d:=d-pi*pi; 16887if (break_type=hyphenated)and(type(r)=hyphenated) then 16888 if cur_p<>null then d:=d+double_hyphen_demerits 16889 else d:=d+final_hyphen_demerits; 16890if abs(fit_class-fitness(r))>1 then d:=d+adj_demerits; 16891end 16892 16893@ When an active node disappears, we must delete an adjacent delta node if the 16894active node was at the beginning or the end of the active list, or if it 16895was surrounded by delta nodes. We also must preserve the property that 16896|cur_active_width| represents the length of material from |link(prev_r)| 16897to~|cur_p|. 16898 16899@d combine_two_deltas(#)==@|mem[prev_r+#].sc:=mem[prev_r+#].sc+mem[r+#].sc 16900@d downdate_width(#)==@|cur_active_width[#]:=cur_active_width[#]- 16901 mem[prev_r+#].sc 16902 16903@<Deactivate node |r|@>= 16904link(prev_r):=link(r); free_node(r,active_node_size); 16905if prev_r=active then @<Update the active widths, since the first active 16906 node has been deleted@> 16907else if type(prev_r)=delta_node then 16908 begin r:=link(prev_r); 16909 if r=last_active then 16910 begin do_all_six(downdate_width); 16911 link(prev_prev_r):=last_active; 16912 free_node(prev_r,delta_node_size); prev_r:=prev_prev_r; 16913 end 16914 else if type(r)=delta_node then 16915 begin do_all_six(update_width); 16916 do_all_six(combine_two_deltas); 16917 link(prev_r):=link(r); free_node(r,delta_node_size); 16918 end; 16919 end 16920 16921@ The following code uses the fact that |type(last_active)<>delta_node|. If the 16922active list has just become empty, we do not need to update the 16923|active_width| array, since it will be initialized when an active 16924node is next inserted. 16925 16926@d update_active(#)==active_width[#]:=active_width[#]+mem[r+#].sc 16927 16928@<Update the active widths,...@>= 16929begin r:=link(active); 16930if type(r)=delta_node then 16931 begin do_all_six(update_active); 16932 do_all_six(copy_to_cur_active); 16933 link(active):=link(r); free_node(r,delta_node_size); 16934 end; 16935end 16936 16937@* \[39] Breaking paragraphs into lines, continued. 16938So far we have gotten a little way into the |line_break| routine, having 16939covered its important |try_break| subroutine. Now let's consider the 16940rest of the process. 16941 16942The main loop of |line_break| traverses the given hlist, 16943starting at |link(temp_head)|, and calls |try_break| at each legal 16944breakpoint. A variable called |auto_breaking| is set to true except 16945within math formulas, since glue nodes are not legal breakpoints when 16946they appear in formulas. 16947 16948The current node of interest in the hlist is pointed to by |cur_p|. Another 16949variable, |prev_p|, is usually one step behind |cur_p|, but the real 16950meaning of |prev_p| is this: If |type(cur_p)=glue_node| then |cur_p| is a legal 16951breakpoint if and only if |auto_breaking| is true and |prev_p| does not 16952point to a glue node, penalty node, explicit kern node, or math node. 16953 16954The following declarations provide for a few other local variables that are 16955used in special calculations. 16956 16957@<Local variables for line breaking@>= 16958@!auto_breaking:boolean; {is node |cur_p| outside a formula?} 16959@!prev_p:pointer; {helps to determine when glue nodes are breakpoints} 16960@!q,@!r,@!s,@!prev_s:pointer; {miscellaneous nodes of temporary interest} 16961@!f:internal_font_number; {used when calculating character widths} 16962 16963@ The `\ignorespaces|loop|\unskip' in the following code is performed at most 16964thrice per call of |line_break|, since it is actually a pass over the 16965entire paragraph. 16966 16967@<Find optimal breakpoints@>= 16968threshold:=pretolerance; 16969if threshold>=0 then 16970 begin @!stat if tracing_paragraphs>0 then 16971 begin begin_diagnostic; print_nl("@@firstpass");@+end;@;@+tats@;@/ 16972 second_pass:=false; final_pass:=false; 16973 end 16974else begin threshold:=tolerance; second_pass:=true; 16975 final_pass:=(emergency_stretch<=0); 16976 @!stat if tracing_paragraphs>0 then begin_diagnostic;@+tats@; 16977 end; 16978loop@+ begin if threshold>inf_bad then threshold:=inf_bad; 16979 if second_pass then @<Initialize for hyphenating a paragraph@>; 16980 @<Create an active breakpoint representing the beginning of the paragraph@>; 16981 cur_p:=link(temp_head); auto_breaking:=true;@/ 16982 prev_p:=cur_p; {glue at beginning is not a legal breakpoint} 16983 while (cur_p<>null)and(link(active)<>last_active) do 16984 @<Call |try_break| if |cur_p| is a legal breakpoint; 16985 on the second pass, also try to hyphenate the next 16986 word, if |cur_p| is a glue node; 16987 then advance |cur_p| to the next node of the paragraph 16988 that could possibly be a legal breakpoint@>; 16989 if cur_p=null then 16990 @<Try the final line break at the end of the paragraph, 16991 and |goto done| if the desired breakpoints have been found@>; 16992 @<Clean up the memory by removing the break nodes@>; 16993 if not second_pass then 16994 begin@!stat if tracing_paragraphs>0 then print_nl("@@secondpass");@;@+tats@/ 16995 threshold:=tolerance; second_pass:=true; final_pass:=(emergency_stretch<=0); 16996 end {if at first you don't succeed, \dots} 16997 else begin @!stat if tracing_paragraphs>0 then 16998 print_nl("@@emergencypass");@;@+tats@/ 16999 background[2]:=background[2]+emergency_stretch; final_pass:=true; 17000 end; 17001 end; 17002done: @!stat if tracing_paragraphs>0 then 17003 begin end_diagnostic(true); normalize_selector; 17004 end;@+tats@/ 17005 17006@ The active node that represents the starting point does not need a 17007corresponding passive node. 17008 17009@d store_background(#)==active_width[#]:=background[#] 17010 17011@<Create an active breakpoint representing the beginning of the paragraph@>= 17012q:=get_node(active_node_size); 17013type(q):=unhyphenated; fitness(q):=decent_fit; 17014link(q):=last_active; break_node(q):=null; 17015line_number(q):=prev_graf+1; total_demerits(q):=0; link(active):=q; 17016do_all_six(store_background);@/ 17017passive:=null; printed_node:=temp_head; pass_number:=0; 17018font_in_short_display:=null_font 17019 17020@ @<Clean...@>= 17021q:=link(active); 17022while q<>last_active do 17023 begin cur_p:=link(q); 17024 if type(q)=delta_node then free_node(q,delta_node_size) 17025 else free_node(q,active_node_size); 17026 q:=cur_p; 17027 end; 17028q:=passive; 17029while q<>null do 17030 begin cur_p:=link(q); 17031 free_node(q,passive_node_size); 17032 q:=cur_p; 17033 end 17034 17035@ Here is the main switch in the |line_break| routine, where legal breaks 17036are determined. As we move through the hlist, we need to keep the |active_width| 17037array up to date, so that the badness of individual lines is readily calculated 17038by |try_break|. It is convenient to use the short name |act_width| for 17039the component of active width that represents real width as opposed to glue. 17040 17041@d act_width==active_width[1] {length from first active node to current node} 17042@d kern_break==begin if not is_char_node(link(cur_p)) and auto_breaking then 17043 if type(link(cur_p))=glue_node then try_break(0,unhyphenated); 17044 act_width:=act_width+width(cur_p); 17045 end 17046 17047@<Call |try_break| if |cur_p| is a legal breakpoint...@>= 17048begin if is_char_node(cur_p) then 17049 @<Advance \(c)|cur_p| to the node following the present 17050 string of characters@>; 17051case type(cur_p) of 17052hlist_node,vlist_node,rule_node: act_width:=act_width+width(cur_p); 17053whatsit_node: @<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>; 17054glue_node: begin @<If node |cur_p| is a legal breakpoint, call |try_break|; 17055 then update the active widths by including the glue in |glue_ptr(cur_p)|@>; 17056 if second_pass and auto_breaking then 17057 @<Try to hyphenate the following word@>; 17058 end; 17059kern_node: if subtype(cur_p)=explicit then kern_break 17060 else act_width:=act_width+width(cur_p); 17061ligature_node: begin f:=font(lig_char(cur_p)); 17062 act_width:=act_width+char_width(f)(char_info(f)(character(lig_char(cur_p)))); 17063 end; 17064disc_node: @<Try to break after a discretionary fragment, then |goto done5|@>; 17065math_node: begin auto_breaking:=(subtype(cur_p)=after); kern_break; 17066 end; 17067penalty_node: try_break(penalty(cur_p),unhyphenated); 17068mark_node,ins_node,adjust_node: do_nothing; 17069othercases confusion("paragraph") 17070@:this can't happen paragraph}{\quad paragraph@> 17071endcases;@/ 17072prev_p:=cur_p; cur_p:=link(cur_p); 17073done5:end 17074 17075@ The code that passes over the characters of words in a paragraph is 17076part of \TeX's inner loop, so it has been streamlined for speed. We use 17077the fact that `\.{\\parfillskip}' glue appears at the end of each paragraph; 17078it is therefore unnecessary to check if |link(cur_p)=null| when |cur_p| is a 17079character node. 17080@^inner loop@> 17081 17082@<Advance \(c)|cur_p| to the node following the present string...@>= 17083begin prev_p:=cur_p; 17084repeat f:=font(cur_p); 17085act_width:=act_width+char_width(f)(char_info(f)(character(cur_p))); 17086cur_p:=link(cur_p); 17087until not is_char_node(cur_p); 17088end 17089 17090@ When node |cur_p| is a glue node, we look at |prev_p| to see whether or not 17091a breakpoint is legal at |cur_p|, as explained above. 17092 17093@<If node |cur_p| is a legal breakpoint, call...@>= 17094if auto_breaking then 17095 begin if is_char_node(prev_p) then try_break(0,unhyphenated) 17096 else if precedes_break(prev_p) then try_break(0,unhyphenated) 17097 else if (type(prev_p)=kern_node)and(subtype(prev_p)<>explicit) then 17098 try_break(0,unhyphenated); 17099 end; 17100check_shrinkage(glue_ptr(cur_p)); q:=glue_ptr(cur_p); 17101act_width:=act_width+width(q);@| 17102active_width[2+stretch_order(q)]:=@| 17103 active_width[2+stretch_order(q)]+stretch(q);@/ 17104active_width[6]:=active_width[6]+shrink(q) 17105 17106@ The following code knows that discretionary texts contain 17107only character nodes, kern nodes, box nodes, rule nodes, and ligature nodes. 17108 17109@<Try to break after a discretionary fragment...@>= 17110begin s:=pre_break(cur_p); disc_width:=0; 17111if s=null then try_break(ex_hyphen_penalty,hyphenated) 17112else begin repeat @<Add the width of node |s| to |disc_width|@>; 17113 s:=link(s); 17114 until s=null; 17115 act_width:=act_width+disc_width; 17116 try_break(hyphen_penalty,hyphenated); 17117 act_width:=act_width-disc_width; 17118 end; 17119r:=replace_count(cur_p); s:=link(cur_p); 17120while r>0 do 17121 begin @<Add the width of node |s| to |act_width|@>; 17122 decr(r); s:=link(s); 17123 end; 17124prev_p:=cur_p; cur_p:=s; goto done5; 17125end 17126 17127@ @<Add the width of node |s| to |disc_width|@>= 17128if is_char_node(s) then 17129 begin f:=font(s); 17130 disc_width:=disc_width+char_width(f)(char_info(f)(character(s))); 17131 end 17132else case type(s) of 17133 ligature_node: begin f:=font(lig_char(s)); 17134 disc_width:=disc_width+ 17135 char_width(f)(char_info(f)(character(lig_char(s)))); 17136 end; 17137 hlist_node,vlist_node,rule_node,kern_node: 17138 disc_width:=disc_width+width(s); 17139 othercases confusion("disc3") 17140@:this can't happen disc3}{\quad disc3@> 17141 endcases 17142 17143@ @<Add the width of node |s| to |act_width|@>= 17144if is_char_node(s) then 17145 begin f:=font(s); 17146 act_width:=act_width+char_width(f)(char_info(f)(character(s))); 17147 end 17148else case type(s) of 17149 ligature_node: begin f:=font(lig_char(s)); 17150 act_width:=act_width+ 17151 char_width(f)(char_info(f)(character(lig_char(s)))); 17152 end; 17153 hlist_node,vlist_node,rule_node,kern_node: 17154 act_width:=act_width+width(s); 17155 othercases confusion("disc4") 17156@:this can't happen disc4}{\quad disc4@> 17157 endcases 17158 17159@ The forced line break at the paragraph's end will reduce the list of 17160breakpoints so that all active nodes represent breaks at |cur_p=null|. 17161On the first pass, we insist on finding an active node that has the 17162correct ``looseness.'' On the final pass, there will be at least one active 17163node, and we will match the desired looseness as well as we can. 17164 17165The global variable |best_bet| will be set to the active node for the best 17166way to break the paragraph, and a few other variables are used to 17167help determine what is best. 17168 17169@<Glob...@>= 17170@!best_bet:pointer; {use this passive node and its predecessors} 17171@!fewest_demerits:integer; {the demerits associated with |best_bet|} 17172@!best_line:halfword; {line number following the last line of the new paragraph} 17173@!actual_looseness:integer; {the difference between |line_number(best_bet)| 17174 and the optimum |best_line|} 17175@!line_diff:integer; {the difference between the current line number and 17176 the optimum |best_line|} 17177 17178@ @<Try the final line break at the end of the paragraph...@>= 17179begin try_break(eject_penalty,hyphenated); 17180if link(active)<>last_active then 17181 begin @<Find an active node with fewest demerits@>; 17182 if looseness=0 then goto done; 17183 @<Find the best active node for the desired looseness@>; 17184 if (actual_looseness=looseness)or final_pass then goto done; 17185 end; 17186end 17187 17188@ @<Find an active node...@>= 17189r:=link(active); fewest_demerits:=awful_bad; 17190repeat if type(r)<>delta_node then if total_demerits(r)<fewest_demerits then 17191 begin fewest_demerits:=total_demerits(r); best_bet:=r; 17192 end; 17193r:=link(r); 17194until r=last_active; 17195best_line:=line_number(best_bet) 17196 17197@ The adjustment for a desired looseness is a slightly more complicated 17198version of the loop just considered. Note that if a paragraph is broken 17199into segments by displayed equations, each segment will be subject to the 17200looseness calculation, independently of the other segments. 17201 17202@<Find the best active node...@>= 17203begin r:=link(active); actual_looseness:=0; 17204repeat if type(r)<>delta_node then 17205 begin line_diff:=line_number(r)-best_line; 17206 if ((line_diff<actual_looseness)and(looseness<=line_diff))or@| 17207 ((line_diff>actual_looseness)and(looseness>=line_diff)) then 17208 begin best_bet:=r; actual_looseness:=line_diff; 17209 fewest_demerits:=total_demerits(r); 17210 end 17211 else if (line_diff=actual_looseness)and@| 17212 (total_demerits(r)<fewest_demerits) then 17213 begin best_bet:=r; fewest_demerits:=total_demerits(r); 17214 end; 17215 end; 17216r:=link(r); 17217until r=last_active; 17218best_line:=line_number(best_bet); 17219end 17220 17221@ Once the best sequence of breakpoints has been found (hurray), we call on the 17222procedure |post_line_break| to finish the remainder of the work. 17223(By introducing this subprocedure, we are able to keep |line_break| 17224from getting extremely long.) 17225 17226@<Break the paragraph at the chosen...@>= 17227post_line_break(final_widow_penalty) 17228 17229@ The total number of lines that will be set by |post_line_break| 17230is |best_line-prev_graf-1|. The last breakpoint is specified by 17231|break_node(best_bet)|, and this passive node points to the other breakpoints 17232via the |prev_break| links. The finishing-up phase starts by linking the 17233relevant passive nodes in forward order, changing |prev_break| to 17234|next_break|. (The |next_break| fields actually reside in the same memory 17235space as the |prev_break| fields did, but we give them a new name because 17236of their new significance.) Then the lines are justified, one by one. 17237 17238@d next_break==prev_break {new name for |prev_break| after links are reversed} 17239 17240@<Declare subprocedures for |line_break|@>= 17241procedure post_line_break(@!final_widow_penalty:integer); 17242label done,done1; 17243var q,@!r,@!s:pointer; {temporary registers for list manipulation} 17244@!disc_break:boolean; {was the current break at a discretionary node?} 17245@!post_disc_break:boolean; {and did it have a nonempty post-break part?} 17246@!cur_width:scaled; {width of line number |cur_line|} 17247@!cur_indent:scaled; {left margin of line number |cur_line|} 17248@!t:quarterword; {used for replacement counts in discretionary nodes} 17249@!pen:integer; {use when calculating penalties between lines} 17250@!cur_line: halfword; {the current line number being justified} 17251begin @<Reverse the links of the relevant passive nodes, setting |cur_p| to the 17252 first breakpoint@>; 17253cur_line:=prev_graf+1; 17254repeat @<Justify the line ending at breakpoint |cur_p|, and append it to the 17255 current vertical list, together with associated penalties and other 17256 insertions@>; 17257incr(cur_line); cur_p:=next_break(cur_p); 17258if cur_p<>null then if not post_disc_break then 17259 @<Prune unwanted nodes at the beginning of the next line@>; 17260until cur_p=null; 17261if (cur_line<>best_line)or(link(temp_head)<>null) then 17262 confusion("line breaking"); 17263@:this can't happen line breaking}{\quad line breaking@> 17264prev_graf:=best_line-1; 17265end; 17266 17267@ The job of reversing links in a list is conveniently regarded as the job 17268of taking items off one stack and putting them on another. In this case we 17269take them off a stack pointed to by |q| and having |prev_break| fields; 17270we put them on a stack pointed to by |cur_p| and having |next_break| fields. 17271Node |r| is the passive node being moved from stack to stack. 17272 17273@<Reverse the links of the relevant passive nodes...@>= 17274q:=break_node(best_bet); cur_p:=null; 17275repeat r:=q; q:=prev_break(q); next_break(r):=cur_p; cur_p:=r; 17276until q=null 17277 17278@ Glue and penalty and kern and math nodes are deleted at the beginning of 17279a line, except in the anomalous case that the node to be deleted is actually 17280one of the chosen breakpoints. Otherwise 17281the pruning done here is designed to match 17282the lookahead computation in |try_break|, where the |break_width| values 17283are computed for non-discretionary breakpoints. 17284 17285@<Prune unwanted nodes at the beginning of the next line@>= 17286begin r:=temp_head; 17287loop@+ begin q:=link(r); 17288 if q=cur_break(cur_p) then goto done1; 17289 {|cur_break(cur_p)| is the next breakpoint} 17290 {now |q| cannot be |null|} 17291 if is_char_node(q) then goto done1; 17292 if non_discardable(q) then goto done1; 17293 if type(q)=kern_node then if subtype(q)<>explicit then goto done1; 17294 r:=q; {now |type(q)=glue_node|, |kern_node|, |math_node| or |penalty_node|} 17295 end; 17296done1: if r<>temp_head then 17297 begin link(r):=null; flush_node_list(link(temp_head)); 17298 link(temp_head):=q; 17299 end; 17300end 17301 17302@ The current line to be justified appears in a horizontal list starting 17303at |link(temp_head)| and ending at |cur_break(cur_p)|. If |cur_break(cur_p)| is 17304a glue node, we reset the glue to equal the |right_skip| glue; otherwise 17305we append the |right_skip| glue at the right. If |cur_break(cur_p)| is a 17306discretionary node, we modify the list so that the discretionary break 17307is compulsory, and we set |disc_break| to |true|. We also append 17308the |left_skip| glue at the left of the line, unless it is zero. 17309 17310@<Justify the line ending at breakpoint |cur_p|, and append it...@>= 17311@<Modify the end of the line to reflect the nature of the break and to include 17312 \.{\\rightskip}; also set the proper value of |disc_break|@>; 17313@<Put the \(l)\.{\\leftskip} glue at the left and detach this line@>; 17314@<Call the packaging subroutine, setting |just_box| to the justified box@>; 17315@<Append the new box to the current vertical list, followed by the list of 17316 special nodes taken out of the box by the packager@>; 17317@<Append a penalty node, if a nonzero penalty is appropriate@> 17318 17319@ At the end of the following code, |q| will point to the final node on the 17320list about to be justified. 17321 17322@<Modify the end of the line...@>= 17323q:=cur_break(cur_p); disc_break:=false; post_disc_break:=false; 17324if q<>null then {|q| cannot be a |char_node|} 17325 if type(q)=glue_node then 17326 begin delete_glue_ref(glue_ptr(q)); 17327 glue_ptr(q):=right_skip; 17328 subtype(q):=right_skip_code+1; add_glue_ref(right_skip); 17329 goto done; 17330 end 17331 else begin if type(q)=disc_node then 17332 @<Change discretionary to compulsory and set 17333 |disc_break:=true|@> 17334 else if (type(q)=math_node)or(type(q)=kern_node) then width(q):=0; 17335 end 17336else begin q:=temp_head; 17337 while link(q)<>null do q:=link(q); 17338 end; 17339@<Put the \(r)\.{\\rightskip} glue after node |q|@>; 17340done: 17341 17342@ @<Change discretionary to compulsory...@>= 17343begin t:=replace_count(q); 17344@<Destroy the |t| nodes following |q|, and 17345 make |r| point to the following node@>; 17346if post_break(q)<>null then @<Transplant the post-break list@>; 17347if pre_break(q)<>null then @<Transplant the pre-break list@>; 17348link(q):=r; disc_break:=true; 17349end 17350 17351@ @<Destroy the |t| nodes following |q|...@>= 17352if t=0 then r:=link(q) 17353else begin r:=q; 17354 while t>1 do 17355 begin r:=link(r); decr(t); 17356 end; 17357 s:=link(r); 17358 r:=link(s); link(s):=null; 17359 flush_node_list(link(q)); replace_count(q):=0; 17360 end 17361 17362@ We move the post-break list from inside node |q| to the main list by 17363re\-attaching it just before the present node |r|, then resetting |r|. 17364 17365@<Transplant the post-break list@>= 17366begin s:=post_break(q); 17367while link(s)<>null do s:=link(s); 17368link(s):=r; r:=post_break(q); post_break(q):=null; post_disc_break:=true; 17369end 17370 17371@ We move the pre-break list from inside node |q| to the main list by 17372re\-attaching it just after the present node |q|, then resetting |q|. 17373 17374@<Transplant the pre-break list@>= 17375begin s:=pre_break(q); link(q):=s; 17376while link(s)<>null do s:=link(s); 17377pre_break(q):=null; q:=s; 17378end 17379 17380@ @<Put the \(r)\.{\\rightskip} glue after node |q|@>= 17381r:=new_param_glue(right_skip_code); link(r):=link(q); link(q):=r; q:=r 17382 17383@ The following code begins with |q| at the end of the list to be 17384justified. It ends with |q| at the beginning of that list, and with 17385|link(temp_head)| pointing to the remainder of the paragraph, if any. 17386 17387@<Put the \(l)\.{\\leftskip} glue at the left...@>= 17388r:=link(q); link(q):=null; q:=link(temp_head); link(temp_head):=r; 17389if left_skip<>zero_glue then 17390 begin r:=new_param_glue(left_skip_code); 17391 link(r):=q; q:=r; 17392 end 17393 17394@ @<Append the new box to the current vertical list...@>= 17395append_to_vlist(just_box); 17396if adjust_head<>adjust_tail then 17397 begin link(tail):=link(adjust_head); tail:=adjust_tail; 17398 end; 17399adjust_tail:=null 17400 17401@ Now |q| points to the hlist that represents the current line of the 17402paragraph. We need to compute the appropriate line width, pack the 17403line into a box of this size, and shift the box by the appropriate 17404amount of indentation. 17405 17406@<Call the packaging subroutine...@>= 17407if cur_line>last_special_line then 17408 begin cur_width:=second_width; cur_indent:=second_indent; 17409 end 17410else if par_shape_ptr=null then 17411 begin cur_width:=first_width; cur_indent:=first_indent; 17412 end 17413else begin cur_width:=mem[par_shape_ptr+2*cur_line].sc; 17414 cur_indent:=mem[par_shape_ptr+2*cur_line-1].sc; 17415 end; 17416adjust_tail:=adjust_head; just_box:=hpack(q,cur_width,exactly); 17417shift_amount(just_box):=cur_indent 17418 17419@ Penalties between the lines of a paragraph come from club and widow lines, 17420from the |inter_line_penalty| parameter, and from lines that end at 17421discretionary breaks. Breaking between lines of a two-line paragraph gets 17422both club-line and widow-line penalties. The local variable |pen| will 17423be set to the sum of all relevant penalties for the current line, except 17424that the final line is never penalized. 17425 17426@<Append a penalty node, if a nonzero penalty is appropriate@>= 17427if cur_line+1<>best_line then 17428 begin pen:=inter_line_penalty; 17429 if cur_line=prev_graf+1 then pen:=pen+club_penalty; 17430 if cur_line+2=best_line then pen:=pen+final_widow_penalty; 17431 if disc_break then pen:=pen+broken_penalty; 17432 if pen<>0 then 17433 begin r:=new_penalty(pen); 17434 link(tail):=r; tail:=r; 17435 end; 17436 end 17437 17438@* \[40] Pre-hyphenation. 17439When the line-breaking routine is unable to find a feasible sequence of 17440breakpoints, it makes a second pass over the paragraph, attempting to 17441hyphenate the hyphenatable words. The goal of hyphenation is to insert 17442discretionary material into the paragraph so that there are more 17443potential places to break. 17444 17445The general rules for hyphenation are somewhat complex and technical, 17446because we want to be able to hyphenate words that are preceded or 17447followed by punctuation marks, and because we want the rules to work 17448for languages other than English. We also must contend with the fact 17449that hyphens might radically alter the ligature and kerning structure 17450of a word. 17451 17452A sequence of characters will be considered for hyphenation only if it 17453belongs to a ``potentially hyphenatable part'' of the current paragraph. 17454This is a sequence of nodes $p_0p_1\ldots p_m$ where $p_0$ is a glue node, 17455$p_1\ldots p_{m-1}$ are either character or ligature or whatsit or 17456implicit kern nodes, and $p_m$ is a glue or penalty or insertion or adjust 17457or mark or whatsit or explicit kern node. (Therefore hyphenation is 17458disabled by boxes, math formulas, and discretionary nodes already inserted 17459by the user.) The ligature nodes among $p_1\ldots p_{m-1}$ are effectively 17460expanded into the original non-ligature characters; the kern nodes and 17461whatsits are ignored. Each character |c| is now classified as either a 17462nonletter (if |lc_code(c)=0|), a lowercase letter (if 17463|lc_code(c)=c|), or an uppercase letter (otherwise); an uppercase letter 17464is treated as if it were |lc_code(c)| for purposes of hyphenation. The 17465characters generated by $p_1\ldots p_{m-1}$ may begin with nonletters; let 17466$c_1$ be the first letter that is not in the middle of a ligature. Whatsit 17467nodes preceding $c_1$ are ignored; a whatsit found after $c_1$ will be the 17468terminating node $p_m$. All characters that do not have the same font as 17469$c_1$ will be treated as nonletters. The |hyphen_char| for that font 17470must be between 0 and 255, otherwise hyphenation will not be attempted. 17471\TeX\ looks ahead for as many consecutive letters $c_1\ldots c_n$ as 17472possible; however, |n| must be less than 64, so a character that would 17473otherwise be $c_{64}$ is effectively not a letter. Furthermore $c_n$ must 17474not be in the middle of a ligature. In this way we obtain a string of 17475letters $c_1\ldots c_n$ that are generated by nodes $p_a\ldots p_b$, where 17476|1<=a<=b+1<=m|. If |n>=l_hyf+r_hyf|, this string qualifies for hyphenation; 17477however, |uc_hyph| must be positive, if $c_1$ is uppercase. 17478 17479The hyphenation process takes place in three stages. First, the candidate 17480sequence $c_1\ldots c_n$ is found; then potential positions for hyphens 17481are determined by referring to hyphenation tables; and finally, the nodes 17482$p_a\ldots p_b$ are replaced by a new sequence of nodes that includes the 17483discretionary breaks found. 17484 17485Fortunately, we do not have to do all this calculation very often, because 17486of the way it has been taken out of \TeX's inner loop. For example, when 17487the second edition of the author's 700-page book {\sl Seminumerical 17488Algorithms} was typeset by \TeX, only about 1.2 hyphenations needed to be 17489@^Knuth, Donald Ervin@> 17490tried per paragraph, since the line breaking algorithm needed to use two 17491passes on only about 5 per cent of the paragraphs. 17492 17493@<Initialize for hyphenating...@>= 17494begin @!init if trie_not_ready then init_trie;@+tini@;@/ 17495cur_lang:=init_cur_lang; l_hyf:=init_l_hyf; r_hyf:=init_r_hyf; 17496end 17497 17498@ The letters $c_1\ldots c_n$ that are candidates for hyphenation are placed 17499into an array called |hc|; the number |n| is placed into |hn|; pointers to 17500nodes $p_{a-1}$ and~$p_b$ in the description above are placed into variables 17501|ha| and |hb|; and the font number is placed into |hf|. 17502 17503@<Glob...@>= 17504@!hc:array[0..65] of 0..256; {word to be hyphenated} 17505@!hn:small_number; {the number of positions occupied in |hc|} 17506@!ha,@!hb:pointer; {nodes |ha..hb| should be replaced by the hyphenated result} 17507@!hf:internal_font_number; {font number of the letters in |hc|} 17508@!hu:array[0..63] of 0..256; {like |hc|, before conversion to lowercase} 17509@!hyf_char:integer; {hyphen character of the relevant font} 17510@!cur_lang,@!init_cur_lang:ASCII_code; {current hyphenation table of interest} 17511@!l_hyf,@!r_hyf,@!init_l_hyf,@!init_r_hyf:integer; {limits on fragment sizes} 17512@!hyf_bchar:halfword; {boundary character after $c_n$} 17513 17514@ Hyphenation routines need a few more local variables. 17515 17516@<Local variables for line...@>= 17517@!j:small_number; {an index into |hc| or |hu|} 17518@!c:0..255; {character being considered for hyphenation} 17519 17520@ When the following code is activated, the |line_break| procedure is in its 17521second pass, and |cur_p| points to a glue node. 17522 17523@<Try to hyphenate...@>= 17524begin prev_s:=cur_p; s:=link(prev_s); 17525if s<>null then 17526 begin @<Skip to node |ha|, or |goto done1| if no hyphenation 17527 should be attempted@>; 17528 if l_hyf+r_hyf>63 then goto done1; 17529 @<Skip to node |hb|, putting letters into |hu| and |hc|@>; 17530 @<Check that the nodes following |hb| permit hyphenation and that at least 17531 |l_hyf+r_hyf| letters have been found, otherwise |goto done1|@>; 17532 hyphenate; 17533 end; 17534done1: end 17535 17536@ @<Declare subprocedures for |line_break|@>= 17537@t\4@>@<Declare the function called |reconstitute|@> 17538procedure hyphenate; 17539label common_ending,done,found,found1,found2,not_found,exit; 17540var @<Local variables for hyphenation@>@; 17541begin @<Find hyphen locations for the word in |hc|, or |return|@>; 17542@<If no hyphens were found, |return|@>; 17543@<Replace nodes |ha..hb| by a sequence of nodes that includes 17544 the discretionary hyphens@>; 17545exit:end; 17546 17547@ The first thing we need to do is find the node |ha| just before the 17548first letter. 17549 17550@<Skip to node |ha|, or |goto done1|...@>= 17551loop@+ begin if is_char_node(s) then 17552 begin c:=qo(character(s)); hf:=font(s); 17553 end 17554 else if type(s)=ligature_node then 17555 if lig_ptr(s)=null then goto continue 17556 else begin q:=lig_ptr(s); c:=qo(character(q)); hf:=font(q); 17557 end 17558 else if (type(s)=kern_node)and(subtype(s)=normal) then goto continue 17559 else if type(s)=whatsit_node then 17560 begin @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>; 17561 goto continue; 17562 end 17563 else goto done1; 17564 if lc_code(c)<>0 then 17565 if (lc_code(c)=c)or(uc_hyph>0) then goto done2 17566 else goto done1; 17567continue: prev_s:=s; s:=link(prev_s); 17568 end; 17569done2: hyf_char:=hyphen_char[hf]; 17570if hyf_char<0 then goto done1; 17571if hyf_char>255 then goto done1; 17572ha:=prev_s 17573 17574@ The word to be hyphenated is now moved to the |hu| and |hc| arrays. 17575 17576@<Skip to node |hb|, putting letters...@>= 17577hn:=0; 17578loop@+ begin if is_char_node(s) then 17579 begin if font(s)<>hf then goto done3; 17580 hyf_bchar:=character(s); c:=qo(hyf_bchar); 17581 if lc_code(c)=0 then goto done3; 17582 if hn=63 then goto done3; 17583 hb:=s; incr(hn); hu[hn]:=c; hc[hn]:=lc_code(c); hyf_bchar:=non_char; 17584 end 17585 else if type(s)=ligature_node then 17586 @<Move the characters of a ligature node to |hu| and |hc|; 17587 but |goto done3| if they are not all letters@> 17588 else if (type(s)=kern_node)and(subtype(s)=normal) then 17589 begin hb:=s; 17590 hyf_bchar:=font_bchar[hf]; 17591 end 17592 else goto done3; 17593 s:=link(s); 17594 end; 17595done3: 17596 17597@ We let |j| be the index of the character being stored when a ligature node 17598is being expanded, since we do not want to advance |hn| until we are sure 17599that the entire ligature consists of letters. Note that it is possible 17600to get to |done3| with |hn=0| and |hb| not set to any value. 17601 17602@<Move the characters of a ligature node to |hu| and |hc|...@>= 17603begin if font(lig_char(s))<>hf then goto done3; 17604j:=hn; q:=lig_ptr(s);@+if q>null then hyf_bchar:=character(q); 17605while q>null do 17606 begin c:=qo(character(q)); 17607 if lc_code(c)=0 then goto done3; 17608 if j=63 then goto done3; 17609 incr(j); hu[j]:=c; hc[j]:=lc_code(c);@/ 17610 q:=link(q); 17611 end; 17612hb:=s; hn:=j; 17613if odd(subtype(s)) then hyf_bchar:=font_bchar[hf]@+else hyf_bchar:=non_char; 17614end 17615 17616@ @<Check that the nodes following |hb| permit hyphenation...@>= 17617if hn<l_hyf+r_hyf then goto done1; {|l_hyf| and |r_hyf| are |>=1|} 17618loop@+ begin if not(is_char_node(s)) then 17619 case type(s) of 17620 ligature_node: do_nothing; 17621 kern_node: if subtype(s)<>normal then goto done4; 17622 whatsit_node,glue_node,penalty_node,ins_node,adjust_node,mark_node: 17623 goto done4; 17624 othercases goto done1 17625 endcases; 17626 s:=link(s); 17627 end; 17628done4: 17629 17630@* \[41] Post-hyphenation. 17631If a hyphen may be inserted between |hc[j]| and |hc[j+1]|, the hyphenation 17632procedure will set |hyf[j]| to some small odd number. But before we look 17633at \TeX's hyphenation procedure, which is independent of the rest of the 17634line-breaking algorithm, let us consider what we will do with the hyphens 17635it finds, since it is better to work on this part of the program before 17636forgetting what |ha| and |hb|, etc., are all about. 17637 17638@<Glob...@>= 17639@!hyf:array [0..64] of 0..9; {odd values indicate discretionary hyphens} 17640@!init_list:pointer; {list of punctuation characters preceding the word} 17641@!init_lig:boolean; {does |init_list| represent a ligature?} 17642@!init_lft:boolean; {if so, did the ligature involve a left boundary?} 17643 17644@ @<Local variables for hyphenation@>= 17645@!i,@!j,@!l:0..65; {indices into |hc| or |hu|} 17646@!q,@!r,@!s:pointer; {temporary registers for list manipulation} 17647@!bchar:halfword; {right boundary character of hyphenated word, or |non_char|} 17648 17649@ \TeX\ will never insert a hyphen that has fewer than 17650\.{\\lefthyphenmin} letters before it or fewer than 17651\.{\\righthyphenmin} after it; hence, a short word has 17652comparatively little chance of being hyphenated. If no hyphens have 17653been found, we can save time by not having to make any changes to the 17654paragraph. 17655 17656@<If no hyphens were found, |return|@>= 17657for j:=l_hyf to hn-r_hyf do if odd(hyf[j]) then goto found1; 17658return; 17659found1: 17660 17661@ If hyphens are in fact going to be inserted, \TeX\ first deletes the 17662subsequence of nodes between |ha| and~|hb|. An attempt is made to 17663preserve the effect that implicit boundary characters and punctuation marks 17664had on ligatures inside the hyphenated word, by storing a left boundary or 17665preceding character in |hu[0]| and by storing a possible right boundary 17666in |bchar|. We set |j:=0| if |hu[0]| is to be part of the reconstruction; 17667otherwise |j:=1|. 17668The variable |s| will point to the tail of the current hlist, and 17669|q| will point to the node following |hb|, so that 17670things can be hooked up after we reconstitute the hyphenated word. 17671 17672@<Replace nodes |ha..hb| by a sequence of nodes...@>= 17673q:=link(hb); link(hb):=null; r:=link(ha); link(ha):=null; bchar:=hyf_bchar; 17674if is_char_node(ha) then 17675 if font(ha)<>hf then goto found2 17676 else begin init_list:=ha; init_lig:=false; hu[0]:=qo(character(ha)); 17677 end 17678else if type(ha)=ligature_node then 17679 if font(lig_char(ha))<>hf then goto found2 17680 else begin init_list:=lig_ptr(ha); init_lig:=true; init_lft:=(subtype(ha)>1); 17681 hu[0]:=qo(character(lig_char(ha))); 17682 if init_list=null then if init_lft then 17683 begin hu[0]:=256; init_lig:=false; 17684 end; {in this case a ligature will be reconstructed from scratch} 17685 free_node(ha,small_node_size); 17686 end 17687else begin {no punctuation found; look for left boundary} 17688 if not is_char_node(r) then if type(r)=ligature_node then 17689 if subtype(r)>1 then goto found2; 17690 j:=1; s:=ha; init_list:=null; goto common_ending; 17691 end; 17692s:=cur_p; {we have |cur_p<>ha| because |type(cur_p)=glue_node|} 17693while link(s)<>ha do s:=link(s); 17694j:=0; goto common_ending; 17695found2: s:=ha; j:=0; hu[0]:=256; init_lig:=false; init_list:=null; 17696common_ending: flush_node_list(r); 17697@<Reconstitute nodes for the hyphenated word, inserting discretionary hyphens@>; 17698flush_list(init_list) 17699 17700@ We must now face the fact that the battle is not over, even though the 17701{\def\!{\kern-1pt}% 17702hyphens have been found: The process of reconstituting a word can be nontrivial 17703because ligatures might change when a hyphen is present. {\sl The \TeX book\/} 17704discusses the difficulties of the word ``difficult'', and 17705the discretionary material surrounding a 17706hyphen can be considerably more complex than that. Suppose 17707\.{abcdef} is a word in a font for which the only ligatures are \.{b\!c}, 17708\.{c\!d}, \.{d\!e}, and \.{e\!f}. If this word permits hyphenation 17709between \.b and \.c, the two patterns with and without hyphenation are 17710$\.a\,\.b\,\.-\,\.{c\!d}\,\.{e\!f}$ and $\.a\,\.{b\!c}\,\.{d\!e}\,\.f$. 17711Thus the insertion of a hyphen might cause effects to ripple arbitrarily 17712far into the rest of the word. A further complication arises if additional 17713hyphens appear together with such rippling, e.g., if the word in the 17714example just given could also be hyphenated between \.c and \.d; \TeX\ 17715avoids this by simply ignoring the additional hyphens in such weird cases.} 17716 17717Still further complications arise in the presence of ligatures that do not 17718delete the original characters. When punctuation precedes the word being 17719hyphenated, \TeX's method is not perfect under all possible scenarios, 17720because punctuation marks and letters can propagate information back and forth. 17721For example, suppose the original pre-hyphenation pair 17722\.{*a} changes to \.{*y} via a \.{\?=:} ligature, which changes to \.{xy} 17723via a \.{=:\?} ligature; if $p_{a-1}=\.x$ and $p_a=\.y$, the reconstitution 17724procedure isn't smart enough to obtain \.{xy} again. In such cases the 17725font designer should include a ligature that goes from \.{xa} to \.{xy}. 17726 17727@ The processing is facilitated by a subroutine called |reconstitute|. Given 17728a string of characters $x_j\ldots x_n$, there is a smallest index $m\ge j$ 17729such that the ``translation'' of $x_j\ldots x_n$ by ligatures and kerning 17730has the form $y_1\ldots y_t$ followed by the translation of $x_{m+1}\ldots x_n$, 17731where $y_1\ldots y_t$ is some nonempty sequence of character, ligature, and 17732kern nodes. We call $x_j\ldots x_m$ a ``cut prefix'' of $x_j\ldots x_n$. 17733For example, if $x_1x_2x_3=\.{fly}$, and if the font contains `fl' as a 17734ligature and a kern between `fl' and `y', then $m=2$, $t=2$, and $y_1$ will 17735be a ligature node for `fl' followed by an appropriate kern node~$y_2$. 17736In the most common case, $x_j$~forms no ligature with $x_{j+1}$ and we 17737simply have $m=j$, $y_1=x_j$. If $m<n$ we can repeat the procedure on 17738$x_{m+1}\ldots x_n$ until the entire translation has been found. 17739 17740The |reconstitute| function returns the integer $m$ and puts the nodes 17741$y_1\ldots y_t$ into a linked list starting at |link(hold_head)|, 17742getting the input $x_j\ldots x_n$ from the |hu| array. If $x_j=256$, 17743we consider $x_j$ to be an implicit left boundary character; in this 17744case |j| must be strictly less than~|n|. There is a 17745parameter |bchar|, which is either 256 or an implicit right boundary character 17746assumed to be present just following~$x_n$. (The value |hu[n+1]| is never 17747explicitly examined, but the algorithm imagines that |bchar| is there.) 17748 17749If there exists an index |k| in the range $j\le k\le m$ such that |hyf[k]| 17750is odd and such that the result of |reconstitute| would have been different 17751if $x_{k+1}$ had been |hchar|, then |reconstitute| sets |hyphen_passed| 17752to the smallest such~|k|. Otherwise it sets |hyphen_passed| to zero. 17753 17754A special convention is used in the case |j=0|: Then we assume that the 17755translation of |hu[0]| appears in a special list of charnodes starting at 17756|init_list|; moreover, if |init_lig| is |true|, then |hu[0]| will be 17757a ligature character, involving a left boundary if |init_lft| is |true|. 17758This facility is provided for cases when a hyphenated 17759word is preceded by punctuation (like single or double quotes) that might 17760affect the translation of the beginning of the word. 17761 17762@<Glob...@>= 17763@!hyphen_passed:small_number; {first hyphen in a ligature, if any} 17764 17765@ @<Declare the function called |reconstitute|@>= 17766function reconstitute(@!j,@!n:small_number;@!bchar,@!hchar:halfword): 17767 small_number; 17768label continue,done; 17769var @!p:pointer; {temporary register for list manipulation} 17770@!t:pointer; {a node being appended to} 17771@!q:four_quarters; {character information or a lig/kern instruction} 17772@!cur_rh:halfword; {hyphen character for ligature testing} 17773@!test_char:halfword; {hyphen or other character for ligature testing} 17774@!w:scaled; {amount of kerning} 17775@!k:font_index; {position of current lig/kern instruction} 17776begin hyphen_passed:=0; t:=hold_head; w:=0; link(hold_head):=null; 17777 {at this point |ligature_present=lft_hit=rt_hit=false|} 17778@<Set up data structures with the cursor following position |j|@>; 17779continue:@<If there's a ligature or kern at the cursor position, update the data 17780 structures, possibly advancing~|j|; continue until the cursor moves@>; 17781@<Append a ligature and/or kern to the translation; 17782 |goto continue| if the stack of inserted ligatures is nonempty@>; 17783reconstitute:=j; 17784end; 17785 17786@ The reconstitution procedure shares many of the global data structures 17787by which \TeX\ has processed the words before they were hyphenated. 17788There is an implied ``cursor'' between characters |cur_l| and |cur_r|; 17789these characters will be tested for possible ligature activity. If 17790|ligature_present| then |cur_l| is a ligature character formed from the 17791original characters following |cur_q| in the current translation list. 17792There is a ``ligature stack'' between the cursor and character |j+1|, 17793consisting of pseudo-ligature nodes linked together by their |link| fields. 17794This stack is normally empty unless a ligature command has created a new 17795character that will need to be processed later. A pseudo-ligature is 17796a special node having a |character| field that represents a potential 17797ligature and a |lig_ptr| field that points to a |char_node| or is |null|. 17798We have 17799$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr 17800 |qi(hu[j+1])|,&if |lig_stack=null| and |j<n|;\cr 17801 bchar,&if |lig_stack=null| and |j=n|.\cr}$$ 17802 17803@<Glob...@>= 17804@!cur_l,@!cur_r:halfword; {characters before and after the cursor} 17805@!cur_q:pointer; {where a ligature should be detached} 17806@!lig_stack:pointer; {unfinished business to the right of the cursor} 17807@!ligature_present:boolean; {should a ligature node be made for |cur_l|?} 17808@!lft_hit,@!rt_hit:boolean; {did we hit a ligature with a boundary character?} 17809 17810@ @d append_charnode_to_t(#)== begin link(t):=get_avail; t:=link(t); 17811 font(t):=hf; character(t):=#; 17812 end 17813@d set_cur_r==begin if j<n then cur_r:=qi(hu[j+1])@+else cur_r:=bchar; 17814 if odd(hyf[j]) then cur_rh:=hchar@+else cur_rh:=non_char; 17815 end 17816 17817@<Set up data structures with the cursor following position |j|@>= 17818cur_l:=qi(hu[j]); cur_q:=t; 17819if j=0 then 17820 begin ligature_present:=init_lig; p:=init_list; 17821 if ligature_present then lft_hit:=init_lft; 17822 while p>null do 17823 begin append_charnode_to_t(character(p)); p:=link(p); 17824 end; 17825 end 17826else if cur_l<non_char then append_charnode_to_t(cur_l); 17827lig_stack:=null; set_cur_r 17828 17829@ We may want to look at the lig/kern program twice, once for a hyphen 17830and once for a normal letter. (The hyphen might appear after the letter 17831in the program, so we'd better not try to look for both at once.) 17832 17833@<If there's a ligature or kern at the cursor position, update...@>= 17834if cur_l=non_char then 17835 begin k:=bchar_label[hf]; 17836 if k=non_address then goto done@+else q:=font_info[k].qqqq; 17837 end 17838else begin q:=char_info(hf)(cur_l); 17839 if char_tag(q)<>lig_tag then goto done; 17840 k:=lig_kern_start(hf)(q); q:=font_info[k].qqqq; 17841 if skip_byte(q)>stop_flag then 17842 begin k:=lig_kern_restart(hf)(q); q:=font_info[k].qqqq; 17843 end; 17844 end; {now |k| is the starting address of the lig/kern program} 17845if cur_rh<non_char then test_char:=cur_rh@+else test_char:=cur_r; 17846loop@+begin if next_char(q)=test_char then if skip_byte(q)<=stop_flag then 17847 if cur_rh<non_char then 17848 begin hyphen_passed:=j; hchar:=non_char; cur_rh:=non_char; 17849 goto continue; 17850 end 17851 else begin if hchar<non_char then if odd(hyf[j]) then 17852 begin hyphen_passed:=j; hchar:=non_char; 17853 end; 17854 if op_byte(q)<kern_flag then 17855 @<Carry out a ligature replacement, updating the cursor structure 17856 and possibly advancing~|j|; |goto continue| if the cursor doesn't 17857 advance, otherwise |goto done|@>; 17858 w:=char_kern(hf)(q); goto done; {this kern will be inserted below} 17859 end; 17860 if skip_byte(q)>=stop_flag then 17861 if cur_rh=non_char then goto done 17862 else begin cur_rh:=non_char; goto continue; 17863 end; 17864 k:=k+qo(skip_byte(q))+1; q:=font_info[k].qqqq; 17865 end; 17866done: 17867 17868@ @d wrap_lig(#)==if ligature_present then 17869 begin p:=new_ligature(hf,cur_l,link(cur_q)); 17870 if lft_hit then 17871 begin subtype(p):=2; lft_hit:=false; 17872 end; 17873 if # then if lig_stack=null then 17874 begin incr(subtype(p)); rt_hit:=false; 17875 end; 17876 link(cur_q):=p; t:=p; ligature_present:=false; 17877 end 17878@d pop_lig_stack==begin if lig_ptr(lig_stack)>null then 17879 begin link(t):=lig_ptr(lig_stack); {this is a charnode for |hu[j+1]|} 17880 t:=link(t); incr(j); 17881 end; 17882 p:=lig_stack; lig_stack:=link(p); free_node(p,small_node_size); 17883 if lig_stack=null then set_cur_r@+else cur_r:=character(lig_stack); 17884 end {if |lig_stack| isn't |null| we have |cur_rh=non_char|} 17885 17886@<Append a ligature and/or kern to the translation...@>= 17887wrap_lig(rt_hit); 17888if w<>0 then 17889 begin link(t):=new_kern(w); t:=link(t); w:=0; 17890 end; 17891if lig_stack>null then 17892 begin cur_q:=t; cur_l:=character(lig_stack); ligature_present:=true; 17893 pop_lig_stack; goto continue; 17894 end 17895 17896@ @<Carry out a ligature replacement, updating the cursor structure...@>= 17897begin if cur_l=non_char then lft_hit:=true; 17898if j=n then if lig_stack=null then rt_hit:=true; 17899check_interrupt; {allow a way out in case there's an infinite ligature loop} 17900case op_byte(q) of 17901qi(1),qi(5):begin cur_l:=rem_byte(q); {\.{=:\?}, \.{=:\?>}} 17902 ligature_present:=true; 17903 end; 17904qi(2),qi(6):begin cur_r:=rem_byte(q); {\.{\?=:}, \.{\?=:>}} 17905 if lig_stack>null then character(lig_stack):=cur_r 17906 else begin lig_stack:=new_lig_item(cur_r); 17907 if j=n then bchar:=non_char 17908 else begin p:=get_avail; lig_ptr(lig_stack):=p; 17909 character(p):=qi(hu[j+1]); font(p):=hf; 17910 end; 17911 end; 17912 end; 17913qi(3):begin cur_r:=rem_byte(q); {\.{\?=:\?}} 17914 p:=lig_stack; lig_stack:=new_lig_item(cur_r); link(lig_stack):=p; 17915 end; 17916qi(7),qi(11):begin wrap_lig(false); {\.{\?=:\?>}, \.{\?=:\?>>}} 17917 cur_q:=t; cur_l:=rem_byte(q); ligature_present:=true; 17918 end; 17919othercases begin cur_l:=rem_byte(q); ligature_present:=true; {\.{=:}} 17920 if lig_stack>null then pop_lig_stack 17921 else if j=n then goto done 17922 else begin append_charnode_to_t(cur_r); incr(j); set_cur_r; 17923 end; 17924 end 17925endcases; 17926if op_byte(q)>qi(4) then if op_byte(q)<>qi(7) then goto done; 17927goto continue; 17928end 17929 17930@ Okay, we're ready to insert the potential hyphenations that were found. 17931When the following program is executed, we want to append the word 17932|hu[1..hn]| after node |ha|, and node |q| should be appended to the result. 17933During this process, the variable |i| will be a temporary 17934index into |hu|; the variable |j| will be an index to our current position 17935in |hu|; the variable |l| will be the counterpart of |j|, in a discretionary 17936branch; the variable |r| will point to new nodes being created; and 17937we need a few new local variables: 17938 17939@<Local variables for hyph...@>= 17940@!major_tail,@!minor_tail:pointer; {the end of lists in the main and 17941 discretionary branches being reconstructed} 17942@!c:ASCII_code; {character temporarily replaced by a hyphen} 17943@!c_loc:0..63; {where that character came from} 17944@!r_count:integer; {replacement count for discretionary} 17945@!hyf_node:pointer; {the hyphen, if it exists} 17946 17947@ When the following code is performed, |hyf[0]| and |hyf[hn]| will be zero. 17948 17949@<Reconstitute nodes for the hyphenated word...@>= 17950repeat l:=j; j:=reconstitute(j,hn,bchar,qi(hyf_char))+1; 17951if hyphen_passed=0 then 17952 begin link(s):=link(hold_head); 17953 while link(s)>null do s:=link(s); 17954 if odd(hyf[j-1]) then 17955 begin l:=j; hyphen_passed:=j-1; link(hold_head):=null; 17956 end; 17957 end; 17958if hyphen_passed>0 then 17959 @<Create and append a discretionary node as an alternative to the 17960 unhyphenated word, and continue to develop both branches until they 17961 become equivalent@>; 17962until j>hn; 17963link(s):=q 17964 17965@ In this repeat loop we will insert another discretionary if |hyf[j-1]| is 17966odd, when both branches of the previous discretionary end at position |j-1|. 17967Strictly speaking, we aren't justified in doing this, because we don't know 17968that a hyphen after |j-1| is truly independent of those branches. But in almost 17969all applications we would rather not lose a potentially valuable hyphenation 17970point. (Consider the word `difficult', where the letter `c' is in position |j|.) 17971 17972@d advance_major_tail==begin major_tail:=link(major_tail); incr(r_count); 17973 end 17974 17975@<Create and append a discretionary node as an alternative...@>= 17976repeat r:=get_node(small_node_size); 17977link(r):=link(hold_head); type(r):=disc_node; 17978major_tail:=r; r_count:=0; 17979while link(major_tail)>null do advance_major_tail; 17980i:=hyphen_passed; hyf[i]:=0; 17981@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>; 17982@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|, appending to this 17983 list and to |major_tail| until synchronization has been achieved@>; 17984@<Move pointer |s| to the end of the current list, and set |replace_count(r)| 17985 appropriately@>; 17986hyphen_passed:=j-1; link(hold_head):=null; 17987until not odd(hyf[j-1]) 17988 17989@ The new hyphen might combine with the previous character via ligature 17990or kern. At this point we have |l-1<=i<j| and |i<hn|. 17991 17992@<Put the \(c)characters |hu[l..i]| and a hyphen into |pre_break(r)|@>= 17993minor_tail:=null; pre_break(r):=null; hyf_node:=new_character(hf,hyf_char); 17994if hyf_node<>null then 17995 begin incr(i); c:=hu[i]; hu[i]:=hyf_char; free_avail(hyf_node); 17996 end; 17997while l<=i do 17998 begin l:=reconstitute(l,i,font_bchar[hf],non_char)+1; 17999 if link(hold_head)>null then 18000 begin if minor_tail=null then pre_break(r):=link(hold_head) 18001 else link(minor_tail):=link(hold_head); 18002 minor_tail:=link(hold_head); 18003 while link(minor_tail)>null do minor_tail:=link(minor_tail); 18004 end; 18005 end; 18006if hyf_node<>null then 18007 begin hu[i]:=c; {restore the character in the hyphen position} 18008 l:=i; decr(i); 18009 end 18010 18011@ The synchronization algorithm begins with |l=i+1<=j|. 18012 18013@<Put the \(c)characters |hu[i+1..@,]| into |post_break(r)|...@>= 18014minor_tail:=null; post_break(r):=null; c_loc:=0; 18015if bchar_label[hf]<>non_address then {put left boundary at beginning of new line} 18016 begin decr(l); c:=hu[l]; c_loc:=l; hu[l]:=256; 18017 end; 18018while l<j do 18019 begin repeat l:=reconstitute(l,hn,bchar,non_char)+1; 18020 if c_loc>0 then 18021 begin hu[c_loc]:=c; c_loc:=0; 18022 end; 18023 if link(hold_head)>null then 18024 begin if minor_tail=null then post_break(r):=link(hold_head) 18025 else link(minor_tail):=link(hold_head); 18026 minor_tail:=link(hold_head); 18027 while link(minor_tail)>null do minor_tail:=link(minor_tail); 18028 end; 18029 until l>=j; 18030 while l>j do 18031 @<Append characters of |hu[j..@,]| to |major_tail|, advancing~|j|@>; 18032 end 18033 18034@ @<Append characters of |hu[j..@,]|...@>= 18035begin j:=reconstitute(j,hn,bchar,non_char)+1; 18036link(major_tail):=link(hold_head); 18037while link(major_tail)>null do advance_major_tail; 18038end 18039 18040@ Ligature insertion can cause a word to grow exponentially in size. Therefore 18041we must test the size of |r_count| here, even though the hyphenated text 18042was at most 63 characters long. 18043 18044@<Move pointer |s| to the end of the current list...@>= 18045if r_count>127 then {we have to forget the discretionary hyphen} 18046 begin link(s):=link(r); link(r):=null; flush_node_list(r); 18047 end 18048else begin link(s):=r; replace_count(r):=r_count; 18049 end; 18050s:=major_tail 18051 18052@* \[42] Hyphenation. 18053When a word |hc[1..hn]| has been set up to contain a candidate for hyphenation, 18054\TeX\ first looks to see if it is in the user's exception dictionary. If not, 18055hyphens are inserted based on patterns that appear within the given word, 18056using an algorithm due to Frank~M. Liang. 18057@^Liang, Franklin Mark@> 18058 18059Let's consider Liang's method first, since it is much more interesting than the 18060exception-lookup routine. The algorithm begins by setting |hyf[j]| to zero 18061for all |j|, and invalid characters are inserted into |hc[0]| 18062and |hc[hn+1]| to serve as delimiters. Then a reasonably fast method is 18063used to see which of a given set of patterns occurs in the word 18064|hc[0..(hn+1)]|. Each pattern $p_1\ldots p_k$ of length |k| has an associated 18065sequence of |k+1| numbers $n_0\ldots n_k$; and if the pattern occurs in 18066|hc[(j+1)..(j+k)]|, \TeX\ will set |hyf[j+i]:=@tmax@>(hyf[j+i],@t$n_i$@>)| for 18067|0<=i<=k|. After this has been done for each pattern that occurs, a 18068discretionary hyphen will be inserted between |hc[j]| and |hc[j+1]| when 18069|hyf[j]| is odd, as we have already seen. 18070 18071The set of patterns $p_1\ldots p_k$ and associated numbers $n_0\ldots n_k$ 18072depends, of course, on the language whose words are being hyphenated, and 18073on the degree of hyphenation that is desired. A method for finding 18074appropriate |p|'s and |n|'s, from a given dictionary of words and acceptable 18075hyphenations, is discussed in Liang's Ph.D. thesis (Stanford University, 180761983); \TeX\ simply starts with the patterns and works from there. 18077 18078@ The patterns are stored in a compact table that is also efficient for 18079retrieval, using a variant of ``trie memory'' [cf.\ {\sl The Art of 18080Computer Programming \bf3} (1973), 481--505]. We can find each pattern 18081$p_1\ldots p_k$ by letting $z_0$ be one greater than the relevant language 18082index and then, for |1<=i<=k|, 18083setting |@t$z_i$@>:=trie_link@t$(z_{i-1})+p_i$@>|; the pattern will be 18084identified by the number $z_k$. Since all the pattern information is 18085packed together into a single |trie_link| array, it is necessary to 18086prevent confusion between the data from inequivalent patterns, so another 18087table is provided such that |trie_char@t$(z_i)=p_i$@>| for all |i|. There 18088is also a table |trie_op|$(z_k)$ to identify the numbers $n_0\ldots n_k$ 18089associated with $p_1\ldots p_k$. 18090 18091Comparatively few different number sequences $n_0\ldots n_k$ actually occur, 18092since most of the |n|'s are generally zero. Therefore the number sequences 18093are encoded in such a way that |trie_op|$(z_k)$ is only one byte long. 18094If |trie_op(@t$z_k$@>)<>min_quarterword|, when $p_1\ldots p_k$ has matched 18095the letters in |hc[(l-k+1)..l@,]| of language |t|, 18096we perform all of the required operations 18097for this pattern by carrying out the following little program: Set 18098|v:=trie_op(@t$z_k$@>)|. Then set |v:=v+op_start[t]|, 18099|hyf[l-hyf_distance[v]]:=@tmax@>(hyf[l-hyf_distance[v]], hyf_num[v])|, 18100and |v:=hyf_next[v]|; repeat, if necessary, until |v=min_quarterword|. 18101 18102@<Types...@>= 18103@!trie_pointer=0..trie_size; {an index into |trie|} 18104 18105@ @d trie_link(#)==trie[#].rh {``downward'' link in a trie} 18106@d trie_char(#)==trie[#].b1 {character matched at this trie location} 18107@d trie_op(#)==trie[#].b0 {program for hyphenation at this trie location} 18108 18109@<Glob...@>= 18110@!trie:array[trie_pointer] of two_halves; {|trie_link|, |trie_char|, |trie_op|} 18111@!hyf_distance:array[1..trie_op_size] of small_number; {position |k-j| of $n_j$} 18112@!hyf_num:array[1..trie_op_size] of small_number; {value of $n_j$} 18113@!hyf_next:array[1..trie_op_size] of quarterword; {continuation code} 18114@!op_start:array[ASCII_code] of 0..trie_op_size; {offset for current language} 18115 18116@ @<Local variables for hyph...@>= 18117@!z:trie_pointer; {an index into |trie|} 18118@!v:integer; {an index into |hyf_distance|, etc.} 18119 18120@ Assuming that these auxiliary tables have been set up properly, the 18121hyphenation algorithm is quite short. In the following code we set |hc[hn+2]| 18122to the impossible value 256, in order to guarantee that |hc[hn+3]| will 18123never be fetched. 18124 18125@<Find hyphen locations for the word in |hc|...@>= 18126for j:=0 to hn do hyf[j]:=0; 18127@<Look for the word |hc[1..hn]| in the exception table, and |goto found| (with 18128 |hyf| containing the hyphens) if an entry is found@>; 18129if trie_char(cur_lang+1)<>qi(cur_lang) then return; {no patterns for |cur_lang|} 18130hc[0]:=0; hc[hn+1]:=0; hc[hn+2]:=256; {insert delimiters} 18131for j:=0 to hn-r_hyf+1 do 18132 begin z:=trie_link(cur_lang+1)+hc[j]; l:=j; 18133 while hc[l]=qo(trie_char(z)) do 18134 begin if trie_op(z)<>min_quarterword then 18135 @<Store \(m)maximum values in the |hyf| table@>; 18136 incr(l); z:=trie_link(z)+hc[l]; 18137 end; 18138 end; 18139found: for j:=0 to l_hyf-1 do hyf[j]:=0; 18140for j:=0 to r_hyf-1 do hyf[hn-j]:=0 18141 18142@ @<Store \(m)maximum values in the |hyf| table@>= 18143begin v:=trie_op(z); 18144repeat v:=v+op_start[cur_lang]; i:=l-hyf_distance[v]; 18145if hyf_num[v]>hyf[i] then hyf[i]:=hyf_num[v]; 18146v:=hyf_next[v]; 18147until v=min_quarterword; 18148end 18149 18150@ The exception table that is built by \TeX's \.{\\hyphenation} primitive is 18151organized as an ordered hash table [cf.\ Amble and Knuth, {\sl The Computer 18152@^Amble, Ole@> @^Knuth, Donald Ervin@> 18153Journal\/ \bf17} (1974), 135--142] using linear probing. If $\alpha$ and 18154$\beta$ are words, we will say that $\alpha<\beta$ if $\vert\alpha\vert< 18155\vert\beta\vert$ or if $\vert\alpha\vert=\vert\beta\vert$ and 18156$\alpha$ is lexicographically smaller than $\beta$. (The notation $\vert 18157\alpha\vert$ stands for the length of $\alpha$.) The idea of ordered hashing 18158is to arrange the table so that a given word $\alpha$ can be sought by computing 18159a hash address $h=h(\alpha)$ and then looking in table positions |h|, |h-1|, 18160\dots, until encountering the first word $\L\alpha$. If this word is 18161different from $\alpha$, we can conclude that $\alpha$ is not in the table. 18162 18163The words in the table point to lists in |mem| that specify hyphen positions 18164in their |info| fields. The list for $c_1\ldots c_n$ contains the number |k| if 18165the word $c_1\ldots c_n$ has a discretionary hyphen between $c_k$ and 18166$c_{k+1}$. 18167 18168@<Types...@>= 18169@!hyph_pointer=0..hyph_size; {an index into the ordered hash table} 18170 18171@ @<Glob...@>= 18172@!hyph_word:array[hyph_pointer] of str_number; {exception words} 18173@!hyph_list:array[hyph_pointer] of pointer; {lists of hyphen positions} 18174@!hyph_count:hyph_pointer; {the number of words in the exception dictionary} 18175 18176@ @<Local variables for init...@>= 18177@!z:hyph_pointer; {runs through the exception dictionary} 18178 18179@ @<Set init...@>= 18180for z:=0 to hyph_size do 18181 begin hyph_word[z]:=0; hyph_list[z]:=null; 18182 end; 18183hyph_count:=0; 18184 18185@ The algorithm for exception lookup is quite simple, as soon as we have 18186a few more local variables to work with. 18187 18188@<Local variables for hyph...@>= 18189@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|} 18190@!k:str_number; {an index into |str_start|} 18191@!u:pool_pointer; {an index into |str_pool|} 18192 18193@ First we compute the hash code |h|, then we search until we either 18194find the word or we don't. Words from different languages are kept 18195separate by appending the language code to the string. 18196 18197@<Look for the word |hc[1...@>= 18198h:=hc[1]; incr(hn); hc[hn]:=cur_lang; 18199for j:=2 to hn do h:=(h+h+hc[j]) mod hyph_size; 18200loop@+ begin @<If the string |hyph_word[h]| is less than \(hc)|hc[1..hn]|, 18201 |goto not_found|; but if the two strings are equal, 18202 set |hyf| to the hyphen positions and |goto found|@>; 18203 if h>0 then decr(h)@+else h:=hyph_size; 18204 end; 18205not_found: decr(hn) 18206 18207@ @<If the string |hyph_word[h]| is less than \(hc)...@>= 18208k:=hyph_word[h]; if k=0 then goto not_found; 18209if length(k)<hn then goto not_found; 18210if length(k)=hn then 18211 begin j:=1; u:=str_start[k]; 18212 repeat if so(str_pool[u])<hc[j] then goto not_found; 18213 if so(str_pool[u])>hc[j] then goto done; 18214 incr(j); incr(u); 18215 until j>hn; 18216 @<Insert hyphens as specified in |hyph_list[h]|@>; 18217 decr(hn); goto found; 18218 end; 18219done: 18220 18221@ @<Insert hyphens as specified...@>= 18222s:=hyph_list[h]; 18223while s<>null do 18224 begin hyf[info(s)]:=1; s:=link(s); 18225 end 18226 18227@ @<Search |hyph_list| for pointers to |p|@>= 18228for q:=0 to hyph_size do 18229 begin if hyph_list[q]=p then 18230 begin print_nl("HYPH("); print_int(q); print_char(")"); 18231 end; 18232 end 18233 18234@ We have now completed the hyphenation routine, so the |line_break| procedure 18235is finished at last. Since the hyphenation exception table is fresh in our 18236minds, it's a good time to deal with the routine that adds new entries to it. 18237 18238When \TeX\ has scanned `\.{\\hyphenation}', it calls on a procedure named 18239|new_hyph_exceptions| to do the right thing. 18240 18241@d set_cur_lang==if language<=0 then cur_lang:=0 18242 else if language>255 then cur_lang:=0 18243 else cur_lang:=language 18244 18245@p procedure new_hyph_exceptions; {enters new exceptions} 18246label reswitch, exit, found, not_found; 18247var n:0..64; {length of current word; not always a |small_number|} 18248@!j:0..64; {an index into |hc|} 18249@!h:hyph_pointer; {an index into |hyph_word| and |hyph_list|} 18250@!k:str_number; {an index into |str_start|} 18251@!p:pointer; {head of a list of hyphen positions} 18252@!q:pointer; {used when creating a new node for list |p|} 18253@!s,@!t:str_number; {strings being compared or stored} 18254@!u,@!v:pool_pointer; {indices into |str_pool|} 18255begin scan_left_brace; {a left brace must follow \.{\\hyphenation}} 18256set_cur_lang; 18257@<Enter as many hyphenation exceptions as are listed, 18258until coming to a right brace; then |return|@>; 18259exit:end; 18260 18261@ @<Enter as many...@>= 18262n:=0; p:=null; 18263loop@+ begin get_x_token; 18264 reswitch: case cur_cmd of 18265 letter,other_char,char_given:@<Append a new letter or hyphen@>; 18266 char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given; 18267 goto reswitch; 18268 end; 18269 spacer,right_brace: begin if n>1 then @<Enter a hyphenation exception@>; 18270 if cur_cmd=right_brace then return; 18271 n:=0; p:=null; 18272 end; 18273 othercases @<Give improper \.{\\hyphenation} error@> 18274 endcases; 18275 end 18276 18277@ @<Give improper \.{\\hyph...@>= 18278begin print_err("Improper "); print_esc("hyphenation"); 18279@.Improper \\hyphenation...@> 18280 print(" will be flushed"); 18281help2("Hyphenation exceptions must contain only letters")@/ 18282 ("and hyphens. But continue; I'll forgive and forget."); 18283error; 18284end 18285 18286@ @<Append a new letter or hyphen@>= 18287if cur_chr="-" then @<Append the value |n| to list |p|@> 18288else begin if lc_code(cur_chr)=0 then 18289 begin print_err("Not a letter"); 18290@.Not a letter@> 18291 help2("Letters in \hyphenation words must have \lccode>0.")@/ 18292 ("Proceed; I'll ignore the character I just read."); 18293 error; 18294 end 18295 else if n<63 then 18296 begin incr(n); hc[n]:=lc_code(cur_chr); 18297 end; 18298 end 18299 18300@ @<Append the value |n| to list |p|@>= 18301begin if n<63 then 18302 begin q:=get_avail; link(q):=p; info(q):=n; p:=q; 18303 end; 18304end 18305 18306@ @<Enter a hyphenation exception@>= 18307begin incr(n); hc[n]:=cur_lang; str_room(n); h:=0; 18308for j:=1 to n do 18309 begin h:=(h+h+hc[j]) mod hyph_size; 18310 append_char(hc[j]); 18311 end; 18312s:=make_string; 18313@<Insert the \(p)pair |(s,p)| into the exception table@>; 18314end 18315 18316@ @<Insert the \(p)pair |(s,p)|...@>= 18317if hyph_count=hyph_size then overflow("exception dictionary",hyph_size); 18318@:TeX capacity exceeded exception dictionary}{\quad exception dictionary@> 18319incr(hyph_count); 18320while hyph_word[h]<>0 do 18321 begin @<If the string |hyph_word[h]| is less than \(or)or equal to 18322 |s|, interchange |(hyph_word[h],hyph_list[h])| with |(s,p)|@>; 18323 if h>0 then decr(h)@+else h:=hyph_size; 18324 end; 18325hyph_word[h]:=s; hyph_list[h]:=p 18326 18327@ @<If the string |hyph_word[h]| is less than \(or)...@>= 18328k:=hyph_word[h]; 18329if length(k)<length(s) then goto found; 18330if length(k)>length(s) then goto not_found; 18331u:=str_start[k]; v:=str_start[s]; 18332repeat if str_pool[u]<str_pool[v] then goto found; 18333if str_pool[u]>str_pool[v] then goto not_found; 18334incr(u); incr(v); 18335until u=str_start[k+1]; 18336found:q:=hyph_list[h]; hyph_list[h]:=p; p:=q;@/ 18337t:=hyph_word[h]; hyph_word[h]:=s; s:=t; 18338not_found: 18339 18340@* \[43] Initializing the hyphenation tables. 18341The trie for \TeX's hyphenation algorithm is built from a sequence of 18342patterns following a \.{\\patterns} specification. Such a specification 18343is allowed only in \.{INITEX}, since the extra memory for auxiliary tables 18344and for the initialization program itself would only clutter up the 18345production version of \TeX\ with a lot of deadwood. 18346 18347The first step is to build a trie that is linked, instead of packed 18348into sequential storage, so that insertions are readily made. 18349After all patterns have been processed, \.{INITEX} 18350compresses the linked trie by identifying common subtries. Finally the 18351trie is packed into the efficient sequential form that the hyphenation 18352algorithm actually uses. 18353 18354@<Declare subprocedures for |line_break|@>= 18355@!init @<Declare procedures for preprocessing hyphenation patterns@>@; 18356tini 18357 18358@ Before we discuss trie building in detail, let's consider the simpler 18359problem of creating the |hyf_distance|, |hyf_num|, and |hyf_next| arrays. 18360 18361Suppose, for example, that \TeX\ reads the pattern `\.{ab2cde1}'. This is 18362a pattern of length 5, with $n_0\ldots n_5=0\,0\,2\,0\,0\,1$ in the 18363notation above. We want the corresponding |trie_op| code |v| to have 18364|hyf_distance[v]=3|, |hyf_num[v]=2|, and |hyf_next[v]=@t$v^\prime$@>|, 18365where the auxiliary |trie_op| code $v^\prime$ has 18366|hyf_distance[@t$v^\prime$@>]=0|, |hyf_num[@t$v^\prime$@>]=1|, and 18367|hyf_next[@t$v^\prime$@>]=min_quarterword|. 18368 18369\TeX\ computes an appropriate value |v| with the |new_trie_op| subroutine 18370below, by setting 18371$$\hbox{|@t$v^\prime$@>:=new_trie_op(0,1,min_quarterword)|,\qquad 18372|v:=new_trie_op(3,2,@t$v^\prime$@>)|.}$$ 18373This subroutine looks up its three 18374parameters in a special hash table, assigning a new value only if these 18375three have not appeared before for the current language. 18376 18377The hash table is called |trie_op_hash|, and the number of entries it contains 18378is |trie_op_ptr|. 18379 18380@<Glob...@>= 18381@!init@! trie_op_hash:array[-trie_op_size..trie_op_size] of 0..trie_op_size; 18382 {trie op codes for quadruples} 18383@!trie_used:array[ASCII_code] of quarterword; 18384 {largest opcode used so far for this language} 18385@!trie_op_lang:array[1..trie_op_size] of ASCII_code; 18386 {language part of a hashed quadruple} 18387@!trie_op_val:array[1..trie_op_size] of quarterword; 18388 {opcode corresponding to a hashed quadruple} 18389@!trie_op_ptr:0..trie_op_size; {number of stored ops so far} 18390tini 18391 18392@ It's tempting to remove the |overflow| stops in the following procedure; 18393|new_trie_op| could return |min_quarterword| (thereby simply ignoring 18394part of a hyphenation pattern) instead of aborting the job. However, that would 18395lead to different hyphenation results on different installations of \TeX\ 18396using the same patterns. The |overflow| stops are necessary for portability 18397of patterns. 18398 18399@<Declare procedures for preprocessing hyph...@>= 18400function new_trie_op(@!d,@!n:small_number;@!v:quarterword):quarterword; 18401label exit; 18402var h:-trie_op_size..trie_op_size; {trial hash location} 18403@!u:quarterword; {trial op code} 18404@!l:0..trie_op_size; {pointer to stored data} 18405begin h:=abs(n+313*d+361*v+1009*cur_lang) mod (trie_op_size+trie_op_size) 18406 - trie_op_size; 18407loop@+ begin l:=trie_op_hash[h]; 18408 if l=0 then {empty position found for a new op} 18409 begin if trie_op_ptr=trie_op_size then 18410 overflow("pattern memory ops",trie_op_size); 18411 u:=trie_used[cur_lang]; 18412 if u=max_quarterword then 18413 overflow("pattern memory ops per language", 18414 max_quarterword-min_quarterword); 18415 incr(trie_op_ptr); incr(u); trie_used[cur_lang]:=u; 18416 hyf_distance[trie_op_ptr]:=d; 18417 hyf_num[trie_op_ptr]:=n; hyf_next[trie_op_ptr]:=v; 18418 trie_op_lang[trie_op_ptr]:=cur_lang; trie_op_hash[h]:=trie_op_ptr; 18419 trie_op_val[trie_op_ptr]:=u; new_trie_op:=u; return; 18420 end; 18421 if (hyf_distance[l]=d)and(hyf_num[l]=n)and(hyf_next[l]=v) 18422 and(trie_op_lang[l]=cur_lang) then 18423 begin new_trie_op:=trie_op_val[l]; return; 18424 end; 18425 if h>-trie_op_size then decr(h)@+else h:=trie_op_size; 18426 end; 18427exit:end; 18428 18429@ After |new_trie_op| has compressed the necessary opcode information, 18430plenty of information is available to unscramble the data into the 18431final form needed by our hyphenation algorithm. 18432 18433@<Sort \(t)the hyphenation op tables into proper order@>= 18434op_start[0]:=-min_quarterword; 18435for j:=1 to 255 do op_start[j]:=op_start[j-1]+qo(trie_used[j-1]); 18436for j:=1 to trie_op_ptr do 18437 trie_op_hash[j]:=op_start[trie_op_lang[j]]+trie_op_val[j]; {destination} 18438for j:=1 to trie_op_ptr do while trie_op_hash[j]>j do 18439 begin k:=trie_op_hash[j];@/ 18440 t:=hyf_distance[k]; hyf_distance[k]:=hyf_distance[j]; hyf_distance[j]:=t;@/ 18441 t:=hyf_num[k]; hyf_num[k]:=hyf_num[j]; hyf_num[j]:=t;@/ 18442 t:=hyf_next[k]; hyf_next[k]:=hyf_next[j]; hyf_next[j]:=t;@/ 18443 trie_op_hash[j]:=trie_op_hash[k]; trie_op_hash[k]:=k; 18444 end 18445 18446@ Before we forget how to initialize the data structures that have been 18447mentioned so far, let's write down the code that gets them started. 18448 18449@<Initialize table entries...@>= 18450for k:=-trie_op_size to trie_op_size do trie_op_hash[k]:=0; 18451for k:=0 to 255 do trie_used[k]:=min_quarterword; 18452trie_op_ptr:=0; 18453 18454@ The linked trie that is used to preprocess hyphenation patterns appears 18455in several global arrays. Each node represents an instruction of the form 18456``if you see character |c|, then perform operation |o|, move to the 18457next character, and go to node |l|; otherwise go to node |r|.'' 18458The four quantities |c|, |o|, |l|, and |r| are stored in four arrays 18459|trie_c|, |trie_o|, |trie_l|, and |trie_r|. The root of the trie 18460is |trie_l[0]|, and the number of nodes is |trie_ptr|. Null trie 18461pointers are represented by zero. To initialize the trie, we simply 18462set |trie_l[0]| and |trie_ptr| to zero. We also set |trie_c[0]| to some 18463arbitrary value, since the algorithm may access it. 18464 18465The algorithms maintain the condition 18466$$\hbox{|trie_c[trie_r[z]]>trie_c[z]|\qquad 18467whenever |z<>0| and |trie_r[z]<>0|};$$ in other words, sibling nodes are 18468ordered by their |c| fields. 18469 18470@d trie_root==trie_l[0] {root of the linked trie} 18471 18472@<Glob...@>= 18473@!init @!trie_c:packed array[trie_pointer] of packed_ASCII_code; 18474 {characters to match} 18475@t\hskip10pt@>@!trie_o:packed array[trie_pointer] of quarterword; 18476 {operations to perform} 18477@t\hskip10pt@>@!trie_l:packed array[trie_pointer] of trie_pointer; 18478 {left subtrie links} 18479@t\hskip10pt@>@!trie_r:packed array[trie_pointer] of trie_pointer; 18480 {right subtrie links} 18481@t\hskip10pt@>@!trie_ptr:trie_pointer; {the number of nodes in the trie} 18482@t\hskip10pt@>@!trie_hash:packed array[trie_pointer] of trie_pointer; 18483 {used to identify equivalent subtries} 18484tini 18485 18486@ Let us suppose that a linked trie has already been constructed. 18487Experience shows that we can often reduce its size by recognizing common 18488subtries; therefore another hash table is introduced for this purpose, 18489somewhat similar to |trie_op_hash|. The new hash table will be 18490initialized to zero. 18491 18492The function |trie_node(p)| returns |p| if |p| is distinct from other nodes 18493that it has seen, otherwise it returns the number of the first equivalent 18494node that it has seen. 18495 18496Notice that we might make subtries equivalent even if they correspond to 18497patterns for different languages, in which the trie ops might mean quite 18498different things. That's perfectly all right. 18499 18500@<Declare procedures for preprocessing hyph...@>= 18501function trie_node(@!p:trie_pointer):trie_pointer; {converts 18502 to a canonical form} 18503label exit; 18504var h:trie_pointer; {trial hash location} 18505@!q:trie_pointer; {trial trie node} 18506begin h:=abs(trie_c[p]+1009*trie_o[p]+@| 18507 2718*trie_l[p]+3142*trie_r[p]) mod trie_size; 18508loop@+ begin q:=trie_hash[h]; 18509 if q=0 then 18510 begin trie_hash[h]:=p; trie_node:=p; return; 18511 end; 18512 if (trie_c[q]=trie_c[p])and(trie_o[q]=trie_o[p])and@| 18513 (trie_l[q]=trie_l[p])and(trie_r[q]=trie_r[p]) then 18514 begin trie_node:=q; return; 18515 end; 18516 if h>0 then decr(h)@+else h:=trie_size; 18517 end; 18518exit:end; 18519 18520@ A neat recursive procedure is now able to compress a trie by 18521traversing it and applying |trie_node| to its nodes in ``bottom up'' 18522fashion. We will compress the entire trie by clearing |trie_hash| to 18523zero and then saying `|trie_root:=compress_trie(trie_root)|'. 18524@^recursion@> 18525 18526@<Declare procedures for preprocessing hyph...@>= 18527function compress_trie(@!p:trie_pointer):trie_pointer; 18528begin if p=0 then compress_trie:=0 18529else begin trie_l[p]:=compress_trie(trie_l[p]); 18530 trie_r[p]:=compress_trie(trie_r[p]); 18531 compress_trie:=trie_node(p); 18532 end; 18533end; 18534 18535@ The compressed trie will be packed into the |trie| array using a 18536``top-down first-fit'' procedure. This is a little tricky, so the reader 18537should pay close attention: The |trie_hash| array is cleared to zero 18538again and renamed |trie_ref| for this phase of the operation; later on, 18539|trie_ref[p]| will be nonzero only if the linked trie node |p| is the 18540smallest character 18541in a family and if the characters |c| of that family have been allocated to 18542locations |trie_ref[p]+c| in the |trie| array. Locations of |trie| that 18543are in use will have |trie_link=0|, while the unused holes in |trie| 18544will be doubly linked with |trie_link| pointing to the next larger vacant 18545location and |trie_back| pointing to the next smaller one. This double 18546linking will have been carried out only as far as |trie_max|, where 18547|trie_max| is the largest index of |trie| that will be needed. 18548To save time at the low end of the trie, we maintain array entries 18549|trie_min[c]| pointing to the smallest hole that is greater than~|c|. 18550Another array |trie_taken| tells whether or not a given location is 18551equal to |trie_ref[p]| for some |p|; this array is used to ensure that 18552distinct nodes in the compressed trie will have distinct |trie_ref| 18553entries. 18554 18555@d trie_ref==trie_hash {where linked trie families go into |trie|} 18556@d trie_back(#)==trie[#].lh {backward links in |trie| holes} 18557 18558@<Glob...@>= 18559@!init@!trie_taken:packed array[1..trie_size] of boolean; 18560 {does a family start here?} 18561@t\hskip10pt@>@!trie_min:array[ASCII_code] of trie_pointer; 18562 {the first possible slot for each character} 18563@t\hskip10pt@>@!trie_max:trie_pointer; {largest location used in |trie|} 18564@t\hskip10pt@>@!trie_not_ready:boolean; {is the trie still in linked form?} 18565tini 18566 18567@ Each time \.{\\patterns} appears, it contributes further patterns to 18568the future trie, which will be built only when hyphenation is attempted or 18569when a format file is dumped. The boolean variable |trie_not_ready| 18570will change to |false| when the trie is compressed; this will disable 18571further patterns. 18572 18573@<Initialize table entries...@>= 18574trie_not_ready:=true; trie_root:=0; trie_c[0]:=si(0); trie_ptr:=0; 18575 18576@ Here is how the trie-compression data structures are initialized. 18577If storage is tight, it would be possible to overlap |trie_op_hash|, 18578|trie_op_lang|, and |trie_op_val| with |trie|, |trie_hash|, and |trie_taken|, 18579because we finish with the former just before we need the latter. 18580 18581@<Get ready to compress the trie@>= 18582@<Sort \(t)the hyphenation...@>; 18583for p:=0 to trie_size do trie_hash[p]:=0; 18584trie_root:=compress_trie(trie_root); {identify equivalent subtries} 18585for p:=0 to trie_ptr do trie_ref[p]:=0; 18586for p:=0 to 255 do trie_min[p]:=p+1; 18587trie_link(0):=1; trie_max:=0 18588 18589@ The |first_fit| procedure finds the smallest hole |z| in |trie| such that 18590a trie family starting at a given node |p| will fit into vacant positions 18591starting at |z|. If |c=trie_c[p]|, this means that location |z-c| must 18592not already be taken by some other family, and that |z-c+@t$c^\prime$@>| 18593must be vacant for all characters $c^\prime$ in the family. The procedure 18594sets |trie_ref[p]| to |z-c| when the first fit has been found. 18595 18596@<Declare procedures for preprocessing hyph...@>= 18597procedure first_fit(@!p:trie_pointer); {packs a family into |trie|} 18598label not_found,found; 18599var h:trie_pointer; {candidate for |trie_ref[p]|} 18600@!z:trie_pointer; {runs through holes} 18601@!q:trie_pointer; {runs through the family starting at |p|} 18602@!c:ASCII_code; {smallest character in the family} 18603@!l,@!r:trie_pointer; {left and right neighbors} 18604@!ll:1..256; {upper limit of |trie_min| updating} 18605begin c:=so(trie_c[p]); 18606z:=trie_min[c]; {get the first conceivably good hole} 18607loop@+ begin h:=z-c;@/ 18608 @<Ensure that |trie_max>=h+256|@>; 18609 if trie_taken[h] then goto not_found; 18610 @<If all characters of the family fit relative to |h|, then 18611 |goto found|,\30\ otherwise |goto not_found|@>; 18612 not_found: z:=trie_link(z); {move to the next hole} 18613 end; 18614found: @<Pack the family into |trie| relative to |h|@>; 18615end; 18616 18617@ By making sure that |trie_max| is at least |h+256|, we can be sure that 18618|trie_max>z|, since |h=z-c|. It follows that location |trie_max| will 18619never be occupied in |trie|, and we will have |trie_max>=trie_link(z)|. 18620 18621@<Ensure that |trie_max>=h+256|@>= 18622if trie_max<h+256 then 18623 begin if trie_size<=h+256 then overflow("pattern memory",trie_size); 18624@:TeX capacity exceeded pattern memory}{\quad pattern memory@> 18625 repeat incr(trie_max); trie_taken[trie_max]:=false; 18626 trie_link(trie_max):=trie_max+1; trie_back(trie_max):=trie_max-1; 18627 until trie_max=h+256; 18628 end 18629 18630@ @<If all characters of the family fit relative to |h|...@>= 18631q:=trie_r[p]; 18632while q>0 do 18633 begin if trie_link(h+so(trie_c[q]))=0 then goto not_found; 18634 q:=trie_r[q]; 18635 end; 18636goto found 18637 18638@ @<Pack the family into |trie| relative to |h|@>= 18639trie_taken[h]:=true; trie_ref[p]:=h; q:=p; 18640repeat z:=h+so(trie_c[q]); l:=trie_back(z); r:=trie_link(z); 18641trie_back(r):=l; trie_link(l):=r; trie_link(z):=0; 18642if l<256 then 18643 begin if z<256 then ll:=z @+else ll:=256; 18644 repeat trie_min[l]:=r; incr(l); 18645 until l=ll; 18646 end; 18647q:=trie_r[q]; 18648until q=0 18649 18650@ To pack the entire linked trie, we use the following recursive procedure. 18651@^recursion@> 18652 18653@<Declare procedures for preprocessing hyph...@>= 18654procedure trie_pack(@!p:trie_pointer); {pack subtries of a family} 18655var q:trie_pointer; {a local variable that need not be saved on recursive calls} 18656begin repeat q:=trie_l[p]; 18657if (q>0)and(trie_ref[q]=0) then 18658 begin first_fit(q); trie_pack(q); 18659 end; 18660p:=trie_r[p]; 18661until p=0; 18662end; 18663 18664@ When the whole trie has been allocated into the sequential table, we 18665must go through it once again so that |trie| contains the correct 18666information. Null pointers in the linked trie will be represented by the 18667value~0, which properly implements an ``empty'' family. 18668 18669@<Move the data into |trie|@>= 18670h.rh:=0; h.b0:=min_quarterword; h.b1:=min_quarterword; {|trie_link:=0|, 18671 |trie_op:=min_quarterword|, |trie_char:=qi(0)|} 18672if trie_root=0 then {no patterns were given} 18673 begin for r:=0 to 256 do trie[r]:=h; 18674 trie_max:=256; 18675 end 18676else begin trie_fix(trie_root); {this fixes the non-holes in |trie|} 18677 r:=0; {now we will zero out all the holes} 18678 repeat s:=trie_link(r); trie[r]:=h; r:=s; 18679 until r>trie_max; 18680 end; 18681trie_char(0):=qi("?"); {make |trie_char(c)<>c| for all |c|} 18682 18683@ The fixing-up procedure is, of course, recursive. Since the linked trie 18684usually has overlapping subtries, the same data may be moved several 18685times; but that causes no harm, and at most as much work is done as it 18686took to build the uncompressed trie. 18687@^recursion@> 18688 18689@<Declare procedures for preprocessing hyph...@>= 18690procedure trie_fix(@!p:trie_pointer); {moves |p| and its siblings into |trie|} 18691var q:trie_pointer; {a local variable that need not be saved on recursive calls} 18692@!c:ASCII_code; {another one that need not be saved} 18693@!z:trie_pointer; {|trie| reference; this local variable must be saved} 18694begin z:=trie_ref[p]; 18695repeat q:=trie_l[p]; c:=so(trie_c[p]); 18696trie_link(z+c):=trie_ref[q]; trie_char(z+c):=qi(c); trie_op(z+c):=trie_o[p]; 18697if q>0 then trie_fix(q); 18698p:=trie_r[p]; 18699until p=0; 18700end; 18701 18702@ Now let's go back to the easier problem, of building the linked 18703trie. When \.{INITEX} has scanned the `\.{\\patterns}' control 18704sequence, it calls on |new_patterns| to do the right thing. 18705 18706@<Declare procedures for preprocessing hyph...@>= 18707procedure new_patterns; {initializes the hyphenation pattern data} 18708label done, done1; 18709var k,@!l:0..64; {indices into |hc| and |hyf|; 18710 not always in |small_number| range} 18711@!digit_sensed:boolean; {should the next digit be treated as a letter?} 18712@!v:quarterword; {trie op code} 18713@!p,@!q:trie_pointer; {nodes of trie traversed during insertion} 18714@!first_child:boolean; {is |p=trie_l[q]|?} 18715@!c:ASCII_code; {character being inserted} 18716begin if trie_not_ready then 18717 begin set_cur_lang; scan_left_brace; {a left brace must follow \.{\\patterns}} 18718 @<Enter all of the patterns into a linked trie, until coming to a right 18719 brace@>; 18720 end 18721else begin print_err("Too late for "); print_esc("patterns"); 18722 help1("All patterns must be given before typesetting begins."); 18723 error; link(garbage):=scan_toks(false,false); flush_list(def_ref); 18724 end; 18725end; 18726 18727@ Novices are not supposed to be using \.{\\patterns}, so the error 18728messages are terse. (Note that all error messages appear in \TeX's string 18729pool, even if they are used only by \.{INITEX}.) 18730 18731@<Enter all of the patterns into a linked trie...@>= 18732k:=0; hyf[0]:=0; digit_sensed:=false; 18733loop@+ begin get_x_token; 18734 case cur_cmd of 18735 letter,other_char:@<Append a new letter or a hyphen level@>; 18736 spacer,right_brace: begin if k>0 then 18737 @<Insert a new pattern into the linked trie@>; 18738 if cur_cmd=right_brace then goto done; 18739 k:=0; hyf[0]:=0; digit_sensed:=false; 18740 end; 18741 othercases begin print_err("Bad "); print_esc("patterns"); 18742@.Bad \\patterns@> 18743 help1("(See Appendix H.)"); error; 18744 end 18745 endcases; 18746 end; 18747done: 18748 18749@ @<Append a new letter or a hyphen level@>= 18750if digit_sensed or(cur_chr<"0")or(cur_chr>"9") then 18751 begin if cur_chr="." then cur_chr:=0 {edge-of-word delimiter} 18752 else begin cur_chr:=lc_code(cur_chr); 18753 if cur_chr=0 then 18754 begin print_err("Nonletter"); 18755@.Nonletter@> 18756 help1("(See Appendix H.)"); error; 18757 end; 18758 end; 18759 if k<63 then 18760 begin incr(k); hc[k]:=cur_chr; hyf[k]:=0; digit_sensed:=false; 18761 end; 18762 end 18763else if k<63 then 18764 begin hyf[k]:=cur_chr-"0"; digit_sensed:=true; 18765 end 18766 18767@ When the following code comes into play, the pattern $p_1\ldots p_k$ 18768appears in |hc[1..k]|, and the corresponding sequence of numbers $n_0\ldots 18769n_k$ appears in |hyf[0..k]|. 18770 18771@<Insert a new pattern into the linked trie@>= 18772begin @<Compute the trie op code, |v|, and set |l:=0|@>; 18773q:=0; hc[0]:=cur_lang; 18774while l<=k do 18775 begin c:=hc[l]; incr(l); p:=trie_l[q]; first_child:=true; 18776 while (p>0)and(c>so(trie_c[p])) do 18777 begin q:=p; p:=trie_r[q]; first_child:=false; 18778 end; 18779 if (p=0)or(c<so(trie_c[p])) then 18780 @<Insert a new trie node between |q| and |p|, and 18781 make |p| point to it@>; 18782 q:=p; {now node |q| represents $p_1\ldots p_{l-1}$} 18783 end; 18784if trie_o[q]<>min_quarterword then 18785 begin print_err("Duplicate pattern"); 18786@.Duplicate pattern@> 18787 help1("(See Appendix H.)"); error; 18788 end; 18789trie_o[q]:=v; 18790end 18791 18792@ @<Insert a new trie node between |q| and |p|...@>= 18793begin if trie_ptr=trie_size then overflow("pattern memory",trie_size); 18794@:TeX capacity exceeded pattern memory}{\quad pattern memory@> 18795incr(trie_ptr); trie_r[trie_ptr]:=p; p:=trie_ptr; trie_l[p]:=0; 18796if first_child then trie_l[q]:=p@+else trie_r[q]:=p; 18797trie_c[p]:=si(c); trie_o[p]:=min_quarterword; 18798end 18799 18800@ @<Compute the trie op code, |v|...@>= 18801if hc[1]=0 then hyf[0]:=0; 18802if hc[k]=0 then hyf[k]:=0; 18803l:=k; v:=min_quarterword; 18804loop@+ begin if hyf[l]<>0 then v:=new_trie_op(k-l,hyf[l],v); 18805 if l>0 then decr(l)@+else goto done1; 18806 end; 18807done1: 18808 18809@ Finally we put everything together: Here is how the trie gets to its 18810final, efficient form. 18811The following packing routine is rigged so that the root of the linked 18812tree gets mapped into location 1 of |trie|, as required by the hyphenation 18813algorithm. This happens because the first call of |first_fit| will 18814``take'' location~1. 18815 18816@<Declare procedures for preprocessing hyphenation patterns@>= 18817procedure init_trie; 18818var @!p:trie_pointer; {pointer for initialization} 18819@!j,@!k,@!t:integer; {all-purpose registers for initialization} 18820@!r,@!s:trie_pointer; {used to clean up the packed |trie|} 18821@!h:two_halves; {template used to zero out |trie|'s holes} 18822begin @<Get ready to compress the trie@>; 18823if trie_root<>0 then 18824 begin first_fit(trie_root); trie_pack(trie_root); 18825 end; 18826@<Move the data into |trie|@>; 18827trie_not_ready:=false; 18828end; 18829 18830@* \[44] Breaking vertical lists into pages. 18831The |vsplit| procedure, which implements \TeX's \.{\\vsplit} operation, 18832is considerably simpler than |line_break| because it doesn't have to 18833worry about hyphenation, and because its mission is to discover a single 18834break instead of an optimum sequence of breakpoints. But before we get 18835into the details of |vsplit|, we need to consider a few more basic things. 18836 18837@ A subroutine called |prune_page_top| takes a pointer to a vlist and 18838returns a pointer to a modified vlist in which all glue, kern, and penalty nodes 18839have been deleted before the first box or rule node. However, the first 18840box or rule is actually preceded by a newly created glue node designed so that 18841the topmost baseline will be at distance |split_top_skip| from the top, 18842whenever this is possible without backspacing. 18843 18844In this routine and those that follow, we make use of the fact that a 18845vertical list contains no character nodes, hence the |type| field exists 18846for each node in the list. 18847@^data structure assumptions@> 18848 18849@p function prune_page_top(@!p:pointer):pointer; {adjust top after page break} 18850var prev_p:pointer; {lags one step behind |p|} 18851@!q:pointer; {temporary variable for list manipulation} 18852begin prev_p:=temp_head; link(temp_head):=p; 18853while p<>null do 18854 case type(p) of 18855 hlist_node,vlist_node,rule_node:@<Insert glue for |split_top_skip| 18856 and set~|p:=null|@>; 18857 whatsit_node,mark_node,ins_node: begin prev_p:=p; p:=link(prev_p); 18858 end; 18859 glue_node,kern_node,penalty_node: begin q:=p; p:=link(q); link(q):=null; 18860 link(prev_p):=p; flush_node_list(q); 18861 end; 18862 othercases confusion("pruning") 18863@:this can't happen pruning}{\quad pruning@> 18864 endcases; 18865prune_page_top:=link(temp_head); 18866end; 18867 18868@ @<Insert glue for |split_top_skip|...@>= 18869begin q:=new_skip_param(split_top_skip_code); link(prev_p):=q; link(q):=p; 18870 {now |temp_ptr=glue_ptr(q)|} 18871if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p) 18872else width(temp_ptr):=0; 18873p:=null; 18874end 18875 18876@ The next subroutine finds the best place to break a given vertical list 18877so as to obtain a box of height~|h|, with maximum depth~|d|. 18878A pointer to the beginning of the vertical list is given, 18879and a pointer to the optimum breakpoint is returned. The list is effectively 18880followed by a forced break, i.e., a penalty node with the |eject_penalty|; 18881if the best break occurs at this artificial node, the value |null| is returned. 18882 18883An array of six |scaled| distances is used to keep track of the height 18884from the beginning of the list to the current place, just as in |line_break|. 18885In fact, we use one of the same arrays, only changing its name to reflect 18886its new significance. 18887 18888@d active_height==active_width {new name for the six distance variables} 18889@d cur_height==active_height[1] {the natural height} 18890@d set_height_zero(#)==active_height[#]:=0 {initialize the height to zero} 18891@# 18892@d update_heights=90 {go here to record glue in the |active_height| table} 18893 18894@p function vert_break(@!p:pointer; @!h,@!d:scaled):pointer; 18895 {finds optimum page break} 18896label done,not_found,update_heights; 18897var prev_p:pointer; {if |p| is a glue node, |type(prev_p)| determines 18898 whether |p| is a legal breakpoint} 18899@!q,@!r:pointer; {glue specifications} 18900@!pi:integer; {penalty value} 18901@!b:integer; {badness at a trial breakpoint} 18902@!least_cost:integer; {the smallest badness plus penalties found so far} 18903@!best_place:pointer; {the most recent break that leads to |least_cost|} 18904@!prev_dp:scaled; {depth of previous box in the list} 18905@!t:small_number; {|type| of the node following a kern} 18906begin prev_p:=p; {an initial glue node is not a legal breakpoint} 18907least_cost:=awful_bad; do_all_six(set_height_zero); prev_dp:=0; 18908loop@+ begin @<If node |p| is a legal breakpoint, check if this break is 18909 the best known, and |goto done| if |p| is null or 18910 if the page-so-far is already too full to accept more stuff@>; 18911 prev_p:=p; p:=link(prev_p); 18912 end; 18913done: vert_break:=best_place; 18914end; 18915 18916@ A global variable |best_height_plus_depth| will be set to the natural size 18917of the box that corresponds to the optimum breakpoint found by |vert_break|. 18918(This value is used by the insertion-splitting algorithm of the page builder.) 18919 18920@<Glob...@>= 18921@!best_height_plus_depth:scaled; {height of the best box, without stretching or 18922 shrinking} 18923 18924@ A subtle point to be noted here is that the maximum depth~|d| might be 18925negative, so |cur_height| and |prev_dp| might need to be corrected even 18926after a glue or kern node. 18927 18928@<If node |p| is a legal breakpoint, check...@>= 18929if p=null then pi:=eject_penalty 18930else @<Use node |p| to update the current height and depth measurements; 18931 if this node is not a legal breakpoint, |goto not_found| 18932 or |update_heights|, 18933 otherwise set |pi| to the associated penalty at the break@>; 18934@<Check if node |p| is a new champion breakpoint; then \(go)|goto done| 18935 if |p| is a forced break or if the page-so-far is already too full@>; 18936if (type(p)<glue_node)or(type(p)>kern_node) then goto not_found; 18937update_heights: @<Update the current height and depth measurements with 18938 respect to a glue or kern node~|p|@>; 18939not_found: if prev_dp>d then 18940 begin cur_height:=cur_height+prev_dp-d; 18941 prev_dp:=d; 18942 end; 18943 18944@ @<Use node |p| to update the current height and depth measurements...@>= 18945case type(p) of 18946hlist_node,vlist_node,rule_node: begin@t@>@;@/ 18947 cur_height:=cur_height+prev_dp+height(p); prev_dp:=depth(p); 18948 goto not_found; 18949 end; 18950whatsit_node:@<Process whatsit |p| in |vert_break| loop, |goto not_found|@>; 18951glue_node: if precedes_break(prev_p) then pi:=0 18952 else goto update_heights; 18953kern_node: begin if link(p)=null then t:=penalty_node 18954 else t:=type(link(p)); 18955 if t=glue_node then pi:=0@+else goto update_heights; 18956 end; 18957penalty_node: pi:=penalty(p); 18958mark_node,ins_node: goto not_found; 18959othercases confusion("vertbreak") 18960@:this can't happen vertbreak}{\quad vertbreak@> 18961endcases 18962 18963@ @d deplorable==100000 {more than |inf_bad|, but less than |awful_bad|} 18964 18965@<Check if node |p| is a new champion breakpoint; then \(go)...@>= 18966if pi<inf_penalty then 18967 begin @<Compute the badness, |b|, using |awful_bad| 18968 if the box is too full@>; 18969 if b<awful_bad then 18970 if pi<=eject_penalty then b:=pi 18971 else if b<inf_bad then b:=b+pi 18972 else b:=deplorable; 18973 if b<=least_cost then 18974 begin best_place:=p; least_cost:=b; 18975 best_height_plus_depth:=cur_height+prev_dp; 18976 end; 18977 if (b=awful_bad)or(pi<=eject_penalty) then goto done; 18978 end 18979 18980@ @<Compute the badness, |b|, using |awful_bad| if the box is too full@>= 18981if cur_height<h then 18982 if (active_height[3]<>0) or (active_height[4]<>0) or 18983 (active_height[5]<>0) then b:=0 18984 else b:=badness(h-cur_height,active_height[2]) 18985else if cur_height-h>active_height[6] then b:=awful_bad 18986else b:=badness(cur_height-h,active_height[6]) 18987 18988@ Vertical lists that are subject to the |vert_break| procedure should not 18989contain infinite shrinkability, since that would permit any amount of 18990information to ``fit'' on one page. 18991 18992@<Update the current height and depth measurements with...@>= 18993if type(p)=kern_node then q:=p 18994else begin q:=glue_ptr(p); 18995 active_height[2+stretch_order(q)]:=@| 18996 active_height[2+stretch_order(q)]+stretch(q);@/ 18997 active_height[6]:=active_height[6]+shrink(q); 18998 if (shrink_order(q)<>normal)and(shrink(q)<>0) then 18999 begin@t@>@;@/ 19000 print_err("Infinite glue shrinkage found in box being split");@/ 19001@.Infinite glue shrinkage...@> 19002 help4("The box you are \vsplitting contains some infinitely")@/ 19003 ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/ 19004 ("Such glue doesn't belong there; but you can safely proceed,")@/ 19005 ("since the offensive shrinkability has been made finite."); 19006 error; r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q); 19007 glue_ptr(p):=r; q:=r; 19008 end; 19009 end; 19010cur_height:=cur_height+prev_dp+width(q); prev_dp:=0 19011 19012@ Now we are ready to consider |vsplit| itself. Most of 19013its work is accomplished by the two subroutines that we have just considered. 19014 19015Given the number of a vlist box |n|, and given a desired page height |h|, 19016the |vsplit| function finds the best initial segment of the vlist and 19017returns a box for a page of height~|h|. The remainder of the vlist, if 19018any, replaces the original box, after removing glue and penalties and 19019adjusting for |split_top_skip|. Mark nodes in the split-off box are used to 19020set the values of |split_first_mark| and |split_bot_mark|; we use the 19021fact that |split_first_mark=null| if and only if |split_bot_mark=null|. 19022 19023The original box becomes ``void'' if and only if it has been entirely 19024extracted. The extracted box is ``void'' if and only if the original 19025box was void (or if it was, erroneously, an hlist box). 19026 19027@p function vsplit(@!n:eight_bits; @!h:scaled):pointer; 19028 {extracts a page of height |h| from box |n|} 19029label exit,done; 19030var v:pointer; {the box to be split} 19031p:pointer; {runs through the vlist} 19032q:pointer; {points to where the break occurs} 19033begin v:=box(n); 19034if split_first_mark<>null then 19035 begin delete_token_ref(split_first_mark); split_first_mark:=null; 19036 delete_token_ref(split_bot_mark); split_bot_mark:=null; 19037 end; 19038@<Dispense with trivial cases of void or bad boxes@>; 19039q:=vert_break(list_ptr(v),h,split_max_depth); 19040@<Look at all the marks in nodes before the break, and set the final 19041 link to |null| at the break@>; 19042q:=prune_page_top(q); p:=list_ptr(v); free_node(v,box_node_size); 19043if q=null then box(n):=null {the |eq_level| of the box stays the same} 19044else box(n):=vpack(q,natural); 19045vsplit:=vpackage(p,h,exactly,split_max_depth); 19046exit: end; 19047 19048@ @<Dispense with trivial cases of void or bad boxes@>= 19049if v=null then 19050 begin vsplit:=null; return; 19051 end; 19052if type(v)<>vlist_node then 19053 begin print_err(""); print_esc("vsplit"); print(" needs a "); 19054 print_esc("vbox"); 19055@:vsplit_}{\.{\\vsplit needs a \\vbox}@> 19056 help2("The box you are trying to split is an \hbox.")@/ 19057 ("I can't split such a box, so I'll leave it alone."); 19058 error; vsplit:=null; return; 19059 end 19060 19061@ It's possible that the box begins with a penalty node that is the 19062``best'' break, so we must be careful to handle this special case correctly. 19063 19064@<Look at all the marks...@>= 19065p:=list_ptr(v); 19066if p=q then list_ptr(v):=null 19067else loop@+begin if type(p)=mark_node then 19068 if split_first_mark=null then 19069 begin split_first_mark:=mark_ptr(p); 19070 split_bot_mark:=split_first_mark; 19071 token_ref_count(split_first_mark):=@| 19072 token_ref_count(split_first_mark)+2; 19073 end 19074 else begin delete_token_ref(split_bot_mark); 19075 split_bot_mark:=mark_ptr(p); 19076 add_token_ref(split_bot_mark); 19077 end; 19078 if link(p)=q then 19079 begin link(p):=null; goto done; 19080 end; 19081 p:=link(p); 19082 end; 19083done: 19084 19085@* \[45] The page builder. 19086When \TeX\ appends new material to its main vlist in vertical mode, it uses 19087a method something like |vsplit| to decide where a page ends, except that 19088the calculations are done ``on line'' as new items come in. 19089The main complication in this process is that insertions must be put 19090into their boxes and removed from the vlist, in a more-or-less optimum manner. 19091 19092We shall use the term ``current page'' for that part of the main vlist that 19093is being considered as a candidate for being broken off and sent to the 19094user's output routine. The current page starts at |link(page_head)|, and 19095it ends at |page_tail|. We have |page_head=page_tail| if this list is empty. 19096@^current page@> 19097 19098Utter chaos would reign if the user kept changing page specifications 19099while a page is being constructed, so the page builder keeps the pertinent 19100specifications frozen as soon as the page receives its first box or 19101insertion. The global variable |page_contents| is |empty| when the 19102current page contains only mark nodes and content-less whatsit nodes; it 19103is |inserts_only| if the page contains only insertion nodes in addition to 19104marks and whatsits. Glue nodes, kern nodes, and penalty nodes are 19105discarded until a box or rule node appears, at which time |page_contents| 19106changes to |box_there|. As soon as |page_contents| becomes non-|empty|, 19107the current |vsize| and |max_depth| are squirreled away into |page_goal| 19108and |page_max_depth|; the latter values will be used until the page has 19109been forwarded to the user's output routine. The \.{\\topskip} adjustment 19110is made when |page_contents| changes to |box_there|. 19111 19112Although |page_goal| starts out equal to |vsize|, it is decreased by the 19113scaled natural height-plus-depth of the insertions considered so far, and by 19114the \.{\\skip} corrections for those insertions. Therefore it represents 19115the size into which the non-inserted material should fit, assuming that 19116all insertions in the current page have been made. 19117 19118The global variables |best_page_break| and |least_page_cost| correspond 19119respectively to the local variables |best_place| and |least_cost| in the 19120|vert_break| routine that we have already studied; i.e., they record the 19121location and value of the best place currently known for breaking the 19122current page. The value of |page_goal| at the time of the best break is 19123stored in |best_size|. 19124 19125@d inserts_only=1 19126 {|page_contents| when an insert node has been contributed, but no boxes} 19127@d box_there=2 {|page_contents| when a box or rule has been contributed} 19128 19129@<Glob...@>= 19130@!page_tail:pointer; {the final node on the current page} 19131@!page_contents:empty..box_there; {what is on the current page so far?} 19132@!page_max_depth:scaled; {maximum box depth on page being built} 19133@!best_page_break:pointer; {break here to get the best page known so far} 19134@!least_page_cost:integer; {the score for this currently best page} 19135@!best_size:scaled; {its |page_goal|} 19136 19137@ The page builder has another data structure to keep track of insertions. 19138This is a list of four-word nodes, starting and ending at |page_ins_head|. 19139That is, the first element of the list is node |r@t$_1$@>=link(page_ins_head)|; 19140node $r_j$ is followed by |r@t$_{j+1}$@>=link(r@t$_j$@>)|; and if there are 19141|n| items we have |r@t$_{n+1}$@>=page_ins_head|. The |subtype| field of 19142each node in this list refers to an insertion number; for example, `\.{\\insert 19143250}' would correspond to a node whose |subtype| is |qi(250)| 19144(the same as the |subtype| field of the relevant |ins_node|). These |subtype| 19145fields are in increasing order, and |subtype(page_ins_head)= 19146qi(255)|, so |page_ins_head| serves as a convenient sentinel 19147at the end of the list. A record is present for each insertion number that 19148appears in the current page. 19149 19150The |type| field in these nodes distinguishes two possibilities that 19151might occur as we look ahead before deciding on the optimum page break. 19152If |type(r)=inserting|, then |height(r)| contains the total of the 19153height-plus-depth dimensions of the box and all its inserts seen so far. 19154If |type(r)=split_up|, then no more insertions will be made into this box, 19155because at least one previous insertion was too big to fit on the current 19156page; |broken_ptr(r)| points to the node where that insertion will be 19157split, if \TeX\ decides to split it, |broken_ins(r)| points to the 19158insertion node that was tentatively split, and |height(r)| includes also the 19159natural height plus depth of the part that would be split off. 19160 19161In both cases, |last_ins_ptr(r)| points to the last |ins_node| 19162encountered for box |qo(subtype(r))| that would be at least partially 19163inserted on the next page; and |best_ins_ptr(r)| points to the last 19164such |ins_node| that should actually be inserted, to get the page with 19165minimum badness among all page breaks considered so far. We have 19166|best_ins_ptr(r)=null| if and only if no insertion for this box should 19167be made to produce this optimum page. 19168 19169The data structure definitions here use the fact that the |@!height| field 19170appears in the fourth word of a box node. 19171@^data structure assumptions@> 19172 19173@d page_ins_node_size=4 {number of words for a page insertion node} 19174@d inserting=0 {an insertion class that has not yet overflowed} 19175@d split_up=1 {an overflowed insertion class} 19176@d broken_ptr(#)==link(#+1) 19177 {an insertion for this class will break here if anywhere} 19178@d broken_ins(#)==info(#+1) {this insertion might break at |broken_ptr|} 19179@d last_ins_ptr(#)==link(#+2) {the most recent insertion for this |subtype|} 19180@d best_ins_ptr(#)==info(#+2) {the optimum most recent insertion} 19181 19182@<Initialize the special list heads...@>= 19183subtype(page_ins_head):=qi(255); 19184type(page_ins_head):=split_up; link(page_ins_head):=page_ins_head; 19185 19186@ An array |page_so_far| records the heights and depths of everything 19187on the current page. This array contains six |scaled| numbers, like the 19188similar arrays already considered in |line_break| and |vert_break|; and it 19189also contains |page_goal| and |page_depth|, since these values are 19190all accessible to the user via |set_page_dimen| commands. The 19191value of |page_so_far[1]| is also called |page_total|. The stretch 19192and shrink components of the \.{\\skip} corrections for each insertion are 19193included in |page_so_far|, but the natural space components of these 19194corrections are not, since they have been subtracted from |page_goal|. 19195 19196The variable |page_depth| records the depth of the current page; it has been 19197adjusted so that it is at most |page_max_depth|. The variable 19198|last_glue| points to the glue specification of the most recent node 19199contributed from the contribution list, if this was a glue node; otherwise 19200|last_glue=max_halfword|. (If the contribution list is nonempty, 19201however, the value of |last_glue| is not necessarily accurate.) 19202The variables |last_penalty| and |last_kern| are similar. And 19203finally, |insert_penalties| holds the sum of the penalties associated with 19204all split and floating insertions. 19205 19206@d page_goal==page_so_far[0] {desired height of information on page being built} 19207@d page_total==page_so_far[1] {height of the current page} 19208@d page_shrink==page_so_far[6] {shrinkability of the current page} 19209@d page_depth==page_so_far[7] {depth of the current page} 19210 19211@<Glob...@>= 19212@!page_so_far:array [0..7] of scaled; {height and glue of the current page} 19213@!last_glue:pointer; {used to implement \.{\\lastskip}} 19214@!last_penalty:integer; {used to implement \.{\\lastpenalty}} 19215@!last_kern:scaled; {used to implement \.{\\lastkern}} 19216@!insert_penalties:integer; {sum of the penalties for held-over insertions} 19217 19218@ @<Put each...@>= 19219primitive("pagegoal",set_page_dimen,0); 19220@!@:page_goal_}{\.{\\pagegoal} primitive@> 19221primitive("pagetotal",set_page_dimen,1); 19222@!@:page_total_}{\.{\\pagetotal} primitive@> 19223primitive("pagestretch",set_page_dimen,2); 19224@!@:page_stretch_}{\.{\\pagestretch} primitive@> 19225primitive("pagefilstretch",set_page_dimen,3); 19226@!@:page_fil_stretch_}{\.{\\pagefilstretch} primitive@> 19227primitive("pagefillstretch",set_page_dimen,4); 19228@!@:page_fill_stretch_}{\.{\\pagefillstretch} primitive@> 19229primitive("pagefilllstretch",set_page_dimen,5); 19230@!@:page_filll_stretch_}{\.{\\pagefilllstretch} primitive@> 19231primitive("pageshrink",set_page_dimen,6); 19232@!@:page_shrink_}{\.{\\pageshrink} primitive@> 19233primitive("pagedepth",set_page_dimen,7); 19234@!@:page_depth_}{\.{\\pagedepth} primitive@> 19235 19236@ @<Cases of |print_cmd_chr|...@>= 19237set_page_dimen: case chr_code of 192380: print_esc("pagegoal"); 192391: print_esc("pagetotal"); 192402: print_esc("pagestretch"); 192413: print_esc("pagefilstretch"); 192424: print_esc("pagefillstretch"); 192435: print_esc("pagefilllstretch"); 192446: print_esc("pageshrink"); 19245othercases print_esc("pagedepth") 19246endcases; 19247 19248@ @d print_plus_end(#)==print(#);@+end 19249@d print_plus(#)==if page_so_far[#]<>0 then 19250 begin print(" plus "); print_scaled(page_so_far[#]); print_plus_end 19251 19252@p procedure print_totals; 19253begin print_scaled(page_total); 19254print_plus(2)(""); 19255print_plus(3)("fil"); 19256print_plus(4)("fill"); 19257print_plus(5)("filll"); 19258if page_shrink<>0 then 19259 begin print(" minus "); print_scaled(page_shrink); 19260 end; 19261end; 19262 19263@ @<Show the status of the current page@>= 19264if page_head<>page_tail then 19265 begin print_nl("### current page:"); 19266 if output_active then print(" (held over for next output)"); 19267@.held over for next output@> 19268 show_box(link(page_head)); 19269 if page_contents>empty then 19270 begin print_nl("total height "); print_totals; 19271@:total_height}{\.{total height}@> 19272 print_nl(" goal height "); print_scaled(page_goal); 19273@.goal height@> 19274 r:=link(page_ins_head); 19275 while r<>page_ins_head do 19276 begin print_ln; print_esc("insert"); t:=qo(subtype(r)); 19277 print_int(t); print(" adds "); 19278 if count(t)=1000 then t:=height(r) 19279 else t:=x_over_n(height(r),1000)*count(t); 19280 print_scaled(t); 19281 if type(r)=split_up then 19282 begin q:=page_head; t:=0; 19283 repeat q:=link(q); 19284 if (type(q)=ins_node)and(subtype(q)=subtype(r)) then incr(t); 19285 until q=broken_ins(r); 19286 print(", #"); print_int(t); print(" might split"); 19287 end; 19288 r:=link(r); 19289 end; 19290 end; 19291 end 19292 19293@ Here is a procedure that is called when the |page_contents| is changing 19294from |empty| to |inserts_only| or |box_there|. 19295 19296@d set_page_so_far_zero(#)==page_so_far[#]:=0 19297 19298@p procedure freeze_page_specs(@!s:small_number); 19299begin page_contents:=s; 19300page_goal:=vsize; page_max_depth:=max_depth; 19301page_depth:=0; do_all_six(set_page_so_far_zero); 19302least_page_cost:=awful_bad; 19303@!stat if tracing_pages>0 then 19304 begin begin_diagnostic; 19305 print_nl("%% goal height="); print_scaled(page_goal); 19306@.goal height@> 19307 print(", max depth="); print_scaled(page_max_depth); 19308 end_diagnostic(false); 19309 end;@;@+tats@;@/ 19310end; 19311 19312@ Pages are built by appending nodes to the current list in \TeX's 19313vertical mode, which is at the outermost level of the semantic nest. This 19314vlist is split into two parts; the ``current page'' that we have been 19315talking so much about already, and the ``contribution list'' that receives 19316new nodes as they are created. The current page contains everything that 19317the page builder has accounted for in its data structures, as described 19318above, while the contribution list contains other things that have been 19319generated by other parts of \TeX\ but have not yet been 19320seen by the page builder. 19321The contribution list starts at |link(contrib_head)|, and it ends at the 19322current node in \TeX's vertical mode. 19323 19324When \TeX\ has appended new material in vertical mode, it calls the procedure 19325|build_page|, which tries to catch up by moving nodes from the contribution 19326list to the current page. This procedure will succeed in its goal of 19327emptying the contribution list, unless a page break is discovered, i.e., 19328unless the current page has grown to the point where the optimum next 19329page break has been determined. In the latter case, the nodes after the 19330optimum break will go back onto the contribution list, and control will 19331effectively pass to the user's output routine. 19332 19333We make |type(page_head)=glue_node|, so that an initial glue node on 19334the current page will not be considered a valid breakpoint. 19335 19336@<Initialize the special list...@>= 19337type(page_head):=glue_node; subtype(page_head):=normal; 19338 19339@ The global variable |output_active| is true during the time the 19340user's output routine is driving \TeX. 19341 19342@<Glob...@>= 19343@!output_active:boolean; {are we in the midst of an output routine?} 19344 19345@ @<Set init...@>= 19346output_active:=false; insert_penalties:=0; 19347 19348@ The page builder is ready to start a fresh page if we initialize 19349the following state variables. (However, the page insertion list is initialized 19350elsewhere.) 19351 19352@<Start a new current page@>= 19353page_contents:=empty; page_tail:=page_head; link(page_head):=null;@/ 19354last_glue:=max_halfword; last_penalty:=0; last_kern:=0; 19355page_depth:=0; page_max_depth:=0 19356 19357@ At certain times box 255 is supposed to be void (i.e., |null|), 19358or an insertion box is supposed to be ready to accept a vertical list. 19359If not, an error message is printed, and the following subroutine 19360flushes the unwanted contents, reporting them to the user. 19361 19362@p procedure box_error(@!n:eight_bits); 19363begin error; begin_diagnostic; 19364print_nl("The following box has been deleted:"); 19365@.The following...deleted@> 19366show_box(box(n)); end_diagnostic(true); 19367flush_node_list(box(n)); box(n):=null; 19368end; 19369 19370@ The following procedure guarantees that a given box register 19371does not contain an \.{\\hbox}. 19372 19373@p procedure ensure_vbox(@!n:eight_bits); 19374var p:pointer; {the box register contents} 19375begin p:=box(n); 19376if p<>null then if type(p)=hlist_node then 19377 begin print_err("Insertions can only be added to a vbox"); 19378@.Insertions can only...@> 19379 help3("Tut tut: You're trying to \insert into a")@/ 19380 ("\box register that now contains an \hbox.")@/ 19381 ("Proceed, and I'll discard its present contents."); 19382 box_error(n); 19383 end; 19384end; 19385 19386@ \TeX\ is not always in vertical mode at the time |build_page| 19387is called; the current mode reflects what \TeX\ should return to, after 19388the contribution list has been emptied. A call on |build_page| should 19389be immediately followed by `|goto big_switch|', which is \TeX's central 19390control point. 19391 19392@d contribute=80 {go here to link a node into the current page} 19393 19394@p @t\4@>@<Declare the procedure called |fire_up|@>@;@/ 19395procedure build_page; {append contributions to the current page} 19396label exit,done,done1,continue,contribute,update_heights; 19397var p:pointer; {the node being appended} 19398@!q,@!r:pointer; {nodes being examined} 19399@!b,@!c:integer; {badness and cost of current page} 19400@!pi:integer; {penalty to be added to the badness} 19401@!n:min_quarterword..255; {insertion box number} 19402@!delta,@!h,@!w:scaled; {sizes used for insertion calculations} 19403begin if (link(contrib_head)=null)or output_active then return; 19404repeat continue: p:=link(contrib_head);@/ 19405@<Update the values of |last_glue|, |last_penalty|, and |last_kern|@>; 19406@<Move node |p| to the current page; if it is time for a page break, 19407 put the nodes following the break back onto the contribution list, 19408 and |return| to the user's output routine if there is one@>; 19409until link(contrib_head)=null; 19410@<Make the contribution list empty by setting its tail to |contrib_head|@>; 19411exit:end; 19412 19413@ @d contrib_tail==nest[0].tail_field {tail of the contribution list} 19414 19415@<Make the contribution list empty...@>= 19416if nest_ptr=0 then tail:=contrib_head {vertical mode} 19417else contrib_tail:=contrib_head {other modes} 19418 19419@ @<Update the values of |last_glue|...@>= 19420if last_glue<>max_halfword then delete_glue_ref(last_glue); 19421last_penalty:=0; last_kern:=0; 19422if type(p)=glue_node then 19423 begin last_glue:=glue_ptr(p); add_glue_ref(last_glue); 19424 end 19425else begin last_glue:=max_halfword; 19426 if type(p)=penalty_node then last_penalty:=penalty(p) 19427 else if type(p)=kern_node then last_kern:=width(p); 19428 end 19429 19430@ The code here is an example of a many-way switch into routines that 19431merge together in different places. Some people call this unstructured 19432programming, but the author doesn't see much wrong with it, as long as 19433@^Knuth, Donald Ervin@> 19434the various labels have a well-understood meaning. 19435 19436@<Move node |p| to the current page; ...@>= 19437@<If the current page is empty and node |p| is to be deleted, |goto done1|; 19438 otherwise use node |p| to update the state of the current page; 19439 if this node is an insertion, |goto contribute|; otherwise if this node 19440 is not a legal breakpoint, |goto contribute| or |update_heights|; 19441 otherwise set |pi| to the penalty associated with this breakpoint@>; 19442@<Check if node |p| is a new champion breakpoint; then \(if)if it is time for 19443 a page break, prepare for output, and either fire up the user's 19444 output routine and |return| or ship out the page and |goto done|@>; 19445if (type(p)<glue_node)or(type(p)>kern_node) then goto contribute; 19446update_heights:@<Update the current page measurements with respect to the 19447 glue or kern specified by node~|p|@>; 19448contribute: @<Make sure that |page_max_depth| is not exceeded@>; 19449@<Link node |p| into the current page and |goto done|@>; 19450done1:@<Recycle node |p|@>; 19451done: 19452 19453@ @<Link node |p| into the current page and |goto done|@>= 19454link(page_tail):=p; page_tail:=p; 19455link(contrib_head):=link(p); link(p):=null; goto done 19456 19457@ @<Recycle node |p|@>= 19458link(contrib_head):=link(p); link(p):=null; flush_node_list(p) 19459 19460@ The title of this section is already so long, it seems best to avoid 19461making it more accurate but still longer, by mentioning the fact that a 19462kern node at the end of the contribution list will not be contributed until 19463we know its successor. 19464 19465@<If the current page is empty...@>= 19466case type(p) of 19467hlist_node,vlist_node,rule_node: if page_contents<box_there then 19468 @<Initialize the current page, insert the \.{\\topskip} glue 19469 ahead of |p|, and |goto continue|@> 19470 else @<Prepare to move a box or rule node to the current page, 19471 then |goto contribute|@>; 19472whatsit_node: @<Prepare to move whatsit |p| to the current page, 19473 then |goto contribute|@>; 19474glue_node: if page_contents<box_there then goto done1 19475 else if precedes_break(page_tail) then pi:=0 19476 else goto update_heights; 19477kern_node: if page_contents<box_there then goto done1 19478 else if link(p)=null then return 19479 else if type(link(p))=glue_node then pi:=0 19480 else goto update_heights; 19481penalty_node: if page_contents<box_there then goto done1@+else pi:=penalty(p); 19482mark_node: goto contribute; 19483ins_node: @<Append an insertion to the current page and |goto contribute|@>; 19484othercases confusion("page") 19485@:this can't happen page}{\quad page@> 19486endcases 19487 19488@ @<Initialize the current page, insert the \.{\\topskip} glue...@>= 19489begin if page_contents=empty then freeze_page_specs(box_there) 19490else page_contents:=box_there; 19491q:=new_skip_param(top_skip_code); {now |temp_ptr=glue_ptr(q)|} 19492if width(temp_ptr)>height(p) then width(temp_ptr):=width(temp_ptr)-height(p) 19493else width(temp_ptr):=0; 19494link(q):=p; link(contrib_head):=q; goto continue; 19495end 19496 19497@ @<Prepare to move a box or rule node to the current page...@>= 19498begin page_total:=page_total+page_depth+height(p); 19499page_depth:=depth(p); 19500goto contribute; 19501end 19502 19503@ @<Make sure that |page_max_depth| is not exceeded@>= 19504if page_depth>page_max_depth then 19505 begin page_total:=@| 19506 page_total+page_depth-page_max_depth;@/ 19507 page_depth:=page_max_depth; 19508 end; 19509 19510@ @<Update the current page measurements with respect to the glue...@>= 19511if type(p)=kern_node then q:=p 19512else begin q:=glue_ptr(p); 19513 page_so_far[2+stretch_order(q)]:=@| 19514 page_so_far[2+stretch_order(q)]+stretch(q);@/ 19515 page_shrink:=page_shrink+shrink(q); 19516 if (shrink_order(q)<>normal)and(shrink(q)<>0) then 19517 begin@t@>@;@/ 19518 print_err("Infinite glue shrinkage found on current page");@/ 19519@.Infinite glue shrinkage...@> 19520 help4("The page about to be output contains some infinitely")@/ 19521 ("shrinkable glue, e.g., `\vss' or `\vskip 0pt minus 1fil'.")@/ 19522 ("Such glue doesn't belong there; but you can safely proceed,")@/ 19523 ("since the offensive shrinkability has been made finite."); 19524 error; 19525 r:=new_spec(q); shrink_order(r):=normal; delete_glue_ref(q); 19526 glue_ptr(p):=r; q:=r; 19527 end; 19528 end; 19529page_total:=page_total+page_depth+width(q); page_depth:=0 19530 19531@ @<Check if node |p| is a new champion breakpoint; then \(if)...@>= 19532if pi<inf_penalty then 19533 begin @<Compute the badness, |b|, of the current page, 19534 using |awful_bad| if the box is too full@>; 19535 if b<awful_bad then 19536 if pi<=eject_penalty then c:=pi 19537 else if b<inf_bad then c:=b+pi+insert_penalties 19538 else c:=deplorable 19539 else c:=b; 19540 if insert_penalties>=10000 then c:=awful_bad; 19541 @!stat if tracing_pages>0 then @<Display the page break cost@>;@+tats@;@/ 19542 if c<=least_page_cost then 19543 begin best_page_break:=p; best_size:=page_goal; 19544 least_page_cost:=c; 19545 r:=link(page_ins_head); 19546 while r<>page_ins_head do 19547 begin best_ins_ptr(r):=last_ins_ptr(r); 19548 r:=link(r); 19549 end; 19550 end; 19551 if (c=awful_bad)or(pi<=eject_penalty) then 19552 begin fire_up(p); {output the current page at the best place} 19553 if output_active then return; {user's output routine will act} 19554 goto done; {the page has been shipped out by default output routine} 19555 end; 19556 end 19557 19558@ @<Display the page break cost@>= 19559begin begin_diagnostic; print_nl("%"); 19560print(" t="); print_totals;@/ 19561print(" g="); print_scaled(page_goal);@/ 19562print(" b="); 19563if b=awful_bad then print_char("*")@+else print_int(b); 19564@.*\relax@> 19565print(" p="); print_int(pi); 19566print(" c="); 19567if c=awful_bad then print_char("*")@+else print_int(c); 19568if c<=least_page_cost then print_char("#"); 19569end_diagnostic(false); 19570end 19571 19572@ @<Compute the badness, |b|, of the current page...@>= 19573if page_total<page_goal then 19574 if (page_so_far[3]<>0) or (page_so_far[4]<>0) or@| 19575 (page_so_far[5]<>0) then b:=0 19576 else b:=badness(page_goal-page_total,page_so_far[2]) 19577else if page_total-page_goal>page_shrink then b:=awful_bad 19578else b:=badness(page_total-page_goal,page_shrink) 19579 19580@ @<Append an insertion to the current page and |goto contribute|@>= 19581begin if page_contents=empty then freeze_page_specs(inserts_only); 19582n:=subtype(p); r:=page_ins_head; 19583while n>=subtype(link(r)) do r:=link(r); 19584n:=qo(n); 19585if subtype(r)<>qi(n) then 19586 @<Create a page insertion node with |subtype(r)=qi(n)|, and 19587 include the glue correction for box |n| in the 19588 current page state@>; 19589if type(r)=split_up then insert_penalties:=insert_penalties+float_cost(p) 19590else begin last_ins_ptr(r):=p; 19591 delta:=page_goal-page_total-page_depth+page_shrink; 19592 {this much room is left if we shrink the maximum} 19593 if count(n)=1000 then h:=height(p) 19594 else h:=x_over_n(height(p),1000)*count(n); {this much room is needed} 19595 if ((h<=0)or(h<=delta))and(height(p)+height(r)<=dimen(n)) then 19596 begin page_goal:=page_goal-h; height(r):=height(r)+height(p); 19597 end 19598 else @<Find the best way to split the insertion, and change 19599 |type(r)| to |split_up|@>; 19600 end; 19601goto contribute; 19602end 19603 19604@ We take note of the value of \.{\\skip} |n| and the height plus depth 19605of \.{\\box}~|n| only when the first \.{\\insert}~|n| node is 19606encountered for a new page. A user who changes the contents of \.{\\box}~|n| 19607after that first \.{\\insert}~|n| had better be either extremely careful 19608or extremely lucky, or both. 19609 19610@<Create a page insertion node...@>= 19611begin q:=get_node(page_ins_node_size); link(q):=link(r); link(r):=q; r:=q; 19612subtype(r):=qi(n); type(r):=inserting; ensure_vbox(n); 19613if box(n)=null then height(r):=0 19614else height(r):=height(box(n))+depth(box(n)); 19615best_ins_ptr(r):=null;@/ 19616q:=skip(n); 19617if count(n)=1000 then h:=height(r) 19618else h:=x_over_n(height(r),1000)*count(n); 19619page_goal:=page_goal-h-width(q);@/ 19620page_so_far[2+stretch_order(q)]:=@|page_so_far[2+stretch_order(q)]+stretch(q);@/ 19621page_shrink:=page_shrink+shrink(q); 19622if (shrink_order(q)<>normal)and(shrink(q)<>0) then 19623 begin print_err("Infinite glue shrinkage inserted from "); print_esc("skip"); 19624@.Infinite glue shrinkage...@> 19625 print_int(n); 19626 help3("The correction glue for page breaking with insertions")@/ 19627 ("must have finite shrinkability. But you may proceed,")@/ 19628 ("since the offensive shrinkability has been made finite."); 19629 error; 19630 end; 19631end 19632 19633@ Here is the code that will split a long footnote between pages, in an 19634emergency. The current situation deserves to be recapitulated: Node |p| 19635is an insertion into box |n|; the insertion will not fit, in its entirety, 19636either because it would make the total contents of box |n| greater than 19637\.{\\dimen} |n|, or because it would make the incremental amount of growth 19638|h| greater than the available space |delta|, or both. (This amount |h| has 19639been weighted by the insertion scaling factor, i.e., by \.{\\count} |n| 19640over 1000.) Now we will choose the best way to break the vlist of the 19641insertion, using the same criteria as in the \.{\\vsplit} operation. 19642 19643@<Find the best way to split the insertion...@>= 19644begin if count(n)<=0 then w:=max_dimen 19645else begin w:=page_goal-page_total-page_depth; 19646 if count(n)<>1000 then w:=x_over_n(w,count(n))*1000; 19647 end; 19648if w>dimen(n)-height(r) then w:=dimen(n)-height(r); 19649q:=vert_break(ins_ptr(p),w,depth(p)); 19650height(r):=height(r)+best_height_plus_depth; 19651@!stat if tracing_pages>0 then @<Display the insertion split cost@>;@+tats@;@/ 19652if count(n)<>1000 then 19653 best_height_plus_depth:=x_over_n(best_height_plus_depth,1000)*count(n); 19654page_goal:=page_goal-best_height_plus_depth; 19655type(r):=split_up; broken_ptr(r):=q; broken_ins(r):=p; 19656if q=null then insert_penalties:=insert_penalties+eject_penalty 19657else if type(q)=penalty_node then insert_penalties:=insert_penalties+penalty(q); 19658end 19659 19660@ @<Display the insertion split cost@>= 19661begin begin_diagnostic; print_nl("% split"); print_int(n); 19662@.split@> 19663print(" to "); print_scaled(w); 19664print_char(","); print_scaled(best_height_plus_depth);@/ 19665print(" p="); 19666if q=null then print_int(eject_penalty) 19667else if type(q)=penalty_node then print_int(penalty(q)) 19668else print_char("0"); 19669end_diagnostic(false); 19670end 19671 19672@ When the page builder has looked at as much material as could appear before 19673the next page break, it makes its decision. The break that gave minimum 19674badness will be used to put a completed ``page'' into box 255, with insertions 19675appended to their other boxes. 19676 19677We also set the values of |top_mark|, |first_mark|, and |bot_mark|. The 19678program uses the fact that |bot_mark<>null| implies |first_mark<>null|; 19679it also knows that |bot_mark=null| implies |top_mark=first_mark=null|. 19680 19681The |fire_up| subroutine prepares to output the current page at the best 19682place; then it fires up the user's output routine, if there is one, 19683or it simply ships out the page. There is one parameter, |c|, which represents 19684the node that was being contributed to the page when the decision to 19685force an output was made. 19686 19687@<Declare the procedure called |fire_up|@>= 19688procedure fire_up(@!c:pointer); 19689label exit; 19690var p,@!q,@!r,@!s:pointer; {nodes being examined and/or changed} 19691@!prev_p:pointer; {predecessor of |p|} 19692@!n:min_quarterword..255; {insertion box number} 19693@!wait:boolean; {should the present insertion be held over?} 19694@!save_vbadness:integer; {saved value of |vbadness|} 19695@!save_vfuzz: scaled; {saved value of |vfuzz|} 19696@!save_split_top_skip: pointer; {saved value of |split_top_skip|} 19697begin @<Set the value of |output_penalty|@>; 19698if bot_mark<>null then 19699 begin if top_mark<>null then delete_token_ref(top_mark); 19700 top_mark:=bot_mark; add_token_ref(top_mark); 19701 delete_token_ref(first_mark); first_mark:=null; 19702 end; 19703@<Put the \(o)optimal current page into box 255, update |first_mark| and 19704 |bot_mark|, append insertions to their boxes, and put the 19705 remaining nodes back on the contribution list@>; 19706if (top_mark<>null)and(first_mark=null) then 19707 begin first_mark:=top_mark; add_token_ref(top_mark); 19708 end; 19709if output_routine<>null then 19710 if dead_cycles>=max_dead_cycles then 19711 @<Explain that too many dead cycles have occurred in a row@> 19712 else @<Fire up the user's output routine and |return|@>; 19713@<Perform the default output routine@>; 19714exit:end; 19715 19716@ @<Set the value of |output_penalty|@>= 19717if type(best_page_break)=penalty_node then 19718 begin geq_word_define(int_base+output_penalty_code,penalty(best_page_break)); 19719 penalty(best_page_break):=inf_penalty; 19720 end 19721else geq_word_define(int_base+output_penalty_code,inf_penalty) 19722 19723@ As the page is finally being prepared for output, 19724pointer |p| runs through the vlist, with |prev_p| trailing behind; 19725pointer |q| is the tail of a list of insertions that 19726are being held over for a subsequent page. 19727 19728@<Put the \(o)optimal current page into box 255...@>= 19729if c=best_page_break then best_page_break:=null; {|c| not yet linked in} 19730@<Ensure that box 255 is empty before output@>; 19731insert_penalties:=0; {this will count the number of insertions held over} 19732save_split_top_skip:=split_top_skip; 19733if holding_inserts<=0 then 19734 @<Prepare all the boxes involved in insertions to act as queues@>; 19735q:=hold_head; link(q):=null; prev_p:=page_head; p:=link(prev_p); 19736while p<>best_page_break do 19737 begin if type(p)=ins_node then 19738 begin if holding_inserts<=0 then 19739 @<Either insert the material specified by node |p| into the 19740 appropriate box, or hold it for the next page; 19741 also delete node |p| from the current page@>; 19742 end 19743 else if type(p)=mark_node then @<Update the values of 19744 |first_mark| and |bot_mark|@>; 19745 prev_p:=p; p:=link(prev_p); 19746 end; 19747split_top_skip:=save_split_top_skip; 19748@<Break the current page at node |p|, put it in box~255, 19749 and put the remaining nodes on the contribution list@>; 19750@<Delete \(t)the page-insertion nodes@> 19751 19752@ @<Ensure that box 255 is empty before output@>= 19753if box(255)<>null then 19754 begin print_err(""); print_esc("box"); print("255 is not void"); 19755@:box255}{\.{\\box255 is not void}@> 19756 help2("You shouldn't use \box255 except in \output routines.")@/ 19757 ("Proceed, and I'll discard its present contents."); 19758 box_error(255); 19759 end 19760 19761@ @<Update the values of |first_mark| and |bot_mark|@>= 19762begin if first_mark=null then 19763 begin first_mark:=mark_ptr(p); 19764 add_token_ref(first_mark); 19765 end; 19766if bot_mark<>null then delete_token_ref(bot_mark); 19767bot_mark:=mark_ptr(p); add_token_ref(bot_mark); 19768end 19769 19770@ When the following code is executed, the current page runs from node 19771|link(page_head)| to node |prev_p|, and the nodes from |p| to |page_tail| 19772are to be placed back at the front of the contribution list. Furthermore 19773the heldover insertions appear in a list from |link(hold_head)| to |q|; we 19774will put them into the current page list for safekeeping while the user's 19775output routine is active. We might have |q=hold_head|; and |p=null| if 19776and only if |prev_p=page_tail|. Error messages are suppressed within 19777|vpackage|, since the box might appear to be overfull or underfull simply 19778because the stretch and shrink from the \.{\\skip} registers for inserts 19779are not actually present in the box. 19780 19781@<Break the current page at node |p|, put it...@>= 19782if p<>null then 19783 begin if link(contrib_head)=null then 19784 if nest_ptr=0 then tail:=page_tail 19785 else contrib_tail:=page_tail; 19786 link(page_tail):=link(contrib_head); 19787 link(contrib_head):=p; 19788 link(prev_p):=null; 19789 end; 19790save_vbadness:=vbadness; vbadness:=inf_bad; 19791save_vfuzz:=vfuzz; vfuzz:=max_dimen; {inhibit error messages} 19792box(255):=vpackage(link(page_head),best_size,exactly,page_max_depth); 19793vbadness:=save_vbadness; vfuzz:=save_vfuzz; 19794if last_glue<>max_halfword then delete_glue_ref(last_glue); 19795@<Start a new current page@>; {this sets |last_glue:=max_halfword|} 19796if q<>hold_head then 19797 begin link(page_head):=link(hold_head); page_tail:=q; 19798 end 19799 19800@ If many insertions are supposed to go into the same box, we want to know 19801the position of the last node in that box, so that we don't need to waste time 19802when linking further information into it. The |last_ins_ptr| fields of the 19803page insertion nodes are therefore used for this purpose during the 19804packaging phase. 19805 19806@<Prepare all the boxes involved in insertions to act as queues@>= 19807begin r:=link(page_ins_head); 19808while r<>page_ins_head do 19809 begin if best_ins_ptr(r)<>null then 19810 begin n:=qo(subtype(r)); ensure_vbox(n); 19811 if box(n)=null then box(n):=new_null_box; 19812 p:=box(n)+list_offset; 19813 while link(p)<>null do p:=link(p); 19814 last_ins_ptr(r):=p; 19815 end; 19816 r:=link(r); 19817 end; 19818end 19819 19820@ @<Delete \(t)the page-insertion nodes@>= 19821r:=link(page_ins_head); 19822while r<>page_ins_head do 19823 begin q:=link(r); free_node(r,page_ins_node_size); r:=q; 19824 end; 19825link(page_ins_head):=page_ins_head 19826 19827@ We will set |best_ins_ptr:=null| and package the box corresponding to 19828insertion node~|r|, just after making the final insertion into that box. 19829If this final insertion is `|split_up|', the remainder after splitting 19830and pruning (if any) will be carried over to the next page. 19831 19832@<Either insert the material specified by node |p| into...@>= 19833begin r:=link(page_ins_head); 19834while subtype(r)<>subtype(p) do r:=link(r); 19835if best_ins_ptr(r)=null then wait:=true 19836else begin wait:=false; s:=last_ins_ptr(r); link(s):=ins_ptr(p); 19837 if best_ins_ptr(r)=p then 19838 @<Wrap up the box specified by node |r|, splitting node |p| if 19839 called for; set |wait:=true| if node |p| holds a remainder after 19840 splitting@> 19841 else begin while link(s)<>null do s:=link(s); 19842 last_ins_ptr(r):=s; 19843 end; 19844 end; 19845@<Either append the insertion node |p| after node |q|, and remove it 19846 from the current page, or delete |node(p)|@>; 19847end 19848 19849@ @<Wrap up the box specified by node |r|, splitting node |p| if...@>= 19850begin if type(r)=split_up then 19851 if (broken_ins(r)=p)and(broken_ptr(r)<>null) then 19852 begin while link(s)<>broken_ptr(r) do s:=link(s); 19853 link(s):=null; 19854 split_top_skip:=split_top_ptr(p); 19855 ins_ptr(p):=prune_page_top(broken_ptr(r)); 19856 if ins_ptr(p)<>null then 19857 begin temp_ptr:=vpack(ins_ptr(p),natural); 19858 height(p):=height(temp_ptr)+depth(temp_ptr); 19859 free_node(temp_ptr,box_node_size); wait:=true; 19860 end; 19861 end; 19862best_ins_ptr(r):=null; 19863n:=qo(subtype(r)); 19864temp_ptr:=list_ptr(box(n)); 19865free_node(box(n),box_node_size); 19866box(n):=vpack(temp_ptr,natural); 19867end 19868 19869@ @<Either append the insertion node |p|...@>= 19870link(prev_p):=link(p); link(p):=null; 19871if wait then 19872 begin link(q):=p; q:=p; incr(insert_penalties); 19873 end 19874else begin delete_glue_ref(split_top_ptr(p)); 19875 free_node(p,ins_node_size); 19876 end; 19877p:=prev_p 19878 19879@ The list of heldover insertions, running from |link(page_head)| to 19880|page_tail|, must be moved to the contribution list when the user has 19881specified no output routine. 19882 19883@<Perform the default output routine@>= 19884begin if link(page_head)<>null then 19885 begin if link(contrib_head)=null then 19886 if nest_ptr=0 then tail:=page_tail@+else contrib_tail:=page_tail 19887 else link(page_tail):=link(contrib_head); 19888 link(contrib_head):=link(page_head); 19889 link(page_head):=null; page_tail:=page_head; 19890 end; 19891ship_out(box(255)); box(255):=null; 19892end 19893 19894@ @<Explain that too many dead cycles have occurred in a row@>= 19895begin print_err("Output loop---"); print_int(dead_cycles); 19896@.Output loop...@> 19897print(" consecutive dead cycles"); 19898help3("I've concluded that your \output is awry; it never does a")@/ 19899("\shipout, so I'm shipping \box255 out myself. Next time")@/ 19900("increase \maxdeadcycles if you want me to be more patient!"); error; 19901end 19902 19903@ @<Fire up the user's output routine and |return|@>= 19904begin output_active:=true; 19905incr(dead_cycles); 19906push_nest; mode:=-vmode; prev_depth:=ignore_depth; mode_line:=-line; 19907begin_token_list(output_routine,output_text); 19908new_save_level(output_group); normal_paragraph; 19909scan_left_brace; 19910return; 19911end 19912 19913@ When the user's output routine finishes, it has constructed a vlist 19914in internal vertical mode, and \TeX\ will do the following: 19915 19916@<Resume the page builder after an output routine has come to an end@>= 19917begin if (loc<>null) or 19918 ((token_type<>output_text)and(token_type<>backed_up)) then 19919 @<Recover from an unbalanced output routine@>; 19920end_token_list; {conserve stack space in case more outputs are triggered} 19921end_graf; unsave; output_active:=false; insert_penalties:=0;@/ 19922@<Ensure that box 255 is empty after output@>; 19923if tail<>head then {current list goes after heldover insertions} 19924 begin link(page_tail):=link(head); 19925 page_tail:=tail; 19926 end; 19927if link(page_head)<>null then {and both go before heldover contributions} 19928 begin if link(contrib_head)=null then contrib_tail:=page_tail; 19929 link(page_tail):=link(contrib_head); 19930 link(contrib_head):=link(page_head); 19931 link(page_head):=null; page_tail:=page_head; 19932 end; 19933pop_nest; build_page; 19934end 19935 19936@ @<Recover from an unbalanced output routine@>= 19937begin print_err("Unbalanced output routine"); 19938@.Unbalanced output routine@> 19939help2("Your sneaky output routine has problematic {'s and/or }'s.")@/ 19940("I can't handle that very well; good luck."); error; 19941repeat get_token; 19942until loc=null; 19943end {loops forever if reading from a file, since |null=min_halfword<=0|} 19944 19945@ @<Ensure that box 255 is empty after output@>= 19946if box(255)<>null then 19947 begin print_err("Output routine didn't use all of "); 19948 print_esc("box"); print_int(255); 19949@.Output routine didn't use...@> 19950 help3("Your \output commands should empty \box255,")@/ 19951 ("e.g., by saying `\shipout\box255'.")@/ 19952 ("Proceed; I'll discard its present contents."); 19953 box_error(255); 19954 end 19955 19956@* \[46] The chief executive. 19957We come now to the |main_control| routine, which contains the master 19958switch that causes all the various pieces of \TeX\ to do their things, 19959in the right order. 19960 19961In a sense, this is the grand climax of the program: It applies all the 19962tools that we have worked so hard to construct. In another sense, this is 19963the messiest part of the program: It necessarily refers to other pieces 19964of code all over the place, so that a person can't fully understand what is 19965going on without paging back and forth to be reminded of conventions that 19966are defined elsewhere. We are now at the hub of the web, the central nervous 19967system that touches most of the other parts and ties them together. 19968@^brain@> 19969 19970The structure of |main_control| itself is quite simple. There's a label 19971called |big_switch|, at which point the next token of input is fetched 19972using |get_x_token|. Then the program branches at high speed into one of 19973about 100 possible directions, based on the value of the current 19974mode and the newly fetched command code; the sum |abs(mode)+cur_cmd| 19975indicates what to do next. For example, the case `|vmode+letter|' arises 19976when a letter occurs in vertical mode (or internal vertical mode); this 19977case leads to instructions that initialize a new paragraph and enter 19978horizontal mode. 19979 19980The big |case| statement that contains this multiway switch has been labeled 19981|reswitch|, so that the program can |goto reswitch| when the next token 19982has already been fetched. Most of the cases are quite short; they call 19983an ``action procedure'' that does the work for that case, and then they 19984either |goto reswitch| or they ``fall through'' to the end of the |case| 19985statement, which returns control back to |big_switch|. Thus, |main_control| 19986is not an extremely large procedure, in spite of the multiplicity of things 19987it must do; it is small enough to be handled by \PASCAL\ compilers that put 19988severe restrictions on procedure size. 19989@!@^action procedure@> 19990 19991One case is singled out for special treatment, because it accounts for most 19992of \TeX's activities in typical applications. The process of reading simple 19993text and converting it into |char_node| records, while looking for ligatures 19994and kerns, is part of \TeX's ``inner loop''; the whole program runs 19995efficiently when its inner loop is fast, so this part has been written 19996with particular care. 19997 19998@ We shall concentrate first on the inner loop of |main_control|, deferring 19999consideration of the other cases until later. 20000@^inner loop@> 20001 20002@d big_switch=60 {go here to branch on the next token of input} 20003@d main_loop=70 {go here to typeset a string of consecutive characters} 20004@d main_loop_wrapup=80 {go here to finish a character or ligature} 20005@d main_loop_move=90 {go here to advance the ligature cursor} 20006@d main_loop_move_lig=95 {same, when advancing past a generated ligature} 20007@d main_loop_lookahead=100 {go here to bring in another character, if any} 20008@d main_lig_loop=110 {go here to check for ligatures or kerning} 20009@d append_normal_space=120 {go here to append a normal space between words} 20010 20011@p @t\4@>@<Declare action procedures for use by |main_control|@>@; 20012@t\4@>@<Declare the procedure called |handle_right_brace|@>@; 20013procedure main_control; {governs \TeX's activities} 20014label big_switch,reswitch,main_loop,main_loop_wrapup, 20015 main_loop_move,main_loop_move+1,main_loop_move+2,main_loop_move_lig, 20016 main_loop_lookahead,main_loop_lookahead+1, 20017 main_lig_loop,main_lig_loop+1,main_lig_loop+2, 20018 append_normal_space,exit; 20019var@!t:integer; {general-purpose temporary variable} 20020begin if every_job<>null then begin_token_list(every_job,every_job_text); 20021big_switch: get_x_token;@/ 20022reswitch: @<Give diagnostic information, if requested@>; 20023case abs(mode)+cur_cmd of 20024hmode+letter,hmode+other_char,hmode+char_given: goto main_loop; 20025hmode+char_num: begin scan_char_num; cur_chr:=cur_val; goto main_loop;@+end; 20026hmode+no_boundary: begin get_x_token; 20027 if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given)or 20028 (cur_cmd=char_num) then cancel_boundary:=true; 20029 goto reswitch; 20030 end; 20031hmode+spacer: if space_factor=1000 then goto append_normal_space 20032 else app_space; 20033hmode+ex_space,mmode+ex_space: goto append_normal_space; 20034@t\4@>@<Cases of |main_control| that are not part of the inner loop@>@; 20035end; {of the big |case| statement} 20036goto big_switch; 20037main_loop:@<Append character |cur_chr| and the following characters (if~any) 20038 to the current hlist in the current font; |goto reswitch| when 20039 a non-character has been fetched@>; 20040append_normal_space:@<Append a normal inter-word space to the current list, 20041 then |goto big_switch|@>; 20042exit:end; 20043 20044@ When a new token has just been fetched at |big_switch|, we have an 20045ideal place to monitor \TeX's activity. 20046@^debugging@> 20047 20048@<Give diagnostic information, if requested@>= 20049if interrupt<>0 then if OK_to_interrupt then 20050 begin back_input; check_interrupt; goto big_switch; 20051 end; 20052@!debug if panicking then check_mem(false);@+@;@+gubed 20053if tracing_commands>0 then show_cur_cmd_chr 20054 20055@ The following part of the program was first written in a structured 20056manner, according to the philosophy that ``premature optimization is 20057the root of all evil.'' Then it was rearranged into pieces of 20058spaghetti so that the most common actions could proceed with little or 20059no redundancy. 20060 20061The original unoptimized form of this algorithm resembles the 20062|reconstitute| procedure, which was described earlier in connection with 20063hyphenation. Again we have an implied ``cursor'' between characters 20064|cur_l| and |cur_r|. The main difference is that the |lig_stack| can now 20065contain a charnode as well as pseudo-ligatures; that stack is now 20066usually nonempty, because the next character of input (if any) has been 20067appended to it. In |main_control| we have 20068$$|cur_r|=\cases{|character(lig_stack)|,&if |lig_stack>null|;\cr 20069 |font_bchar[cur_font]|,&otherwise;\cr}$$ 20070except when |character(lig_stack)=font_false_bchar[cur_font]|. 20071Several additional global variables are needed. 20072 20073@<Glob...@>= 20074@!main_f:internal_font_number; {the current font} 20075@!main_i:four_quarters; {character information bytes for |cur_l|} 20076@!main_j:four_quarters; {ligature/kern command} 20077@!main_k:font_index; {index into |font_info|} 20078@!main_p:pointer; {temporary register for list manipulation} 20079@!main_s:integer; {space factor value} 20080@!bchar:halfword; {right boundary character of current font, or |non_char|} 20081@!false_bchar:halfword; {nonexistent character matching |bchar|, or |non_char|} 20082@!cancel_boundary:boolean; {should the left boundary be ignored?} 20083@!ins_disc:boolean; {should we insert a discretionary node?} 20084 20085@ The boolean variables of the main loop are normally false, and always reset 20086to false before the loop is left. That saves us the extra work of initializing 20087each time. 20088 20089@<Set init...@>= 20090ligature_present:=false; cancel_boundary:=false; lft_hit:=false; rt_hit:=false; 20091ins_disc:=false; 20092 20093@ We leave the |space_factor| unchanged if |sf_code(cur_chr)=0|; otherwise we 20094set it equal to |sf_code(cur_chr)|, except that it should never change 20095from a value less than 1000 to a value exceeding 1000. The most common 20096case is |sf_code(cur_chr)=1000|, so we want that case to be fast. 20097 20098The overall structure of the main loop is presented here. Some program labels 20099are inside the individual sections. 20100@^inner loop@> 20101 20102@d adjust_space_factor==@t@>@;@/ 20103 main_s:=sf_code(cur_chr); 20104 if main_s=1000 then space_factor:=1000 20105 else if main_s<1000 then 20106 begin if main_s>0 then space_factor:=main_s; 20107 end 20108 else if space_factor<1000 then space_factor:=1000 20109 else space_factor:=main_s 20110 20111@<Append character |cur_chr|...@>= 20112adjust_space_factor;@/ 20113main_f:=cur_font; 20114bchar:=font_bchar[main_f]; false_bchar:=font_false_bchar[main_f]; 20115if mode>0 then if language<>clang then fix_language; 20116fast_get_avail(lig_stack); font(lig_stack):=main_f; cur_l:=qi(cur_chr); 20117character(lig_stack):=cur_l;@/ 20118cur_q:=tail; 20119if cancel_boundary then 20120 begin cancel_boundary:=false; main_k:=non_address; 20121 end 20122else main_k:=bchar_label[main_f]; 20123if main_k=non_address then goto main_loop_move+2; {no left boundary processing} 20124cur_r:=cur_l; cur_l:=non_char; 20125goto main_lig_loop+1; {begin with cursor after left boundary} 20126@# 20127main_loop_wrapup:@<Make a ligature node, if |ligature_present|; 20128 insert a null discretionary, if appropriate@>; 20129main_loop_move:@<If the cursor is immediately followed by the right boundary, 20130 |goto reswitch|; if it's followed by an invalid character, |goto big_switch|; 20131 otherwise move the cursor one step to the right and |goto main_lig_loop|@>; 20132main_loop_lookahead:@<Look ahead for another character, or leave |lig_stack| 20133 empty if there's none there@>; 20134main_lig_loop:@<If there's a ligature/kern command relevant to |cur_l| and 20135 |cur_r|, adjust the text appropriately; exit to |main_loop_wrapup|@>; 20136main_loop_move_lig:@<Move the cursor past a pseudo-ligature, then 20137 |goto main_loop_lookahead| or |main_lig_loop|@> 20138 20139@ If |link(cur_q)| is nonnull when |wrapup| is invoked, |cur_q| points to 20140the list of characters that were consumed while building the ligature 20141character~|cur_l|. 20142 20143A discretionary break is not inserted for an explicit hyphen when we are in 20144restricted horizontal mode. In particular, this avoids putting discretionary 20145nodes inside of other discretionaries. 20146@^inner loop@> 20147 20148@d pack_lig(#)== {the parameter is either |rt_hit| or |false|} 20149 begin main_p:=new_ligature(main_f,cur_l,link(cur_q)); 20150 if lft_hit then 20151 begin subtype(main_p):=2; lft_hit:=false; 20152 end; 20153 if # then if lig_stack=null then 20154 begin incr(subtype(main_p)); rt_hit:=false; 20155 end; 20156 link(cur_q):=main_p; tail:=main_p; ligature_present:=false; 20157 end 20158 20159@d wrapup(#)==if cur_l<non_char then 20160 begin if link(cur_q)>null then 20161 if character(tail)=qi(hyphen_char[main_f]) then ins_disc:=true; 20162 if ligature_present then pack_lig(#); 20163 if ins_disc then 20164 begin ins_disc:=false; 20165 if mode>0 then tail_append(new_disc); 20166 end; 20167 end 20168 20169@<Make a ligature node, if |ligature_present|;...@>= 20170wrapup(rt_hit) 20171 20172@ @<If the cursor is immediately followed by the right boundary...@>= 20173@^inner loop@> 20174if lig_stack=null then goto reswitch; 20175cur_q:=tail; cur_l:=character(lig_stack); 20176main_loop_move+1:if not is_char_node(lig_stack) then goto main_loop_move_lig; 20177main_loop_move+2:if(cur_chr<font_bc[main_f])or(cur_chr>font_ec[main_f]) then 20178 begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch; 20179 end; 20180main_i:=char_info(main_f)(cur_l); 20181if not char_exists(main_i) then 20182 begin char_warning(main_f,cur_chr); free_avail(lig_stack); goto big_switch; 20183 end; 20184link(tail):=lig_stack; tail:=lig_stack {|main_loop_lookahead| is next} 20185 20186@ Here we are at |main_loop_move_lig|. 20187When we begin this code we have |cur_q=tail| and |cur_l=character(lig_stack)|. 20188 20189@<Move the cursor past a pseudo-ligature...@>= 20190main_p:=lig_ptr(lig_stack); 20191if main_p>null then tail_append(main_p); {append a single character} 20192temp_ptr:=lig_stack; lig_stack:=link(temp_ptr); 20193free_node(temp_ptr,small_node_size); 20194main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20195if lig_stack=null then 20196 if main_p>null then goto main_loop_lookahead 20197 else cur_r:=bchar 20198else cur_r:=character(lig_stack); 20199goto main_lig_loop 20200 20201@ The result of \.{\\char} can participate in a ligature or kern, so we must 20202look ahead for it. 20203 20204@<Look ahead for another character...@>= 20205get_next; {set only |cur_cmd| and |cur_chr|, for speed} 20206if cur_cmd=letter then goto main_loop_lookahead+1; 20207if cur_cmd=other_char then goto main_loop_lookahead+1; 20208if cur_cmd=char_given then goto main_loop_lookahead+1; 20209x_token; {now expand and set |cur_cmd|, |cur_chr|, |cur_tok|} 20210if cur_cmd=letter then goto main_loop_lookahead+1; 20211if cur_cmd=other_char then goto main_loop_lookahead+1; 20212if cur_cmd=char_given then goto main_loop_lookahead+1; 20213if cur_cmd=char_num then 20214 begin scan_char_num; cur_chr:=cur_val; goto main_loop_lookahead+1; 20215 end; 20216if cur_cmd=no_boundary then bchar:=non_char; 20217cur_r:=bchar; lig_stack:=null; goto main_lig_loop; 20218main_loop_lookahead+1: adjust_space_factor; 20219fast_get_avail(lig_stack); font(lig_stack):=main_f; 20220cur_r:=qi(cur_chr); character(lig_stack):=cur_r; 20221if cur_r=false_bchar then cur_r:=non_char {this prevents spurious ligatures} 20222 20223@ Even though comparatively few characters have a lig/kern program, several 20224of the instructions here count as part of \TeX's inner loop, since a 20225@^inner loop@> 20226potentially long sequential search must be performed. For example, tests with 20227Computer Modern Roman showed that about 40 per cent of all characters 20228actually encountered in practice had a lig/kern program, and that about four 20229lig/kern commands were investigated for every such character. 20230 20231At the beginning of this code we have |main_i=char_info(main_f)(cur_l)|. 20232 20233@<If there's a ligature/kern command...@>= 20234if char_tag(main_i)<>lig_tag then goto main_loop_wrapup; 20235if cur_r=non_char then goto main_loop_wrapup; 20236main_k:=lig_kern_start(main_f)(main_i); main_j:=font_info[main_k].qqqq; 20237if skip_byte(main_j)<=stop_flag then goto main_lig_loop+2; 20238main_k:=lig_kern_restart(main_f)(main_j); 20239main_lig_loop+1:main_j:=font_info[main_k].qqqq; 20240main_lig_loop+2:if next_char(main_j)=cur_r then 20241 if skip_byte(main_j)<=stop_flag then 20242 @<Do ligature or kern command, returning to |main_lig_loop| 20243 or |main_loop_wrapup| or |main_loop_move|@>; 20244if skip_byte(main_j)=qi(0) then incr(main_k) 20245else begin if skip_byte(main_j)>=stop_flag then goto main_loop_wrapup; 20246 main_k:=main_k+qo(skip_byte(main_j))+1; 20247 end; 20248goto main_lig_loop+1 20249 20250@ When a ligature or kern instruction matches a character, we know from 20251|read_font_info| that the character exists in the font, even though we 20252haven't verified its existence in the normal way. 20253 20254This section could be made into a subroutine, if the code inside 20255|main_control| needs to be shortened. 20256 20257\chardef\?='174 % vertical line to indicate character retention 20258 20259@<Do ligature or kern command...@>= 20260begin if op_byte(main_j)>=kern_flag then 20261 begin wrapup(rt_hit); 20262 tail_append(new_kern(char_kern(main_f)(main_j))); goto main_loop_move; 20263 end; 20264if cur_l=non_char then lft_hit:=true 20265else if lig_stack=null then rt_hit:=true; 20266check_interrupt; {allow a way out in case there's an infinite ligature loop} 20267case op_byte(main_j) of 20268qi(1),qi(5):begin cur_l:=rem_byte(main_j); {\.{=:\?}, \.{=:\?>}} 20269 main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20270 end; 20271qi(2),qi(6):begin cur_r:=rem_byte(main_j); {\.{\?=:}, \.{\?=:>}} 20272 if lig_stack=null then {right boundary character is being consumed} 20273 begin lig_stack:=new_lig_item(cur_r); bchar:=non_char; 20274 end 20275 else if is_char_node(lig_stack) then {|link(lig_stack)=null|} 20276 begin main_p:=lig_stack; lig_stack:=new_lig_item(cur_r); 20277 lig_ptr(lig_stack):=main_p; 20278 end 20279 else character(lig_stack):=cur_r; 20280 end; 20281qi(3):begin cur_r:=rem_byte(main_j); {\.{\?=:\?}} 20282 main_p:=lig_stack; lig_stack:=new_lig_item(cur_r); 20283 link(lig_stack):=main_p; 20284 end; 20285qi(7),qi(11):begin wrapup(false); {\.{\?=:\?>}, \.{\?=:\?>>}} 20286 cur_q:=tail; cur_l:=rem_byte(main_j); 20287 main_i:=char_info(main_f)(cur_l); ligature_present:=true; 20288 end; 20289othercases begin cur_l:=rem_byte(main_j); ligature_present:=true; {\.{=:}} 20290 if lig_stack=null then goto main_loop_wrapup 20291 else goto main_loop_move+1; 20292 end 20293endcases; 20294if op_byte(main_j)>qi(4) then 20295 if op_byte(main_j)<>qi(7) then goto main_loop_wrapup; 20296if cur_l<non_char then goto main_lig_loop; 20297main_k:=bchar_label[main_f]; goto main_lig_loop+1; 20298end 20299 20300@ The occurrence of blank spaces is almost part of \TeX's inner loop, 20301@^inner loop@> 20302since we usually encounter about one space for every five non-blank characters. 20303Therefore |main_control| gives second-highest priority to ordinary spaces. 20304 20305When a glue parameter like \.{\\spaceskip} is set to `\.{0pt}', we will 20306see to it later that the corresponding glue specification is precisely 20307|zero_glue|, not merely a pointer to some specification that happens 20308to be full of zeroes. Therefore it is simple to test whether a glue parameter 20309is zero or~not. 20310 20311@<Append a normal inter-word space...@>= 20312if space_skip=zero_glue then 20313 begin @<Find the glue specification, |main_p|, for 20314 text spaces in the current font@>; 20315 temp_ptr:=new_glue(main_p); 20316 end 20317else temp_ptr:=new_param_glue(space_skip_code); 20318link(tail):=temp_ptr; tail:=temp_ptr; 20319goto big_switch 20320 20321@ Having |font_glue| allocated for each text font saves both time and memory. 20322If any of the three spacing parameters are subsequently changed by the 20323use of \.{\\fontdimen}, the |find_font_dimen| procedure deallocates the 20324|font_glue| specification allocated here. 20325 20326@<Find the glue specification...@>= 20327begin main_p:=font_glue[cur_font]; 20328if main_p=null then 20329 begin main_p:=new_spec(zero_glue); main_k:=param_base[cur_font]+space_code; 20330 width(main_p):=font_info[main_k].sc; {that's |space(cur_font)|} 20331 stretch(main_p):=font_info[main_k+1].sc; {and |space_stretch(cur_font)|} 20332 shrink(main_p):=font_info[main_k+2].sc; {and |space_shrink(cur_font)|} 20333 font_glue[cur_font]:=main_p; 20334 end; 20335end 20336 20337@ @<Declare act...@>= 20338procedure app_space; {handle spaces when |space_factor<>1000|} 20339var@!q:pointer; {glue node} 20340begin if (space_factor>=2000)and(xspace_skip<>zero_glue) then 20341 q:=new_param_glue(xspace_skip_code) 20342else begin if space_skip<>zero_glue then main_p:=space_skip 20343 else @<Find the glue specification...@>; 20344 main_p:=new_spec(main_p); 20345 @<Modify the glue specification in |main_p| according to the space factor@>; 20346 q:=new_glue(main_p); glue_ref_count(main_p):=null; 20347 end; 20348link(tail):=q; tail:=q; 20349end; 20350 20351@ @<Modify the glue specification in |main_p| according to the space factor@>= 20352if space_factor>=2000 then width(main_p):=width(main_p)+extra_space(cur_font); 20353stretch(main_p):=xn_over_d(stretch(main_p),space_factor,1000); 20354shrink(main_p):=xn_over_d(shrink(main_p),1000,space_factor) 20355 20356@ Whew---that covers the main loop. We can now proceed at a leisurely 20357pace through the other combinations of possibilities. 20358 20359@d any_mode(#)==vmode+#,hmode+#,mmode+# {for mode-independent commands} 20360 20361@<Cases of |main_control| that are not part of the inner loop@>= 20362any_mode(relax),vmode+spacer,mmode+spacer,mmode+no_boundary:do_nothing; 20363any_mode(ignore_spaces): begin @<Get the next non-blank non-call...@>; 20364 goto reswitch; 20365 end; 20366vmode+stop: if its_all_over then return; {this is the only way out} 20367@t\4@>@<Forbidden cases detected in |main_control|@>@+@,any_mode(mac_param): 20368 report_illegal_case; 20369@<Math-only cases in non-math modes, or vice versa@>: insert_dollar_sign; 20370@t\4@>@<Cases of |main_control| that build boxes and lists@>@; 20371@t\4@>@<Cases of |main_control| that don't depend on |mode|@>@; 20372@t\4@>@<Cases of |main_control| that are for extensions to \TeX@>@; 20373 20374@ Here is a list of cases where the user has probably gotten into or out of math 20375mode by mistake. \TeX\ will insert a dollar sign and rescan the current token. 20376 20377@d non_math(#)==vmode+#,hmode+# 20378 20379@<Math-only cases in non-math modes...@>= 20380non_math(sup_mark), non_math(sub_mark), non_math(math_char_num), 20381non_math(math_given), non_math(math_comp), non_math(delim_num), 20382non_math(left_right), non_math(above), non_math(radical), 20383non_math(math_style), non_math(math_choice), non_math(vcenter), 20384non_math(non_script), non_math(mkern), non_math(limit_switch), 20385non_math(mskip), non_math(math_accent), 20386mmode+endv, mmode+par_end, mmode+stop, mmode+vskip, mmode+un_vbox, 20387mmode+valign, mmode+hrule 20388 20389@ @<Declare action...@>= 20390procedure insert_dollar_sign; 20391begin back_input; cur_tok:=math_shift_token+"$"; 20392print_err("Missing $ inserted"); 20393@.Missing \$ inserted@> 20394help2("I've inserted a begin-math/end-math symbol since I think")@/ 20395("you left one out. Proceed, with fingers crossed."); ins_error; 20396end; 20397 20398@ When erroneous situations arise, \TeX\ usually issues an error message 20399specific to the particular error. For example, `\.{\\noalign}' should 20400not appear in any mode, since it is recognized by the |align_peek| routine 20401in all of its legitimate appearances; a special error message is given 20402when `\.{\\noalign}' occurs elsewhere. But sometimes the most appropriate 20403error message is simply that the user is not allowed to do what he or she 20404has attempted. For example, `\.{\\moveleft}' is allowed only in vertical mode, 20405and `\.{\\lower}' only in non-vertical modes. Such cases are enumerated 20406here and in the other sections referred to under `See also \dots.' 20407 20408@<Forbidden cases...@>= 20409vmode+vmove,hmode+hmove,mmode+hmove,any_mode(last_item), 20410 20411@ The `|you_cant|' procedure prints a line saying that the current command 20412is illegal in the current mode; it identifies these things symbolically. 20413 20414@<Declare action...@>= 20415procedure you_cant; 20416begin print_err("You can't use `"); 20417@.You can't use x in y mode@> 20418print_cmd_chr(cur_cmd,cur_chr); 20419print("' in "); print_mode(mode); 20420end; 20421 20422@ @<Declare act...@>= 20423procedure report_illegal_case; 20424begin you_cant; 20425help4("Sorry, but I'm not programmed to handle this case;")@/ 20426("I'll just pretend that you didn't ask for it.")@/ 20427("If you're in the wrong mode, you might be able to")@/ 20428("return to the right one by typing `I}' or `I$' or `I\par'.");@/ 20429error; 20430end; 20431 20432@ Some operations are allowed only in privileged modes, i.e., in cases 20433that |mode>0|. The |privileged| function is used to detect violations 20434of this rule; it issues an error message and returns |false| if the 20435current |mode| is negative. 20436 20437@<Declare act...@>= 20438function privileged:boolean; 20439begin if mode>0 then privileged:=true 20440else begin report_illegal_case; privileged:=false; 20441 end; 20442end; 20443 20444@ Either \.{\\dump} or \.{\\end} will cause |main_control| to enter the 20445endgame, since both of them have `|stop|' as their command code. 20446 20447@<Put each...@>= 20448primitive("end",stop,0);@/ 20449@!@:end_}{\.{\\end} primitive@> 20450primitive("dump",stop,1);@/ 20451@!@:dump_}{\.{\\dump} primitive@> 20452 20453@ @<Cases of |print_cmd_chr|...@>= 20454stop:if chr_code=1 then print_esc("dump")@+else print_esc("end"); 20455 20456@ We don't want to leave |main_control| immediately when a |stop| command 20457is sensed, because it may be necessary to invoke an \.{\\output} routine 20458several times before things really grind to a halt. (The output routine 20459might even say `\.{\\gdef\\end\{...\}}', to prolong the life of the job.) 20460Therefore |its_all_over| is |true| only when the current page 20461and contribution list are empty, and when the last output was not a 20462``dead cycle.'' 20463 20464@<Declare act...@>= 20465function its_all_over:boolean; {do this when \.{\\end} or \.{\\dump} occurs} 20466label exit; 20467begin if privileged then 20468 begin if (page_head=page_tail)and(head=tail)and(dead_cycles=0) then 20469 begin its_all_over:=true; return; 20470 end; 20471 back_input; {we will try to end again after ejecting residual material} 20472 tail_append(new_null_box); 20473 width(tail):=hsize; 20474 tail_append(new_glue(fill_glue)); 20475 tail_append(new_penalty(-@'10000000000));@/ 20476 build_page; {append \.{\\hbox to \\hsize\{\}\\vfill\\penalty-'10000000000}} 20477 end; 20478its_all_over:=false; 20479exit:end; 20480 20481@* \[47] Building boxes and lists. 20482The most important parts of |main_control| are concerned with \TeX's 20483chief mission of box-making. We need to control the activities that put 20484entries on vlists and hlists, as well as the activities that convert 20485those lists into boxes. All of the necessary machinery has already been 20486developed; it remains for us to ``push the buttons'' at the right times. 20487 20488@ As an introduction to these routines, let's consider one of the simplest 20489cases: What happens when `\.{\\hrule}' occurs in vertical mode, or 20490`\.{\\vrule}' in horizontal mode or math mode? The code in |main_control| 20491is short, since the |scan_rule_spec| routine already does most of what is 20492required; thus, there is no need for a special action procedure. 20493 20494Note that baselineskip calculations are disabled after a rule in vertical 20495mode, by setting |prev_depth:=ignore_depth|. 20496 20497@<Cases of |main_control| that build...@>= 20498vmode+hrule,hmode+vrule,mmode+vrule: begin tail_append(scan_rule_spec); 20499 if abs(mode)=vmode then prev_depth:=ignore_depth 20500 else if abs(mode)=hmode then space_factor:=1000; 20501 end; 20502 20503@ The processing of things like \.{\\hskip} and \.{\\vskip} is slightly 20504more complicated. But the code in |main_control| is very short, since 20505it simply calls on the action routine |append_glue|. Similarly, \.{\\kern} 20506activates |append_kern|. 20507 20508@<Cases of |main_control| that build...@>= 20509vmode+vskip,hmode+hskip,mmode+hskip,mmode+mskip: append_glue; 20510any_mode(kern),mmode+mkern: append_kern; 20511 20512@ The |hskip| and |vskip| command codes are used for control sequences 20513like \.{\\hss} and \.{\\vfil} as well as for \.{\\hskip} and \.{\\vskip}. 20514The difference is in the value of |cur_chr|. 20515 20516@d fil_code=0 {identifies \.{\\hfil} and \.{\\vfil}} 20517@d fill_code=1 {identifies \.{\\hfill} and \.{\\vfill}} 20518@d ss_code=2 {identifies \.{\\hss} and \.{\\vss}} 20519@d fil_neg_code=3 {identifies \.{\\hfilneg} and \.{\\vfilneg}} 20520@d skip_code=4 {identifies \.{\\hskip} and \.{\\vskip}} 20521@d mskip_code=5 {identifies \.{\\mskip}} 20522 20523@<Put each...@>= 20524primitive("hskip",hskip,skip_code);@/ 20525@!@:hskip_}{\.{\\hskip} primitive@> 20526primitive("hfil",hskip,fil_code); 20527@!@:hfil_}{\.{\\hfil} primitive@> 20528primitive("hfill",hskip,fill_code);@/ 20529@!@:hfill_}{\.{\\hfill} primitive@> 20530primitive("hss",hskip,ss_code); 20531@!@:hss_}{\.{\\hss} primitive@> 20532primitive("hfilneg",hskip,fil_neg_code);@/ 20533@!@:hfil_neg_}{\.{\\hfilneg} primitive@> 20534primitive("vskip",vskip,skip_code);@/ 20535@!@:vskip_}{\.{\\vskip} primitive@> 20536primitive("vfil",vskip,fil_code); 20537@!@:vfil_}{\.{\\vfil} primitive@> 20538primitive("vfill",vskip,fill_code);@/ 20539@!@:vfill_}{\.{\\vfill} primitive@> 20540primitive("vss",vskip,ss_code); 20541@!@:vss_}{\.{\\vss} primitive@> 20542primitive("vfilneg",vskip,fil_neg_code);@/ 20543@!@:vfil_neg_}{\.{\\vfilneg} primitive@> 20544primitive("mskip",mskip,mskip_code);@/ 20545@!@:mskip_}{\.{\\mskip} primitive@> 20546primitive("kern",kern,explicit); 20547@!@:kern_}{\.{\\kern} primitive@> 20548primitive("mkern",mkern,mu_glue);@/ 20549@!@:mkern_}{\.{\\mkern} primitive@> 20550 20551@ @<Cases of |print_cmd_chr|...@>= 20552hskip: case chr_code of 20553 skip_code:print_esc("hskip"); 20554 fil_code:print_esc("hfil"); 20555 fill_code:print_esc("hfill"); 20556 ss_code:print_esc("hss"); 20557 othercases print_esc("hfilneg") 20558 endcases; 20559vskip: case chr_code of 20560 skip_code:print_esc("vskip"); 20561 fil_code:print_esc("vfil"); 20562 fill_code:print_esc("vfill"); 20563 ss_code:print_esc("vss"); 20564 othercases print_esc("vfilneg") 20565 endcases; 20566mskip: print_esc("mskip"); 20567kern: print_esc("kern"); 20568mkern: print_esc("mkern"); 20569 20570@ All the work relating to glue creation has been relegated to the 20571following subroutine. It does not call |build_page|, because it is 20572used in at least one place where that would be a mistake. 20573 20574@<Declare action...@>= 20575procedure append_glue; 20576var s:small_number; {modifier of skip command} 20577begin s:=cur_chr; 20578case s of 20579fil_code: cur_val:=fil_glue; 20580fill_code: cur_val:=fill_glue; 20581ss_code: cur_val:=ss_glue; 20582fil_neg_code: cur_val:=fil_neg_glue; 20583skip_code: scan_glue(glue_val); 20584mskip_code: scan_glue(mu_val); 20585end; {now |cur_val| points to the glue specification} 20586tail_append(new_glue(cur_val)); 20587if s>=skip_code then 20588 begin decr(glue_ref_count(cur_val)); 20589 if s>skip_code then subtype(tail):=mu_glue; 20590 end; 20591end; 20592 20593@ @<Declare act...@>= 20594procedure append_kern; 20595var s:quarterword; {|subtype| of the kern node} 20596begin s:=cur_chr; scan_dimen(s=mu_glue,false,false); 20597tail_append(new_kern(cur_val)); subtype(tail):=s; 20598end; 20599 20600@ Many of the actions related to box-making are triggered by the appearance 20601of braces in the input. For example, when the user says `\.{\\hbox} 20602\.{to} \.{100pt\{$\langle\,\hbox{hlist}\,\rangle$\}}' in vertical mode, 20603the information about the box size (100pt, |exactly|) is put onto |save_stack| 20604with a level boundary word just above it, and |cur_group:=adjusted_hbox_group|; 20605\TeX\ enters restricted horizontal mode to process the hlist. The right 20606brace eventually causes |save_stack| to be restored to its former state, 20607at which time the information about the box size (100pt, |exactly|) is 20608available once again; a box is packaged and we leave restricted horizontal 20609mode, appending the new box to the current list of the enclosing mode 20610(in this case to the current list of vertical mode), followed by any 20611vertical adjustments that were removed from the box by |hpack|. 20612 20613The next few sections of the program are therefore concerned with the 20614treatment of left and right curly braces. 20615 20616@ If a left brace occurs in the middle of a page or paragraph, it simply 20617introduces a new level of grouping, and the matching right brace will not have 20618such a drastic effect. Such grouping affects neither the mode nor the 20619current list. 20620 20621@<Cases of |main_control| that build...@>= 20622non_math(left_brace): new_save_level(simple_group); 20623any_mode(begin_group): new_save_level(semi_simple_group); 20624any_mode(end_group): if cur_group=semi_simple_group then unsave 20625 else off_save; 20626 20627@ We have to deal with errors in which braces and such things are not 20628properly nested. Sometimes the user makes an error of commission by 20629inserting an extra symbol, but sometimes the user makes an error of omission. 20630\TeX\ can't always tell one from the other, so it makes a guess and tries 20631to avoid getting into a loop. 20632 20633The |off_save| routine is called when the current group code is wrong. It tries 20634to insert something into the user's input that will help clean off 20635the top level. 20636 20637@<Declare act...@>= 20638procedure off_save; 20639var p:pointer; {inserted token} 20640begin if cur_group=bottom_level then 20641 @<Drop current token and complain that it was unmatched@> 20642else begin back_input; p:=get_avail; link(temp_head):=p; 20643 print_err("Missing "); 20644 @<Prepare to insert a token that matches |cur_group|, 20645 and print what it is@>; 20646 print(" inserted"); ins_list(link(temp_head)); 20647 help5("I've inserted something that you may have forgotten.")@/ 20648 ("(See the <inserted text> above.)")@/ 20649 ("With luck, this will get me unwedged. But if you")@/ 20650 ("really didn't forget anything, try typing `2' now; then")@/ 20651 ("my insertion and my current dilemma will both disappear."); 20652 error; 20653 end; 20654end; 20655 20656@ At this point, |link(temp_head)=p|, a pointer to an empty one-word node. 20657 20658@<Prepare to insert a token that matches |cur_group|...@>= 20659case cur_group of 20660semi_simple_group: begin info(p):=cs_token_flag+frozen_end_group; 20661 print_esc("endgroup"); 20662@.Missing \\endgroup inserted@> 20663 end; 20664math_shift_group: begin info(p):=math_shift_token+"$"; print_char("$"); 20665@.Missing \$ inserted@> 20666 end; 20667math_left_group: begin info(p):=cs_token_flag+frozen_right; link(p):=get_avail; 20668 p:=link(p); info(p):=other_token+"."; print_esc("right."); 20669@.Missing \\right\hbox{.} inserted@> 20670@^null delimiter@> 20671 end; 20672othercases begin info(p):=right_brace_token+"}"; print_char("}"); 20673@.Missing \} inserted@> 20674 end 20675endcases 20676 20677@ @<Drop current token and complain that it was unmatched@>= 20678begin print_err("Extra "); print_cmd_chr(cur_cmd,cur_chr); 20679@.Extra x@> 20680help1("Things are pretty mixed up, but I think the worst is over.");@/ 20681error; 20682end 20683 20684@ The routine for a |right_brace| character branches into many subcases, 20685since a variety of things may happen, depending on |cur_group|. Some 20686types of groups are not supposed to be ended by a right brace; error 20687messages are given in hopes of pinpointing the problem. Most branches 20688of this routine will be filled in later, when we are ready to understand 20689them; meanwhile, we must prepare ourselves to deal with such errors. 20690 20691@<Cases of |main_control| that build...@>= 20692any_mode(right_brace): handle_right_brace; 20693 20694@ @<Declare the procedure called |handle_right_brace|@>= 20695procedure handle_right_brace; 20696var p,@!q:pointer; {for short-term use} 20697@!d:scaled; {holds |split_max_depth| in |insert_group|} 20698@!f:integer; {holds |floating_penalty| in |insert_group|} 20699begin case cur_group of 20700simple_group: unsave; 20701bottom_level: begin print_err("Too many }'s"); 20702@.Too many \}'s@> 20703 help2("You've closed more groups than you opened.")@/ 20704 ("Such booboos are generally harmless, so keep going."); error; 20705 end; 20706semi_simple_group,math_shift_group,math_left_group: extra_right_brace; 20707@t\4@>@<Cases of |handle_right_brace| where a |right_brace| triggers 20708 a delayed action@>@; 20709othercases confusion("rightbrace") 20710@:this can't happen rightbrace}{\quad rightbrace@> 20711endcases; 20712end; 20713 20714@ @<Declare act...@>= 20715procedure extra_right_brace; 20716begin print_err("Extra }, or forgotten "); 20717@.Extra \}, or forgotten x@> 20718case cur_group of 20719semi_simple_group: print_esc("endgroup"); 20720math_shift_group: print_char("$"); 20721math_left_group: print_esc("right"); 20722end;@/ 20723help5("I've deleted a group-closing symbol because it seems to be")@/ 20724("spurious, as in `$x}$'. But perhaps the } is legitimate and")@/ 20725("you forgot something else, as in `\hbox{$x}'. In such cases")@/ 20726("the way to recover is to insert both the forgotten and the")@/ 20727("deleted material, e.g., by typing `I$}'."); error; 20728incr(align_state); 20729end; 20730 20731@ Here is where we clear the parameters that are supposed to revert to their 20732default values after every paragraph and when internal vertical mode is entered. 20733 20734@<Declare act...@>= 20735procedure normal_paragraph; 20736begin if looseness<>0 then eq_word_define(int_base+looseness_code,0); 20737if hang_indent<>0 then eq_word_define(dimen_base+hang_indent_code,0); 20738if hang_after<>1 then eq_word_define(int_base+hang_after_code,1); 20739if par_shape_ptr<>null then eq_define(par_shape_loc,shape_ref,null); 20740end; 20741 20742@ Now let's turn to the question of how \.{\\hbox} is treated. We actually 20743need to consider also a slightly larger context, since constructions like 20744`\.{\\setbox3=}\penalty0\.{\\hbox...}' and 20745`\.{\\leaders}\penalty0\.{\\hbox...}' and 20746`\.{\\lower3.8pt\\hbox...}' 20747are supposed to invoke quite 20748different actions after the box has been packaged. Conversely, 20749constructions like `\.{\\setbox3=}' can be followed by a variety of 20750different kinds of boxes, and we would like to encode such things in an 20751efficient way. 20752 20753In other words, there are two problems: to represent the context of a box, 20754and to represent its type. 20755 20756The first problem is solved by putting a ``context code'' on the |save_stack|, 20757just below the two entries that give the dimensions produced by |scan_spec|. 20758The context code is either a (signed) shift amount, or it is a large 20759integer |>=box_flag|, where |box_flag=@t$2^{30}$@>|. Codes |box_flag| through 20760|box_flag+255| represent `\.{\\setbox0}' through `\.{\\setbox255}'; 20761codes |box_flag+256| through |box_flag+511| represent `\.{\\global\\setbox0}' 20762through `\.{\\global\\setbox255}'; 20763code |box_flag+512| represents `\.{\\shipout}'; and codes |box_flag+513| 20764through |box_flag+515| represent `\.{\\leaders}', `\.{\\cleaders}', 20765and `\.{\\xleaders}'. 20766 20767The second problem is solved by giving the command code |make_box| to all 20768control sequences that produce a box, and by using the following |chr_code| 20769values to distinguish between them: |box_code|, |copy_code|, |last_box_code|, 20770|vsplit_code|, |vtop_code|, |vtop_code+vmode|, and |vtop_code+hmode|, where 20771the latter two are used to denote \.{\\vbox} and \.{\\hbox}, respectively. 20772 20773@d box_flag==@'10000000000 {context code for `\.{\\setbox0}'} 20774@d ship_out_flag==box_flag+512 {context code for `\.{\\shipout}'} 20775@d leader_flag==box_flag+513 {context code for `\.{\\leaders}'} 20776@d box_code=0 {|chr_code| for `\.{\\box}'} 20777@d copy_code=1 {|chr_code| for `\.{\\copy}'} 20778@d last_box_code=2 {|chr_code| for `\.{\\lastbox}'} 20779@d vsplit_code=3 {|chr_code| for `\.{\\vsplit}'} 20780@d vtop_code=4 {|chr_code| for `\.{\\vtop}'} 20781 20782@<Put each...@>= 20783primitive("moveleft",hmove,1); 20784@!@:move_left_}{\.{\\moveleft} primitive@> 20785primitive("moveright",hmove,0);@/ 20786@!@:move_right_}{\.{\\moveright} primitive@> 20787primitive("raise",vmove,1); 20788@!@:raise_}{\.{\\raise} primitive@> 20789primitive("lower",vmove,0); 20790@!@:lower_}{\.{\\lower} primitive@> 20791@# 20792primitive("box",make_box,box_code); 20793@!@:box_}{\.{\\box} primitive@> 20794primitive("copy",make_box,copy_code); 20795@!@:copy_}{\.{\\copy} primitive@> 20796primitive("lastbox",make_box,last_box_code); 20797@!@:last_box_}{\.{\\lastbox} primitive@> 20798primitive("vsplit",make_box,vsplit_code); 20799@!@:vsplit_}{\.{\\vsplit} primitive@> 20800primitive("vtop",make_box,vtop_code);@/ 20801@!@:vtop_}{\.{\\vtop} primitive@> 20802primitive("vbox",make_box,vtop_code+vmode); 20803@!@:vbox_}{\.{\\vbox} primitive@> 20804primitive("hbox",make_box,vtop_code+hmode);@/ 20805@!@:hbox_}{\.{\\hbox} primitive@> 20806primitive("shipout",leader_ship,a_leaders-1); {|ship_out_flag=leader_flag-1|} 20807@!@:ship_out_}{\.{\\shipout} primitive@> 20808primitive("leaders",leader_ship,a_leaders); 20809@!@:leaders_}{\.{\\leaders} primitive@> 20810primitive("cleaders",leader_ship,c_leaders); 20811@!@:c_leaders_}{\.{\\cleaders} primitive@> 20812primitive("xleaders",leader_ship,x_leaders); 20813@!@:x_leaders_}{\.{\\xleaders} primitive@> 20814 20815@ @<Cases of |print_cmd_chr|...@>= 20816hmove: if chr_code=1 then print_esc("moveleft")@+else print_esc("moveright"); 20817vmove: if chr_code=1 then print_esc("raise")@+else print_esc("lower"); 20818make_box: case chr_code of 20819 box_code: print_esc("box"); 20820 copy_code: print_esc("copy"); 20821 last_box_code: print_esc("lastbox"); 20822 vsplit_code: print_esc("vsplit"); 20823 vtop_code: print_esc("vtop"); 20824 vtop_code+vmode: print_esc("vbox"); 20825 othercases print_esc("hbox") 20826 endcases; 20827leader_ship: if chr_code=a_leaders then print_esc("leaders") 20828 else if chr_code=c_leaders then print_esc("cleaders") 20829 else if chr_code=x_leaders then print_esc("xleaders") 20830 else print_esc("shipout"); 20831 20832@ Constructions that require a box are started by calling |scan_box| with 20833a specified context code. The |scan_box| routine verifies 20834that a |make_box| command comes next and then it calls |begin_box|. 20835 20836@<Cases of |main_control| that build...@>= 20837vmode+hmove,hmode+vmove,mmode+vmove: begin t:=cur_chr; 20838 scan_normal_dimen; 20839 if t=0 then scan_box(cur_val)@+else scan_box(-cur_val); 20840 end; 20841any_mode(leader_ship): scan_box(leader_flag-a_leaders+cur_chr); 20842any_mode(make_box): begin_box(0); 20843 20844@ The global variable |cur_box| will point to a newly made box. If the box 20845is void, we will have |cur_box=null|. Otherwise we will have 20846|type(cur_box)=hlist_node| or |vlist_node| or |rule_node|; the |rule_node| 20847case can occur only with leaders. 20848 20849@<Glob...@>= 20850@!cur_box:pointer; {box to be placed into its context} 20851 20852@ The |box_end| procedure does the right thing with |cur_box|, if 20853|box_context| represents the context as explained above. 20854 20855@<Declare act...@>= 20856procedure box_end(@!box_context:integer); 20857var p:pointer; {|ord_noad| for new box in math mode} 20858begin if box_context<box_flag then @<Append box |cur_box| to the current list, 20859 shifted by |box_context|@> 20860else if box_context<ship_out_flag then @<Store \(c)|cur_box| in a box register@> 20861else if cur_box<>null then 20862 if box_context>ship_out_flag then @<Append a new leader node that 20863 uses |cur_box|@> 20864 else ship_out(cur_box); 20865end; 20866 20867@ The global variable |adjust_tail| will be non-null if and only if the 20868current box might include adjustments that should be appended to the 20869current vertical list. 20870 20871@<Append box |cur_box| to the current...@>= 20872begin if cur_box<>null then 20873 begin shift_amount(cur_box):=box_context; 20874 if abs(mode)=vmode then 20875 begin append_to_vlist(cur_box); 20876 if adjust_tail<>null then 20877 begin if adjust_head<>adjust_tail then 20878 begin link(tail):=link(adjust_head); tail:=adjust_tail; 20879 end; 20880 adjust_tail:=null; 20881 end; 20882 if mode>0 then build_page; 20883 end 20884 else begin if abs(mode)=hmode then space_factor:=1000 20885 else begin p:=new_noad; 20886 math_type(nucleus(p)):=sub_box; 20887 info(nucleus(p)):=cur_box; cur_box:=p; 20888 end; 20889 link(tail):=cur_box; tail:=cur_box; 20890 end; 20891 end; 20892end 20893 20894@ @<Store \(c)|cur_box| in a box register@>= 20895if box_context<box_flag+256 then 20896 eq_define(box_base-box_flag+box_context,box_ref,cur_box) 20897else geq_define(box_base-box_flag-256+box_context,box_ref,cur_box) 20898 20899@ @<Append a new leader node ...@>= 20900begin @<Get the next non-blank non-relax...@>; 20901if ((cur_cmd=hskip)and(abs(mode)<>vmode))or@| 20902 ((cur_cmd=vskip)and(abs(mode)=vmode)) then 20903 begin append_glue; subtype(tail):=box_context-(leader_flag-a_leaders); 20904 leader_ptr(tail):=cur_box; 20905 end 20906else begin print_err("Leaders not followed by proper glue"); 20907@.Leaders not followed by...@> 20908 help3("You should say `\leaders <box or rule><hskip or vskip>'.")@/ 20909 ("I found the <box or rule>, but there's no suitable")@/ 20910 ("<hskip or vskip>, so I'm ignoring these leaders."); back_error; 20911 flush_node_list(cur_box); 20912 end; 20913end 20914 20915@ Now that we can see what eventually happens to boxes, we can consider 20916the first steps in their creation. The |begin_box| routine is called when 20917|box_context| is a context specification, |cur_chr| specifies the type of 20918box desired, and |cur_cmd=make_box|. 20919 20920@<Declare act...@>= 20921procedure begin_box(@!box_context:integer); 20922label exit, done; 20923var @!p,@!q:pointer; {run through the current list} 20924@!m:quarterword; {the length of a replacement list} 20925@!k:halfword; {0 or |vmode| or |hmode|} 20926@!n:eight_bits; {a box number} 20927begin case cur_chr of 20928box_code: begin scan_eight_bit_int; cur_box:=box(cur_val); 20929 box(cur_val):=null; {the box becomes void, at the same level} 20930 end; 20931copy_code: begin scan_eight_bit_int; cur_box:=copy_node_list(box(cur_val)); 20932 end; 20933last_box_code: @<If the current list ends with a box node, delete it from 20934 the list and make |cur_box| point to it; otherwise set |cur_box:=null|@>; 20935vsplit_code: @<Split off part of a vertical box, make |cur_box| point to it@>; 20936othercases @<Initiate the construction of an hbox or vbox, then |return|@> 20937endcases;@/ 20938box_end(box_context); {in simple cases, we use the box immediately} 20939exit:end; 20940 20941@ Note that the condition |not is_char_node(tail)| implies that |head<>tail|, 20942since |head| is a one-word node. 20943 20944@<If the current list ends with a box node, delete it...@>= 20945begin cur_box:=null; 20946if abs(mode)=mmode then 20947 begin you_cant; help1("Sorry; this \lastbox will be void."); error; 20948 end 20949else if (mode=vmode)and(head=tail) then 20950 begin you_cant; 20951 help2("Sorry...I usually can't take things from the current page.")@/ 20952 ("This \lastbox will therefore be void."); error; 20953 end 20954else begin if not is_char_node(tail) then 20955 if (type(tail)=hlist_node)or(type(tail)=vlist_node) then 20956 @<Remove the last box, unless it's part of a discretionary@>; 20957 end; 20958end 20959 20960@ @<Remove the last box...@>= 20961begin q:=head; 20962repeat p:=q; 20963if not is_char_node(q) then if type(q)=disc_node then 20964 begin for m:=1 to replace_count(q) do p:=link(p); 20965 if p=tail then goto done; 20966 end; 20967q:=link(p); 20968until q=tail; 20969cur_box:=tail; shift_amount(cur_box):=0; 20970tail:=p; link(p):=null; 20971done:end 20972 20973@ Here we deal with things like `\.{\\vsplit 13 to 100pt}'. 20974 20975@<Split off part of a vertical box, make |cur_box| point to it@>= 20976begin scan_eight_bit_int; n:=cur_val; 20977if not scan_keyword("to") then 20978@.to@> 20979 begin print_err("Missing `to' inserted"); 20980@.Missing `to' inserted@> 20981 help2("I'm working on `\vsplit<box number> to <dimen>';")@/ 20982 ("will look for the <dimen> next."); error; 20983 end; 20984scan_normal_dimen; 20985cur_box:=vsplit(n,cur_val); 20986end 20987 20988@ Here is where we enter restricted horizontal mode or internal vertical 20989mode, in order to make a box. 20990 20991@<Initiate the construction of an hbox or vbox, then |return|@>= 20992begin k:=cur_chr-vtop_code; saved(0):=box_context; 20993if k=hmode then 20994 if (box_context<box_flag)and(abs(mode)=vmode) then 20995 scan_spec(adjusted_hbox_group,true) 20996 else scan_spec(hbox_group,true) 20997else begin if k=vmode then scan_spec(vbox_group,true) 20998 else begin scan_spec(vtop_group,true); k:=vmode; 20999 end; 21000 normal_paragraph; 21001 end; 21002push_nest; mode:=-k; 21003if k=vmode then 21004 begin prev_depth:=ignore_depth; 21005 if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text); 21006 end 21007else begin space_factor:=1000; 21008 if every_hbox<>null then begin_token_list(every_hbox,every_hbox_text); 21009 end; 21010return; 21011end 21012 21013@ @<Declare act...@>= 21014procedure scan_box(@!box_context:integer); 21015 {the next input should specify a box or perhaps a rule} 21016begin @<Get the next non-blank non-relax...@>; 21017if cur_cmd=make_box then begin_box(box_context) 21018else if (box_context>=leader_flag)and((cur_cmd=hrule)or(cur_cmd=vrule)) then 21019 begin cur_box:=scan_rule_spec; box_end(box_context); 21020 end 21021else begin@t@>@;@/ 21022 print_err("A <box> was supposed to be here");@/ 21023@.A <box> was supposed to...@> 21024 help3("I was expecting to see \hbox or \vbox or \copy or \box or")@/ 21025 ("something like that. So you might find something missing in")@/ 21026 ("your output. But keep trying; you can fix this later."); back_error; 21027 end; 21028end; 21029 21030@ When the right brace occurs at the end of an \.{\\hbox} or \.{\\vbox} or 21031\.{\\vtop} construction, the |package| routine comes into action. We might 21032also have to finish a paragraph that hasn't ended. 21033 21034@<Cases of |handle...@>= 21035hbox_group: package(0); 21036adjusted_hbox_group: begin adjust_tail:=adjust_head; package(0); 21037 end; 21038vbox_group: begin end_graf; package(0); 21039 end; 21040vtop_group: begin end_graf; package(vtop_code); 21041 end; 21042 21043@ @<Declare action...@>= 21044procedure package(@!c:small_number); 21045var h:scaled; {height of box} 21046@!p:pointer; {first node in a box} 21047@!d:scaled; {max depth} 21048begin d:=box_max_depth; unsave; save_ptr:=save_ptr-3; 21049if mode=-hmode then cur_box:=hpack(link(head),saved(2),saved(1)) 21050else begin cur_box:=vpackage(link(head),saved(2),saved(1),d); 21051 if c=vtop_code then @<Readjust the height and depth of |cur_box|, 21052 for \.{\\vtop}@>; 21053 end; 21054pop_nest; box_end(saved(0)); 21055end; 21056 21057@ The height of a `\.{\\vtop}' box is inherited from the first item on its list, 21058if that item is an |hlist_node|, |vlist_node|, or |rule_node|; otherwise 21059the \.{\\vtop} height is zero. 21060 21061 21062@<Readjust the height...@>= 21063begin h:=0; p:=list_ptr(cur_box); 21064if p<>null then if type(p)<=rule_node then h:=height(p); 21065depth(cur_box):=depth(cur_box)-h+height(cur_box); height(cur_box):=h; 21066end 21067 21068@ A paragraph begins when horizontal-mode material occurs in vertical mode, 21069or when the paragraph is explicitly started by `\.{\\indent}' or 21070`\.{\\noindent}'. 21071 21072@<Put each...@>= 21073primitive("indent",start_par,1); 21074@!@:indent_}{\.{\\indent} primitive@> 21075primitive("noindent",start_par,0); 21076@!@:no_indent_}{\.{\\noindent} primitive@> 21077 21078@ @<Cases of |print_cmd_chr|...@>= 21079start_par: if chr_code=0 then print_esc("noindent")@+ else print_esc("indent"); 21080 21081@ @<Cases of |main_control| that build...@>= 21082vmode+start_par: new_graf(cur_chr>0); 21083vmode+letter,vmode+other_char,vmode+char_num,vmode+char_given, 21084 vmode+math_shift,vmode+un_hbox,vmode+vrule, 21085 vmode+accent,vmode+discretionary,vmode+hskip,vmode+valign, 21086 vmode+ex_space,vmode+no_boundary:@t@>@;@/ 21087 begin back_input; new_graf(true); 21088 end; 21089 21090@ @<Declare act...@>= 21091function norm_min(@!h:integer):small_number; 21092begin if h<=0 then norm_min:=1@+else if h>=63 then norm_min:=63@+ 21093else norm_min:=h; 21094end; 21095@# 21096procedure new_graf(@!indented:boolean); 21097begin prev_graf:=0; 21098if (mode=vmode)or(head<>tail) then 21099 tail_append(new_param_glue(par_skip_code)); 21100push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang; 21101prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min)) 21102 *@'200000+cur_lang; 21103if indented then 21104 begin tail:=new_null_box; link(head):=tail; width(tail):=par_indent;@+ 21105 end; 21106if every_par<>null then begin_token_list(every_par,every_par_text); 21107if nest_ptr=1 then build_page; {put |par_skip| glue on current page} 21108end; 21109 21110@ @<Cases of |main_control| that build...@>= 21111hmode+start_par,mmode+start_par: indent_in_hmode; 21112 21113@ @<Declare act...@>= 21114procedure indent_in_hmode; 21115var p,@!q:pointer; 21116begin if cur_chr>0 then {\.{\\indent}} 21117 begin p:=new_null_box; width(p):=par_indent; 21118 if abs(mode)=hmode then space_factor:=1000 21119 else begin q:=new_noad; math_type(nucleus(q)):=sub_box; 21120 info(nucleus(q)):=p; p:=q; 21121 end; 21122 tail_append(p); 21123 end; 21124end; 21125 21126@ A paragraph ends when a |par_end| command is sensed, or when we are in 21127horizontal mode when reaching the right brace of vertical-mode routines 21128like \.{\\vbox}, \.{\\insert}, or \.{\\output}. 21129 21130@<Cases of |main_control| that build...@>= 21131vmode+par_end: begin normal_paragraph; 21132 if mode>0 then build_page; 21133 end; 21134hmode+par_end: begin if align_state<0 then off_save; {this tries to 21135 recover from an alignment that didn't end properly} 21136 end_graf; {this takes us to the enclosing mode, if |mode>0|} 21137 if mode=vmode then build_page; 21138 end; 21139hmode+stop,hmode+vskip,hmode+hrule,hmode+un_vbox,hmode+halign: head_for_vmode; 21140 21141@ @<Declare act...@>= 21142procedure head_for_vmode; 21143begin if mode<0 then 21144 if cur_cmd<>hrule then off_save 21145 else begin print_err("You can't use `"); 21146 print_esc("hrule"); print("' here except with leaders"); 21147@.You can't use \\hrule...@> 21148 help2("To put a horizontal rule in an hbox or an alignment,")@/ 21149 ("you should use \leaders or \hrulefill (see The TeXbook)."); 21150 error; 21151 end 21152else begin back_input; cur_tok:=par_token; back_input; token_type:=inserted; 21153 end; 21154end; 21155 21156@ @<Declare act...@>= 21157procedure end_graf; 21158begin if mode=hmode then 21159 begin if head=tail then pop_nest {null paragraphs are ignored} 21160 else line_break(widow_penalty); 21161 normal_paragraph; 21162 error_count:=0; 21163 end; 21164end; 21165 21166@ Insertion and adjustment and mark nodes are constructed by the following 21167pieces of the program. 21168 21169@<Cases of |main_control| that build...@>= 21170any_mode(insert),hmode+vadjust,mmode+vadjust: begin_insert_or_adjust; 21171any_mode(mark): make_mark; 21172 21173@ @<Forbidden...@>= 21174vmode+vadjust, 21175 21176@ @<Declare act...@>= 21177procedure begin_insert_or_adjust; 21178begin if cur_cmd=vadjust then cur_val:=255 21179else begin scan_eight_bit_int; 21180 if cur_val=255 then 21181 begin print_err("You can't "); print_esc("insert"); print_int(255); 21182@.You can't \\insert255@> 21183 help1("I'm changing to \insert0; box 255 is special."); 21184 error; cur_val:=0; 21185 end; 21186 end; 21187saved(0):=cur_val; incr(save_ptr); 21188new_save_level(insert_group); scan_left_brace; normal_paragraph; 21189push_nest; mode:=-vmode; prev_depth:=ignore_depth; 21190end; 21191 21192@ @<Cases of |handle...@>= 21193insert_group: begin end_graf; q:=split_top_skip; add_glue_ref(q); 21194 d:=split_max_depth; f:=floating_penalty; unsave; decr(save_ptr); 21195 {now |saved(0)| is the insertion number, or 255 for |vadjust|} 21196 p:=vpack(link(head),natural); pop_nest; 21197 if saved(0)<255 then 21198 begin tail_append(get_node(ins_node_size)); 21199 type(tail):=ins_node; subtype(tail):=qi(saved(0)); 21200 height(tail):=height(p)+depth(p); ins_ptr(tail):=list_ptr(p); 21201 split_top_ptr(tail):=q; depth(tail):=d; float_cost(tail):=f; 21202 end 21203 else begin tail_append(get_node(small_node_size)); 21204 type(tail):=adjust_node;@/ 21205 subtype(tail):=0; {the |subtype| is not used} 21206 adjust_ptr(tail):=list_ptr(p); delete_glue_ref(q); 21207 end; 21208 free_node(p,box_node_size); 21209 if nest_ptr=0 then build_page; 21210 end; 21211output_group: @<Resume the page builder...@>; 21212 21213@ @<Declare act...@>= 21214procedure make_mark; 21215var p:pointer; {new node} 21216begin p:=scan_toks(false,true); p:=get_node(small_node_size); 21217type(p):=mark_node; subtype(p):=0; {the |subtype| is not used} 21218mark_ptr(p):=def_ref; link(tail):=p; tail:=p; 21219end; 21220 21221@ Penalty nodes get into a list via the |break_penalty| command. 21222@^penalties@> 21223 21224@<Cases of |main_control| that build...@>= 21225any_mode(break_penalty): append_penalty; 21226 21227@ @<Declare action...@>= 21228procedure append_penalty; 21229begin scan_int; tail_append(new_penalty(cur_val)); 21230if mode=vmode then build_page; 21231end; 21232 21233@ The |remove_item| command removes a penalty, kern, or glue node if it 21234appears at the tail of the current list, using a brute-force linear scan. 21235Like \.{\\lastbox}, this command is not allowed in vertical mode (except 21236internal vertical mode), since the current list in vertical mode is sent 21237to the page builder. But if we happen to be able to implement it in 21238vertical mode, we do. 21239 21240@<Cases of |main_control| that build...@>= 21241any_mode(remove_item): delete_last; 21242 21243@ When |delete_last| is called, |cur_chr| is the |type| of node that 21244will be deleted, if present. 21245 21246@<Declare action...@>= 21247procedure delete_last; 21248label exit; 21249var @!p,@!q:pointer; {run through the current list} 21250@!m:quarterword; {the length of a replacement list} 21251begin if (mode=vmode)and(tail=head) then 21252 @<Apologize for inability to do the operation now, 21253 unless \.{\\unskip} follows non-glue@> 21254else begin if not is_char_node(tail) then if type(tail)=cur_chr then 21255 begin q:=head; 21256 repeat p:=q; 21257 if not is_char_node(q) then if type(q)=disc_node then 21258 begin for m:=1 to replace_count(q) do p:=link(p); 21259 if p=tail then return; 21260 end; 21261 q:=link(p); 21262 until q=tail; 21263 link(p):=null; flush_node_list(tail); tail:=p; 21264 end; 21265 end; 21266exit:end; 21267 21268@ @<Apologize for inability to do the operation...@>= 21269begin if (cur_chr<>glue_node)or(last_glue<>max_halfword) then 21270 begin you_cant; 21271 help2("Sorry...I usually can't take things from the current page.")@/ 21272 ("Try `I\vskip-\lastskip' instead."); 21273 if cur_chr=kern_node then help_line[0]:= 21274 ("Try `I\kern-\lastkern' instead.") 21275 else if cur_chr<>glue_node then help_line[0]:=@| 21276 ("Perhaps you can make the output routine do it."); 21277 error; 21278 end; 21279end 21280 21281@ @<Put each...@>= 21282primitive("unpenalty",remove_item,penalty_node);@/ 21283@!@:un_penalty_}{\.{\\unpenalty} primitive@> 21284primitive("unkern",remove_item,kern_node);@/ 21285@!@:un_kern_}{\.{\\unkern} primitive@> 21286primitive("unskip",remove_item,glue_node);@/ 21287@!@:un_skip_}{\.{\\unskip} primitive@> 21288primitive("unhbox",un_hbox,box_code);@/ 21289@!@:un_hbox_}{\.{\\unhbox} primitive@> 21290primitive("unhcopy",un_hbox,copy_code);@/ 21291@!@:un_hcopy_}{\.{\\unhcopy} primitive@> 21292primitive("unvbox",un_vbox,box_code);@/ 21293@!@:un_vbox_}{\.{\\unvbox} primitive@> 21294primitive("unvcopy",un_vbox,copy_code);@/ 21295@!@:un_vcopy_}{\.{\\unvcopy} primitive@> 21296 21297@ @<Cases of |print_cmd_chr|...@>= 21298remove_item: if chr_code=glue_node then print_esc("unskip") 21299 else if chr_code=kern_node then print_esc("unkern") 21300 else print_esc("unpenalty"); 21301un_hbox: if chr_code=copy_code then print_esc("unhcopy") 21302 else print_esc("unhbox"); 21303un_vbox: if chr_code=copy_code then print_esc("unvcopy") 21304 else print_esc("unvbox"); 21305 21306@ The |un_hbox| and |un_vbox| commands unwrap one of the 256 current boxes. 21307 21308@<Cases of |main_control| that build...@>= 21309vmode+un_vbox,hmode+un_hbox,mmode+un_hbox: unpackage; 21310 21311@ @<Declare act...@>= 21312procedure unpackage; 21313label exit; 21314var p:pointer; {the box} 21315@!c:box_code..copy_code; {should we copy?} 21316begin c:=cur_chr; scan_eight_bit_int; p:=box(cur_val); 21317if p=null then return; 21318if (abs(mode)=mmode)or((abs(mode)=vmode)and(type(p)<>vlist_node))or@| 21319 ((abs(mode)=hmode)and(type(p)<>hlist_node)) then 21320 begin print_err("Incompatible list can't be unboxed"); 21321@.Incompatible list...@> 21322 help3("Sorry, Pandora. (You sneaky devil.)")@/ 21323 ("I refuse to unbox an \hbox in vertical mode or vice versa.")@/ 21324 ("And I can't open any boxes in math mode.");@/ 21325 error; return; 21326 end; 21327if c=copy_code then link(tail):=copy_node_list(list_ptr(p)) 21328else begin link(tail):=list_ptr(p); box(cur_val):=null; 21329 free_node(p,box_node_size); 21330 end; 21331while link(tail)<>null do tail:=link(tail); 21332exit:end; 21333 21334@ @<Forbidden...@>=vmode+ital_corr, 21335 21336@ Italic corrections are converted to kern nodes when the |ital_corr| command 21337follows a character. In math mode the same effect is achieved by appending 21338a kern of zero here, since italic corrections are supplied later. 21339 21340@<Cases of |main_control| that build...@>= 21341hmode+ital_corr: append_italic_correction; 21342mmode+ital_corr: tail_append(new_kern(0)); 21343 21344@ @<Declare act...@>= 21345procedure append_italic_correction; 21346label exit; 21347var p:pointer; {|char_node| at the tail of the current list} 21348@!f:internal_font_number; {the font in the |char_node|} 21349begin if tail<>head then 21350 begin if is_char_node(tail) then p:=tail 21351 else if type(tail)=ligature_node then p:=lig_char(tail) 21352 else return; 21353 f:=font(p); 21354 tail_append(new_kern(char_italic(f)(char_info(f)(character(p))))); 21355 subtype(tail):=explicit; 21356 end; 21357exit:end; 21358 21359@ Discretionary nodes are easy in the common case `\.{\\-}', but in the 21360general case we must process three braces full of items. 21361 21362@<Put each...@>= 21363primitive("-",discretionary,1); 21364@!@:Single-character primitives -}{\quad\.{\\-}@> 21365primitive("discretionary",discretionary,0); 21366@!@:discretionary_}{\.{\\discretionary} primitive@> 21367 21368@ @<Cases of |print_cmd_chr|...@>= 21369discretionary: if chr_code=1 then 21370 print_esc("-")@+else print_esc("discretionary"); 21371 21372@ @<Cases of |main_control| that build...@>= 21373hmode+discretionary,mmode+discretionary: append_discretionary; 21374 21375@ The space factor does not change when we append a discretionary node, 21376but it starts out as 1000 in the subsidiary lists. 21377 21378@<Declare act...@>= 21379procedure append_discretionary; 21380var c:integer; {hyphen character} 21381begin tail_append(new_disc); 21382if cur_chr=1 then 21383 begin c:=hyphen_char[cur_font]; 21384 if c>=0 then if c<256 then pre_break(tail):=new_character(cur_font,c); 21385 end 21386else begin incr(save_ptr); saved(-1):=0; new_save_level(disc_group); 21387 scan_left_brace; push_nest; mode:=-hmode; space_factor:=1000; 21388 end; 21389end; 21390 21391@ The three discretionary lists are constructed somewhat as if they were 21392hboxes. A~subroutine called |build_discretionary| handles the transitions. 21393(This is sort of fun.) 21394 21395@<Cases of |handle...@>= 21396disc_group: build_discretionary; 21397 21398@ @<Declare act...@>= 21399procedure build_discretionary; 21400label done,exit; 21401var p,@!q:pointer; {for link manipulation} 21402@!n:integer; {length of discretionary list} 21403begin unsave; 21404@<Prune the current list, if necessary, until it contains only 21405 |char_node|, |kern_node|, |hlist_node|, |vlist_node|, |rule_node|, 21406 and |ligature_node| items; set |n| to the length of the list, 21407 and set |q| to the list's tail@>; 21408p:=link(head); pop_nest; 21409case saved(-1) of 214100:pre_break(tail):=p; 214111:post_break(tail):=p; 214122:@<Attach list |p| to the current list, and record its length; 21413 then finish up and |return|@>; 21414end; {there are no other cases} 21415incr(saved(-1)); new_save_level(disc_group); scan_left_brace; 21416push_nest; mode:=-hmode; space_factor:=1000; 21417exit:end; 21418 21419@ @<Attach list |p| to the current...@>= 21420begin if (n>0)and(abs(mode)=mmode) then 21421 begin print_err("Illegal math "); print_esc("discretionary"); 21422@.Illegal math \\disc...@> 21423 help2("Sorry: The third part of a discretionary break must be")@/ 21424 ("empty, in math formulas. I had to delete your third part."); 21425 flush_node_list(p); n:=0; error; 21426 end 21427else link(tail):=p; 21428if n<=max_quarterword then replace_count(tail):=n 21429else begin print_err("Discretionary list is too long"); 21430@.Discretionary list is too long@> 21431 help2("Wow---I never thought anybody would tweak me here.")@/ 21432 ("You can't seriously need such a huge discretionary list?"); 21433 error; 21434 end; 21435if n>0 then tail:=q; 21436decr(save_ptr); return; 21437end 21438 21439@ During this loop, |p=link(q)| and there are |n| items preceding |p|. 21440 21441@<Prune the current list, if necessary...@>= 21442q:=head; p:=link(q); n:=0; 21443while p<>null do 21444 begin if not is_char_node(p) then if type(p)>rule_node then 21445 if type(p)<>kern_node then if type(p)<>ligature_node then 21446 begin print_err("Improper discretionary list"); 21447@.Improper discretionary list@> 21448 help1("Discretionary lists must contain only boxes and kerns.");@/ 21449 error; 21450 begin_diagnostic; 21451 print_nl("The following discretionary sublist has been deleted:"); 21452@.The following...deleted@> 21453 show_box(p); 21454 end_diagnostic(true); 21455 flush_node_list(p); link(q):=null; goto done; 21456 end; 21457 q:=p; p:=link(q); incr(n); 21458 end; 21459done: 21460 21461@ We need only one more thing to complete the horizontal mode routines, namely 21462the \.{\\accent} primitive. 21463 21464@<Cases of |main_control| that build...@>= 21465hmode+accent: make_accent; 21466 21467@ The positioning of accents is straightforward but tedious. Given an accent 21468of width |a|, designed for characters of height |x| and slant |s|; 21469and given a character of width |w|, height |h|, and slant |t|: We will shift 21470the accent down by |x-h|, and we will insert kern nodes that have the effect of 21471centering the accent over the character and shifting the accent to the 21472right by $\delta={1\over2}(w-a)+h\cdot t-x\cdot s$. If either character is 21473absent from the font, we will simply use the other, without shifting. 21474 21475@<Declare act...@>= 21476procedure make_accent; 21477var s,@!t: real; {amount of slant} 21478@!p,@!q,@!r:pointer; {character, box, and kern nodes} 21479@!f:internal_font_number; {relevant font} 21480@!a,@!h,@!x,@!w,@!delta:scaled; {heights and widths, as explained above} 21481@!i:four_quarters; {character information} 21482begin scan_char_num; f:=cur_font; p:=new_character(f,cur_val); 21483if p<>null then 21484 begin x:=x_height(f); s:=slant(f)/float_constant(65536); 21485@^real division@> 21486 a:=char_width(f)(char_info(f)(character(p)));@/ 21487 do_assignments;@/ 21488 @<Create a character node |q| for the next character, 21489 but set |q:=null| if problems arise@>; 21490 if q<>null then @<Append the accent with appropriate kerns, 21491 then set |p:=q|@>; 21492 link(tail):=p; tail:=p; space_factor:=1000; 21493 end; 21494end; 21495 21496@ @<Create a character node |q| for the next...@>= 21497q:=null; f:=cur_font; 21498if (cur_cmd=letter)or(cur_cmd=other_char)or(cur_cmd=char_given) then 21499 q:=new_character(f,cur_chr) 21500else if cur_cmd=char_num then 21501 begin scan_char_num; q:=new_character(f,cur_val); 21502 end 21503else back_input 21504 21505@ The kern nodes appended here must be distinguished from other kerns, lest 21506they be wiped away by the hyphenation algorithm or by a previous line break. 21507 21508The two kerns are computed with (machine-dependent) |real| arithmetic, but 21509their sum is machine-independent; the net effect is machine-independent, 21510because the user cannot remove these nodes nor access them via \.{\\lastkern}. 21511 21512@<Append the accent with appropriate kerns...@>= 21513begin t:=slant(f)/float_constant(65536); 21514@^real division@> 21515i:=char_info(f)(character(q)); 21516w:=char_width(f)(i); h:=char_height(f)(height_depth(i)); 21517if h<>x then {the accent must be shifted up or down} 21518 begin p:=hpack(p,natural); shift_amount(p):=x-h; 21519 end; 21520delta:=round((w-a)/float_constant(2)+h*t-x*s); 21521@^real multiplication@> 21522@^real addition@> 21523r:=new_kern(delta); subtype(r):=acc_kern; link(tail):=r; link(r):=p; 21524tail:=new_kern(-a-delta); subtype(tail):=acc_kern; link(p):=tail; p:=q; 21525end 21526 21527@ When `\.{\\cr}' or `\.{\\span}' or a tab mark comes through the scanner 21528into |main_control|, it might be that the user has foolishly inserted 21529one of them into something that has nothing to do with alignment. But it is 21530far more likely that a left brace or right brace has been omitted, since 21531|get_next| takes actions appropriate to alignment only when `\.{\\cr}' 21532or `\.{\\span}' or tab marks occur with |align_state=0|. The following 21533program attempts to make an appropriate recovery. 21534 21535@<Cases of |main_control| that build...@>= 21536any_mode(car_ret), any_mode(tab_mark): align_error; 21537any_mode(no_align): no_align_error; 21538any_mode(omit): omit_error; 21539 21540@ @<Declare act...@>= 21541procedure align_error; 21542begin if abs(align_state)>2 then 21543 @<Express consternation over the fact that no alignment is in progress@> 21544else begin back_input; 21545 if align_state<0 then 21546 begin print_err("Missing { inserted"); 21547@.Missing \{ inserted@> 21548 incr(align_state); cur_tok:=left_brace_token+"{"; 21549 end 21550 else begin print_err("Missing } inserted"); 21551@.Missing \} inserted@> 21552 decr(align_state); cur_tok:=right_brace_token+"}"; 21553 end; 21554 help3("I've put in what seems to be necessary to fix")@/ 21555 ("the current column of the current alignment.")@/ 21556 ("Try to go on, since this might almost work."); ins_error; 21557 end; 21558end; 21559 21560@ @<Express consternation...@>= 21561begin print_err("Misplaced "); print_cmd_chr(cur_cmd,cur_chr); 21562@.Misplaced \&@> 21563@.Misplaced \\span@> 21564@.Misplaced \\cr@> 21565if cur_tok=tab_token+"&" then 21566 begin help6("I can't figure out why you would want to use a tab mark")@/ 21567 ("here. If you just want an ampersand, the remedy is")@/ 21568 ("simple: Just type `I\&' now. But if some right brace")@/ 21569 ("up above has ended a previous alignment prematurely,")@/ 21570 ("you're probably due for more error messages, and you")@/ 21571 ("might try typing `S' now just to see what is salvageable."); 21572 end 21573else begin help5("I can't figure out why you would want to use a tab mark")@/ 21574 ("or \cr or \span just now. If something like a right brace")@/ 21575 ("up above has ended a previous alignment prematurely,")@/ 21576 ("you're probably due for more error messages, and you")@/ 21577 ("might try typing `S' now just to see what is salvageable."); 21578 end; 21579error; 21580end 21581 21582@ The help messages here contain a little white lie, since \.{\\noalign} 21583and \.{\\omit} are allowed also after `\.{\\noalign\{...\}}'. 21584 21585@<Declare act...@>= 21586procedure no_align_error; 21587begin print_err("Misplaced "); print_esc("noalign"); 21588@.Misplaced \\noalign@> 21589help2("I expect to see \noalign only after the \cr of")@/ 21590 ("an alignment. Proceed, and I'll ignore this case."); error; 21591end; 21592procedure omit_error; 21593begin print_err("Misplaced "); print_esc("omit"); 21594@.Misplaced \\omit@> 21595help2("I expect to see \omit only after tab marks or the \cr of")@/ 21596 ("an alignment. Proceed, and I'll ignore this case."); error; 21597end; 21598 21599@ We've now covered most of the abuses of \.{\\halign} and \.{\\valign}. 21600Let's take a look at what happens when they are used correctly. 21601 21602@<Cases of |main_control| that build...@>= 21603vmode+halign,hmode+valign:init_align; 21604mmode+halign: if privileged then 21605 if cur_group=math_shift_group then init_align 21606 else off_save; 21607vmode+endv,hmode+endv: do_endv; 21608 21609@ An |align_group| code is supposed to remain on the |save_stack| 21610during an entire alignment, until |fin_align| removes it. 21611 21612A devious user might force an |endv| command to occur just about anywhere; 21613we must defeat such hacks. 21614 21615@<Declare act...@>= 21616procedure do_endv; 21617begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input; 21618while (input_stack[base_ptr].index_field<>v_template) and 21619 (input_stack[base_ptr].loc_field=null) and 21620 (input_stack[base_ptr].state_field=token_list) do decr(base_ptr); 21621if (input_stack[base_ptr].index_field<>v_template) or 21622 (input_stack[base_ptr].loc_field<>null) or 21623 (input_stack[base_ptr].state_field<>token_list) then 21624 fatal_error("(interwoven alignment preambles are not allowed)"); 21625@.interwoven alignment preambles...@> 21626 if cur_group=align_group then 21627 begin end_graf; 21628 if fin_col then fin_row; 21629 end 21630else off_save; 21631end; 21632 21633@ @<Cases of |handle_right_brace|...@>= 21634align_group: begin back_input; cur_tok:=cs_token_flag+frozen_cr; 21635 print_err("Missing "); print_esc("cr"); print(" inserted"); 21636@.Missing \\cr inserted@> 21637 help1("I'm guessing that you meant to end an alignment here."); 21638 ins_error; 21639 end; 21640 21641@ @<Cases of |handle_right_brace|...@>= 21642no_align_group: begin end_graf; unsave; align_peek; 21643 end; 21644 21645@ Finally, \.{\\endcsname} is not supposed to get through to |main_control|. 21646 21647@<Cases of |main_control| that build...@>= 21648any_mode(end_cs_name): cs_error; 21649 21650@ @<Declare act...@>= 21651procedure cs_error; 21652begin print_err("Extra "); print_esc("endcsname"); 21653@.Extra \\endcsname@> 21654help1("I'm ignoring this, since I wasn't doing a \csname."); 21655error; 21656end; 21657 21658@* \[48] Building math lists. 21659The routines that \TeX\ uses to create mlists are similar to those we have 21660just seen for the generation of hlists and vlists. But it is necessary to 21661make ``noads'' as well as nodes, so the reader should review the 21662discussion of math mode data structures before trying to make sense out of 21663the following program. 21664 21665Here is a little routine that needs to be done whenever a subformula 21666is about to be processed. The parameter is a code like |math_group|. 21667 21668@<Declare act...@>= 21669procedure push_math(@!c:group_code); 21670begin push_nest; mode:=-mmode; incompleat_noad:=null; new_save_level(c); 21671end; 21672 21673@ We get into math mode from horizontal mode when a `\.\$' (i.e., a 21674|math_shift| character) is scanned. We must check to see whether this 21675`\.\$' is immediately followed by another, in case display math mode is 21676called for. 21677 21678@<Cases of |main_control| that build...@>= 21679hmode+math_shift:init_math; 21680 21681@ @<Declare act...@>= 21682procedure init_math; 21683label reswitch,found,not_found,done; 21684var w:scaled; {new or partial |pre_display_size|} 21685@!l:scaled; {new |display_width|} 21686@!s:scaled; {new |display_indent|} 21687@!p:pointer; {current node when calculating |pre_display_size|} 21688@!q:pointer; {glue specification when calculating |pre_display_size|} 21689@!f:internal_font_number; {font in current |char_node|} 21690@!n:integer; {scope of paragraph shape specification} 21691@!v:scaled; {|w| plus possible glue amount} 21692@!d:scaled; {increment to |v|} 21693begin get_token; {|get_x_token| would fail on \.{\\ifmmode}\thinspace!} 21694if (cur_cmd=math_shift)and(mode>0) then @<Go into display math mode@> 21695else begin back_input; @<Go into ordinary math mode@>; 21696 end; 21697end; 21698 21699@ @<Go into ordinary math mode@>= 21700begin push_math(math_shift_group); eq_word_define(int_base+cur_fam_code,-1); 21701if every_math<>null then begin_token_list(every_math,every_math_text); 21702end 21703 21704@ We get into ordinary math mode from display math mode when `\.{\\eqno}' or 21705`\.{\\leqno}' appears. In such cases |cur_chr| will be 0 or~1, respectively; 21706the value of |cur_chr| is placed onto |save_stack| for safe keeping. 21707 21708@<Cases of |main_control| that build...@>= 21709mmode+eq_no: if privileged then 21710 if cur_group=math_shift_group then start_eq_no 21711 else off_save; 21712 21713@ @<Put each...@>= 21714primitive("eqno",eq_no,0); 21715@!@:eq_no_}{\.{\\eqno} primitive@> 21716primitive("leqno",eq_no,1); 21717@!@:leq_no_}{\.{\\leqno} primitive@> 21718 21719@ When \TeX\ is in display math mode, |cur_group=math_shift_group|, 21720so it is not necessary for the |start_eq_no| procedure to test for 21721this condition. 21722 21723@<Declare act...@>= 21724procedure start_eq_no; 21725begin saved(0):=cur_chr; incr(save_ptr); 21726@<Go into ordinary math mode@>; 21727end; 21728 21729@ @<Cases of |print_cmd_chr|...@>= 21730eq_no:if chr_code=1 then print_esc("leqno")@+else print_esc("eqno"); 21731 21732@ @<Forbidden...@>=non_math(eq_no), 21733 21734@ When we enter display math mode, we need to call |line_break| to 21735process the partial paragraph that has just been interrupted by the 21736display. Then we can set the proper values of |display_width| and 21737|display_indent| and |pre_display_size|. 21738 21739@<Go into display math mode@>= 21740begin if head=tail then {`\.{\\noindent\$\$}' or `\.{\$\${ }\$\$}'} 21741 begin pop_nest; w:=-max_dimen; 21742 end 21743else begin line_break(display_widow_penalty);@/ 21744 @<Calculate the natural width, |w|, by which the characters of the 21745 final line extend to the right of the reference point, 21746 plus two ems; or set |w:=max_dimen| if the non-blank information 21747 on that line is affected by stretching or shrinking@>; 21748 end; 21749{now we are in vertical mode, working on the list that will contain the display} 21750@<Calculate the length, |l|, and the shift amount, |s|, of the display lines@>; 21751push_math(math_shift_group); mode:=mmode; 21752eq_word_define(int_base+cur_fam_code,-1);@/ 21753eq_word_define(dimen_base+pre_display_size_code,w); 21754eq_word_define(dimen_base+display_width_code,l); 21755eq_word_define(dimen_base+display_indent_code,s); 21756if every_display<>null then begin_token_list(every_display,every_display_text); 21757if nest_ptr=1 then build_page; 21758end 21759 21760@ @<Calculate the natural width, |w|, by which...@>= 21761v:=shift_amount(just_box)+2*quad(cur_font); w:=-max_dimen; 21762p:=list_ptr(just_box); 21763while p<>null do 21764 begin @<Let |d| be the natural width of node |p|; 21765 if the node is ``visible,'' |goto found|; 21766 if the node is glue that stretches or shrinks, set |v:=max_dimen|@>; 21767 if v<max_dimen then v:=v+d; 21768 goto not_found; 21769 found: if v<max_dimen then 21770 begin v:=v+d; w:=v; 21771 end 21772 else begin w:=max_dimen; goto done; 21773 end; 21774 not_found: p:=link(p); 21775 end; 21776done: 21777 21778@ @<Let |d| be the natural width of node |p|...@>= 21779reswitch: if is_char_node(p) then 21780 begin f:=font(p); d:=char_width(f)(char_info(f)(character(p))); 21781 goto found; 21782 end; 21783case type(p) of 21784hlist_node,vlist_node,rule_node: begin d:=width(p); goto found; 21785 end; 21786ligature_node:@<Make node |p| look like a |char_node|...@>; 21787kern_node,math_node: d:=width(p); 21788glue_node:@<Let |d| be the natural width of this glue; if stretching 21789 or shrinking, set |v:=max_dimen|; |goto found| in the case of leaders@>; 21790whatsit_node: @<Let |d| be the width of the whatsit |p|@>; 21791othercases d:=0 21792endcases 21793 21794@ We need to be careful that |w|, |v|, and |d| do not depend on any |glue_set| 21795values, since such values are subject to system-dependent rounding. 21796System-dependent numbers are not allowed to infiltrate parameters like 21797|pre_display_size|, since \TeX82 is supposed to make the same decisions on all 21798machines. 21799 21800@<Let |d| be the natural width of this glue...@>= 21801begin q:=glue_ptr(p); d:=width(q); 21802if glue_sign(just_box)=stretching then 21803 begin if (glue_order(just_box)=stretch_order(q))and@| 21804 (stretch(q)<>0) then 21805 v:=max_dimen; 21806 end 21807else if glue_sign(just_box)=shrinking then 21808 begin if (glue_order(just_box)=shrink_order(q))and@| 21809 (shrink(q)<>0) then 21810 v:=max_dimen; 21811 end; 21812if subtype(p)>=a_leaders then goto found; 21813end 21814 21815@ A displayed equation is considered to be three lines long, so we 21816calculate the length and offset of line number |prev_graf+2|. 21817 21818@<Calculate the length, |l|, ...@>= 21819if par_shape_ptr=null then 21820 if (hang_indent<>0)and@| 21821 (((hang_after>=0)and(prev_graf+2>hang_after))or@| 21822 (prev_graf+1<-hang_after)) then 21823 begin l:=hsize-abs(hang_indent); 21824 if hang_indent>0 then s:=hang_indent@+else s:=0; 21825 end 21826 else begin l:=hsize; s:=0; 21827 end 21828else begin n:=info(par_shape_ptr); 21829 if prev_graf+2>=n then p:=par_shape_ptr+2*n 21830 else p:=par_shape_ptr+2*(prev_graf+2); 21831 s:=mem[p-1].sc; l:=mem[p].sc; 21832 end 21833 21834@ Subformulas of math formulas cause a new level of math mode to be entered, 21835on the semantic nest as well as the save stack. These subformulas arise in 21836several ways: (1)~A left brace by itself indicates the beginning of a 21837subformula that will be put into a box, thereby freezing its glue and 21838preventing line breaks. (2)~A subscript or superscript is treated as a 21839subformula if it is not a single character; the same applies to 21840the nucleus of things like \.{\\underline}. (3)~The \.{\\left} primitive 21841initiates a subformula that will be terminated by a matching \.{\\right}. 21842The group codes placed on |save_stack| in these three cases are 21843|math_group|, |math_group|, and |math_left_group|, respectively. 21844 21845Here is the code that handles case (1); the other cases are not quite as 21846trivial, so we shall consider them later. 21847 21848@<Cases of |main_control| that build...@>= 21849mmode+left_brace: begin tail_append(new_noad); 21850 back_input; scan_math(nucleus(tail)); 21851 end; 21852 21853@ Recall that the |nucleus|, |subscr|, and |supscr| fields in a noad are 21854broken down into subfields called |math_type| and either |info| or 21855|(fam,character)|. The job of |scan_math| is to figure out what to place 21856in one of these principal fields; it looks at the subformula that 21857comes next in the input, and places an encoding of that subformula 21858into a given word of |mem|. 21859 21860@d fam_in_range==((cur_fam>=0)and(cur_fam<16)) 21861 21862@<Declare act...@>= 21863procedure scan_math(@!p:pointer); 21864label restart,reswitch,exit; 21865var c:integer; {math character code} 21866begin restart:@<Get the next non-blank non-relax...@>; 21867reswitch:case cur_cmd of 21868letter,other_char,char_given: begin c:=ho(math_code(cur_chr)); 21869 if c=@'100000 then 21870 begin @<Treat |cur_chr| as an active character@>; 21871 goto restart; 21872 end; 21873 end; 21874char_num: begin scan_char_num; cur_chr:=cur_val; cur_cmd:=char_given; 21875 goto reswitch; 21876 end; 21877math_char_num: begin scan_fifteen_bit_int; c:=cur_val; 21878 end; 21879math_given: c:=cur_chr; 21880delim_num: begin scan_twenty_seven_bit_int; c:=cur_val div @'10000; 21881 end; 21882othercases @<Scan a subformula enclosed in braces and |return|@> 21883endcases;@/ 21884math_type(p):=math_char; character(p):=qi(c mod 256); 21885if (c>=var_code)and fam_in_range then fam(p):=cur_fam 21886else fam(p):=(c div 256) mod 16; 21887exit:end; 21888 21889@ An active character that is an |outer_call| is allowed here. 21890 21891@<Treat |cur_chr|...@>= 21892begin cur_cs:=cur_chr+active_base; 21893cur_cmd:=eq_type(cur_cs); cur_chr:=equiv(cur_cs); 21894x_token; back_input; 21895end 21896 21897@ The pointer |p| is placed on |save_stack| while a complex subformula 21898is being scanned. 21899 21900@<Scan a subformula...@>= 21901begin back_input; scan_left_brace;@/ 21902saved(0):=p; incr(save_ptr); push_math(math_group); return; 21903end 21904 21905@ The simplest math formula is, of course, `\.{\${ }\$}', when no noads are 21906generated. The next simplest cases involve a single character, e.g., 21907`\.{\$x\$}'. Even though such cases may not seem to be very interesting, 21908the reader can perhaps understand how happy the author was when `\.{\$x\$}' 21909was first properly typeset by \TeX. The code in this section was used. 21910@^Knuth, Donald Ervin@> 21911 21912@<Cases of |main_control| that build...@>= 21913mmode+letter,mmode+other_char,mmode+char_given: 21914 set_math_char(ho(math_code(cur_chr))); 21915mmode+char_num: begin scan_char_num; cur_chr:=cur_val; 21916 set_math_char(ho(math_code(cur_chr))); 21917 end; 21918mmode+math_char_num: begin scan_fifteen_bit_int; set_math_char(cur_val); 21919 end; 21920mmode+math_given: set_math_char(cur_chr); 21921mmode+delim_num: begin scan_twenty_seven_bit_int; 21922 set_math_char(cur_val div @'10000); 21923 end; 21924 21925@ The |set_math_char| procedure creates a new noad appropriate to a given 21926math code, and appends it to the current mlist. However, if the math code 21927is sufficiently large, the |cur_chr| is treated as an active character and 21928nothing is appended. 21929 21930@<Declare act...@>= 21931procedure set_math_char(@!c:integer); 21932var p:pointer; {the new noad} 21933begin if c>=@'100000 then 21934 @<Treat |cur_chr|...@> 21935else begin p:=new_noad; math_type(nucleus(p)):=math_char; 21936 character(nucleus(p)):=qi(c mod 256); 21937 fam(nucleus(p)):=(c div 256) mod 16; 21938 if c>=var_code then 21939 begin if fam_in_range then fam(nucleus(p)):=cur_fam; 21940 type(p):=ord_noad; 21941 end 21942 else type(p):=ord_noad+(c div @'10000); 21943 link(tail):=p; tail:=p; 21944 end; 21945end; 21946 21947@ Primitive math operators like \.{\\mathop} and \.{\\underline} are given 21948the command code |math_comp|, supplemented by the noad type that they 21949generate. 21950 21951@<Put each...@>= 21952primitive("mathord",math_comp,ord_noad); 21953@!@:math_ord_}{\.{\\mathord} primitive@> 21954primitive("mathop",math_comp,op_noad); 21955@!@:math_op_}{\.{\\mathop} primitive@> 21956primitive("mathbin",math_comp,bin_noad); 21957@!@:math_bin_}{\.{\\mathbin} primitive@> 21958primitive("mathrel",math_comp,rel_noad); 21959@!@:math_rel_}{\.{\\mathrel} primitive@> 21960primitive("mathopen",math_comp,open_noad); 21961@!@:math_open_}{\.{\\mathopen} primitive@> 21962primitive("mathclose",math_comp,close_noad); 21963@!@:math_close_}{\.{\\mathclose} primitive@> 21964primitive("mathpunct",math_comp,punct_noad); 21965@!@:math_punct_}{\.{\\mathpunct} primitive@> 21966primitive("mathinner",math_comp,inner_noad); 21967@!@:math_inner_}{\.{\\mathinner} primitive@> 21968primitive("underline",math_comp,under_noad); 21969@!@:underline_}{\.{\\underline} primitive@> 21970primitive("overline",math_comp,over_noad);@/ 21971@!@:overline_}{\.{\\overline} primitive@> 21972primitive("displaylimits",limit_switch,normal); 21973@!@:display_limits_}{\.{\\displaylimits} primitive@> 21974primitive("limits",limit_switch,limits); 21975@!@:limits_}{\.{\\limits} primitive@> 21976primitive("nolimits",limit_switch,no_limits); 21977@!@:no_limits_}{\.{\\nolimits} primitive@> 21978 21979@ @<Cases of |print_cmd_chr|...@>= 21980math_comp: case chr_code of 21981 ord_noad: print_esc("mathord"); 21982 op_noad: print_esc("mathop"); 21983 bin_noad: print_esc("mathbin"); 21984 rel_noad: print_esc("mathrel"); 21985 open_noad: print_esc("mathopen"); 21986 close_noad: print_esc("mathclose"); 21987 punct_noad: print_esc("mathpunct"); 21988 inner_noad: print_esc("mathinner"); 21989 under_noad: print_esc("underline"); 21990 othercases print_esc("overline") 21991 endcases; 21992limit_switch: if chr_code=limits then print_esc("limits") 21993 else if chr_code=no_limits then print_esc("nolimits") 21994 else print_esc("displaylimits"); 21995 21996@ @<Cases of |main_control| that build...@>= 21997mmode+math_comp: begin tail_append(new_noad); 21998 type(tail):=cur_chr; scan_math(nucleus(tail)); 21999 end; 22000mmode+limit_switch: math_limit_switch; 22001 22002@ @<Declare act...@>= 22003procedure math_limit_switch; 22004label exit; 22005begin if head<>tail then if type(tail)=op_noad then 22006 begin subtype(tail):=cur_chr; return; 22007 end; 22008print_err("Limit controls must follow a math operator"); 22009@.Limit controls must follow...@> 22010help1("I'm ignoring this misplaced \limits or \nolimits command."); error; 22011exit:end; 22012 22013@ Delimiter fields of noads are filled in by the |scan_delimiter| routine. 22014The first parameter of this procedure is the |mem| address where the 22015delimiter is to be placed; the second tells if this delimiter follows 22016\.{\\radical} or not. 22017 22018@<Declare act...@>= 22019procedure scan_delimiter(@!p:pointer;@!r:boolean); 22020begin if r then scan_twenty_seven_bit_int 22021else begin @<Get the next non-blank non-relax...@>; 22022 case cur_cmd of 22023 letter,other_char: cur_val:=del_code(cur_chr); 22024 delim_num: scan_twenty_seven_bit_int; 22025 othercases cur_val:=-1 22026 endcases; 22027 end; 22028if cur_val<0 then @<Report that an invalid delimiter code is being changed 22029 to null; set~|cur_val:=0|@>; 22030small_fam(p):=(cur_val div @'4000000) mod 16; 22031small_char(p):=qi((cur_val div @'10000) mod 256); 22032large_fam(p):=(cur_val div 256) mod 16; 22033large_char(p):=qi(cur_val mod 256); 22034end; 22035 22036@ @<Report that an invalid delimiter...@>= 22037begin print_err("Missing delimiter (. inserted)"); 22038@.Missing delimiter...@> 22039help6("I was expecting to see something like `(' or `\{' or")@/ 22040 ("`\}' here. If you typed, e.g., `{' instead of `\{', you")@/ 22041 ("should probably delete the `{' by typing `1' now, so that")@/ 22042 ("braces don't get unbalanced. Otherwise just proceed.")@/ 22043 ("Acceptable delimiters are characters whose \delcode is")@/ 22044 ("nonnegative, or you can use `\delimiter <delimiter code>'."); 22045back_error; cur_val:=0; 22046end 22047 22048@ @<Cases of |main_control| that build...@>= 22049mmode+radical:math_radical; 22050 22051@ @<Declare act...@>= 22052procedure math_radical; 22053begin tail_append(get_node(radical_noad_size)); 22054type(tail):=radical_noad; subtype(tail):=normal; 22055mem[nucleus(tail)].hh:=empty_field; 22056mem[subscr(tail)].hh:=empty_field; 22057mem[supscr(tail)].hh:=empty_field; 22058scan_delimiter(left_delimiter(tail),true); scan_math(nucleus(tail)); 22059end; 22060 22061@ @<Cases of |main_control| that build...@>= 22062mmode+accent,mmode+math_accent:math_ac; 22063 22064@ @<Declare act...@>= 22065procedure math_ac; 22066begin if cur_cmd=accent then 22067 @<Complain that the user should have said \.{\\mathaccent}@>; 22068tail_append(get_node(accent_noad_size)); 22069type(tail):=accent_noad; subtype(tail):=normal; 22070mem[nucleus(tail)].hh:=empty_field; 22071mem[subscr(tail)].hh:=empty_field; 22072mem[supscr(tail)].hh:=empty_field; 22073math_type(accent_chr(tail)):=math_char; 22074scan_fifteen_bit_int; 22075character(accent_chr(tail)):=qi(cur_val mod 256); 22076if (cur_val>=var_code)and fam_in_range then fam(accent_chr(tail)):=cur_fam 22077else fam(accent_chr(tail)):=(cur_val div 256) mod 16; 22078scan_math(nucleus(tail)); 22079end; 22080 22081@ @<Complain that the user should have said \.{\\mathaccent}@>= 22082begin print_err("Please use "); print_esc("mathaccent"); 22083print(" for accents in math mode"); 22084@.Please use \\mathaccent...@> 22085help2("I'm changing \accent to \mathaccent here; wish me luck.")@/ 22086 ("(Accents are not the same in formulas as they are in text.)"); 22087error; 22088end 22089 22090@ @<Cases of |main_control| that build...@>= 22091mmode+vcenter: begin scan_spec(vcenter_group,false); normal_paragraph; 22092 push_nest; mode:=-vmode; prev_depth:=ignore_depth; 22093 if every_vbox<>null then begin_token_list(every_vbox,every_vbox_text); 22094 end; 22095 22096@ @<Cases of |handle...@>= 22097vcenter_group: begin end_graf; unsave; save_ptr:=save_ptr-2; 22098 p:=vpack(link(head),saved(1),saved(0)); pop_nest; 22099 tail_append(new_noad); type(tail):=vcenter_noad; 22100 math_type(nucleus(tail)):=sub_box; info(nucleus(tail)):=p; 22101 end; 22102 22103@ The routine that inserts a |style_node| holds no surprises. 22104 22105@<Put each...@>= 22106primitive("displaystyle",math_style,display_style); 22107@!@:display_style_}{\.{\\displaystyle} primitive@> 22108primitive("textstyle",math_style,text_style); 22109@!@:text_style_}{\.{\\textstyle} primitive@> 22110primitive("scriptstyle",math_style,script_style); 22111@!@:script_style_}{\.{\\scriptstyle} primitive@> 22112primitive("scriptscriptstyle",math_style,script_script_style); 22113@!@:script_script_style_}{\.{\\scriptscriptstyle} primitive@> 22114 22115@ @<Cases of |print_cmd_chr|...@>= 22116math_style: print_style(chr_code); 22117 22118@ @<Cases of |main_control| that build...@>= 22119mmode+math_style: tail_append(new_style(cur_chr)); 22120mmode+non_script: begin tail_append(new_glue(zero_glue)); 22121 subtype(tail):=cond_math_glue; 22122 end; 22123mmode+math_choice: append_choices; 22124 22125@ The routine that scans the four mlists of a \.{\\mathchoice} is very 22126much like the routine that builds discretionary nodes. 22127 22128@<Declare act...@>= 22129procedure append_choices; 22130begin tail_append(new_choice); incr(save_ptr); saved(-1):=0; 22131push_math(math_choice_group); scan_left_brace; 22132end; 22133 22134@ @<Cases of |handle_right_brace|...@>= 22135math_choice_group: build_choices; 22136 22137@ @<Declare act...@>= 22138@t\4@>@<Declare the function called |fin_mlist|@>@t@>@;@/ 22139procedure build_choices; 22140label exit; 22141var p:pointer; {the current mlist} 22142begin unsave; p:=fin_mlist(null); 22143case saved(-1) of 221440:display_mlist(tail):=p; 221451:text_mlist(tail):=p; 221462:script_mlist(tail):=p; 221473:begin script_script_mlist(tail):=p; decr(save_ptr); return; 22148 end; 22149end; {there are no other cases} 22150incr(saved(-1)); push_math(math_choice_group); scan_left_brace; 22151exit:end; 22152 22153@ Subscripts and superscripts are attached to the previous nucleus by the 22154@^superscripts@>@^subscripts@> 22155action procedure called |sub_sup|. We use the facts that |sub_mark=sup_mark+1| 22156and |subscr(p)=supscr(p)+1|. 22157 22158@<Cases of |main_control| that build...@>= 22159mmode+sub_mark,mmode+sup_mark: sub_sup; 22160 22161@ @<Declare act...@>= 22162procedure sub_sup; 22163var t:small_number; {type of previous sub/superscript} 22164@!p:pointer; {field to be filled by |scan_math|} 22165begin t:=empty; p:=null; 22166if tail<>head then if scripts_allowed(tail) then 22167 begin p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|} 22168 t:=math_type(p); 22169 end; 22170if (p=null)or(t<>empty) then @<Insert a dummy noad to be sub/superscripted@>; 22171scan_math(p); 22172end; 22173 22174@ @<Insert a dummy...@>= 22175begin tail_append(new_noad); 22176p:=supscr(tail)+cur_cmd-sup_mark; {|supscr| or |subscr|} 22177if t<>empty then 22178 begin if cur_cmd=sup_mark then 22179 begin print_err("Double superscript"); 22180@.Double superscript@> 22181 help1("I treat `x^1^2' essentially like `x^1{}^2'."); 22182 end 22183 else begin print_err("Double subscript"); 22184@.Double subscript@> 22185 help1("I treat `x_1_2' essentially like `x_1{}_2'."); 22186 end; 22187 error; 22188 end; 22189end 22190 22191@ An operation like `\.{\\over}' causes the current mlist to go into a 22192state of suspended animation: |incompleat_noad| points to a |fraction_noad| 22193that contains the mlist-so-far as its numerator, while the denominator 22194is yet to come. Finally when the mlist is finished, the denominator will 22195go into the incompleat fraction noad, and that noad will become the 22196whole formula, unless it is surrounded by `\.{\\left}' and `\.{\\right}' 22197delimiters. 22198 22199@d above_code=0 { `\.{\\above}' } 22200@d over_code=1 { `\.{\\over}' } 22201@d atop_code=2 { `\.{\\atop}' } 22202@d delimited_code=3 { `\.{\\abovewithdelims}', etc.} 22203 22204@<Put each...@>= 22205primitive("above",above,above_code);@/ 22206@!@:above_}{\.{\\above} primitive@> 22207primitive("over",above,over_code);@/ 22208@!@:over_}{\.{\\over} primitive@> 22209primitive("atop",above,atop_code);@/ 22210@!@:atop_}{\.{\\atop} primitive@> 22211primitive("abovewithdelims",above,delimited_code+above_code);@/ 22212@!@:above_with_delims_}{\.{\\abovewithdelims} primitive@> 22213primitive("overwithdelims",above,delimited_code+over_code);@/ 22214@!@:over_with_delims_}{\.{\\overwithdelims} primitive@> 22215primitive("atopwithdelims",above,delimited_code+atop_code); 22216@!@:atop_with_delims_}{\.{\\atopwithdelims} primitive@> 22217 22218@ @<Cases of |print_cmd_chr|...@>= 22219above: case chr_code of 22220 over_code:print_esc("over"); 22221 atop_code:print_esc("atop"); 22222 delimited_code+above_code:print_esc("abovewithdelims"); 22223 delimited_code+over_code:print_esc("overwithdelims"); 22224 delimited_code+atop_code:print_esc("atopwithdelims"); 22225 othercases print_esc("above") 22226 endcases; 22227 22228@ @<Cases of |main_control| that build...@>= 22229mmode+above: math_fraction; 22230 22231@ @<Declare act...@>= 22232procedure math_fraction; 22233var c:small_number; {the type of generalized fraction we are scanning} 22234begin c:=cur_chr; 22235if incompleat_noad<>null then 22236 @<Ignore the fraction operation and complain about this ambiguous case@> 22237else begin incompleat_noad:=get_node(fraction_noad_size); 22238 type(incompleat_noad):=fraction_noad; 22239 subtype(incompleat_noad):=normal; 22240 math_type(numerator(incompleat_noad)):=sub_mlist; 22241 info(numerator(incompleat_noad)):=link(head); 22242 mem[denominator(incompleat_noad)].hh:=empty_field; 22243 mem[left_delimiter(incompleat_noad)].qqqq:=null_delimiter; 22244 mem[right_delimiter(incompleat_noad)].qqqq:=null_delimiter;@/ 22245 link(head):=null; tail:=head; 22246 @<Use code |c| to distinguish between generalized fractions@>; 22247 end; 22248end; 22249 22250@ @<Use code |c|...@>= 22251if c>=delimited_code then 22252 begin scan_delimiter(left_delimiter(incompleat_noad),false); 22253 scan_delimiter(right_delimiter(incompleat_noad),false); 22254 end; 22255case c mod delimited_code of 22256above_code: begin scan_normal_dimen; 22257 thickness(incompleat_noad):=cur_val; 22258 end; 22259over_code: thickness(incompleat_noad):=default_code; 22260atop_code: thickness(incompleat_noad):=0; 22261end {there are no other cases} 22262 22263@ @<Ignore the fraction...@>= 22264begin if c>=delimited_code then 22265 begin scan_delimiter(garbage,false); scan_delimiter(garbage,false); 22266 end; 22267if c mod delimited_code=above_code then scan_normal_dimen; 22268print_err("Ambiguous; you need another { and }"); 22269@.Ambiguous...@> 22270help3("I'm ignoring this fraction specification, since I don't")@/ 22271 ("know whether a construction like `x \over y \over z'")@/ 22272 ("means `{x \over y} \over z' or `x \over {y \over z}'."); 22273error; 22274end 22275 22276@ At the end of a math formula or subformula, the |fin_mlist| routine is 22277called upon to return a pointer to the newly completed mlist, and to 22278pop the nest back to the enclosing semantic level. The parameter to 22279|fin_mlist|, if not null, points to a |right_noad| that ends the 22280current mlist; this |right_noad| has not yet been appended. 22281 22282@<Declare the function called |fin_mlist|@>= 22283function fin_mlist(@!p:pointer):pointer; 22284var q:pointer; {the mlist to return} 22285begin if incompleat_noad<>null then @<Compleat the incompleat noad@> 22286else begin link(tail):=p; q:=link(head); 22287 end; 22288pop_nest; fin_mlist:=q; 22289end; 22290 22291@ @<Compleat...@>= 22292begin math_type(denominator(incompleat_noad)):=sub_mlist; 22293info(denominator(incompleat_noad)):=link(head); 22294if p=null then q:=incompleat_noad 22295else begin q:=info(numerator(incompleat_noad)); 22296 if type(q)<>left_noad then confusion("right"); 22297@:this can't happen right}{\quad right@> 22298 info(numerator(incompleat_noad)):=link(q); 22299 link(q):=incompleat_noad; link(incompleat_noad):=p; 22300 end; 22301end 22302 22303@ Now at last we're ready to see what happens when a right brace occurs 22304in a math formula. Two special cases are simplified here: Braces are effectively 22305removed when they surround a single Ord without sub/superscripts, or when they 22306surround an accent that is the nucleus of an Ord atom. 22307 22308@<Cases of |handle...@>= 22309math_group: begin unsave; decr(save_ptr);@/ 22310 math_type(saved(0)):=sub_mlist; p:=fin_mlist(null); info(saved(0)):=p; 22311 if p<>null then if link(p)=null then 22312 if type(p)=ord_noad then 22313 begin if math_type(subscr(p))=empty then 22314 if math_type(supscr(p))=empty then 22315 begin mem[saved(0)].hh:=mem[nucleus(p)].hh; 22316 free_node(p,noad_size); 22317 end; 22318 end 22319 else if type(p)=accent_noad then if saved(0)=nucleus(tail) then 22320 if type(tail)=ord_noad then @<Replace the tail of the list by |p|@>; 22321 end; 22322 22323@ @<Replace the tail...@>= 22324begin q:=head; while link(q)<>tail do q:=link(q); 22325link(q):=p; free_node(tail,noad_size); tail:=p; 22326end 22327 22328@ We have dealt with all constructions of math mode except `\.{\\left}' and 22329`\.{\\right}', so the picture is completed by the following sections of 22330the program. 22331 22332@<Put each...@>= 22333primitive("left",left_right,left_noad); 22334@!@:left_}{\.{\\left} primitive@> 22335primitive("right",left_right,right_noad); 22336@!@:right_}{\.{\\right} primitive@> 22337text(frozen_right):="right"; eqtb[frozen_right]:=eqtb[cur_val]; 22338 22339@ @<Cases of |print_cmd_chr|...@>= 22340left_right: if chr_code=left_noad then print_esc("left") 22341else print_esc("right"); 22342 22343@ @<Cases of |main_control| that build...@>= 22344mmode+left_right: math_left_right; 22345 22346@ @<Declare act...@>= 22347procedure math_left_right; 22348var t:small_number; {|left_noad| or |right_noad|} 22349@!p:pointer; {new noad} 22350begin t:=cur_chr; 22351if (t=right_noad)and(cur_group<>math_left_group) then 22352 @<Try to recover from mismatched \.{\\right}@> 22353else begin p:=new_noad; type(p):=t; 22354 scan_delimiter(delimiter(p),false); 22355 if t=left_noad then 22356 begin push_math(math_left_group); link(head):=p; tail:=p; 22357 end 22358 else begin p:=fin_mlist(p); unsave; {end of |math_left_group|} 22359 tail_append(new_noad); type(tail):=inner_noad; 22360 math_type(nucleus(tail)):=sub_mlist; 22361 info(nucleus(tail)):=p; 22362 end; 22363 end; 22364end; 22365 22366@ @<Try to recover from mismatch...@>= 22367begin if cur_group=math_shift_group then 22368 begin scan_delimiter(garbage,false); 22369 print_err("Extra "); print_esc("right"); 22370@.Extra \\right.@> 22371 help1("I'm ignoring a \right that had no matching \left."); 22372 error; 22373 end 22374else off_save; 22375end 22376 22377@ Here is the only way out of math mode. 22378 22379@<Cases of |main_control| that build...@>= 22380mmode+math_shift: if cur_group=math_shift_group then after_math 22381 else off_save; 22382 22383@ @<Declare act...@>= 22384procedure after_math; 22385var l:boolean; {`\.{\\leqno}' instead of `\.{\\eqno}'} 22386@!danger:boolean; {not enough symbol fonts are present} 22387@!m:integer; {|mmode| or |-mmode|} 22388@!p:pointer; {the formula} 22389@!a:pointer; {box containing equation number} 22390@<Local variables for finishing a displayed formula@>@; 22391begin danger:=false; 22392@<Check that the necessary fonts for math symbols are present; 22393 if not, flush the current math lists and set |danger:=true|@>; 22394m:=mode; l:=false; p:=fin_mlist(null); {this pops the nest} 22395if mode=-m then {end of equation number} 22396 begin @<Check that another \.\$ follows@>; 22397 cur_mlist:=p; cur_style:=text_style; mlist_penalties:=false; 22398 mlist_to_hlist; a:=hpack(link(temp_head),natural); 22399 unsave; decr(save_ptr); {now |cur_group=math_shift_group|} 22400 if saved(0)=1 then l:=true; 22401 danger:=false; 22402 @<Check that the necessary fonts for math symbols are present; 22403 if not, flush the current math lists and set |danger:=true|@>; 22404 m:=mode; p:=fin_mlist(null); 22405 end 22406else a:=null; 22407if m<0 then @<Finish math in text@> 22408else begin if a=null then @<Check that another \.\$ follows@>; 22409 @<Finish displayed math@>; 22410 end; 22411end; 22412 22413@ @<Check that the necessary fonts...@>= 22414if (font_params[fam_fnt(2+text_size)]<total_mathsy_params)or@| 22415 (font_params[fam_fnt(2+script_size)]<total_mathsy_params)or@| 22416 (font_params[fam_fnt(2+script_script_size)]<total_mathsy_params) then 22417 begin print_err("Math formula deleted: Insufficient symbol fonts");@/ 22418@.Math formula deleted...@> 22419 help3("Sorry, but I can't typeset math unless \textfont 2")@/ 22420 ("and \scriptfont 2 and \scriptscriptfont 2 have all")@/ 22421 ("the \fontdimen values needed in math symbol fonts."); 22422 error; flush_math; danger:=true; 22423 end 22424else if (font_params[fam_fnt(3+text_size)]<total_mathex_params)or@| 22425 (font_params[fam_fnt(3+script_size)]<total_mathex_params)or@| 22426 (font_params[fam_fnt(3+script_script_size)]<total_mathex_params) then 22427 begin print_err("Math formula deleted: Insufficient extension fonts");@/ 22428 help3("Sorry, but I can't typeset math unless \textfont 3")@/ 22429 ("and \scriptfont 3 and \scriptscriptfont 3 have all")@/ 22430 ("the \fontdimen values needed in math extension fonts."); 22431 error; flush_math; danger:=true; 22432 end 22433 22434@ The |unsave| is done after everything else here; hence an appearance of 22435`\.{\\mathsurround}' inside of `\.{\$...\$}' affects the spacing at these 22436particular \.\$'s. This is consistent with the conventions of 22437`\.{\$\$...\$\$}', since `\.{\\abovedisplayskip}' inside a display affects the 22438space above that display. 22439 22440@<Finish math in text@>= 22441begin tail_append(new_math(math_surround,before)); 22442cur_mlist:=p; cur_style:=text_style; mlist_penalties:=(mode>0); mlist_to_hlist; 22443link(tail):=link(temp_head); 22444while link(tail)<>null do tail:=link(tail); 22445tail_append(new_math(math_surround,after)); 22446space_factor:=1000; unsave; 22447end 22448 22449@ \TeX\ gets to the following part of the program when the first `\.\$' ending 22450a display has been scanned. 22451 22452@<Check that another \.\$ follows@>= 22453begin get_x_token; 22454if cur_cmd<>math_shift then 22455 begin print_err("Display math should end with $$"); 22456@.Display math...with \$\$@> 22457 help2("The `$' that I just saw supposedly matches a previous `$$'.")@/ 22458 ("So I shall assume that you typed `$$' both times."); 22459 back_error; 22460 end; 22461end 22462 22463@ We have saved the worst for last: The fussiest part of math mode processing 22464occurs when a displayed formula is being centered and placed with an optional 22465equation number. 22466 22467@<Local variables for finishing...@>= 22468@!b:pointer; {box containing the equation} 22469@!w:scaled; {width of the equation} 22470@!z:scaled; {width of the line} 22471@!e:scaled; {width of equation number} 22472@!q:scaled; {width of equation number plus space to separate from equation} 22473@!d:scaled; {displacement of equation in the line} 22474@!s:scaled; {move the line right this much} 22475@!g1,@!g2:small_number; {glue parameter codes for before and after} 22476@!r:pointer; {kern node used to position the display} 22477@!t:pointer; {tail of adjustment list} 22478 22479@ At this time |p| points to the mlist for the formula; |a| is either 22480|null| or it points to a box containing the equation number; and we are in 22481vertical mode (or internal vertical mode). 22482 22483@<Finish displayed math@>= 22484cur_mlist:=p; cur_style:=display_style; mlist_penalties:=false; 22485mlist_to_hlist; p:=link(temp_head);@/ 22486adjust_tail:=adjust_head; b:=hpack(p,natural); p:=list_ptr(b); 22487t:=adjust_tail; adjust_tail:=null;@/ 22488w:=width(b); z:=display_width; s:=display_indent; 22489if (a=null)or danger then 22490 begin e:=0; q:=0; 22491 end 22492else begin e:=width(a); q:=e+math_quad(text_size); 22493 end; 22494if w+q>z then 22495 @<Squeeze the equation as much as possible; if there is an equation 22496 number that should go on a separate line by itself, 22497 set~|e:=0|@>; 22498@<Determine the displacement, |d|, of the left edge of the equation, with 22499 respect to the line size |z|, assuming that |l=false|@>; 22500@<Append the glue or equation number preceding the display@>; 22501@<Append the display and perhaps also the equation number@>; 22502@<Append the glue or equation number following the display@>; 22503resume_after_display 22504 22505@ @<Declare act...@>= 22506procedure resume_after_display; 22507begin if cur_group<>math_shift_group then confusion("display"); 22508@:this can't happen display}{\quad display@> 22509unsave; prev_graf:=prev_graf+3; 22510push_nest; mode:=hmode; space_factor:=1000; set_cur_lang; clang:=cur_lang; 22511prev_graf:=(norm_min(left_hyphen_min)*@'100+norm_min(right_hyphen_min)) 22512 *@'200000+cur_lang; 22513@<Scan an optional space@>; 22514if nest_ptr=1 then build_page; 22515end; 22516 22517@ The user can force the equation number to go on a separate line 22518by causing its width to be zero. 22519 22520@<Squeeze the equation as much as possible...@>= 22521begin if (e<>0)and((w-total_shrink[normal]+q<=z)or@| 22522 (total_shrink[fil]<>0)or(total_shrink[fill]<>0)or 22523 (total_shrink[filll]<>0)) then 22524 begin free_node(b,box_node_size); 22525 b:=hpack(p,z-q,exactly); 22526 end 22527else begin e:=0; 22528 if w>z then 22529 begin free_node(b,box_node_size); 22530 b:=hpack(p,z,exactly); 22531 end; 22532 end; 22533w:=width(b); 22534end 22535 22536@ We try first to center the display without regard to the existence of 22537the equation number. If that would make it too close (where ``too close'' 22538means that the space between display and equation number is less than the 22539width of the equation number), we either center it in the remaining space 22540or move it as far from the equation number as possible. The latter alternative 22541is taken only if the display begins with glue, since we assume that the 22542user put glue there to control the spacing precisely. 22543 22544@<Determine the displacement, |d|, of the left edge of the equation...@>= 22545d:=half(z-w); 22546if (e>0)and(d<2*e) then {too close} 22547 begin d:=half(z-w-e); 22548 if p<>null then if not is_char_node(p) then if type(p)=glue_node then d:=0; 22549 end 22550 22551@ If the equation number is set on a line by itself, either before or 22552after the formula, we append an infinite penalty so that no page break will 22553separate the display from its number; and we use the same size and 22554displacement for all three potential lines of the display, even though 22555`\.{\\parshape}' may specify them differently. 22556 22557@<Append the glue or equation number preceding the display@>= 22558tail_append(new_penalty(pre_display_penalty));@/ 22559if (d+s<=pre_display_size)or l then {not enough clearance} 22560 begin g1:=above_display_skip_code; g2:=below_display_skip_code; 22561 end 22562else begin g1:=above_display_short_skip_code; 22563 g2:=below_display_short_skip_code; 22564 end; 22565if l and(e=0) then {it follows that |type(a)=hlist_node|} 22566 begin shift_amount(a):=s; append_to_vlist(a); 22567 tail_append(new_penalty(inf_penalty)); 22568 end 22569else tail_append(new_param_glue(g1)) 22570 22571@ @<Append the display and perhaps also the equation number@>= 22572if e<>0 then 22573 begin r:=new_kern(z-w-e-d); 22574 if l then 22575 begin link(a):=r; link(r):=b; b:=a; d:=0; 22576 end 22577 else begin link(b):=r; link(r):=a; 22578 end; 22579 b:=hpack(b,natural); 22580 end; 22581shift_amount(b):=s+d; append_to_vlist(b) 22582 22583@ @<Append the glue or equation number following the display@>= 22584if (a<>null)and(e=0)and not l then 22585 begin tail_append(new_penalty(inf_penalty)); 22586 shift_amount(a):=s+z-width(a); 22587 append_to_vlist(a); 22588 g2:=0; 22589 end; 22590if t<>adjust_head then {migrating material comes after equation number} 22591 begin link(tail):=link(adjust_head); tail:=t; 22592 end; 22593tail_append(new_penalty(post_display_penalty)); 22594if g2>0 then tail_append(new_param_glue(g2)) 22595 22596@ When \.{\\halign} appears in a display, the alignment routines operate 22597essentially as they do in vertical mode. Then the following program is 22598activated, with |p| and |q| pointing to the beginning and end of the 22599resulting list, and with |aux_save| holding the |prev_depth| value. 22600 22601@<Finish an alignment in a display@>= 22602begin do_assignments; 22603if cur_cmd<>math_shift then @<Pontificate about improper alignment in display@> 22604else @<Check that another \.\$ follows@>; 22605pop_nest; 22606tail_append(new_penalty(pre_display_penalty)); 22607tail_append(new_param_glue(above_display_skip_code)); 22608link(tail):=p; 22609if p<>null then tail:=q; 22610tail_append(new_penalty(post_display_penalty)); 22611tail_append(new_param_glue(below_display_skip_code)); 22612prev_depth:=aux_save.sc; resume_after_display; 22613end 22614 22615@ @<Pontificate...@>= 22616begin print_err("Missing $$ inserted"); 22617@.Missing {\$\$} inserted@> 22618help2("Displays can use special alignments (like \eqalignno)")@/ 22619 ("only if nothing but the alignment itself is between $$'s."); 22620back_error; 22621end 22622 22623@* \[49] Mode-independent processing. 22624The long |main_control| procedure has now been fully specified, except for 22625certain activities that are independent of the current mode. These activities 22626do not change the current vlist or hlist or mlist; if they change anything, 22627it is the value of a parameter or the meaning of a control sequence. 22628 22629Assignments to values in |eqtb| can be global or local. Furthermore, a 22630control sequence can be defined to be `\.{\\long}' or `\.{\\outer}', and 22631it might or might not be expanded. The prefixes `\.{\\global}', `\.{\\long}', 22632and `\.{\\outer}' can occur in any order. Therefore we assign binary numeric 22633codes, making it possible to accumulate the union of all specified prefixes 22634by adding the corresponding codes. (\PASCAL's |set| operations could also 22635have been used.) 22636 22637@<Put each...@>= 22638primitive("long",prefix,1); 22639@!@:long_}{\.{\\long} primitive@> 22640primitive("outer",prefix,2); 22641@!@:outer_}{\.{\\outer} primitive@> 22642primitive("global",prefix,4); 22643@!@:global_}{\.{\\global} primitive@> 22644primitive("def",def,0); 22645@!@:def_}{\.{\\def} primitive@> 22646primitive("gdef",def,1); 22647@!@:gdef_}{\.{\\gdef} primitive@> 22648primitive("edef",def,2); 22649@!@:edef_}{\.{\\edef} primitive@> 22650primitive("xdef",def,3); 22651@!@:xdef_}{\.{\\xdef} primitive@> 22652 22653@ @<Cases of |print_cmd_chr|...@>= 22654prefix: if chr_code=1 then print_esc("long") 22655 else if chr_code=2 then print_esc("outer") 22656 else print_esc("global"); 22657def: if chr_code=0 then print_esc("def") 22658 else if chr_code=1 then print_esc("gdef") 22659 else if chr_code=2 then print_esc("edef") 22660 else print_esc("xdef"); 22661 22662@ Every prefix, and every command code that might or might not be prefixed, 22663calls the action procedure |prefixed_command|. This routine accumulates 22664a sequence of prefixes until coming to a non-prefix, then it carries out 22665the command. 22666 22667@<Cases of |main_control| that don't...@>= 22668any_mode(toks_register), 22669any_mode(assign_toks), 22670any_mode(assign_int), 22671any_mode(assign_dimen), 22672any_mode(assign_glue), 22673any_mode(assign_mu_glue), 22674any_mode(assign_font_dimen), 22675any_mode(assign_font_int), 22676any_mode(set_aux), 22677any_mode(set_prev_graf), 22678any_mode(set_page_dimen), 22679any_mode(set_page_int), 22680any_mode(set_box_dimen), 22681any_mode(set_shape), 22682any_mode(def_code), 22683any_mode(def_family), 22684any_mode(set_font), 22685any_mode(def_font), 22686any_mode(register), 22687any_mode(advance), 22688any_mode(multiply), 22689any_mode(divide), 22690any_mode(prefix), 22691any_mode(let), 22692any_mode(shorthand_def), 22693any_mode(read_to_cs), 22694any_mode(def), 22695any_mode(set_box), 22696any_mode(hyph_data), 22697any_mode(set_interaction):prefixed_command; 22698 22699@ If the user says, e.g., `\.{\\global\\global}', the redundancy is 22700silently accepted. 22701 22702@<Declare act...@>= 22703@t\4@>@<Declare subprocedures for |prefixed_command|@>@t@>@;@/ 22704procedure prefixed_command; 22705label done,exit; 22706var a:small_number; {accumulated prefix codes so far} 22707@!f:internal_font_number; {identifies a font} 22708@!j:halfword; {index into a \.{\\parshape} specification} 22709@!k:font_index; {index into |font_info|} 22710@!p,@!q:pointer; {for temporary short-term use} 22711@!n:integer; {ditto} 22712@!e:boolean; {should a definition be expanded? or was \.{\\let} not done?} 22713begin a:=0; 22714while cur_cmd=prefix do 22715 begin if not odd(a div cur_chr) then a:=a+cur_chr; 22716 @<Get the next non-blank non-relax...@>; 22717 if cur_cmd<=max_non_prefixed_command then 22718 @<Discard erroneous prefixes and |return|@>; 22719 end; 22720@<Discard the prefixes \.{\\long} and \.{\\outer} if they are irrelevant@>; 22721@<Adjust \(f)for the setting of \.{\\globaldefs}@>; 22722case cur_cmd of 22723@t\4@>@<Assignments@>@; 22724othercases confusion("prefix") 22725@:this can't happen prefix}{\quad prefix@> 22726endcases; 22727done: @<Insert a token saved by \.{\\afterassignment}, if any@>; 22728exit:end; 22729 22730@ @<Discard erroneous...@>= 22731begin print_err("You can't use a prefix with `"); 22732@.You can't use a prefix with x@> 22733print_cmd_chr(cur_cmd,cur_chr); print_char("'"); 22734help1("I'll pretend you didn't say \long or \outer or \global."); 22735back_error; return; 22736end 22737 22738@ @<Discard the prefixes...@>= 22739if (cur_cmd<>def)and(a mod 4<>0) then 22740 begin print_err("You can't use `"); print_esc("long"); print("' or `"); 22741 print_esc("outer"); print("' with `"); 22742@.You can't use \\long...@> 22743 print_cmd_chr(cur_cmd,cur_chr); print_char("'"); 22744 help1("I'll pretend you didn't say \long or \outer here."); 22745 error; 22746 end 22747 22748@ The previous routine does not have to adjust |a| so that |a mod 4=0|, 22749since the following routines test for the \.{\\global} prefix as follows. 22750 22751@d global==(a>=4) 22752@d define(#)==if global then geq_define(#)@+else eq_define(#) 22753@d word_define(#)==if global then geq_word_define(#)@+else eq_word_define(#) 22754 22755@<Adjust \(f)for the setting of \.{\\globaldefs}@>= 22756if global_defs<>0 then 22757 if global_defs<0 then 22758 begin if global then a:=a-4; 22759 end 22760 else begin if not global then a:=a+4; 22761 end 22762 22763@ When a control sequence is to be defined, by \.{\\def} or \.{\\let} or 22764something similar, the |get_r_token| routine will substitute a special 22765control sequence for a token that is not redefinable. 22766 22767@<Declare subprocedures for |prefixed_command|@>= 22768procedure get_r_token; 22769label restart; 22770begin restart: repeat get_token; 22771until cur_tok<>space_token; 22772if (cur_cs=0)or(cur_cs>frozen_control_sequence) then 22773 begin print_err("Missing control sequence inserted"); 22774@.Missing control...@> 22775 help5("Please don't say `\def cs{...}', say `\def\cs{...}'.")@/ 22776 ("I've inserted an inaccessible control sequence so that your")@/ 22777 ("definition will be completed without mixing me up too badly.")@/ 22778 ("You can recover graciously from this error, if you're")@/ 22779 ("careful; see exercise 27.2 in The TeXbook."); 22780@:TeXbook}{\sl The \TeX book@> 22781 if cur_cs=0 then back_input; 22782 cur_tok:=cs_token_flag+frozen_protection; ins_error; goto restart; 22783 end; 22784end; 22785 22786@ @<Initialize table entries...@>= 22787text(frozen_protection):="inaccessible"; 22788 22789@ Here's an example of the way many of the following routines operate. 22790(Unfortunately, they aren't all as simple as this.) 22791 22792@<Assignments@>= 22793set_font: define(cur_font_loc,data,cur_chr); 22794 22795@ When a |def| command has been scanned, 22796|cur_chr| is odd if the definition is supposed to be global, and 22797|cur_chr>=2| if the definition is supposed to be expanded. 22798 22799@<Assignments@>= 22800def: begin if odd(cur_chr)and not global and(global_defs>=0) then a:=a+4; 22801 e:=(cur_chr>=2); get_r_token; p:=cur_cs; 22802 q:=scan_toks(true,e); define(p,call+(a mod 4),def_ref); 22803 end; 22804 22805@ Both \.{\\let} and \.{\\futurelet} share the command code |let|. 22806 22807@<Put each...@>= 22808primitive("let",let,normal);@/ 22809@!@:let_}{\.{\\let} primitive@> 22810primitive("futurelet",let,normal+1);@/ 22811@!@:future_let_}{\.{\\futurelet} primitive@> 22812 22813@ @<Cases of |print_cmd_chr|...@>= 22814let: if chr_code<>normal then print_esc("futurelet")@+else print_esc("let"); 22815 22816@ @<Assignments@>= 22817let: begin n:=cur_chr; 22818 get_r_token; p:=cur_cs; 22819 if n=normal then 22820 begin repeat get_token; 22821 until cur_cmd<>spacer; 22822 if cur_tok=other_token+"=" then 22823 begin get_token; 22824 if cur_cmd=spacer then get_token; 22825 end; 22826 end 22827 else begin get_token; q:=cur_tok; get_token; back_input; 22828 cur_tok:=q; back_input; {look ahead, then back up} 22829 end; {note that |back_input| doesn't affect |cur_cmd|, |cur_chr|} 22830 if cur_cmd>=call then add_token_ref(cur_chr); 22831 define(p,cur_cmd,cur_chr); 22832 end; 22833 22834@ A \.{\\chardef} creates a control sequence whose |cmd| is |char_given|; 22835a \.{\\mathchardef} creates a control sequence whose |cmd| is |math_given|; 22836and the corresponding |chr| is the character code or math code. A \.{\\countdef} 22837or \.{\\dimendef} or \.{\\skipdef} or \.{\\muskipdef} creates a control 22838sequence whose |cmd| is |assign_int| or \dots\ or |assign_mu_glue|, and the 22839corresponding |chr| is the |eqtb| location of the internal register in question. 22840 22841@d char_def_code=0 {|shorthand_def| for \.{\\chardef}} 22842@d math_char_def_code=1 {|shorthand_def| for \.{\\mathchardef}} 22843@d count_def_code=2 {|shorthand_def| for \.{\\countdef}} 22844@d dimen_def_code=3 {|shorthand_def| for \.{\\dimendef}} 22845@d skip_def_code=4 {|shorthand_def| for \.{\\skipdef}} 22846@d mu_skip_def_code=5 {|shorthand_def| for \.{\\muskipdef}} 22847@d toks_def_code=6 {|shorthand_def| for \.{\\toksdef}} 22848 22849@<Put each...@>= 22850primitive("chardef",shorthand_def,char_def_code);@/ 22851@!@:char_def_}{\.{\\chardef} primitive@> 22852primitive("mathchardef",shorthand_def,math_char_def_code);@/ 22853@!@:math_char_def_}{\.{\\mathchardef} primitive@> 22854primitive("countdef",shorthand_def,count_def_code);@/ 22855@!@:count_def_}{\.{\\countdef} primitive@> 22856primitive("dimendef",shorthand_def,dimen_def_code);@/ 22857@!@:dimen_def_}{\.{\\dimendef} primitive@> 22858primitive("skipdef",shorthand_def,skip_def_code);@/ 22859@!@:skip_def_}{\.{\\skipdef} primitive@> 22860primitive("muskipdef",shorthand_def,mu_skip_def_code);@/ 22861@!@:mu_skip_def_}{\.{\\muskipdef} primitive@> 22862primitive("toksdef",shorthand_def,toks_def_code);@/ 22863@!@:toks_def_}{\.{\\toksdef} primitive@> 22864 22865@ @<Cases of |print_cmd_chr|...@>= 22866shorthand_def: case chr_code of 22867 char_def_code: print_esc("chardef"); 22868 math_char_def_code: print_esc("mathchardef"); 22869 count_def_code: print_esc("countdef"); 22870 dimen_def_code: print_esc("dimendef"); 22871 skip_def_code: print_esc("skipdef"); 22872 mu_skip_def_code: print_esc("muskipdef"); 22873 othercases print_esc("toksdef") 22874 endcases; 22875char_given: begin print_esc("char"); print_hex(chr_code); 22876 end; 22877math_given: begin print_esc("mathchar"); print_hex(chr_code); 22878 end; 22879 22880@ We temporarily define |p| to be |relax|, so that an occurrence of |p| 22881while scanning the definition will simply stop the scanning instead of 22882producing an ``undefined control sequence'' error or expanding the 22883previous meaning. This allows, for instance, `\.{\\chardef\\foo=123\\foo}'. 22884 22885@<Assignments@>= 22886shorthand_def: begin n:=cur_chr; get_r_token; p:=cur_cs; define(p,relax,256); 22887 scan_optional_equals; 22888 case n of 22889 char_def_code: begin scan_char_num; define(p,char_given,cur_val); 22890 end; 22891 math_char_def_code: begin scan_fifteen_bit_int; define(p,math_given,cur_val); 22892 end; 22893 othercases begin scan_eight_bit_int; 22894 case n of 22895 count_def_code: define(p,assign_int,count_base+cur_val); 22896 dimen_def_code: define(p,assign_dimen,scaled_base+cur_val); 22897 skip_def_code: define(p,assign_glue,skip_base+cur_val); 22898 mu_skip_def_code: define(p,assign_mu_glue,mu_skip_base+cur_val); 22899 toks_def_code: define(p,assign_toks,toks_base+cur_val); 22900 end; {there are no other cases} 22901 end 22902 endcases; 22903 end; 22904 22905@ @<Assignments@>= 22906read_to_cs: begin scan_int; n:=cur_val; 22907 if not scan_keyword("to") then 22908@.to@> 22909 begin print_err("Missing `to' inserted"); 22910@.Missing `to'...@> 22911 help2("You should have said `\read<number> to \cs'.")@/ 22912 ("I'm going to look for the \cs now."); error; 22913 end; 22914 get_r_token; 22915 p:=cur_cs; read_toks(n,p); define(p,call,cur_val); 22916 end; 22917 22918@ The token-list parameters, \.{\\output} and \.{\\everypar}, etc., receive 22919their values in the following way. (For safety's sake, we place an 22920enclosing pair of braces around an \.{\\output} list.) 22921 22922@<Assignments@>= 22923toks_register,assign_toks: begin q:=cur_cs; 22924 if cur_cmd=toks_register then 22925 begin scan_eight_bit_int; p:=toks_base+cur_val; 22926 end 22927 else p:=cur_chr; {|p=every_par_loc| or |output_routine_loc| or \dots} 22928 scan_optional_equals; 22929 @<Get the next non-blank non-relax non-call token@>; 22930 if cur_cmd<>left_brace then @<If the right-hand side is a token parameter 22931 or token register, finish the assignment and |goto done|@>; 22932 back_input; cur_cs:=q; q:=scan_toks(false,false); 22933 if link(def_ref)=null then {empty list: revert to the default} 22934 begin define(p,undefined_cs,null); free_avail(def_ref); 22935 end 22936 else begin if p=output_routine_loc then {enclose in curlies} 22937 begin link(q):=get_avail; q:=link(q); 22938 info(q):=right_brace_token+"}"; 22939 q:=get_avail; info(q):=left_brace_token+"{"; 22940 link(q):=link(def_ref); link(def_ref):=q; 22941 end; 22942 define(p,call,def_ref); 22943 end; 22944 end; 22945 22946@ @<If the right-hand side is a token parameter...@>= 22947begin if cur_cmd=toks_register then 22948 begin scan_eight_bit_int; cur_cmd:=assign_toks; cur_chr:=toks_base+cur_val; 22949 end; 22950if cur_cmd=assign_toks then 22951 begin q:=equiv(cur_chr); 22952 if q=null then define(p,undefined_cs,null) 22953 else begin add_token_ref(q); define(p,call,q); 22954 end; 22955 goto done; 22956 end; 22957end 22958 22959@ Similar routines are used to assign values to the numeric parameters. 22960 22961@<Assignments@>= 22962assign_int: begin p:=cur_chr; scan_optional_equals; scan_int; 22963 word_define(p,cur_val); 22964 end; 22965assign_dimen: begin p:=cur_chr; scan_optional_equals; 22966 scan_normal_dimen; word_define(p,cur_val); 22967 end; 22968assign_glue,assign_mu_glue: begin p:=cur_chr; n:=cur_cmd; scan_optional_equals; 22969 if n=assign_mu_glue then scan_glue(mu_val)@+else scan_glue(glue_val); 22970 trap_zero_glue; 22971 define(p,glue_ref,cur_val); 22972 end; 22973 22974@ When a glue register or parameter becomes zero, it will always point to 22975|zero_glue| because of the following procedure. (Exception: The tabskip 22976glue isn't trapped while preambles are being scanned.) 22977 22978@<Declare subprocedures for |prefixed_command|@>= 22979procedure trap_zero_glue; 22980begin if (width(cur_val)=0)and(stretch(cur_val)=0)and(shrink(cur_val)=0) then 22981 begin add_glue_ref(zero_glue); 22982 delete_glue_ref(cur_val); cur_val:=zero_glue; 22983 end; 22984end; 22985 22986@ The various character code tables are changed by the |def_code| commands, 22987and the font families are declared by |def_family|. 22988 22989@<Put each...@>= 22990primitive("catcode",def_code,cat_code_base); 22991@!@:cat_code_}{\.{\\catcode} primitive@> 22992primitive("mathcode",def_code,math_code_base); 22993@!@:math_code_}{\.{\\mathcode} primitive@> 22994primitive("lccode",def_code,lc_code_base); 22995@!@:lc_code_}{\.{\\lccode} primitive@> 22996primitive("uccode",def_code,uc_code_base); 22997@!@:uc_code_}{\.{\\uccode} primitive@> 22998primitive("sfcode",def_code,sf_code_base); 22999@!@:sf_code_}{\.{\\sfcode} primitive@> 23000primitive("delcode",def_code,del_code_base); 23001@!@:del_code_}{\.{\\delcode} primitive@> 23002primitive("textfont",def_family,math_font_base); 23003@!@:text_font_}{\.{\\textfont} primitive@> 23004primitive("scriptfont",def_family,math_font_base+script_size); 23005@!@:script_font_}{\.{\\scriptfont} primitive@> 23006primitive("scriptscriptfont",def_family,math_font_base+script_script_size); 23007@!@:script_script_font_}{\.{\\scriptscriptfont} primitive@> 23008 23009@ @<Cases of |print_cmd_chr|...@>= 23010def_code: if chr_code=cat_code_base then print_esc("catcode") 23011 else if chr_code=math_code_base then print_esc("mathcode") 23012 else if chr_code=lc_code_base then print_esc("lccode") 23013 else if chr_code=uc_code_base then print_esc("uccode") 23014 else if chr_code=sf_code_base then print_esc("sfcode") 23015 else print_esc("delcode"); 23016def_family: print_size(chr_code-math_font_base); 23017 23018@ The different types of code values have different legal ranges; the 23019following program is careful to check each case properly. 23020 23021@<Assignments@>= 23022def_code: begin @<Let |n| be the largest legal code value, based on |cur_chr|@>; 23023 p:=cur_chr; scan_char_num; p:=p+cur_val; scan_optional_equals; 23024 scan_int; 23025 if ((cur_val<0)and(p<del_code_base))or(cur_val>n) then 23026 begin print_err("Invalid code ("); print_int(cur_val); 23027@.Invalid code@> 23028 if p<del_code_base then print("), should be in the range 0..") 23029 else print("), should be at most "); 23030 print_int(n); 23031 help1("I'm going to use 0 instead of that illegal code value.");@/ 23032 error; cur_val:=0; 23033 end; 23034 if p<math_code_base then define(p,data,cur_val) 23035 else if p<del_code_base then define(p,data,hi(cur_val)) 23036 else word_define(p,cur_val); 23037 end; 23038 23039@ @<Let |n| be the largest...@>= 23040if cur_chr=cat_code_base then n:=max_char_code 23041else if cur_chr=math_code_base then n:=@'100000 23042else if cur_chr=sf_code_base then n:=@'77777 23043else if cur_chr=del_code_base then n:=@'77777777 23044else n:=255 23045 23046@ @<Assignments@>= 23047def_family: begin p:=cur_chr; scan_four_bit_int; p:=p+cur_val; 23048 scan_optional_equals; scan_font_ident; define(p,data,cur_val); 23049 end; 23050 23051@ Next we consider changes to \TeX's numeric registers. 23052 23053@<Assignments@>= 23054register,advance,multiply,divide: do_register_command(a); 23055 23056@ We use the fact that |register<advance<multiply<divide|. 23057 23058@<Declare subprocedures for |prefixed_command|@>= 23059procedure do_register_command(@!a:small_number); 23060label found,exit; 23061var l,@!q,@!r,@!s:pointer; {for list manipulation} 23062@!p:int_val..mu_val; {type of register involved} 23063begin q:=cur_cmd; 23064@<Compute the register location |l| and its type |p|; but |return| if invalid@>; 23065if q=register then scan_optional_equals 23066else if scan_keyword("by") then do_nothing; {optional `\.{by}'} 23067@.by@> 23068arith_error:=false; 23069if q<multiply then @<Compute result of |register| or 23070 |advance|, put it in |cur_val|@> 23071else @<Compute result of |multiply| or |divide|, put it in |cur_val|@>; 23072if arith_error then 23073 begin print_err("Arithmetic overflow"); 23074@.Arithmetic overflow@> 23075 help2("I can't carry out that multiplication or division,")@/ 23076 ("since the result is out of range."); 23077 if p>=glue_val then delete_glue_ref(cur_val); 23078 error; return; 23079 end; 23080if p<glue_val then word_define(l,cur_val) 23081else begin trap_zero_glue; define(l,glue_ref,cur_val); 23082 end; 23083exit: end; 23084 23085@ Here we use the fact that the consecutive codes |int_val..mu_val| and 23086|assign_int..assign_mu_glue| correspond to each other nicely. 23087 23088@<Compute the register location |l| and its type |p|...@>= 23089begin if q<>register then 23090 begin get_x_token; 23091 if (cur_cmd>=assign_int)and(cur_cmd<=assign_mu_glue) then 23092 begin l:=cur_chr; p:=cur_cmd-assign_int; goto found; 23093 end; 23094 if cur_cmd<>register then 23095 begin print_err("You can't use `"); print_cmd_chr(cur_cmd,cur_chr); 23096@.You can't use x after ...@> 23097 print("' after "); print_cmd_chr(q,0); 23098 help1("I'm forgetting what you said and not changing anything."); 23099 error; return; 23100 end; 23101 end; 23102p:=cur_chr; scan_eight_bit_int; 23103case p of 23104int_val: l:=cur_val+count_base; 23105dimen_val: l:=cur_val+scaled_base; 23106glue_val: l:=cur_val+skip_base; 23107mu_val: l:=cur_val+mu_skip_base; 23108end; {there are no other cases} 23109end; 23110found: 23111 23112@ @<Compute result of |register| or |advance|...@>= 23113if p<glue_val then 23114 begin if p=int_val then scan_int@+else scan_normal_dimen; 23115 if q=advance then cur_val:=cur_val+eqtb[l].int; 23116 end 23117else begin scan_glue(p); 23118 if q=advance then @<Compute the sum of two glue specs@>; 23119 end 23120 23121@ @<Compute the sum of two glue specs@>= 23122begin q:=new_spec(cur_val); r:=equiv(l); 23123delete_glue_ref(cur_val); 23124width(q):=width(q)+width(r); 23125if stretch(q)=0 then stretch_order(q):=normal; 23126if stretch_order(q)=stretch_order(r) then stretch(q):=stretch(q)+stretch(r) 23127else if (stretch_order(q)<stretch_order(r))and(stretch(r)<>0) then 23128 begin stretch(q):=stretch(r); stretch_order(q):=stretch_order(r); 23129 end; 23130if shrink(q)=0 then shrink_order(q):=normal; 23131if shrink_order(q)=shrink_order(r) then shrink(q):=shrink(q)+shrink(r) 23132else if (shrink_order(q)<shrink_order(r))and(shrink(r)<>0) then 23133 begin shrink(q):=shrink(r); shrink_order(q):=shrink_order(r); 23134 end; 23135cur_val:=q; 23136end 23137 23138@ @<Compute result of |multiply| or |divide|...@>= 23139begin scan_int; 23140if p<glue_val then 23141 if q=multiply then 23142 if p=int_val then cur_val:=mult_integers(eqtb[l].int,cur_val) 23143 else cur_val:=nx_plus_y(eqtb[l].int,cur_val,0) 23144 else cur_val:=x_over_n(eqtb[l].int,cur_val) 23145else begin s:=equiv(l); r:=new_spec(s); 23146 if q=multiply then 23147 begin width(r):=nx_plus_y(width(s),cur_val,0); 23148 stretch(r):=nx_plus_y(stretch(s),cur_val,0); 23149 shrink(r):=nx_plus_y(shrink(s),cur_val,0); 23150 end 23151 else begin width(r):=x_over_n(width(s),cur_val); 23152 stretch(r):=x_over_n(stretch(s),cur_val); 23153 shrink(r):=x_over_n(shrink(s),cur_val); 23154 end; 23155 cur_val:=r; 23156 end; 23157end 23158 23159@ The processing of boxes is somewhat different, because we may need 23160to scan and create an entire box before we actually change the value of the old 23161one. 23162 23163@<Assignments@>= 23164set_box: begin scan_eight_bit_int; 23165 if global then n:=256+cur_val@+else n:=cur_val; 23166 scan_optional_equals; 23167 if set_box_allowed then scan_box(box_flag+n) 23168 else begin print_err("Improper "); print_esc("setbox"); 23169@.Improper \\setbox@> 23170 help2("Sorry, \setbox is not allowed after \halign in a display,")@/ 23171 ("or between \accent and an accented character."); error; 23172 end; 23173 end; 23174 23175@ The |space_factor| or |prev_depth| settings are changed when a |set_aux| 23176command is sensed. Similarly, |prev_graf| is changed in the presence of 23177|set_prev_graf|, and |dead_cycles| or |insert_penalties| in the presence of 23178|set_page_int|. These definitions are always global. 23179 23180When some dimension of a box register is changed, the change isn't exactly 23181global; but \TeX\ does not look at the \.{\\global} switch. 23182 23183@<Assignments@>= 23184set_aux:alter_aux; 23185set_prev_graf:alter_prev_graf; 23186set_page_dimen:alter_page_so_far; 23187set_page_int:alter_integer; 23188set_box_dimen:alter_box_dimen; 23189 23190@ @<Declare subprocedures for |prefixed_command|@>= 23191procedure alter_aux; 23192var c:halfword; {|hmode| or |vmode|} 23193begin if cur_chr<>abs(mode) then report_illegal_case 23194else begin c:=cur_chr; scan_optional_equals; 23195 if c=vmode then 23196 begin scan_normal_dimen; prev_depth:=cur_val; 23197 end 23198 else begin scan_int; 23199 if (cur_val<=0)or(cur_val>32767) then 23200 begin print_err("Bad space factor"); 23201@.Bad space factor@> 23202 help1("I allow only values in the range 1..32767 here."); 23203 int_error(cur_val); 23204 end 23205 else space_factor:=cur_val; 23206 end; 23207 end; 23208end; 23209 23210@ @<Declare subprocedures for |prefixed_command|@>= 23211procedure alter_prev_graf; 23212var p:0..nest_size; {index into |nest|} 23213begin nest[nest_ptr]:=cur_list; p:=nest_ptr; 23214while abs(nest[p].mode_field)<>vmode do decr(p); 23215scan_optional_equals; scan_int; 23216if cur_val<0 then 23217 begin print_err("Bad "); print_esc("prevgraf"); 23218@.Bad \\prevgraf@> 23219 help1("I allow only nonnegative values here."); 23220 int_error(cur_val); 23221 end 23222else begin nest[p].pg_field:=cur_val; cur_list:=nest[nest_ptr]; 23223 end; 23224end; 23225 23226@ @<Declare subprocedures for |prefixed_command|@>= 23227procedure alter_page_so_far; 23228var c:0..7; {index into |page_so_far|} 23229begin c:=cur_chr; scan_optional_equals; scan_normal_dimen; 23230page_so_far[c]:=cur_val; 23231end; 23232 23233@ @<Declare subprocedures for |prefixed_command|@>= 23234procedure alter_integer; 23235var c:0..1; {0 for \.{\\deadcycles}, 1 for \.{\\insertpenalties}} 23236begin c:=cur_chr; scan_optional_equals; scan_int; 23237if c=0 then dead_cycles:=cur_val 23238else insert_penalties:=cur_val; 23239end; 23240 23241@ @<Declare subprocedures for |prefixed_command|@>= 23242procedure alter_box_dimen; 23243var c:small_number; {|width_offset| or |height_offset| or |depth_offset|} 23244@!b:eight_bits; {box number} 23245begin c:=cur_chr; scan_eight_bit_int; b:=cur_val; scan_optional_equals; 23246scan_normal_dimen; 23247if box(b)<>null then mem[box(b)+c].sc:=cur_val; 23248end; 23249 23250@ Paragraph shapes are set up in the obvious way. 23251 23252@<Assignments@>= 23253set_shape: begin scan_optional_equals; scan_int; n:=cur_val; 23254 if n<=0 then p:=null 23255 else begin p:=get_node(2*n+1); info(p):=n; 23256 for j:=1 to n do 23257 begin scan_normal_dimen; 23258 mem[p+2*j-1].sc:=cur_val; {indentation} 23259 scan_normal_dimen; 23260 mem[p+2*j].sc:=cur_val; {width} 23261 end; 23262 end; 23263 define(par_shape_loc,shape_ref,p); 23264 end; 23265 23266@ Here's something that isn't quite so obvious. It guarantees that 23267|info(par_shape_ptr)| can hold any positive~|n| for which |get_node(2*n+1)| 23268doesn't overflow the memory capacity. 23269 23270@<Check the ``constant''...@>= 23271if 2*max_halfword<mem_top-mem_min then bad:=41; 23272 23273@ New hyphenation data is loaded by the |hyph_data| command. 23274 23275@<Put each...@>= 23276primitive("hyphenation",hyph_data,0); 23277@!@:hyphenation_}{\.{\\hyphenation} primitive@> 23278primitive("patterns",hyph_data,1); 23279@!@:patterns_}{\.{\\patterns} primitive@> 23280 23281@ @<Cases of |print_cmd_chr|...@>= 23282hyph_data: if chr_code=1 then print_esc("patterns") 23283 else print_esc("hyphenation"); 23284 23285@ @<Assignments@>= 23286hyph_data: if cur_chr=1 then 23287 begin @!init new_patterns; goto done;@;@+tini@/ 23288 print_err("Patterns can be loaded only by INITEX"); 23289@.Patterns can be...@> 23290 help0; error; 23291 repeat get_token; until cur_cmd=right_brace; {flush the patterns} 23292 return; 23293 end 23294 else begin new_hyph_exceptions; goto done; 23295 end; 23296 23297@ All of \TeX's parameters are kept in |eqtb| except the font information, 23298the interaction mode, and the hyphenation tables; these are strictly global. 23299 23300@<Assignments@>= 23301assign_font_dimen: begin find_font_dimen(true); k:=cur_val; 23302 scan_optional_equals; scan_normal_dimen; font_info[k].sc:=cur_val; 23303 end; 23304assign_font_int: begin n:=cur_chr; scan_font_ident; f:=cur_val; 23305 scan_optional_equals; scan_int; 23306 if n=0 then hyphen_char[f]:=cur_val@+else skew_char[f]:=cur_val; 23307 end; 23308 23309@ @<Put each...@>= 23310primitive("hyphenchar",assign_font_int,0); 23311@!@:hyphen_char_}{\.{\\hyphenchar} primitive@> 23312primitive("skewchar",assign_font_int,1); 23313@!@:skew_char_}{\.{\\skewchar} primitive@> 23314 23315@ @<Cases of |print_cmd_chr|...@>= 23316assign_font_int: if chr_code=0 then print_esc("hyphenchar") 23317 else print_esc("skewchar"); 23318 23319@ Here is where the information for a new font gets loaded. 23320 23321@<Assignments@>= 23322def_font: new_font(a); 23323 23324@ @<Declare subprocedures for |prefixed_command|@>= 23325procedure new_font(@!a:small_number); 23326label common_ending; 23327var u:pointer; {user's font identifier} 23328@!s:scaled; {stated ``at'' size, or negative of scaled magnification} 23329@!f:internal_font_number; {runs through existing fonts} 23330@!t:str_number; {name for the frozen font identifier} 23331@!old_setting:0..max_selector; {holds |selector| setting} 23332@!flushable_string:str_number; {string not yet referenced} 23333begin if job_name=0 then open_log_file; 23334 {avoid confusing \.{texput} with the font name} 23335@.texput@> 23336get_r_token; u:=cur_cs; 23337if u>=hash_base then t:=text(u) 23338else if u>=single_base then 23339 if u=null_cs then t:="FONT"@+else t:=u-single_base 23340else begin old_setting:=selector; selector:=new_string; 23341 print("FONT"); print(u-active_base); selector:=old_setting; 23342@.FONTx@> 23343 str_room(1); t:=make_string; 23344 end; 23345define(u,set_font,null_font); scan_optional_equals; scan_file_name; 23346@<Scan the font size specification@>; 23347@<If this font has already been loaded, set |f| to the internal 23348 font number and |goto common_ending|@>; 23349f:=read_font_info(u,cur_name,cur_area,s); 23350common_ending: equiv(u):=f; eqtb[font_id_base+f]:=eqtb[u]; font_id_text(f):=t; 23351end; 23352 23353@ @<Scan the font size specification@>= 23354name_in_progress:=true; {this keeps |cur_name| from being changed} 23355if scan_keyword("at") then @<Put the \(p)(positive) `at' size into |s|@> 23356@.at@> 23357else if scan_keyword("scaled") then 23358@.scaled@> 23359 begin scan_int; s:=-cur_val; 23360 if (cur_val<=0)or(cur_val>32768) then 23361 begin print_err("Illegal magnification has been changed to 1000");@/ 23362@.Illegal magnification...@> 23363 help1("The magnification ratio must be between 1 and 32768."); 23364 int_error(cur_val); s:=-1000; 23365 end; 23366 end 23367else s:=-1000; 23368name_in_progress:=false 23369 23370@ @<Put the \(p)(positive) `at' size into |s|@>= 23371begin scan_normal_dimen; s:=cur_val; 23372if (s<=0)or(s>=@'1000000000) then 23373 begin print_err("Improper `at' size ("); 23374 print_scaled(s); print("pt), replaced by 10pt"); 23375@.Improper `at' size...@> 23376 help2("I can only handle fonts at positive sizes that are")@/ 23377 ("less than 2048pt, so I've changed what you said to 10pt."); 23378 error; s:=10*unity; 23379 end; 23380end 23381 23382@ When the user gives a new identifier to a font that was previously loaded, 23383the new name becomes the font identifier of record. Font names `\.{xyz}' and 23384`\.{XYZ}' are considered to be different. 23385 23386@<If this font has already been loaded...@>= 23387flushable_string:=str_ptr-1; 23388for f:=font_base+1 to font_ptr do 23389 if str_eq_str(font_name[f],cur_name)and str_eq_str(font_area[f],cur_area) then 23390 begin if cur_name=flushable_string then 23391 begin flush_string; cur_name:=font_name[f]; 23392 end; 23393 if s>0 then 23394 begin if s=font_size[f] then goto common_ending; 23395 end 23396 else if font_size[f]=xn_over_d(font_dsize[f],-s,1000) then 23397 goto common_ending; 23398 end 23399 23400@ @<Cases of |print_cmd_chr|...@>= 23401set_font:begin print("select font "); slow_print(font_name[chr_code]); 23402 if font_size[chr_code]<>font_dsize[chr_code] then 23403 begin print(" at "); print_scaled(font_size[chr_code]); 23404 print("pt"); 23405 end; 23406 end; 23407 23408@ @<Put each...@>= 23409primitive("batchmode",set_interaction,batch_mode); 23410@!@:batch_mode_}{\.{\\batchmode} primitive@> 23411primitive("nonstopmode",set_interaction,nonstop_mode); 23412@!@:nonstop_mode_}{\.{\\nonstopmode} primitive@> 23413primitive("scrollmode",set_interaction,scroll_mode); 23414@!@:scroll_mode_}{\.{\\scrollmode} primitive@> 23415primitive("errorstopmode",set_interaction,error_stop_mode); 23416@!@:error_stop_mode_}{\.{\\errorstopmode} primitive@> 23417 23418@ @<Cases of |print_cmd_chr|...@>= 23419set_interaction: case chr_code of 23420 batch_mode: print_esc("batchmode"); 23421 nonstop_mode: print_esc("nonstopmode"); 23422 scroll_mode: print_esc("scrollmode"); 23423 othercases print_esc("errorstopmode") 23424 endcases; 23425 23426@ @<Assignments@>= 23427set_interaction: new_interaction; 23428 23429@ @<Declare subprocedures for |prefixed_command|@>= 23430procedure new_interaction; 23431begin print_ln; 23432interaction:=cur_chr; 23433@<Initialize the print |selector| based on |interaction|@>; 23434if log_opened then selector:=selector+2; 23435end; 23436 23437@ The \.{\\afterassignment} command puts a token into the global 23438variable |after_token|. This global variable is examined just after 23439every assignment has been performed. 23440 23441@<Glob...@>= 23442@!after_token:halfword; {zero, or a saved token} 23443 23444@ @<Set init...@>= 23445after_token:=0; 23446 23447@ @<Cases of |main_control| that don't...@>= 23448any_mode(after_assignment):begin get_token; after_token:=cur_tok; 23449 end; 23450 23451@ @<Insert a token saved by \.{\\afterassignment}, if any@>= 23452if after_token<>0 then 23453 begin cur_tok:=after_token; back_input; after_token:=0; 23454 end 23455 23456@ Here is a procedure that might be called `Get the next non-blank non-relax 23457non-call non-assignment token'. 23458 23459@<Declare act...@>= 23460procedure do_assignments; 23461label exit; 23462begin loop begin @<Get the next non-blank non-relax...@>; 23463 if cur_cmd<=max_non_prefixed_command then return; 23464 set_box_allowed:=false; prefixed_command; set_box_allowed:=true; 23465 end; 23466exit:end; 23467 23468@ @<Cases of |main_control| that don't...@>= 23469any_mode(after_group):begin get_token; save_for_after(cur_tok); 23470 end; 23471 23472@ Files for \.{\\read} are opened and closed by the |in_stream| command. 23473 23474@<Put each...@>= 23475primitive("openin",in_stream,1); 23476@!@:open_in_}{\.{\\openin} primitive@> 23477primitive("closein",in_stream,0); 23478@!@:close_in_}{\.{\\closein} primitive@> 23479 23480@ @<Cases of |print_cmd_chr|...@>= 23481in_stream: if chr_code=0 then print_esc("closein") 23482 else print_esc("openin"); 23483 23484@ @<Cases of |main_control| that don't...@>= 23485any_mode(in_stream): open_or_close_in; 23486 23487@ @<Declare act...@>= 23488procedure open_or_close_in; 23489var c:0..1; {1 for \.{\\openin}, 0 for \.{\\closein}} 23490@!n:0..15; {stream number} 23491begin c:=cur_chr; scan_four_bit_int; n:=cur_val; 23492if read_open[n]<>closed then 23493 begin a_close(read_file[n]); read_open[n]:=closed; 23494 end; 23495if c<>0 then 23496 begin scan_optional_equals; scan_file_name; 23497 if cur_ext="" then cur_ext:=".tex"; 23498 pack_cur_name; 23499 if a_open_in(read_file[n]) then read_open[n]:=just_open; 23500 end; 23501end; 23502 23503@ The user can issue messages to the terminal, regardless of the 23504current mode. 23505 23506@<Cases of |main_control| that don't...@>= 23507any_mode(message):issue_message; 23508 23509@ @<Put each...@>= 23510primitive("message",message,0); 23511@!@:message_}{\.{\\message} primitive@> 23512primitive("errmessage",message,1); 23513@!@:err_message_}{\.{\\errmessage} primitive@> 23514 23515@ @<Cases of |print_cmd_chr|...@>= 23516message: if chr_code=0 then print_esc("message") 23517 else print_esc("errmessage"); 23518 23519@ @<Declare act...@>= 23520procedure issue_message; 23521var old_setting:0..max_selector; {holds |selector| setting} 23522@!c:0..1; {identifies \.{\\message} and \.{\\errmessage}} 23523@!s:str_number; {the message} 23524begin c:=cur_chr; link(garbage):=scan_toks(false,true); 23525old_setting:=selector; selector:=new_string; 23526token_show(def_ref); selector:=old_setting; 23527flush_list(def_ref); 23528str_room(1); s:=make_string; 23529if c=0 then @<Print string |s| on the terminal@> 23530else @<Print string |s| as an error message@>; 23531flush_string; 23532end; 23533 23534@ @<Print string |s| on the terminal@>= 23535begin if term_offset+length(s)>max_print_line-2 then print_ln 23536else if (term_offset>0)or(file_offset>0) then print_char(" "); 23537slow_print(s); update_terminal; 23538end 23539 23540@ If \.{\\errmessage} occurs often in |scroll_mode|, without user-defined 23541\.{\\errhelp}, we don't want to give a long help message each time. So we 23542give a verbose explanation only once. 23543 23544@<Glob...@>= 23545@!long_help_seen:boolean; {has the long \.{\\errmessage} help been used?} 23546 23547@ @<Set init...@>=long_help_seen:=false; 23548 23549@ @<Print string |s| as an error message@>= 23550begin print_err(""); slow_print(s); 23551if err_help<>null then use_err_help:=true 23552else if long_help_seen then help1("(That was another \errmessage.)") 23553else begin if interaction<error_stop_mode then long_help_seen:=true; 23554 help4("This error message was generated by an \errmessage")@/ 23555 ("command, so I can't give any explicit help.")@/ 23556 ("Pretend that you're Hercule Poirot: Examine all clues,")@/ 23557@^Poirot, Hercule@> 23558 ("and deduce the truth by order and method."); 23559 end; 23560error; use_err_help:=false; 23561end 23562 23563@ The |error| routine calls on |give_err_help| if help is requested from 23564the |err_help| parameter. 23565 23566@p procedure give_err_help; 23567begin token_show(err_help); 23568end; 23569 23570@ The \.{\\uppercase} and \.{\\lowercase} commands are implemented by 23571building a token list and then changing the cases of the letters in it. 23572 23573@<Cases of |main_control| that don't...@>= 23574any_mode(case_shift):shift_case; 23575 23576@ @<Put each...@>= 23577primitive("lowercase",case_shift,lc_code_base); 23578@!@:lowercase_}{\.{\\lowercase} primitive@> 23579primitive("uppercase",case_shift,uc_code_base); 23580@!@:uppercase_}{\.{\\uppercase} primitive@> 23581 23582@ @<Cases of |print_cmd_chr|...@>= 23583case_shift:if chr_code=lc_code_base then print_esc("lowercase") 23584 else print_esc("uppercase"); 23585 23586@ @<Declare act...@>= 23587procedure shift_case; 23588var b:pointer; {|lc_code_base| or |uc_code_base|} 23589@!p:pointer; {runs through the token list} 23590@!t:halfword; {token} 23591@!c:eight_bits; {character code} 23592begin b:=cur_chr; p:=scan_toks(false,false); p:=link(def_ref); 23593while p<>null do 23594 begin @<Change the case of the token in |p|, if a change is appropriate@>; 23595 p:=link(p); 23596 end; 23597back_list(link(def_ref)); free_avail(def_ref); {omit reference count} 23598end; 23599 23600@ When the case of a |chr_code| changes, we don't change the |cmd|. 23601We also change active characters, using the fact that 23602|cs_token_flag+active_base| is a multiple of~256. 23603@^data structure assumptions@> 23604 23605@<Change the case of the token in |p|, if a change is appropriate@>= 23606t:=info(p); 23607if t<cs_token_flag+single_base then 23608 begin c:=t mod 256; 23609 if equiv(b+c)<>0 then info(p):=t-c+equiv(b+c); 23610 end 23611 23612@ We come finally to the last pieces missing from |main_control|, namely the 23613`\.{\\show}' commands that are useful when debugging. 23614 23615@<Cases of |main_control| that don't...@>= 23616any_mode(xray): show_whatever; 23617 23618@ @d show_code=0 { \.{\\show} } 23619@d show_box_code=1 { \.{\\showbox} } 23620@d show_the_code=2 { \.{\\showthe} } 23621@d show_lists=3 { \.{\\showlists} } 23622 23623@<Put each...@>= 23624primitive("show",xray,show_code); 23625@!@:show_}{\.{\\show} primitive@> 23626primitive("showbox",xray,show_box_code); 23627@!@:show_box_}{\.{\\showbox} primitive@> 23628primitive("showthe",xray,show_the_code); 23629@!@:show_the_}{\.{\\showthe} primitive@> 23630primitive("showlists",xray,show_lists); 23631@!@:show_lists_}{\.{\\showlists} primitive@> 23632 23633@ @<Cases of |print_cmd_chr|...@>= 23634xray: case chr_code of 23635 show_box_code:print_esc("showbox"); 23636 show_the_code:print_esc("showthe"); 23637 show_lists:print_esc("showlists"); 23638 othercases print_esc("show") 23639 endcases; 23640 23641@ @<Declare act...@>= 23642procedure show_whatever; 23643label common_ending; 23644var p:pointer; {tail of a token list to show} 23645begin case cur_chr of 23646show_lists: begin begin_diagnostic; show_activities; 23647 end; 23648show_box_code: @<Show the current contents of a box@>; 23649show_code: @<Show the current meaning of a token, then |goto common_ending|@>; 23650othercases @<Show the current value of some parameter or register, 23651 then |goto common_ending|@> 23652endcases;@/ 23653@<Complete a potentially long \.{\\show} command@>; 23654common_ending: if interaction<error_stop_mode then 23655 begin help0; decr(error_count); 23656 end 23657else if tracing_online>0 then 23658 begin@t@>@;@/ 23659 help3("This isn't an error message; I'm just \showing something.")@/ 23660 ("Type `I\show...' to show more (e.g., \show\cs,")@/ 23661 ("\showthe\count10, \showbox255, \showlists)."); 23662 end 23663else begin@t@>@;@/ 23664 help5("This isn't an error message; I'm just \showing something.")@/ 23665 ("Type `I\show...' to show more (e.g., \show\cs,")@/ 23666 ("\showthe\count10, \showbox255, \showlists).")@/ 23667 ("And type `I\tracingonline=1\show...' to show boxes and")@/ 23668 ("lists on your terminal as well as in the transcript file."); 23669 end; 23670error; 23671end; 23672 23673@ @<Show the current meaning of a token...@>= 23674begin get_token; 23675if interaction=error_stop_mode then wake_up_terminal; 23676print_nl("> "); 23677if cur_cs<>0 then 23678 begin sprint_cs(cur_cs); print_char("="); 23679 end; 23680print_meaning; goto common_ending; 23681end 23682 23683@ @<Cases of |print_cmd_chr|...@>= 23684undefined_cs: print("undefined"); 23685call: print("macro"); 23686long_call: print_esc("long macro"); 23687outer_call: print_esc("outer macro"); 23688long_outer_call: begin print_esc("long"); print_esc("outer macro"); 23689 end; 23690end_template: print_esc("outer endtemplate"); 23691 23692@ @<Show the current contents of a box@>= 23693begin scan_eight_bit_int; begin_diagnostic; 23694print_nl("> \box"); print_int(cur_val); print_char("="); 23695if box(cur_val)=null then print("void") 23696else show_box(box(cur_val)); 23697end 23698 23699@ @<Show the current value of some parameter...@>= 23700begin p:=the_toks; 23701if interaction=error_stop_mode then wake_up_terminal; 23702print_nl("> "); token_show(temp_head); 23703flush_list(link(temp_head)); goto common_ending; 23704end 23705 23706@ @<Complete a potentially long \.{\\show} command@>= 23707end_diagnostic(true); print_err("OK"); 23708@.OK@> 23709if selector=term_and_log then if tracing_online<=0 then 23710 begin selector:=term_only; print(" (see the transcript file)"); 23711 selector:=term_and_log; 23712 end 23713 23714@* \[50] Dumping and undumping the tables. 23715After \.{INITEX} has seen a collection of fonts and macros, it 23716can write all the necessary information on an auxiliary file so 23717that production versions of \TeX\ are able to initialize their 23718memory at high speed. The present section of the program takes 23719care of such output and input. We shall consider simultaneously 23720the processes of storing and restoring, 23721so that the inverse relation between them is clear. 23722@.INITEX@> 23723 23724The global variable |format_ident| is a string that is printed right 23725after the |banner| line when \TeX\ is ready to start. For \.{INITEX} this 23726string says simply `\.{(INITEX)}'; for other versions of \TeX\ it says, 23727for example, `\.{(preloaded format=plain 1982.11.19)}', showing the year, 23728month, and day that the format file was created. We have |format_ident=0| 23729before \TeX's tables are loaded. 23730 23731@<Glob...@>= 23732@!format_ident:str_number; 23733 23734@ @<Set init...@>= 23735format_ident:=0; 23736 23737@ @<Initialize table entries...@>= 23738format_ident:=" (INITEX)"; 23739 23740@ @<Declare act...@>= 23741@!init procedure store_fmt_file; 23742label found1,found2,done1,done2; 23743var j,@!k,@!l:integer; {all-purpose indices} 23744@!p,@!q: pointer; {all-purpose pointers} 23745@!x: integer; {something to dump} 23746@!w: four_quarters; {four ASCII codes} 23747begin @<If dumping is not allowed, abort@>; 23748@<Create the |format_ident|, open the format file, 23749 and inform the user that dumping has begun@>; 23750@<Dump constants for consistency check@>; 23751@<Dump the string pool@>; 23752@<Dump the dynamic memory@>; 23753@<Dump the table of equivalents@>; 23754@<Dump the font information@>; 23755@<Dump the hyphenation tables@>; 23756@<Dump a couple more things and the closing check word@>; 23757@<Close the format file@>; 23758end; 23759tini 23760 23761@ Corresponding to the procedure that dumps a format file, we have a function 23762that reads one in. The function returns |false| if the dumped format is 23763incompatible with the present \TeX\ table sizes, etc. 23764 23765@d bad_fmt=6666 {go here if the format file is unacceptable} 23766@d too_small(#)==begin wake_up_terminal; 23767 wterm_ln('---! Must increase the ',#); 23768@.Must increase the x@> 23769 goto bad_fmt; 23770 end 23771 23772@p @t\4@>@<Declare the function called |open_fmt_file|@>@; 23773function load_fmt_file:boolean; 23774label bad_fmt,exit; 23775var j,@!k:integer; {all-purpose indices} 23776@!p,@!q: pointer; {all-purpose pointers} 23777@!x: integer; {something undumped} 23778@!w: four_quarters; {four ASCII codes} 23779begin @<Undump constants for consistency check@>; 23780@<Undump the string pool@>; 23781@<Undump the dynamic memory@>; 23782@<Undump the table of equivalents@>; 23783@<Undump the font information@>; 23784@<Undump the hyphenation tables@>; 23785@<Undump a couple more things and the closing check word@>; 23786load_fmt_file:=true; return; {it worked!} 23787bad_fmt: wake_up_terminal; 23788 wterm_ln('(Fatal format file error; I''m stymied)'); 23789@.Fatal format file error@> 23790load_fmt_file:=false; 23791exit:end; 23792 23793@ The user is not allowed to dump a format file unless |save_ptr=0|. 23794This condition implies that |cur_level=level_one|, hence 23795the |xeq_level| array is constant and it need not be dumped. 23796 23797@<If dumping is not allowed, abort@>= 23798if save_ptr<>0 then 23799 begin print_err("You can't dump inside a group"); 23800@.You can't dump...@> 23801 help1("`{...\dump}' is a no-no."); succumb; 23802 end 23803 23804@ Format files consist of |memory_word| items, and we use the following 23805macros to dump words of different types: 23806 23807@d dump_wd(#)==begin fmt_file^:=#; put(fmt_file);@+end 23808@d dump_int(#)==begin fmt_file^.int:=#; put(fmt_file);@+end 23809@d dump_hh(#)==begin fmt_file^.hh:=#; put(fmt_file);@+end 23810@d dump_qqqq(#)==begin fmt_file^.qqqq:=#; put(fmt_file);@+end 23811 23812@<Glob...@>= 23813@!fmt_file:word_file; {for input or output of format information} 23814 23815@ The inverse macros are slightly more complicated, since we need to check 23816the range of the values we are reading in. We say `|undump(a)(b)(x)|' to 23817read an integer value |x| that is supposed to be in the range |a<=x<=b|. 23818 23819@d undump_wd(#)==begin get(fmt_file); #:=fmt_file^;@+end 23820@d undump_int(#)==begin get(fmt_file); #:=fmt_file^.int;@+end 23821@d undump_hh(#)==begin get(fmt_file); #:=fmt_file^.hh;@+end 23822@d undump_qqqq(#)==begin get(fmt_file); #:=fmt_file^.qqqq;@+end 23823@d undump_end_end(#)==#:=x;@+end 23824@d undump_end(#)==(x>#) then goto bad_fmt@+else undump_end_end 23825@d undump(#)==begin undump_int(x); if (x<#) or undump_end 23826@d undump_size_end_end(#)==too_small(#)@+else undump_end_end 23827@d undump_size_end(#)==if x># then undump_size_end_end 23828@d undump_size(#)==begin undump_int(x); 23829 if x<# then goto bad_fmt; undump_size_end 23830 23831@ The next few sections of the program should make it clear how we use the 23832dump/undump macros. 23833 23834@<Dump constants for consistency check@>= 23835dump_int(@$);@/ 23836dump_int(mem_bot);@/ 23837dump_int(mem_top);@/ 23838dump_int(eqtb_size);@/ 23839dump_int(hash_prime);@/ 23840dump_int(hyph_size) 23841 23842@ Sections of a \.{WEB} program that are ``commented out'' still contribute 23843strings to the string pool; therefore \.{INITEX} and \TeX\ will have 23844the same strings. (And it is, of course, a good thing that they do.) 23845@.WEB@> 23846@^string pool@> 23847 23848@<Undump constants for consistency check@>= 23849x:=fmt_file^.int; 23850if x<>@$ then goto bad_fmt; {check that strings are the same} 23851undump_int(x); 23852if x<>mem_bot then goto bad_fmt; 23853undump_int(x); 23854if x<>mem_top then goto bad_fmt; 23855undump_int(x); 23856if x<>eqtb_size then goto bad_fmt; 23857undump_int(x); 23858if x<>hash_prime then goto bad_fmt; 23859undump_int(x); 23860if x<>hyph_size then goto bad_fmt 23861 23862@ @d dump_four_ASCII== 23863 w.b0:=qi(so(str_pool[k])); w.b1:=qi(so(str_pool[k+1])); 23864 w.b2:=qi(so(str_pool[k+2])); w.b3:=qi(so(str_pool[k+3])); 23865 dump_qqqq(w) 23866 23867@<Dump the string pool@>= 23868dump_int(pool_ptr); 23869dump_int(str_ptr); 23870for k:=0 to str_ptr do dump_int(str_start[k]); 23871k:=0; 23872while k+4<pool_ptr do 23873 begin dump_four_ASCII; k:=k+4; 23874 end; 23875k:=pool_ptr-4; dump_four_ASCII; 23876print_ln; print_int(str_ptr); print(" strings of total length "); 23877print_int(pool_ptr) 23878 23879@ @d undump_four_ASCII== 23880 undump_qqqq(w); 23881 str_pool[k]:=si(qo(w.b0)); str_pool[k+1]:=si(qo(w.b1)); 23882 str_pool[k+2]:=si(qo(w.b2)); str_pool[k+3]:=si(qo(w.b3)) 23883 23884@<Undump the string pool@>= 23885undump_size(0)(pool_size)('string pool size')(pool_ptr); 23886undump_size(0)(max_strings)('max strings')(str_ptr); 23887for k:=0 to str_ptr do undump(0)(pool_ptr)(str_start[k]); 23888k:=0; 23889while k+4<pool_ptr do 23890 begin undump_four_ASCII; k:=k+4; 23891 end; 23892k:=pool_ptr-4; undump_four_ASCII; 23893init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr 23894 23895@ By sorting the list of available spaces in the variable-size portion of 23896|mem|, we are usually able to get by without having to dump very much 23897of the dynamic memory. 23898 23899We recompute |var_used| and |dyn_used|, so that \.{INITEX} dumps valid 23900information even when it has not been gathering statistics. 23901 23902@<Dump the dynamic memory@>= 23903sort_avail; var_used:=0; 23904dump_int(lo_mem_max); dump_int(rover); 23905p:=mem_bot; q:=rover; x:=0; 23906repeat for k:=p to q+1 do dump_wd(mem[k]); 23907x:=x+q+2-p; var_used:=var_used+q-p; 23908p:=q+node_size(q); q:=rlink(q); 23909until q=rover; 23910var_used:=var_used+lo_mem_max-p; dyn_used:=mem_end+1-hi_mem_min;@/ 23911for k:=p to lo_mem_max do dump_wd(mem[k]); 23912x:=x+lo_mem_max+1-p; 23913dump_int(hi_mem_min); dump_int(avail); 23914for k:=hi_mem_min to mem_end do dump_wd(mem[k]); 23915x:=x+mem_end+1-hi_mem_min; 23916p:=avail; 23917while p<>null do 23918 begin decr(dyn_used); p:=link(p); 23919 end; 23920dump_int(var_used); dump_int(dyn_used); 23921print_ln; print_int(x); 23922print(" memory locations dumped; current usage is "); 23923print_int(var_used); print_char("&"); print_int(dyn_used) 23924 23925@ @<Undump the dynamic memory@>= 23926undump(lo_mem_stat_max+1000)(hi_mem_stat_min-1)(lo_mem_max); 23927undump(lo_mem_stat_max+1)(lo_mem_max)(rover); 23928p:=mem_bot; q:=rover; 23929repeat for k:=p to q+1 do undump_wd(mem[k]); 23930p:=q+node_size(q); 23931if (p>lo_mem_max)or((q>=rlink(q))and(rlink(q)<>rover)) then goto bad_fmt; 23932q:=rlink(q); 23933until q=rover; 23934for k:=p to lo_mem_max do undump_wd(mem[k]); 23935if mem_min<mem_bot-2 then {make more low memory available} 23936 begin p:=llink(rover); q:=mem_min+1; 23937 link(mem_min):=null; info(mem_min):=null; {we don't use the bottom word} 23938 rlink(p):=q; llink(rover):=q;@/ 23939 rlink(q):=rover; llink(q):=p; link(q):=empty_flag; 23940 node_size(q):=mem_bot-q; 23941 end; 23942undump(lo_mem_max+1)(hi_mem_stat_min)(hi_mem_min); 23943undump(null)(mem_top)(avail); mem_end:=mem_top; 23944for k:=hi_mem_min to mem_end do undump_wd(mem[k]); 23945undump_int(var_used); undump_int(dyn_used) 23946 23947@ @<Dump the table of equivalents@>= 23948@<Dump regions 1 to 4 of |eqtb|@>; 23949@<Dump regions 5 and 6 of |eqtb|@>; 23950dump_int(par_loc); dump_int(write_loc);@/ 23951@<Dump the hash table@> 23952 23953@ @<Undump the table of equivalents@>= 23954@<Undump regions 1 to 6 of |eqtb|@>; 23955undump(hash_base)(frozen_control_sequence)(par_loc); 23956par_token:=cs_token_flag+par_loc;@/ 23957undump(hash_base)(frozen_control_sequence)(write_loc);@/ 23958@<Undump the hash table@> 23959 23960@ The table of equivalents usually contains repeated information, so we dump it 23961in compressed form: The sequence of $n+2$ values $(n,x_1,\ldots,x_n,m)$ in the 23962format file represents $n+m$ consecutive entries of |eqtb|, with |m| extra 23963copies of $x_n$, namely $(x_1,\ldots,x_n,x_n,\ldots,x_n)$. 23964 23965@<Dump regions 1 to 4 of |eqtb|@>= 23966k:=active_base; 23967repeat j:=k; 23968while j<int_base-1 do 23969 begin if (equiv(j)=equiv(j+1))and(eq_type(j)=eq_type(j+1))and@| 23970 (eq_level(j)=eq_level(j+1)) then goto found1; 23971 incr(j); 23972 end; 23973l:=int_base; goto done1; {|j=int_base-1|} 23974found1: incr(j); l:=j; 23975while j<int_base-1 do 23976 begin if (equiv(j)<>equiv(j+1))or(eq_type(j)<>eq_type(j+1))or@| 23977 (eq_level(j)<>eq_level(j+1)) then goto done1; 23978 incr(j); 23979 end; 23980done1:dump_int(l-k); 23981while k<l do 23982 begin dump_wd(eqtb[k]); incr(k); 23983 end; 23984k:=j+1; dump_int(k-l); 23985until k=int_base 23986 23987@ @<Dump regions 5 and 6 of |eqtb|@>= 23988repeat j:=k; 23989while j<eqtb_size do 23990 begin if eqtb[j].int=eqtb[j+1].int then goto found2; 23991 incr(j); 23992 end; 23993l:=eqtb_size+1; goto done2; {|j=eqtb_size|} 23994found2: incr(j); l:=j; 23995while j<eqtb_size do 23996 begin if eqtb[j].int<>eqtb[j+1].int then goto done2; 23997 incr(j); 23998 end; 23999done2:dump_int(l-k); 24000while k<l do 24001 begin dump_wd(eqtb[k]); incr(k); 24002 end; 24003k:=j+1; dump_int(k-l); 24004until k>eqtb_size 24005 24006@ @<Undump regions 1 to 6 of |eqtb|@>= 24007k:=active_base; 24008repeat undump_int(x); 24009if (x<1)or(k+x>eqtb_size+1) then goto bad_fmt; 24010for j:=k to k+x-1 do undump_wd(eqtb[j]); 24011k:=k+x; 24012undump_int(x); 24013if (x<0)or(k+x>eqtb_size+1) then goto bad_fmt; 24014for j:=k to k+x-1 do eqtb[j]:=eqtb[k-1]; 24015k:=k+x; 24016until k>eqtb_size 24017 24018@ A different scheme is used to compress the hash table, since its lower 24019region is usually sparse. When |text(p)<>0| for |p<=hash_used|, we output 24020two words, |p| and |hash[p]|. The hash table is, of course, densely packed 24021for |p>=hash_used|, so the remaining entries are output in a~block. 24022 24023@<Dump the hash table@>= 24024dump_int(hash_used); cs_count:=frozen_control_sequence-1-hash_used; 24025for p:=hash_base to hash_used do if text(p)<>0 then 24026 begin dump_int(p); dump_hh(hash[p]); incr(cs_count); 24027 end; 24028for p:=hash_used+1 to undefined_control_sequence-1 do dump_hh(hash[p]); 24029dump_int(cs_count);@/ 24030print_ln; print_int(cs_count); print(" multiletter control sequences") 24031 24032@ @<Undump the hash table@>= 24033undump(hash_base)(frozen_control_sequence)(hash_used); p:=hash_base-1; 24034repeat undump(p+1)(hash_used)(p); undump_hh(hash[p]); 24035until p=hash_used; 24036for p:=hash_used+1 to undefined_control_sequence-1 do undump_hh(hash[p]); 24037undump_int(cs_count) 24038 24039@ @<Dump the font information@>= 24040dump_int(fmem_ptr); 24041for k:=0 to fmem_ptr-1 do dump_wd(font_info[k]); 24042dump_int(font_ptr); 24043for k:=null_font to font_ptr do 24044 @<Dump the array info for internal font number |k|@>; 24045print_ln; print_int(fmem_ptr-7); print(" words of font info for "); 24046print_int(font_ptr-font_base); print(" preloaded font"); 24047if font_ptr<>font_base+1 then print_char("s") 24048 24049@ @<Undump the font information@>= 24050undump_size(7)(font_mem_size)('font mem size')(fmem_ptr); 24051for k:=0 to fmem_ptr-1 do undump_wd(font_info[k]); 24052undump_size(font_base)(font_max)('font max')(font_ptr); 24053for k:=null_font to font_ptr do 24054 @<Undump the array info for internal font number |k|@> 24055 24056@ @<Dump the array info for internal font number |k|@>= 24057begin dump_qqqq(font_check[k]); 24058dump_int(font_size[k]); 24059dump_int(font_dsize[k]); 24060dump_int(font_params[k]);@/ 24061dump_int(hyphen_char[k]); 24062dump_int(skew_char[k]);@/ 24063dump_int(font_name[k]); 24064dump_int(font_area[k]);@/ 24065dump_int(font_bc[k]); 24066dump_int(font_ec[k]);@/ 24067dump_int(char_base[k]); 24068dump_int(width_base[k]); 24069dump_int(height_base[k]);@/ 24070dump_int(depth_base[k]); 24071dump_int(italic_base[k]); 24072dump_int(lig_kern_base[k]);@/ 24073dump_int(kern_base[k]); 24074dump_int(exten_base[k]); 24075dump_int(param_base[k]);@/ 24076dump_int(font_glue[k]);@/ 24077dump_int(bchar_label[k]); 24078dump_int(font_bchar[k]); 24079dump_int(font_false_bchar[k]);@/ 24080print_nl("\font"); print_esc(font_id_text(k)); print_char("="); 24081print_file_name(font_name[k],font_area[k],""); 24082if font_size[k]<>font_dsize[k] then 24083 begin print(" at "); print_scaled(font_size[k]); print("pt"); 24084 end; 24085end 24086 24087@ @<Undump the array info for internal font number |k|@>= 24088begin undump_qqqq(font_check[k]);@/ 24089undump_int(font_size[k]); 24090undump_int(font_dsize[k]); 24091undump(min_halfword)(max_halfword)(font_params[k]);@/ 24092undump_int(hyphen_char[k]); 24093undump_int(skew_char[k]);@/ 24094undump(0)(str_ptr)(font_name[k]); 24095undump(0)(str_ptr)(font_area[k]);@/ 24096undump(0)(255)(font_bc[k]); 24097undump(0)(255)(font_ec[k]);@/ 24098undump_int(char_base[k]); 24099undump_int(width_base[k]); 24100undump_int(height_base[k]);@/ 24101undump_int(depth_base[k]); 24102undump_int(italic_base[k]); 24103undump_int(lig_kern_base[k]);@/ 24104undump_int(kern_base[k]); 24105undump_int(exten_base[k]); 24106undump_int(param_base[k]);@/ 24107undump(min_halfword)(lo_mem_max)(font_glue[k]);@/ 24108undump(0)(fmem_ptr-1)(bchar_label[k]); 24109undump(min_quarterword)(non_char)(font_bchar[k]); 24110undump(min_quarterword)(non_char)(font_false_bchar[k]); 24111end 24112 24113@ @<Dump the hyphenation tables@>= 24114dump_int(hyph_count); 24115for k:=0 to hyph_size do if hyph_word[k]<>0 then 24116 begin dump_int(k); dump_int(hyph_word[k]); dump_int(hyph_list[k]); 24117 end; 24118print_ln; print_int(hyph_count); print(" hyphenation exception"); 24119if hyph_count<>1 then print_char("s"); 24120if trie_not_ready then init_trie; 24121dump_int(trie_max); 24122for k:=0 to trie_max do dump_hh(trie[k]); 24123dump_int(trie_op_ptr); 24124for k:=1 to trie_op_ptr do 24125 begin dump_int(hyf_distance[k]); 24126 dump_int(hyf_num[k]); 24127 dump_int(hyf_next[k]); 24128 end; 24129print_nl("Hyphenation trie of length "); print_int(trie_max); 24130@.Hyphenation trie...@> 24131print(" has "); print_int(trie_op_ptr); print(" op"); 24132if trie_op_ptr<>1 then print_char("s"); 24133print(" out of "); print_int(trie_op_size); 24134for k:=255 downto 0 do if trie_used[k]>min_quarterword then 24135 begin print_nl(" "); print_int(qo(trie_used[k])); 24136 print(" for language "); print_int(k); 24137 dump_int(k); dump_int(qo(trie_used[k])); 24138 end 24139 24140@ Only ``nonempty'' parts of |op_start| need to be restored. 24141 24142@<Undump the hyphenation tables@>= 24143undump(0)(hyph_size)(hyph_count); 24144for k:=1 to hyph_count do 24145 begin undump(0)(hyph_size)(j); 24146 undump(0)(str_ptr)(hyph_word[j]); 24147 undump(min_halfword)(max_halfword)(hyph_list[j]); 24148 end; 24149undump_size(0)(trie_size)('trie size')(j); @+init trie_max:=j;@+tini 24150for k:=0 to j do undump_hh(trie[k]); 24151undump_size(0)(trie_op_size)('trie op size')(j); @+init trie_op_ptr:=j;@+tini 24152for k:=1 to j do 24153 begin undump(0)(63)(hyf_distance[k]); {a |small_number|} 24154 undump(0)(63)(hyf_num[k]); 24155 undump(min_quarterword)(max_quarterword)(hyf_next[k]); 24156 end; 24157init for k:=0 to 255 do trie_used[k]:=min_quarterword;@+tini@;@/ 24158k:=256; 24159while j>0 do 24160 begin undump(0)(k-1)(k); undump(1)(j)(x);@+init trie_used[k]:=qi(x);@+tini@;@/ 24161 j:=j-x; op_start[k]:=qo(j); 24162 end; 24163@!init trie_not_ready:=false @+tini 24164 24165@ We have already printed a lot of statistics, so we set |tracing_stats:=0| 24166to prevent them from appearing again. 24167 24168@<Dump a couple more things and the closing check word@>= 24169dump_int(interaction); dump_int(format_ident); dump_int(69069); 24170tracing_stats:=0 24171 24172@ @<Undump a couple more things and the closing check word@>= 24173undump(batch_mode)(error_stop_mode)(interaction); 24174undump(0)(str_ptr)(format_ident); 24175undump_int(x); 24176if (x<>69069)or eof(fmt_file) then goto bad_fmt 24177 24178@ @<Create the |format_ident|...@>= 24179selector:=new_string; 24180print(" (preloaded format="); print(job_name); print_char(" "); 24181print_int(year); print_char("."); 24182print_int(month); print_char("."); print_int(day); print_char(")"); 24183if interaction=batch_mode then selector:=log_only 24184else selector:=term_and_log; 24185str_room(1); 24186format_ident:=make_string; 24187pack_job_name(format_extension); 24188while not w_open_out(fmt_file) do 24189 prompt_file_name("format file name",format_extension); 24190print_nl("Beginning to dump on file "); 24191@.Beginning to dump...@> 24192slow_print(w_make_name_string(fmt_file)); flush_string; 24193print_nl(""); slow_print(format_ident) 24194 24195@ @<Close the format file@>= 24196w_close(fmt_file) 24197 24198@* \[51] The main program. 24199This is it: the part of \TeX\ that executes all those procedures we have 24200written. 24201 24202Well---almost. Let's leave space for a few more routines that we may 24203have forgotten. 24204 24205@p @<Last-minute procedures@> 24206 24207@ We have noted that there are two versions of \TeX82. One, called \.{INITEX}, 24208@.INITEX@> 24209has to be run first; it initializes everything from scratch, without 24210reading a format file, and it has the capability of dumping a format file. 24211The other one is called `\.{VIRTEX}'; it is a ``virgin'' program that needs 24212@.VIRTEX@> 24213to input a format file in order to get started. \.{VIRTEX} typically has 24214more memory capacity than \.{INITEX}, because it does not need the space 24215consumed by the auxiliary hyphenation tables and the numerous calls on 24216|primitive|, etc. 24217 24218The \.{VIRTEX} program cannot read a format file instantaneously, of course; 24219the best implementations therefore allow for production versions of \TeX\ that 24220not only avoid the loading routine for \PASCAL\ object code, they also have 24221a format file pre-loaded. This is impossible to do if we stick to standard 24222\PASCAL; but there is a simple way to fool many systems into avoiding the 24223initialization, as follows:\quad(1)~We declare a global integer variable 24224called |ready_already|. The probability is negligible that this 24225variable holds any particular value like 314159 when \.{VIRTEX} is first 24226loaded.\quad(2)~After we have read in a format file and initialized 24227everything, we set |ready_already:=314159|.\quad(3)~Soon \.{VIRTEX} 24228will print `\.*', waiting for more input; and at this point we 24229interrupt the program and save its core image in some form that the 24230operating system can reload speedily.\quad(4)~When that core image is 24231activated, the program starts again at the beginning; but now 24232|ready_already=314159| and all the other global variables have 24233their initial values too. The former chastity has vanished! 24234 24235In other words, if we allow ourselves to test the condition 24236|ready_already=314159|, before |ready_already| has been 24237assigned a value, we can avoid the lengthy initialization. Dirty tricks 24238rarely pay off so handsomely. 24239@^dirty \PASCAL@> 24240@^system dependencies@> 24241 24242On systems that allow such preloading, the standard program called \.{TeX} 24243should be the one that has \.{plain} format preloaded, since that agrees 24244with {\sl The \TeX book}. Other versions, e.g., \.{AmSTeX}, should also 24245@:TeXbook}{\sl The \TeX book@> 24246@.AmSTeX@> 24247@.plain@> 24248be provided for commonly used formats. 24249 24250@<Glob...@>= 24251@!ready_already:integer; {a sacrifice of purity for economy} 24252 24253@ Now this is really it: \TeX\ starts and ends here. 24254 24255The initial test involving |ready_already| should be deleted if the 24256\PASCAL\ runtime system is smart enough to detect such a ``mistake.'' 24257@^system dependencies@> 24258 24259@p begin @!{|start_here|} 24260history:=fatal_error_stop; {in case we quit during initialization} 24261t_open_out; {open the terminal for output} 24262if ready_already=314159 then goto start_of_TEX; 24263@<Check the ``constant'' values...@>@; 24264if bad>0 then 24265 begin wterm_ln('Ouch---my internal constants have been clobbered!', 24266 '---case ',bad:1); 24267@.Ouch...clobbered@> 24268 goto final_end; 24269 end; 24270initialize; {set global variables to their starting values} 24271@!init if not get_strings_started then goto final_end; 24272init_prim; {call |primitive| for each primitive} 24273init_str_ptr:=str_ptr; init_pool_ptr:=pool_ptr; fix_date_and_time; 24274tini@/ 24275ready_already:=314159; 24276start_of_TEX: @<Initialize the output routines@>; 24277@<Get the first line of input and prepare to start@>; 24278history:=spotless; {ready to go!} 24279main_control; {come to life} 24280final_cleanup; {prepare for death} 24281end_of_TEX: close_files_and_terminate; 24282final_end: ready_already:=0; 24283end. 24284 24285@ Here we do whatever is needed to complete \TeX's job gracefully on the 24286local operating system. The code here might come into play after a fatal 24287error; it must therefore consist entirely of ``safe'' operations that 24288cannot produce error messages. For example, it would be a mistake to call 24289|str_room| or |make_string| at this time, because a call on |overflow| 24290might lead to an infinite loop. 24291@^system dependencies@> 24292 24293Actually there's one way to get error messages, via |prepare_mag|; 24294but that can't cause infinite recursion. 24295@^recursion@> 24296 24297This program doesn't bother to close the input files that may still be open. 24298 24299@<Last-minute...@>= 24300procedure close_files_and_terminate; 24301var k:integer; {all-purpose index} 24302begin @<Finish the extensions@>; 24303@!stat if tracing_stats>0 then @<Output statistics about this job@>;@;@+tats@/ 24304wake_up_terminal; @<Finish the \.{DVI} file@>; 24305if log_opened then 24306 begin wlog_cr; a_close(log_file); selector:=selector-2; 24307 if selector=term_only then 24308 begin print_nl("Transcript written on "); 24309@.Transcript written...@> 24310 slow_print(log_name); print_char("."); 24311 end; 24312 end; 24313end; 24314 24315@ The present section goes directly to the log file instead of using 24316|print| commands, because there's no need for these strings to take 24317up |str_pool| memory when a non-{\bf stat} version of \TeX\ is being used. 24318 24319@<Output statistics...@>= 24320if log_opened then 24321 begin wlog_ln(' '); 24322 wlog_ln('Here is how much of TeX''s memory',' you used:'); 24323@.Here is how much...@> 24324 wlog(' ',str_ptr-init_str_ptr:1,' string'); 24325 if str_ptr<>init_str_ptr+1 then wlog('s'); 24326 wlog_ln(' out of ', max_strings-init_str_ptr:1);@/ 24327 wlog_ln(' ',pool_ptr-init_pool_ptr:1,' string characters out of ', 24328 pool_size-init_pool_ptr:1);@/ 24329 wlog_ln(' ',lo_mem_max-mem_min+mem_end-hi_mem_min+2:1,@| 24330 ' words of memory out of ',mem_end+1-mem_min:1);@/ 24331 wlog_ln(' ',cs_count:1,' multiletter control sequences out of ', 24332 hash_size:1);@/ 24333 wlog(' ',fmem_ptr:1,' words of font info for ', 24334 font_ptr-font_base:1,' font'); 24335 if font_ptr<>font_base+1 then wlog('s'); 24336 wlog_ln(', out of ',font_mem_size:1,' for ',font_max-font_base:1);@/ 24337 wlog(' ',hyph_count:1,' hyphenation exception'); 24338 if hyph_count<>1 then wlog('s'); 24339 wlog_ln(' out of ',hyph_size:1);@/ 24340 wlog_ln(' ',max_in_stack:1,'i,',max_nest_stack:1,'n,',@| 24341 max_param_stack:1,'p,',@| 24342 max_buf_stack+1:1,'b,',@| 24343 max_save_stack+6:1,'s stack positions out of ',@| 24344 stack_size:1,'i,', 24345 nest_size:1,'n,', 24346 param_size:1,'p,', 24347 buf_size:1,'b,', 24348 save_size:1,'s'); 24349 end 24350 24351@ We get to the |final_cleanup| routine when \.{\\end} or \.{\\dump} has 24352been scanned and |its_all_over|\kern-2pt. 24353 24354@<Last-minute...@>= 24355procedure final_cleanup; 24356label exit; 24357var c:small_number; {0 for \.{\\end}, 1 for \.{\\dump}} 24358begin c:=cur_chr; 24359if job_name=0 then open_log_file; 24360while input_ptr>0 do 24361 if state=token_list then end_token_list@+else end_file_reading; 24362while open_parens>0 do 24363 begin print(" )"); decr(open_parens); 24364 end; 24365if cur_level>level_one then 24366 begin print_nl("("); print_esc("end occurred "); 24367 print("inside a group at level "); 24368@:end_}{\.{(\\end occurred...)}@> 24369 print_int(cur_level-level_one); print_char(")"); 24370 end; 24371while cond_ptr<>null do 24372 begin print_nl("("); print_esc("end occurred "); 24373 print("when "); print_cmd_chr(if_test,cur_if); 24374 if if_line<>0 then 24375 begin print(" on line "); print_int(if_line); 24376 end; 24377 print(" was incomplete)"); 24378 if_line:=if_line_field(cond_ptr); 24379 cur_if:=subtype(cond_ptr); temp_ptr:=cond_ptr; 24380 cond_ptr:=link(cond_ptr); free_node(temp_ptr,if_node_size); 24381 end; 24382if history<>spotless then 24383 if ((history=warning_issued)or(interaction<error_stop_mode)) then 24384 if selector=term_and_log then 24385 begin selector:=term_only; 24386 print_nl("(see the transcript file for additional information)"); 24387@.see the transcript file...@> 24388 selector:=term_and_log; 24389 end; 24390if c=1 then 24391 begin @!init for c:=top_mark_code to split_bot_mark_code do 24392 if cur_mark[c]<>null then delete_token_ref(cur_mark[c]); 24393 if last_glue<>max_halfword then delete_glue_ref(last_glue); 24394 store_fmt_file; return;@+tini@/ 24395 print_nl("(\dump is performed only by INITEX)"); return; 24396@:dump_}{\.{\\dump...only by INITEX}@> 24397 end; 24398exit:end; 24399 24400@ @<Last-minute...@>= 24401@!init procedure init_prim; {initialize all the primitives} 24402begin no_new_control_sequence:=false; 24403@<Put each...@>; 24404no_new_control_sequence:=true; 24405end; 24406tini 24407 24408@ When we begin the following code, \TeX's tables may still contain garbage; 24409the strings might not even be present. Thus we must proceed cautiously to get 24410bootstrapped in. 24411 24412But when we finish this part of the program, \TeX\ is ready to call on the 24413|main_control| routine to do its work. 24414 24415@<Get the first line...@>= 24416begin @<Initialize the input routines@>; 24417if (format_ident=0)or(buffer[loc]="&") then 24418 begin if format_ident<>0 then initialize; {erase preloaded format} 24419 if not open_fmt_file then goto final_end; 24420 if not load_fmt_file then 24421 begin w_close(fmt_file); goto final_end; 24422 end; 24423 w_close(fmt_file); 24424 while (loc<limit)and(buffer[loc]=" ") do incr(loc); 24425 end; 24426if end_line_char_inactive then decr(limit) 24427else buffer[limit]:=end_line_char; 24428fix_date_and_time;@/ 24429@<Compute the magic offset@>; 24430@<Initialize the print |selector|...@>; 24431if (loc<limit)and(cat_code(buffer[loc])<>escape) then start_input; 24432 {\.{\\input} assumed} 24433end 24434 24435@* \[52] Debugging. 24436Once \TeX\ is working, you should be able to diagnose most errors with 24437the \.{\\show} commands and other diagnostic features. But for the initial 24438stages of debugging, and for the revelation of really deep mysteries, you 24439can compile \TeX\ with a few more aids, including the \PASCAL\ runtime 24440checks and its debugger. An additional routine called |debug_help| 24441will also come into play when you type `\.D' after an error message; 24442|debug_help| also occurs just before a fatal error causes \TeX\ to succumb. 24443@^debugging@> 24444@^system dependencies@> 24445 24446The interface to |debug_help| is primitive, but it is good enough when used 24447with a \PASCAL\ debugger that allows you to set breakpoints and to read 24448variables and change their values. After getting the prompt `\.{debug \#}', you 24449type either a negative number (this exits |debug_help|), or zero (this 24450goes to a location where you can set a breakpoint, thereby entering into 24451dialog with the \PASCAL\ debugger), or a positive number |m| followed by 24452an argument |n|. The meaning of |m| and |n| will be clear from the 24453program below. (If |m=13|, there is an additional argument, |l|.) 24454@.debug \#@> 24455 24456@d breakpoint=888 {place where a breakpoint is desirable} 24457 24458@<Last-minute...@>= 24459@!debug procedure debug_help; {routine to display various things} 24460label breakpoint,exit; 24461var k,@!l,@!m,@!n:integer; 24462begin loop begin wake_up_terminal; 24463 print_nl("debug # (-1 to exit):"); update_terminal; 24464@.debug \#@> 24465 read(term_in,m); 24466 if m<0 then return 24467 else if m=0 then 24468 begin goto breakpoint;@\ {go to every label at least once} 24469 breakpoint: m:=0; @{'BREAKPOINT'@}@\ 24470 end 24471 else begin read(term_in,n); 24472 case m of 24473 @t\4@>@<Numbered cases for |debug_help|@>@; 24474 othercases print("?") 24475 endcases; 24476 end; 24477 end; 24478exit:end; 24479gubed 24480 24481@ @<Numbered cases...@>= 244821: print_word(mem[n]); {display |mem[n]| in all forms} 244832: print_int(info(n)); 244843: print_int(link(n)); 244854: print_word(eqtb[n]); 244865: print_word(font_info[n]); 244876: print_word(save_stack[n]); 244887: show_box(n); 24489 {show a box, abbreviated by |show_box_depth| and |show_box_breadth|} 244908: begin breadth_max:=10000; depth_threshold:=pool_size-pool_ptr-10; 24491 show_node_list(n); {show a box in its entirety} 24492 end; 244939: show_token_list(n,null,1000); 2449410: slow_print(n); 2449511: check_mem(n>0); {check wellformedness; print new busy locations if |n>0|} 2449612: search_mem(n); {look for pointers to |n|} 2449713: begin read(term_in,l); print_cmd_chr(n,l); 24498 end; 2449914: for k:=0 to n do print(buffer[k]); 2450015: begin font_in_short_display:=null_font; short_display(n); 24501 end; 2450216: panicking:=not panicking; 24503 24504@* \[53] Extensions. 24505The program above includes a bunch of ``hooks'' that allow further 24506capabilities to be added without upsetting \TeX's basic structure. 24507Most of these hooks are concerned with ``whatsit'' nodes, which are 24508intended to be used for special purposes; whenever a new extension to 24509\TeX\ involves a new kind of whatsit node, a corresponding change needs 24510to be made to the routines below that deal with such nodes, 24511but it will usually be unnecessary to make many changes to the 24512other parts of this program. 24513 24514In order to demonstrate how extensions can be made, we shall treat 24515`\.{\\write}', `\.{\\openout}', `\.{\\closeout}', `\.{\\immediate}', 24516`\.{\\special}', and `\.{\\setlanguage}' as if they were extensions. 24517These commands are actually primitives of \TeX, and they should 24518appear in all implementations of the system; but let's try to imagine 24519that they aren't. Then the program below illustrates how a person 24520could add them. 24521 24522Sometimes, of course, an extension will require changes to \TeX\ itself; 24523no system of hooks could be complete enough for all conceivable extensions. 24524The features associated with `\.{\\write}' are almost all confined to the 24525following paragraphs, but there are small parts of the |print_ln| and 24526|print_char| procedures that were introduced specifically to \.{\\write} 24527characters. Furthermore one of the token lists recognized by the scanner 24528is a |write_text|; and there are a few other miscellaneous places where we 24529have already provided for some aspect of \.{\\write}. The goal of a \TeX\ 24530extender should be to minimize alterations to the standard parts of the 24531program, and to avoid them completely if possible. He or she should also 24532be quite sure that there's no easy way to accomplish the desired goals 24533with the standard features that \TeX\ already has. ``Think thrice before 24534extending,'' because that may save a lot of work, and it will also keep 24535incompatible extensions of \TeX\ from proliferating. 24536@^system dependencies@> 24537@^extensions to \TeX@> 24538 24539@ First let's consider the format of whatsit nodes that are used to represent 24540the data associated with \.{\\write} and its relatives. Recall that a whatsit 24541has |type=whatsit_node|, and the |subtype| is supposed to distinguish 24542different kinds of whatsits. Each node occupies two or more words; the 24543exact number is immaterial, as long as it is readily determined from the 24544|subtype| or other data. 24545 24546We shall introduce five |subtype| values here, corresponding to the 24547control sequences \.{\\openout}, \.{\\write}, \.{\\closeout}, \.{\\special}, and 24548\.{\\setlanguage}. The second word of I/O whatsits has a |write_stream| field 24549that identifies the write-stream number (0 to 15, or 16 for out-of-range and 24550positive, or 17 for out-of-range and negative). 24551In the case of \.{\\write} and \.{\\special}, there is also a field that 24552points to the reference count of a token list that should be sent. In the 24553case of \.{\\openout}, we need three words and three auxiliary subfields 24554to hold the string numbers for name, area, and extension. 24555 24556@d write_node_size=2 {number of words in a write/whatsit node} 24557@d open_node_size=3 {number of words in an open/whatsit node} 24558@d open_node=0 {|subtype| in whatsits that represent files to \.{\\openout}} 24559@d write_node=1 {|subtype| in whatsits that represent things to \.{\\write}} 24560@d close_node=2 {|subtype| in whatsits that represent streams to \.{\\closeout}} 24561@d special_node=3 {|subtype| in whatsits that represent \.{\\special} things} 24562@d language_node=4 {|subtype| in whatsits that change the current language} 24563@d what_lang(#)==link(#+1) {language number, in the range |0..255|} 24564@d what_lhm(#)==type(#+1) {minimum left fragment, in the range |1..63|} 24565@d what_rhm(#)==subtype(#+1) {minimum right fragment, in the range |1..63|} 24566@d write_tokens(#) == link(#+1) {reference count of token list to write} 24567@d write_stream(#) == info(#+1) {stream number (0 to 17)} 24568@d open_name(#) == link(#+1) {string number of file name to open} 24569@d open_area(#) == info(#+2) {string number of file area for |open_name|} 24570@d open_ext(#) == link(#+2) {string number of file extension for |open_name|} 24571 24572@ The sixteen possible \.{\\write} streams are represented by the |write_file| 24573array. The |j|th file is open if and only if |write_open[j]=true|. The last 24574two streams are special; |write_open[16]| represents a stream number 24575greater than 15, while |write_open[17]| represents a negative stream number, 24576and both of these variables are always |false|. 24577 24578@<Glob...@>= 24579@!write_file:array[0..15] of alpha_file; 24580@!write_open:array[0..17] of boolean; 24581 24582@ @<Set init...@>= 24583for k:=0 to 17 do write_open[k]:=false; 24584 24585@ Extensions might introduce new command codes; but it's best to use 24586|extension| with a modifier, whenever possible, so that |main_control| 24587stays the same. 24588 24589@d immediate_code=4 {command modifier for \.{\\immediate}} 24590@d set_language_code=5 {command modifier for \.{\\setlanguage}} 24591 24592@<Put each...@>= 24593primitive("openout",extension,open_node);@/ 24594@!@:open_out_}{\.{\\openout} primitive@> 24595primitive("write",extension,write_node); write_loc:=cur_val;@/ 24596@!@:write_}{\.{\\write} primitive@> 24597primitive("closeout",extension,close_node);@/ 24598@!@:close_out_}{\.{\\closeout} primitive@> 24599primitive("special",extension,special_node);@/ 24600@!@:special_}{\.{\\special} primitive@> 24601primitive("immediate",extension,immediate_code);@/ 24602@!@:immediate_}{\.{\\immediate} primitive@> 24603primitive("setlanguage",extension,set_language_code);@/ 24604@!@:set_language_}{\.{\\setlanguage} primitive@> 24605 24606@ The variable |write_loc| just introduced is used to provide an 24607appropriate error message in case of ``runaway'' write texts. 24608 24609@<Glob...@>= 24610@!write_loc:pointer; {|eqtb| address of \.{\\write}} 24611 24612@ @<Cases of |print_cmd_chr|...@>= 24613extension: case chr_code of 24614 open_node:print_esc("openout"); 24615 write_node:print_esc("write"); 24616 close_node:print_esc("closeout"); 24617 special_node:print_esc("special"); 24618 immediate_code:print_esc("immediate"); 24619 set_language_code:print_esc("setlanguage"); 24620 othercases print("[unknown extension!]") 24621 endcases; 24622 24623@ When an |extension| command occurs in |main_control|, in any mode, 24624the |do_extension| routine is called. 24625 24626@<Cases of |main_control| that are for extensions...@>= 24627any_mode(extension):do_extension; 24628 24629@ @<Declare act...@>= 24630@t\4@>@<Declare procedures needed in |do_extension|@>@; 24631procedure do_extension; 24632var i,@!j,@!k:integer; {all-purpose integers} 24633@!p,@!q,@!r:pointer; {all-purpose pointers} 24634begin case cur_chr of 24635open_node:@<Implement \.{\\openout}@>; 24636write_node:@<Implement \.{\\write}@>; 24637close_node:@<Implement \.{\\closeout}@>; 24638special_node:@<Implement \.{\\special}@>; 24639immediate_code:@<Implement \.{\\immediate}@>; 24640set_language_code:@<Implement \.{\\setlanguage}@>; 24641othercases confusion("ext1") 24642@:this can't happen ext1}{\quad ext1@> 24643endcases; 24644end; 24645 24646@ Here is a subroutine that creates a whatsit node having a given |subtype| 24647and a given number of words. It initializes only the first word of the whatsit, 24648and appends it to the current list. 24649 24650@<Declare procedures needed in |do_extension|@>= 24651procedure new_whatsit(@!s:small_number;@!w:small_number); 24652var p:pointer; {the new node} 24653begin p:=get_node(w); type(p):=whatsit_node; subtype(p):=s; 24654link(tail):=p; tail:=p; 24655end; 24656 24657@ The next subroutine uses |cur_chr| to decide what sort of whatsit is 24658involved, and also inserts a |write_stream| number. 24659 24660@<Declare procedures needed in |do_ext...@>= 24661procedure new_write_whatsit(@!w:small_number); 24662begin new_whatsit(cur_chr,w); 24663if w<>write_node_size then scan_four_bit_int 24664else begin scan_int; 24665 if cur_val<0 then cur_val:=17 24666 else if cur_val>15 then cur_val:=16; 24667 end; 24668write_stream(tail):=cur_val; 24669end; 24670 24671@ @<Implement \.{\\openout}@>= 24672begin new_write_whatsit(open_node_size); 24673scan_optional_equals; scan_file_name;@/ 24674open_name(tail):=cur_name; open_area(tail):=cur_area; open_ext(tail):=cur_ext; 24675end 24676 24677@ When `\.{\\write 12\{...\}}' appears, we scan the token list `\.{\{...\}}' 24678without expanding its macros; the macros will be expanded later when this 24679token list is rescanned. 24680 24681@<Implement \.{\\write}@>= 24682begin k:=cur_cs; new_write_whatsit(write_node_size);@/ 24683cur_cs:=k; p:=scan_toks(false,false); write_tokens(tail):=def_ref; 24684end 24685 24686@ @<Implement \.{\\closeout}@>= 24687begin new_write_whatsit(write_node_size); write_tokens(tail):=null; 24688end 24689 24690@ When `\.{\\special\{...\}}' appears, we expand the macros in the token 24691list as in \.{\\xdef} and \.{\\mark}. 24692 24693@<Implement \.{\\special}@>= 24694begin new_whatsit(special_node,write_node_size); write_stream(tail):=null; 24695p:=scan_toks(false,true); write_tokens(tail):=def_ref; 24696end 24697 24698@ Each new type of node that appears in our data structure must be capable 24699of being displayed, copied, destroyed, and so on. The routines that we 24700need for write-oriented whatsits are somewhat like those for mark nodes; 24701other extensions might, of course, involve more subtlety here. 24702 24703@<Basic printing...@>= 24704procedure print_write_whatsit(@!s:str_number;@!p:pointer); 24705begin print_esc(s); 24706if write_stream(p)<16 then print_int(write_stream(p)) 24707else if write_stream(p)=16 then print_char("*") 24708@.*\relax@> 24709else print_char("-"); 24710end; 24711 24712@ @<Display the whatsit...@>= 24713case subtype(p) of 24714open_node:begin print_write_whatsit("openout",p); 24715 print_char("="); print_file_name(open_name(p),open_area(p),open_ext(p)); 24716 end; 24717write_node:begin print_write_whatsit("write",p); 24718 print_mark(write_tokens(p)); 24719 end; 24720close_node:print_write_whatsit("closeout",p); 24721special_node:begin print_esc("special"); 24722 print_mark(write_tokens(p)); 24723 end; 24724language_node:begin print_esc("setlanguage"); 24725 print_int(what_lang(p)); print(" (hyphenmin "); 24726 print_int(what_lhm(p)); print_char(","); 24727 print_int(what_rhm(p)); print_char(")"); 24728 end; 24729othercases print("whatsit?") 24730endcases 24731 24732@ @<Make a partial copy of the whatsit...@>= 24733case subtype(p) of 24734open_node: begin r:=get_node(open_node_size); words:=open_node_size; 24735 end; 24736write_node,special_node: begin r:=get_node(write_node_size); 24737 add_token_ref(write_tokens(p)); words:=write_node_size; 24738 end; 24739close_node,language_node: begin r:=get_node(small_node_size); 24740 words:=small_node_size; 24741 end; 24742othercases confusion("ext2") 24743@:this can't happen ext2}{\quad ext2@> 24744endcases 24745 24746@ @<Wipe out the whatsit...@>= 24747begin case subtype(p) of 24748open_node: free_node(p,open_node_size); 24749write_node,special_node: begin delete_token_ref(write_tokens(p)); 24750 free_node(p,write_node_size); goto done; 24751 end; 24752close_node,language_node: free_node(p,small_node_size); 24753othercases confusion("ext3") 24754@:this can't happen ext3}{\quad ext3@> 24755endcases;@/ 24756goto done; 24757end 24758 24759@ @<Incorporate a whatsit node into a vbox@>=do_nothing 24760 24761@ @<Incorporate a whatsit node into an hbox@>=do_nothing 24762 24763@ @<Let |d| be the width of the whatsit |p|@>=d:=0 24764 24765@ @d adv_past(#)==@+if subtype(#)=language_node then 24766 begin cur_lang:=what_lang(#); l_hyf:=what_lhm(#); r_hyf:=what_rhm(#);@+end 24767 24768@<Advance \(p)past a whatsit node in the \(l)|line_break| loop@>=@+ 24769adv_past(cur_p) 24770 24771@ @<Advance \(p)past a whatsit node in the \(p)pre-hyphenation loop@>=@+ 24772adv_past(s) 24773 24774@ @<Prepare to move whatsit |p| to the current page, then |goto contribute|@>= 24775goto contribute 24776 24777@ @<Process whatsit |p| in |vert_break| loop, |goto not_found|@>= 24778goto not_found 24779 24780@ @<Output the whatsit node |p| in a vlist@>= 24781out_what(p) 24782 24783@ @<Output the whatsit node |p| in an hlist@>= 24784out_what(p) 24785 24786@ After all this preliminary shuffling, we come finally to the routines 24787that actually send out the requested data. Let's do \.{\\special} first 24788(it's easier). 24789 24790@<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24791procedure special_out(@!p:pointer); 24792var old_setting:0..max_selector; {holds print |selector|} 24793@!k:pool_pointer; {index into |str_pool|} 24794begin synch_h; synch_v;@/ 24795old_setting:=selector; selector:=new_string; 24796show_token_list(link(write_tokens(p)),null,pool_size-pool_ptr); 24797selector:=old_setting; 24798str_room(1); 24799if cur_length<256 then 24800 begin dvi_out(xxx1); dvi_out(cur_length); 24801 end 24802else begin dvi_out(xxx4); dvi_four(cur_length); 24803 end; 24804for k:=str_start[str_ptr] to pool_ptr-1 do dvi_out(so(str_pool[k])); 24805pool_ptr:=str_start[str_ptr]; {erase the string} 24806end; 24807 24808@ To write a token list, we must run it through \TeX's scanner, expanding 24809macros and \.{\\the} and \.{\\number}, etc. This might cause runaways, 24810if a delimited macro parameter isn't matched, and runaways would be 24811extremely confusing since we are calling on \TeX's scanner in the middle 24812of a \.{\\shipout} command. Therefore we will put a dummy control sequence as 24813a ``stopper,'' right after the token list. This control sequence is 24814artificially defined to be \.{\\outer}. 24815@:end_write_}{\.{\\endwrite}@> 24816 24817@<Initialize table...@>= 24818text(end_write):="endwrite"; eq_level(end_write):=level_one; 24819eq_type(end_write):=outer_call; equiv(end_write):=null; 24820 24821@ @<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24822procedure write_out(@!p:pointer); 24823var old_setting:0..max_selector; {holds print |selector|} 24824@!old_mode:integer; {saved |mode|} 24825@!j:small_number; {write stream number} 24826@!q,@!r:pointer; {temporary variables for list manipulation} 24827begin @<Expand macros in the token list 24828 and make |link(def_ref)| point to the result@>; 24829old_setting:=selector; j:=write_stream(p); 24830if write_open[j] then selector:=j 24831else begin {write to the terminal if file isn't open} 24832 if (j=17)and(selector=term_and_log) then selector:=log_only; 24833 print_nl(""); 24834 end; 24835token_show(def_ref); print_ln; 24836flush_list(def_ref); selector:=old_setting; 24837end; 24838 24839@ The final line of this routine is slightly subtle; at least, the author 24840didn't think about it until getting burnt! There is a used-up token list 24841@^Knuth, Donald Ervin@> 24842on the stack, namely the one that contained |end_write_token|. (We 24843insert this artificial `\.{\\endwrite}' to prevent runaways, as explained 24844above.) If it were not removed, and if there were numerous writes on a 24845single page, the stack would overflow. 24846 24847@d end_write_token==cs_token_flag+end_write 24848 24849@<Expand macros in the token list and...@>= 24850q:=get_avail; info(q):=right_brace_token+"}";@/ 24851r:=get_avail; link(q):=r; info(r):=end_write_token; ins_list(q);@/ 24852begin_token_list(write_tokens(p),write_text);@/ 24853q:=get_avail; info(q):=left_brace_token+"{"; ins_list(q); 24854{now we're ready to scan 24855 `\.\{$\langle\,$token list$\,\rangle$\.{\} \\endwrite}'} 24856old_mode:=mode; mode:=0; 24857 {disable \.{\\prevdepth}, \.{\\spacefactor}, \.{\\lastskip}, \.{\\prevgraf}} 24858cur_cs:=write_loc; q:=scan_toks(false,true); {expand macros, etc.} 24859get_token;@+if cur_tok<>end_write_token then 24860 @<Recover from an unbalanced write command@>; 24861mode:=old_mode; 24862end_token_list {conserve stack space} 24863 24864@ @<Recover from an unbalanced write command@>= 24865begin print_err("Unbalanced write command"); 24866@.Unbalanced write...@> 24867help2("On this page there's a \write with fewer real {'s than }'s.")@/ 24868("I can't handle that very well; good luck."); error; 24869repeat get_token; 24870until cur_tok=end_write_token; 24871end 24872 24873@ The |out_what| procedure takes care of outputting whatsit nodes for 24874|vlist_out| and |hlist_out|\kern-.3pt. 24875 24876@<Declare procedures needed in |hlist_out|, |vlist_out|@>= 24877procedure out_what(@!p:pointer); 24878var j:small_number; {write stream number} 24879begin case subtype(p) of 24880open_node,write_node,close_node:@<Do some work that has been queued up 24881 for \.{\\write}@>; 24882special_node:special_out(p); 24883language_node:do_nothing; 24884othercases confusion("ext4") 24885@:this can't happen ext4}{\quad ext4@> 24886endcases; 24887end; 24888 24889@ We don't implement \.{\\write} inside of leaders. (The reason is that 24890the number of times a leader box appears might be different in different 24891implementations, due to machine-dependent rounding in the glue calculations.) 24892@^leaders@> 24893 24894@<Do some work that has been queued up...@>= 24895if not doing_leaders then 24896 begin j:=write_stream(p); 24897 if subtype(p)=write_node then write_out(p) 24898 else begin if write_open[j] then a_close(write_file[j]); 24899 if subtype(p)=close_node then write_open[j]:=false 24900 else if j<16 then 24901 begin cur_name:=open_name(p); cur_area:=open_area(p); 24902 cur_ext:=open_ext(p); 24903 if cur_ext="" then cur_ext:=".tex"; 24904 pack_cur_name; 24905 while not a_open_out(write_file[j]) do 24906 prompt_file_name("output file name",".tex"); 24907 write_open[j]:=true; 24908 end; 24909 end; 24910 end 24911 24912@ The presence of `\.{\\immediate}' causes the |do_extension| procedure 24913to descend to one level of recursion. Nothing happens unless \.{\\immediate} 24914is followed by `\.{\\openout}', `\.{\\write}', or `\.{\\closeout}'. 24915@^recursion@> 24916 24917@<Implement \.{\\immediate}@>= 24918begin get_x_token; 24919if (cur_cmd=extension)and(cur_chr<=close_node) then 24920 begin p:=tail; do_extension; {append a whatsit node} 24921 out_what(tail); {do the action immediately} 24922 flush_node_list(tail); tail:=p; link(p):=null; 24923 end 24924else back_input; 24925end 24926 24927@ The \.{\\language} extension is somewhat different. 24928We need a subroutine that comes into play when a character of 24929a non-|clang| language is being appended to the current paragraph. 24930 24931@<Declare action...@>= 24932procedure fix_language; 24933var @!l:ASCII_code; {the new current language} 24934begin if language<=0 then l:=0 24935else if language>255 then l:=0 24936else l:=language; 24937if l<>clang then 24938 begin new_whatsit(language_node,small_node_size); 24939 what_lang(tail):=l; clang:=l;@/ 24940 what_lhm(tail):=norm_min(left_hyphen_min); 24941 what_rhm(tail):=norm_min(right_hyphen_min); 24942 end; 24943end; 24944 24945@ @<Implement \.{\\setlanguage}@>= 24946if abs(mode)<>hmode then report_illegal_case 24947else begin new_whatsit(language_node,small_node_size); 24948 scan_int; 24949 if cur_val<=0 then clang:=0 24950 else if cur_val>255 then clang:=0 24951 else clang:=cur_val; 24952 what_lang(tail):=clang; 24953 what_lhm(tail):=norm_min(left_hyphen_min); 24954 what_rhm(tail):=norm_min(right_hyphen_min); 24955 end 24956 24957@ @<Finish the extensions@>= 24958for k:=0 to 15 do if write_open[k] then a_close(write_file[k]) 24959 24960@* \[54] System-dependent changes. 24961This section should be replaced, if necessary, by any special 24962modifications of the program 24963that are necessary to make \TeX\ work at a particular installation. 24964It is usually best to design your change file so that all changes to 24965previous sections preserve the section numbering; then everybody's version 24966will be consistent with the published program. More extensive changes, 24967which introduce new sections, can be inserted here; then only the index 24968itself will get a new section number. 24969@^system dependencies@> 24970 24971@* \[55] Index. 24972Here is where you can find all uses of each identifier in the program, 24973with underlined entries pointing to where the identifier was defined. 24974If the identifier is only one letter long, however, you get to see only 24975the underlined entries. {\sl All references are to section numbers instead of 24976page numbers.} 24977 24978This index also lists error messages and other aspects of the program 24979that you might want to look up some day. For example, the entry 24980for ``system dependencies'' lists all sections that should receive 24981special attention from people who are installing \TeX\ in a new 24982operating environment. A list of various things that can't happen appears 24983under ``this can't happen''. Approximately 40 sections are listed under 24984``inner loop''; these account for about 60\pct! of \TeX's running time, 24985exclusive of input and output. 24986