1.if !\n(xx .so tmac.p 2.tr _\(ru 3.nr H1 0 4.NH 5Organization 6.PP 7Most of 8.I px 9is written in the 10.SM "VAX 11/780" 11assembly language, using the 12.UX 13assembler 14.I as. 15Portions of 16.I px 17are also written in the 18.UX 19systems programming language C. 20.I Px 21consists of a main procedure that reads in the interpreter code, 22a main interpreter loop that transfers successively to various 23code segments implementing the abstract machine operations, 24built-in procedures and functions, 25and several routines that support the implementation of the 26Pascal input-output environment. 27.PP 28The interpreter runs at a fraction of the speed of equivalent 29compiled C code, with this fraction varying from 1/5 to 1/15. 30The interpreter occupies 18.5K bytes of instruction space, shared among 31all processes executing Pascal, and has 4.6K bytes of data space (constants, 32error messages, etc.) a copy of which is allocated to each executing process. 33.NH 2 34Format of the object file 35.PP 36.I Px 37normally interprets the code left in an object file by a run of the 38Pascal translator 39.I pi. 40The file where the translator puts the object originally, and the most 41commonly interpreted file, is called 42.I obj. 43In order that all persons using 44.I px 45share a common text image, this executable file is 46a small process that coordinates with the interpreter to start 47execution. 48The interpreter code is placed 49at the end of a special ``header'' file and the size of the initialized 50data area of this header file is expanded to include this code, 51so that during execution it is located at an 52easily determined address in its data space. 53When executed, the object process creates a 54.I pipe , 55creates another process by doing a 56.I fork , 57and arranges that the resulting parent process becomes an instance of 58.I px . 59The child process then writes the interpreter code through 60the pipe that it has to the 61interpreter process parent. 62When this process is complete, the child exits. 63.PP 64The real advantage of this approach is that it does not require modifications 65to the shell, and that the resultant objects are ``true objects'' not 66requiring special treatment. 67A simpler mechanism would be to determine the name of the file that was 68executed and pass this to the interpreter. 69However it is not possible to determine this name 70in all cases.\*(Dd 71.FS 72\*(dd\ For instance, if the 73.I pxref 74program is placed in the directory 75`/usr/bin' 76then when the user types 77``pxref program.p'' 78the first argument to the program, nominally the programs name, is 79``pxref.'' 80While it would be possible to search in the standard place, 81i.e. the current directory, and the system directories 82`/bin' 83and 84`/usr/bin' 85for a corresponding object file, 86this would be expensive and not guaranteed to succeed. 87Several shells exist that allow other directories to be searched 88for commands, and there is, 89in general, 90no way to determine what these directories are. 91.FE 92.NH 2 93General features of object code 94.PP 95Pascal object code is relocatable as all addressing references for 96control transfers within the code are relative. 97The code consists of instructions interspersed with inline data. 98All instructions have a length that is an even number of bytes. 99No variables are kept in the object code area. 100.PP 101The first byte of a Pascal interpreter instruction contains an operation 102code. 103This allows a total of 256 major operation codes, and 232 of these are 104in use in the current 105.I px. 106The second byte of each interpreter instruction is called the 107``sub-operation code'', 108or more commonly the 109.I sub-opcode. 110It contains a small integer that may, for example, be used as a 111block-structure level for the associated operation. 112If the instruction can take a longword constant, 113this constant is often packed into the sub-opcode 114if it fits into 8 bits and is not zero. 115A sub-opcode value of zero specifies that the constant would not 116fit and therefore follows in the next word. 117This is a space optimization, the value of zero for flagging 118the longer case being convenient because it is easy to test. 119.PP 120Other instruction formats are used. 121The branching 122instructions take an offset in the following word, 123operators that load constants onto the stack 124take arbitrarily long inline constant values, 125and many operations deal exclusively with data on the 126interpreter stack, requiring no inline data. 127.NH 2 128Stack structure of the interpreter 129.PP 130The interpreter emulates a stack-structured Pascal machine. 131The ``load'' instructions put values onto the stack, where all 132arithmetic operations take place. 133The ``store'' instructions take values off the stack 134and place them in an address that is also contained on the stack. 135The only way to move data or to compute in the machine is with the stack. 136.PP 137To make the interpreter operations more powerful 138and to thereby increase the interpreter speed, 139the arithmetic operations in the interpreter are ``typed''. 140That is, length conversion of arithmetic values occurs when they are 141used in an operation. 142This eliminates interpreter cycles for length conversion 143and the associated overhead. 144For example, when adding an integer that fits in one byte to one that 145requires four bytes to store, no ``conversion'' operators are required. 146The one byte integer is loaded onto the stack, followed by the four 147byte integer, and then an adding operator is used that has, implicit 148in its definition, the sizes of the arguments. 149.NH 2 150Data types in the interpreter 151.PP 152The interpreter deals with several different fundamental data types. 153In the memory of the machine, 1, 2, and 4 byte integers are supported, 154with only 2 and 4 byte integers being present on the stack. 155The interpreter always converts to 4 byte integers when there is a possibility 156of overflowing the shorter formats. 157This corresponds to the Pascal language definition of overflow in 158arithmetic operations that requires that the result be correct 159if all partial values lie within the bounds of the base integer type: 1604 byte integer values. 161.PP 162Character constants are treated similarly to 1 byte integers for 163most purposes, as are Boolean values. 164All enumerated types are treated as integer values of 165an appropriate length, usually 1 byte. 166The interpreter also has real numbers, occupying 8 bytes of storage, 167and sets and strings of varying length. 168The appropriate operations are included for each data type, such as 169set union and intersection and an operation to write a string. 170.PP 171No special 172.B packed 173data formats are supported by the interpreter. 174The smallest unit of storage occupied by any variable is one byte. 175The built-ins 176.I pack 177and 178.I unpack 179thus degenerate to simple memory to memory transfers with 180no special processing. 181.NH 2 182Runtime environment 183.PP 184The interpreter runtime environment uses a stack data area and a heap 185data area, that are kept at opposite ends of memory 186and grow towards each other. 187All global variables and variables local to procedures and functions 188are kept in the stack area. 189Dynamically allocated variables and buffers for input/output are 190allocated in the heap. 191.PP 192The addressing of block structured variables is done by using 193a fixed display 194that contains the address of its stack frame 195for each statically active block.\*(Dg 196.FS 197\*(dg\ Here ``block'' is being used to mean any 198.I procedure , 199.I function 200or the main program. 201.FE 202This display is referenced by instructions that load and store 203variables and maintained by the operations for 204block entry and exit, and for non-local 205.B goto 206statements. 207.NH 2 208Dp, lc, loop 209.PP 210Three ``global'' variables in the interpreter, in addition to the 211``display'', are the 212.I dp, 213.I lc, 214and the 215.I loop. 216The 217.I dp 218is a pointer to the display entry for the current block; 219the 220.I lc 221is the abstract machine location counter; 222and the 223.I loop 224is a register that holds the address of the main interpreter 225loop so that returning to the loop to fetch the next instruction is 226a fast operation. 227.NH 2 228The stack frame structure 229.PP 230Each active block 231has a stack frame consisting of three parts: 232a block mark, local variables, and temporary storage for partially 233evaluated expressions. 234The stack in the interpreter grows from the high addresses in memory 235to the low addresses, 236so that those parts of the stack frame that are ``on the top'' 237of the stack have the most negative offsets from the display 238entry for the block. 239The major parts of the stack frame are represented in Figure 1.1. 240.so fig1.1.n 241Note that the local variables of each block 242have negative offsets from the corresponding display entry, 243the ``first'' local variable having offset `\-2'. 244.NH 2 245The block mark 246.PP 247The block mark contains the saved information necessary 248to restore the environment when the current block exits. 249It consists of two parts. 250The first and top-most part is saved by the 251.SM CALL 252instruction in the interpreter. 253This information is not present for the main program 254as it is never ``called''. 255The second part of the block mark is created by the 256.SM BEG 257begin block operator that also allocates and clears the 258local variable storage. 259The format of these blocks is represented in Figure 1.2. 260.sp 261.so fig1.2.n 262.PP 263The data saved by the 264.SM CALL 265operator includes the line number 266.I lino 267of the point of call, 268that is printed if the program execution ends abnormally; 269the location counter 270.I lc 271giving the return address; 272and the current display entry address 273.I dp 274at the time of call. 275.PP 276The 277.SM BEG 278begin operator saves the previous display contents at the level 279of this block, so that the display can be restored on block exit. 280A pointer to the beginning line number and the 281name of this block is also saved. 282This information is stored in the interpreter object code in-line after the 283.SM BEG 284operator. 285It is used in printing a post-mortem backtrace. 286The saved file name and buffer reference are necessary because of 287the input/output structure 288(this is discussed in detail in 289sections 3.3 and 3.4). 290The top of stack reference gives the value the stack pointer should 291have when there are no expression temporaries on the stack. 292It is used for a consistency check in the 293.SM LINO 294line number operators in the interpreter, that occurs before 295each statement executed. 296This helps to catch bugs in the interpreter, that often manifest 297themselves by leaving the stack non-empty between statements. 298.PP 299Note that there is no explicit static link here. 300Thus to set up the display correctly after a non-local 301.B goto 302statement one must ``unwind'' 303through all the block marks on the stack to rebuild the display. 304.NH 2 305Arguments and return values 306.PP 307A function returns its value into a space reserved by the calling 308block. 309Arguments to a 310.B function 311are placed on top of this return area. 312For both 313.B procedure 314and 315.B function 316calls, arguments are placed at the end of the expression evaluation area 317of the caller. 318When a 319.B function 320completes, expression evaluation can continue 321after popping the arguments to the 322.B function 323off the stack, 324exactly as if the function value had been ``loaded''. 325The arguments to a 326.B procedure 327are also popped off the stack by the caller 328after its execution ends. 329.KS 330.PP 331As a simple example consider the following stack structure 332for a call to a function 333.I f, 334of the form ``f(a)''. 335.so fig1.3.n 336.KE 337.PP 338If we suppose that 339.I f 340returns a 341.I real 342and that 343.I a 344is an integer, 345the calling sequence for this function would be: 346.DS 347.TS 348lp-2w(8) l. 349PUSH \-8 350RV4:\fIl a\fR 351CALL:\fIl f\fR 352POP 4 353.TE 354.DE 355.ZP 356Here we use the operator 357.SM PUSH 358to clear space for the return value, 359load 360.I a 361on the stack with a ``right value'' operator, 362call the function, 363pop off the argument 364.I a , 365and can then complete evaluation of the containing expression. 366The operations used here will be explained in section 2. 367.PP 368If the function 369.I f 370were given by 371.LS 372 10 \*bfunction\fR f(i: integer): real; 373 11 \*bbegin\fR 374 12 f := i 375 13 \*bend\fR; 376.LE 377then 378.I f 379would have code sequence: 380.DS 381.TS 382lp-2w(8) l. 383BEG:2 0 384 11 385 "f" 386LV:\fIl\fR 40 387RV4:\fIl\fR 32 388AS48 389END 390.TE 391.DE 392.ZP 393Here the 394.SM BEG 395operator takes 9 bytes of inline data. 396The first byte specifies the 397length of the function name. 398The second longword specifies the 399amount of local variable storage, here none. 400The succeeding two lines give the line number of the 401.B begin 402and the name of the block 403for error traceback. 404The 405.SM BEG 406operator places a name pointer in the block mark. 407The body of the 408.B function 409first takes an address of the 410.B function 411result variable 412.I f 413using the address of operator 414.SM LV 415.I a . 416The next operation in the interpretation of this function is the loading 417of the value of 418.I i . 419.I I 420is at the level of the 421.B function 422.I f , 423here symbolically 424.I l, 425and the first variable in the local variable area. 426The 427.B function 428completes by assigning the 4 byte integer on the stack to the 8 byte 429return location, hence the 430.SM AS48 431assignment operator, and then uses the 432.SM END 433operator to exit the current block. 434.NH 2 435The main interpreter loop 436.PP 437The main interpreter loop is simply: 438.DS 439.mD 440iloop: 441 \fBcaseb\fR (lc)+,$0,$255 442 <table of opcode interpreter addresses> 443.DE 444.ZP 445The main opcode is extracted from the first byte of the instruction 446and used to index into the table of opcode interpreter addresses. 447Control is then transferred to the specified location. 448The sub-opcode may be used to index the display, 449as a small constant, 450or to specify one of several relational operators. 451In the cases where a constant is needed, but it 452is not small enough to fit in the byte sub-operator, 453a zero is placed there and the constant follows in the next word. 454Zero is easily tested for, 455as the instruction that fetches the 456sub-opcode sets the condition code flags. 457A construction like: 458.DS 459.mD 460_OPER: 461 \fBcvtbl\fR (lc)+,r0 462 \fBbneq\fR L1 463 \fBcvtwl\fR (lc)+,r0 464L1: ... 465.DE 466is all that is needed to effect this packing of data. 467This technique saves space in the Pascal 468.I obj 469object code. 470.PP 471The address of the instruction at 472.I iloop 473is always contained in the register variable 474.I loop . 475Thus a return to the main interpreter is simply: 476.DS 477 \fBjmp\fR (loop) 478.DE 479that is both quick and occupies little space. 480.NH 2 481Errors 482.PP 483Errors during interpretation fall into three classes: 484.DS 4851) Interpreter detected errors. 4862) Hardware detected errors. 4873) External events. 488.DE 489.PP 490Interpreter detected errors include I/O errors and 491built-in function errors. 492These errors cause a subroutine call to an error routine 493with a single parameter indicating the cause of the error. 494Hardware errors such as range errors and overflows are 495fielded by a special routine that determines the opcode 496that caused the error. 497It then calls the error routine with an appropriate error 498parameter. 499External events include interrupts and system limits such 500as available memory. 501They generate a call to the error routine with an 502appropriate error code. 503The error routine processes the error condition, 504printing an appropriate error message and usually 505a backtrace from the point of the error. 506