xref: /original-bsd/share/doc/papers/px/pxin1.n (revision 50dd0bba)
1.if !\n(xx .so tmac.p
2.tr _\(ru
3.nr H1 0
4.NH
5Organization
6.PP
7Most of
8.I px
9is written in the
10.SM "VAX 11/780"
11assembly language, using the
12.UX
13assembler
14.I as.
15Portions of
16.I px
17are also written in the
18.UX
19systems programming language C.
20.I Px
21consists of a main procedure that reads in the interpreter code,
22a main interpreter loop that transfers successively to various
23code segments implementing the abstract machine operations,
24built-in procedures and functions,
25and several routines that support the implementation of the
26Pascal input-output environment.
27.PP
28The interpreter runs at a fraction of the speed of equivalent
29compiled C code, with this fraction varying from 1/5 to 1/15.
30The interpreter occupies 18.5K bytes of instruction space, shared among
31all processes executing Pascal, and has 4.6K bytes of data space (constants,
32error messages, etc.) a copy of which is allocated to each executing process.
33.NH 2
34Format of the object file
35.PP
36.I Px
37normally interprets the code left in an object file by a run of the
38Pascal translator
39.I pi.
40The file where the translator puts the object originally, and the most
41commonly interpreted file, is called
42.I obj.
43In order that all persons using
44.I px
45share a common text image, this executable file is
46a small process that coordinates with the interpreter to start
47execution.
48The interpreter code is placed
49at the end of a special ``header'' file and the size of the initialized
50data area of this header file is expanded to include this code,
51so that during execution it is located at an
52easily determined address in its data space.
53When executed, the object process creates a
54.I pipe ,
55creates another process by doing a
56.I fork ,
57and arranges that the resulting parent process becomes an instance of
58.I px .
59The child process then writes the interpreter code through
60the pipe that it has to the
61interpreter process parent.
62When this process is complete, the child exits.
63.PP
64The real advantage of this approach is that it does not require modifications
65to the shell, and that the resultant objects are ``true objects'' not
66requiring special treatment.
67A simpler mechanism would be to determine the name of the file that was
68executed and pass this to the interpreter.
69However it is not possible to determine this name
70in all cases.\*(Dd
71.FS
72\*(dd\ For instance, if the
73.I pxref
74program is placed in the directory
75`/usr/bin'
76then when the user types
77``pxref program.p''
78the first argument to the program, nominally the programs name, is
79``pxref.''
80While it would be possible to search in the standard place,
81i.e. the current directory, and the system directories
82`/bin'
83and
84`/usr/bin'
85for a corresponding object file,
86this would be expensive and not guaranteed to succeed.
87Several shells exist that allow other directories to be searched
88for commands, and there is,
89in general,
90no way to determine what these directories are.
91.FE
92.NH 2
93General features of object code
94.PP
95Pascal object code is relocatable as all addressing references for
96control transfers within the code are relative.
97The code consists of instructions interspersed with inline data.
98All instructions have a length that is an even number of bytes.
99No variables are kept in the object code area.
100.PP
101The first byte of a Pascal interpreter instruction contains an operation
102code.
103This allows a total of 256 major operation codes, and 232 of these are
104in use in the current
105.I px.
106The second byte of each interpreter instruction is called the
107``sub-operation code'',
108or more commonly the
109.I sub-opcode.
110It contains a small integer that may, for example, be used as a
111block-structure level for the associated operation.
112If the instruction can take a longword constant,
113this constant is often packed into the sub-opcode
114if it fits into 8 bits and is not zero.
115A sub-opcode value of zero specifies that the constant would not
116fit and therefore follows in the next word.
117This is a space optimization, the value of zero for flagging
118the longer case being convenient because it is easy to test.
119.PP
120Other instruction formats are used.
121The branching
122instructions take an offset in the following word,
123operators that load constants onto the stack
124take arbitrarily long inline constant values,
125and many operations deal exclusively with data on the
126interpreter stack, requiring no inline data.
127.NH 2
128Stack structure of the interpreter
129.PP
130The interpreter emulates a stack-structured Pascal machine.
131The ``load'' instructions put values onto the stack, where all
132arithmetic operations take place.
133The ``store'' instructions take values off the stack
134and place them in an address that is also contained on the stack.
135The only way to move data or to compute in the machine is with the stack.
136.PP
137To make the interpreter operations more powerful
138and to thereby increase the interpreter speed,
139the arithmetic operations in the interpreter are ``typed''.
140That is, length conversion of arithmetic values occurs when they are
141used in an operation.
142This eliminates interpreter cycles for length conversion
143and the associated overhead.
144For example, when adding an integer that fits in one byte to one that
145requires four bytes to store, no ``conversion'' operators are required.
146The one byte integer is loaded onto the stack, followed by the four
147byte integer, and then an adding operator is used that has, implicit
148in its definition, the sizes of the arguments.
149.NH 2
150Data types in the interpreter
151.PP
152The interpreter deals with several different fundamental data types.
153In the memory of the machine, 1, 2, and 4 byte integers are supported,
154with only 2 and 4 byte integers being present on the stack.
155The interpreter always converts to 4 byte integers when there is a possibility
156of overflowing the shorter formats.
157This corresponds to the Pascal language definition of overflow in
158arithmetic operations that requires that the result be correct
159if all partial values lie within the bounds of the base integer type:
1604 byte integer values.
161.PP
162Character constants are treated similarly to 1 byte integers for
163most purposes, as are Boolean values.
164All enumerated types are treated as integer values of
165an appropriate length, usually 1 byte.
166The interpreter also has real numbers, occupying 8 bytes of storage,
167and sets and strings of varying length.
168The appropriate operations are included for each data type, such as
169set union and intersection and an operation to write a string.
170.PP
171No special
172.B packed
173data formats are supported by the interpreter.
174The smallest unit of storage occupied by any variable is one byte.
175The built-ins
176.I pack
177and
178.I unpack
179thus degenerate to simple memory to memory transfers with
180no special processing.
181.NH 2
182Runtime environment
183.PP
184The interpreter runtime environment uses a stack data area and a heap
185data area, that are kept at opposite ends of memory
186and grow towards each other.
187All global variables and variables local to procedures and functions
188are kept in the stack area.
189Dynamically allocated variables and buffers for input/output are
190allocated in the heap.
191.PP
192The addressing of block structured variables is done by using
193a fixed display
194that contains the address of its stack frame
195for each statically active block.\*(Dg
196.FS
197\*(dg\ Here ``block'' is being used to mean any
198.I procedure ,
199.I function
200or the main program.
201.FE
202This display is referenced by instructions that load and store
203variables and maintained by the operations for
204block entry and exit, and for non-local
205.B goto
206statements.
207.NH 2
208Dp, lc, loop
209.PP
210Three ``global'' variables in the interpreter, in addition to the
211``display'', are the
212.I dp,
213.I lc,
214and the
215.I loop.
216The
217.I dp
218is a pointer to the display entry for the current block;
219the
220.I lc
221is the abstract machine location counter;
222and the
223.I loop
224is a register that holds the address of the main interpreter
225loop so that returning to the loop to fetch the next instruction is
226a fast operation.
227.NH 2
228The stack frame structure
229.PP
230Each active block
231has a stack frame consisting of three parts:
232a block mark, local variables, and temporary storage for partially
233evaluated expressions.
234The stack in the interpreter grows from the high addresses in memory
235to the low addresses,
236so that those parts of the stack frame that are ``on the top''
237of the stack have the most negative offsets from the display
238entry for the block.
239The major parts of the stack frame are represented in Figure 1.1.
240.so fig1.1.n
241Note that the local variables of each block
242have negative offsets from the corresponding display entry,
243the ``first'' local variable having offset `\-2'.
244.NH 2
245The block mark
246.PP
247The block mark contains the saved information necessary
248to restore the environment when the current block exits.
249It consists of two parts.
250The first and top-most part is saved by the
251.SM CALL
252instruction in the interpreter.
253This information is not present for the main program
254as it is never ``called''.
255The second part of the block mark is created by the
256.SM BEG
257begin block operator that also allocates and clears the
258local variable storage.
259The format of these blocks is represented in Figure 1.2.
260.sp
261.so fig1.2.n
262.PP
263The data saved by the
264.SM CALL
265operator includes the line number
266.I lino
267of the point of call,
268that is printed if the program execution ends abnormally;
269the location counter
270.I lc
271giving the return address;
272and the current display entry address
273.I dp
274at the time of call.
275.PP
276The
277.SM BEG
278begin operator saves the previous display contents at the level
279of this block, so that the display can be restored on block exit.
280A pointer to the beginning line number and the
281name of this block is also saved.
282This information is stored in the interpreter object code in-line after the
283.SM BEG
284operator.
285It is used in printing a post-mortem backtrace.
286The saved file name and buffer reference are necessary because of
287the input/output structure
288(this is discussed in detail in
289sections 3.3 and 3.4).
290The top of stack reference gives the value the stack pointer should
291have when there are no expression temporaries on the stack.
292It is used for a consistency check in the
293.SM LINO
294line number operators in the interpreter, that occurs before
295each statement executed.
296This helps to catch bugs in the interpreter, that often manifest
297themselves by leaving the stack non-empty between statements.
298.PP
299Note that there is no explicit static link here.
300Thus to set up the display correctly after a non-local
301.B goto
302statement one must ``unwind''
303through all the block marks on the stack to rebuild the display.
304.NH 2
305Arguments and return values
306.PP
307A function returns its value into a space reserved by the calling
308block.
309Arguments to a
310.B function
311are placed on top of this return area.
312For both
313.B procedure
314and
315.B function
316calls, arguments are placed at the end of the expression evaluation area
317of the caller.
318When a
319.B function
320completes, expression evaluation can continue
321after popping the arguments to the
322.B function
323off the stack,
324exactly as if the function value had been ``loaded''.
325The arguments to a
326.B procedure
327are also popped off the stack by the caller
328after its execution ends.
329.KS
330.PP
331As a simple example consider the following stack structure
332for a call to a function
333.I f,
334of the form ``f(a)''.
335.so fig1.3.n
336.KE
337.PP
338If we suppose that
339.I f
340returns a
341.I real
342and that
343.I a
344is an integer,
345the calling sequence for this function would be:
346.DS
347.TS
348lp-2w(8) l.
349PUSH	\-8
350RV4:\fIl	a\fR
351CALL:\fIl	f\fR
352POP	4
353.TE
354.DE
355.ZP
356Here we use the operator
357.SM PUSH
358to clear space for the return value,
359load
360.I a
361on the stack with a ``right value'' operator,
362call the function,
363pop off the argument
364.I a ,
365and can then complete evaluation of the containing expression.
366The operations used here will be explained in section 2.
367.PP
368If the function
369.I f
370were given by
371.LS
372 10 \*bfunction\fR f(i: integer): real;
373 11 \*bbegin\fR
374 12     f := i
375 13 \*bend\fR;
376.LE
377then
378.I f
379would have code sequence:
380.DS
381.TS
382lp-2w(8) l.
383BEG:2	0
384	11
385	"f"
386LV:\fIl\fR	40
387RV4:\fIl\fR	32
388AS48
389END
390.TE
391.DE
392.ZP
393Here the
394.SM BEG
395operator takes 9 bytes of inline data.
396The first byte specifies the
397length of the function name.
398The second longword specifies the
399amount of local variable storage, here none.
400The succeeding two lines give the line number of the
401.B begin
402and the name of the block
403for error traceback.
404The
405.SM BEG
406operator places a name pointer in the block mark.
407The body of the
408.B function
409first takes an address of the
410.B function
411result variable
412.I f
413using the address of operator
414.SM LV
415.I a .
416The next operation in the interpretation of this function is the loading
417of the value of
418.I i .
419.I I
420is at the level of the
421.B function
422.I f ,
423here symbolically
424.I l,
425and the first variable in the local variable area.
426The
427.B function
428completes by assigning the 4 byte integer on the stack to the 8 byte
429return location, hence the
430.SM AS48
431assignment operator, and then uses the
432.SM END
433operator to exit the current block.
434.NH 2
435The main interpreter loop
436.PP
437The main interpreter loop is simply:
438.DS
439.mD
440iloop:
441	\fBcaseb\fR	(lc)+,$0,$255
442	<table of opcode interpreter addresses>
443.DE
444.ZP
445The main opcode is extracted from the first byte of the instruction
446and used to index into the table of opcode interpreter addresses.
447Control is then transferred to the specified location.
448The sub-opcode may be used to index the display,
449as a small constant,
450or to specify one of several relational operators.
451In the cases where a constant is needed, but it
452is not small enough to fit in the byte sub-operator,
453a zero is placed there and the constant follows in the next word.
454Zero is easily tested for,
455as the instruction that fetches the
456sub-opcode sets the condition code flags.
457A construction like:
458.DS
459.mD
460_OPER:
461	\fBcvtbl\fR	(lc)+,r0
462	\fBbneq\fR	L1
463	\fBcvtwl\fR	(lc)+,r0
464L1:	...
465.DE
466is all that is needed to effect this packing of data.
467This technique saves space in the Pascal
468.I obj
469object code.
470.PP
471The address of the instruction at
472.I iloop
473is always contained in the register variable
474.I loop .
475Thus a return to the main interpreter is simply:
476.DS
477	\fBjmp\fR	(loop)
478.DE
479that is both quick and occupies little space.
480.NH 2
481Errors
482.PP
483Errors during interpretation fall into three classes:
484.DS
4851) Interpreter detected errors.
4862) Hardware detected errors.
4873) External events.
488.DE
489.PP
490Interpreter detected errors include I/O errors and
491built-in function errors.
492These errors cause a subroutine call to an error routine
493with a single parameter indicating the cause of the error.
494Hardware errors such as range errors and overflows are
495fielded by a special routine that determines the opcode
496that caused the error.
497It then calls the error routine with an appropriate error
498parameter.
499External events include interrupts and system limits such
500as available memory.
501They generate a call to the error routine with an
502appropriate error code.
503The error routine processes the error condition,
504printing an appropriate error message and usually
505a backtrace from the point of the error.
506