xref: /original-bsd/old/lisp/PSD.doc/franz.n (revision c577960b)
Copyright (c) 1980 Regents of the University of California.
All rights reserved. The Berkeley software License Agreement
specifies the terms and conditions for redistribution.

@(#)franz.n 6.1 (Berkeley) 04/29/86

." franz.n -[Thu Jun 17 11:01:27 1982 by jkf]- ." this file will contain a description of the franz lisp system. ." sort of a systems manual. F\s-2RANZ\s0 L\s-2ISP\s0 .. .tp .(l C .sz +2 The Franz Lisp System .sz -2 by John K. Foderaro .)l .tl ''Abstract''

This document describes the .Fr system written at The University of California at Berkeley. Included are descriptions of the memory management, interpreter and compiler, the conventions used within the C coded kernel and those within the compiled code. This does not duplicate the information found in The Franz Lisp Manual. .++ C '\s-2Draft\s+2'The Franz Lisp System'%' .+c Characteristics of \F\s-2RANZ \s0L\s-2ISP\s0 .ls 1 .pp There is no formal standard for lisp systems, thus each lisp system is almost guaranteed to be different from any other lisp system. In this section we will examine the design decisions which most often characterize a lisp system. This focus does not imply that all other parts of the language are generally agreed upon. In fact, there is nothing sacred to lisp system designers. For example, one usually identifies the symbol .i nil with the empty list, and one usually thinks of lisp as a dynamically scoped language. Both of these conventions are are not true in the lisp dialect .i NIL currently being written at MIT. As another example, one imagines that a lisp system must use garbage collection to reclaim storage. The lisp dialect .i ZetaLisp that is running on Lisp Machines doesn't normally use a garbage collector because of the way in which it allocates its address space. It is faster to reboot the machine than to do a garbage collection. In most lisp dialects the lisp expressions are written in parenthesised form. In .i Portable .i Standard .i Lisp (PSL) at the University of Utah, programs are written primarily in an Algol-like syntax. .pp Thus some of the discussion to follow indicates not so much .i unique charateristics of .Fr but instead, how decisions were made. .sh 2 Scoping and Binding .pp The .Fr interpreter uses .i dynamic .i scoping , that is, after A calls B, B can access all of the variables that A allocated as well as those that A can access, provided B doesn't allocate variables of the same names. There are two popular ways of implementing dynamic scoping: .i deep .i binding and .i shallow .i binding . Note that we will use the terms variable allocation and lambda binding interchangeably in this report. The former term is less language-specific than the latter. .sh +1 Deep Binding .pp In a deep binding implementation, when a variable is allocated, the name of the variable and a space for its value are pushed on the top of a stack. When a program asks for the value of a variable, the lisp system must search down the stack for the first occurrence of the variable. This system offers great flexibility for the programmer since the state of his program can be described by the contents of the stack. It is thus possible to save the context of a program by just saving the stack or in some cases just a pointer to a place in the stack. The problem with deep binding is that it is time-consuming to search down the stack for the value of a variable, and, as a result, most systems modify the deep binding model by giving variables a global value cell and allowing programs to set and access the global cell. Some implementations of Interlisp use deep binding. .sh +0 Shallow Binding .pp In a shallow binding implementation, each variable has a global value cell which contains the current value of the variable. When a variable is allocated inside a program, its name and old value are pushed on a stack called a .i bindstack . The variable is then given a new value. Throughout the lifetime of the allocation, the current value of the variable can be found in the global value cell associated with the variable. When leaving the context of the variable's allocation, the old value is restored from the .i bindstack . A shallow binding scheme makes it much cheaper to access the values of variables, however it is much more difficult to save the state and to restore it. .pp .Fr and most other lisps which are not derived from Interlisp use shallow binding. Some versions of Interlisp use shallow binding. .sh -1 Lisp Compiler .pp Dynamic scoping often is not necessary and it is never cheap, even in a shallow binding implementation. Thus the .Fr compiler prefers to do lexical scoping rather than dynamic scoping. If the user does not specify otherwise, when a function is compiled, all variables allocated will be put on a local stack and will .i not be accessible by any function that this function calls. This convention results in faster code being generated by the compiler, but if the user is not careful, it can result in compiled and interpreted code working differently. In some of the new lisp designs, such as .i NIL and .i Common .i Lisp the semantics of compiled and interpreted code have been unified by transferring the compiler semantics (lexical scoping) to the interpreter. This is a rather large step if dynamic scoping is no longer an option, and it is not clear whether the resulting language should be called 'lisp'. .sh +0 Data Types .pp A complete list of data types in .Fr can be found in the first chapter of the .Fr Manual. The most important ones are described below. .pp Lisp is a list processing language and the basic data type is the list cell. A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr). It contains pointers to two lisp objects, and these pointers can be accessed via the .i car and .i cdr functions. Each pointer requires four (8-bit) bytes and thus the list cell is eight bytes large. The cdr pointer is stored first since this makes it easier to 'cdr down a list' using the addressing modes of the VAX. .pp The symbol cell is used to implement variables. It contains a pointer to a string that is the print name, a cell to store a value, a cell to store the function definition, a cell to store a pointer to the property list and one pointer that the list system uses if it stores the symbol in the oblist. .sh +0 Memory Management .pp A lisp system must be able to determine the type of an object at run time. The method used to determine the type influences the way storage is managed and garbage collection is done. Next, we will look at three possible places to store the type information. .sh +1 Typed data .pp The type of the data object could be placed in the object itself, much as Pascal stores the type of a variant record in the record itself. This could result is a large amount of storage being used to store type information. No lisp system that we know of uses this method exclusively. .sh +0 Typed pointers .pp Every reference to a lisp object could contain the type of the object referenced. This is a good idea in a machine like an IBM 370 where only part of machine word is used by the addressing hardware. Lisps that use typed pointers generally use a .i heap memory management scheme and a .i copying garbage collector. In a heap scheme, all memory is divided by a pointer (called the heap pointer) separating lisp data and free space. When a new lisp object is needed, it is formed at the dividing line and then the heap pointer is incremented past the new object. Garbage collection is done by maintaining two separate spaces for lisp data, only one of which is used at any one time. When one fills up, all active lisp objects are copied to the other space, leaving the first space totally free. Subsequent allocations are done from the second space, until it fills up, at which point all active data in the second space is copied to the first space. The advantage of the copying garbage collector is that the data space is compacted, which will improve virtual memory behavior. The disadvantage is that half the data space is always unused. .pp PSL on a PDP-10 uses this scheme, as does Lisp 370 on an IBM 370. PSL and NIL on the VAX will also use this scheme. Since the VAX hardware uses the entire word for address calculation, PSL and NIL must mask off the type bits of a pointer before using it to reference memory. .sh +0 Typed pages .pp The final alternative is to allocate data objects by pages rather than individually. Each page contains only one type of object and there is a table that keeps track of the type of data object in each page. The address of an object tells which page the object is on and the page number tells which type the object is. Since a whole page of objects is allocated when only one object is necessary, the rest of the objects in the page are put on a free list. This method of allocation is sometimes referred to as .i bibop for BIg Bag Of Pages. The garbage collector for bibop is usually a traditional .i mark and .i sweep algorithm. All active data objects are marked and then each page is swept linearly with the free cells being put on the free list and the used cells left alone. The advantage of mark and sweep over a copying garbage collector is that the mark phase is cheaper because data objects do not have to be copied. Also, all of memory can be used since there is no requirement for two spaces. The disadvantage is that a sweep phase is required in a mark and sweep but one is not required in a copying garbage collector. The sweep phase is expensive because it has to alter most data pages while building a free list. This operation can be expensive in a virtual memory environment. Alternatives to the sweep phase have been proposed in [Foderaro+Fateman Characterization of Vax Macsyma]. .pp .Fr uses a bibop memory allocator and a mark and sweep garbage collector, as does Maclisp (on a PDP-10). The reason that .Fr uses bibop is primarily due to the VAX architecture, which makes typed pointers expensive to implement. Also, typed pointers would make it difficult to pass lisp values to programs written in other languages, some of which may not have the ability to extract the address of a lisp object from a typed pointer. .sh -1 Construction .pp The .Fr system is built by adding lisp code to a C-coded kernel. The C-coded kernel is a working lisp interpreter, although it has few of the functions a lisp system needs. The lisp code provides most of the functions, the top level and error handlers. The lisp compiler is also written in lisp, but is usually run as a separate process. .sh +0 Unique features .pp .Fr can dynamically load and execute functions defined in the other languages which run on the VAX. It uses two different dynamic loaders. One is part of the lisp system and is used for lisp only, it is called .i fasl (for fast loader). The other is the Unix loader, ld, which is used for loading in C, Fortran or Pascal code. Loading code with fasl is much faster than with ld since fasl benefits from the simplicity of a lisp object file. .+c Data Structures .sh 0 _ .nr $1 \n(ch .pp In this chapter we shall examine the data structures which are most important to the .Fr system. First, though we will see how the lisp system uses the address space. .sh 2 Physical layout .pp As a Unix process, lisp has three address regions: text, data and stack. Text begins at location 0 and continues up to 0x3b73c (0x means hex number). The data segment begins on the next page boundary and continues up to a limit set by the configuration parameters (currently 0x2fd000). Lisp does not begin running with such a large data segment, rather it grows when necessary. The stack segment begins at address 0x7fffffff and grows downward to a maximum size of one-half megabyte. .pp The text segment stores the instructions for the C kernel as well as read-only data. The read-only data for lisp are the symbol nil, the i/o ports, and the small integer table. The symbol nil is stored at location 0 which makes it very easy to test whether a value is nil. The problem with storing nil in read-only space is that a special case must be made for nil's property list, which is the only thing in the nil symbol that the user is permitted to alter. .pp The fixnums -1024 through 1023 are stored sequentially, with 0 being stored at 0x1400. .Fr doesn't have any 'immediate' lisp data, that is data whose value is stored as part of the reference to the data. But, by storing some of the fixnums in a known place, we can achieve some of the benefits of immediate data: A program can use .i eq as a test for a fixnum in the range -1024 to 1023. In the majority of cases, when asked to allocate a fixnum, the system can return a pointer into this table and bypass the memory allocator. .sh +0 Typetable .pp .Fr uses the typed pages (or bibop) method of type determination. The .i typetable is an array of bytes ( .i chars in C lingo). This table describes the type of all pages, from page 0 where nil is stored, up to the end of data space. Thus there are many entries that describe the types of the pages which make up the C kernel. In order to reduce the wasted space in the typetable, we could have made the typetable begin typing pages at the start of data space and make a special case of the pages that contain nil and the small integer table. However, the effect of this change would be that type checks (which are done in-line in compiled code) would be longer and slower. Since type checking happens quite frequently, we felt it was better to waste the space in the typetable in order to save execution time and instruction space. .pp Each page on a VAX is 512 bytes, and thus to find the type of an object given the address of it, it is only necessary to shift the address right 9 bits and index the typetable array offset by one. The offset by one is required because the value -4, which is an illegal address, is used as a sentinel value to indicate an illegal value. Thus when we take the type of -4 we will be indexing the table by -1 and we want to retrieve the first byte in the table (which contains the value UNBOUND). The C macro which retrieves the type is this (from file h/global.h): .(b I #define TYPE(a1) ((typetable+1)[(int)(a1) >> 9]) .)b This is code generated by the lisp compiler to check if the type code of an argument (stored at 0(r10)) is a symbol (which is type code 1): .(b I ashl $-9,0(r10),r0 cmpb _typetable+1[r0],$1 .)b .pp The type codes which are stored in the typetable are these: .ts 2i 4i 6i .(b I UNBO -1\tSTRNG 0\tATOM 1\tINT 2 DTPR 3\tDOUB 4\tBCD 5\tPORT 6 ARRAY 7\tOTHER 8\tSDOT 9\tVALUE 10 HUNK2 11\tHUNK4 12\tHUNK8 13\tHUNK16 14 HUNK32 15\tHUNK64 16\tHUNK128 17 .)b The names given above are the C kernel internal names. ATOM is symbol, INT is fixnum, DTPR is list cell, DOUB is flonum, BCD is binary, SDOT is bignum and all the HUNKn types are just known as hunk to the user. .sh +0 Stacks .pp .Fr uses three stacks: the .i C .i runtime stack, the .i namestack and the .i bindstack . The C runtime stack is the stack part of the address space and the other two stacks are stored in the data space. .sh +1 C runtime stack The C-coded kernel uses this stack in the same way as a typical C program use the stack: storing return addresses and non-lisp-object arguments to subroutines, saving registers, and storing local variables within a function. The lisp system stores .i catch .i frames on this stack as well (to be described later). .pp The lisp system assumes that there are no lisp data on the stack and thus the use of this stack by a program is unrestricted. As will be discussed later on, the .b calls and .b ret instructions are used for most subroutine calls and returns. These instructions expect the stack to look a certain way. .sh +0 Namestack .pp The namestack contains only lisp data. It is used to pass arguments to functions and to hold local (lexically scoped) data within lisp functions. It is also used as a temporary storage spot for lisp data which must be protected from garbage collection. .pp A slight digression on the garbage collector: The person who writes code for the lisp system must always be aware of his greatest enemy: the garbage collector. Whenever a function is called that could possibly allocate more lisp data, one must assume that it when it attempts to allocate space, the garbage collector will be invoked and that it will take away everything that isn't protected from garbage collection. The objects that are protected from garbage collection are: the namestack, the bindstack, the oblist (table of interned symbols), and the compiler literals. Objects that are only referred to by values in registers or the C runtime stack will not be seen by the mark phase of the garbage collector and will be swept up during the sweep phase. .pp Back to the namestack: The first free element on the namestack is pointed to by the C variable .i np . This variable is always stored in the VAX register r6. Another pointer into the namestack is the C variable .i lbot . It is always stored in VAX register r7. Its use will be described in the section on calling conventions. .sh +0 Bindstack .pp The bindstack is a stack of lisp object pairs: symbol and value. It is used to implement shallow binding. When a symbol is lambda-bound, the symbol and its current value are put on this stack. Then the symbol can be given a new value. When the context of the lambda binding is over, the symbol and value pair are popped from the stack and the symbol is given its old value. The C variable .i bnp points to the first free entry on the bindstack. In the C code, the following macros lambda-bind a symbol to a value and restore the old value: .(b #define PUSHDOWN(atom,value)\e {bnp->atm=(atom);\e bnp++->val=(atom)->a.clb;\e (atom)->a.clb=value;\e if(bnp>bnplim) binderr();} #define POP\e {--bnp; bnp->atm->a.clb=bnp->val;} .)b .sh -1 Bitmap .pp The bitmap is used in garbage collection to hold the mark bits for the mark and sweep garbage collector. As its names implies, it is viewed as a collection of bits. The minimum size of a lisp data object is 4 bytes, thus there are 128 of those on a VAX page of 512 bytes. Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits. Therefore the size of the bitmap in bytes is 16 times the maximum number of pages. Like the typetable, the bitmap keeps track of marks from the bottom of memory, not the bottom of data space. The bitmap and the typetable are static structures. It is their size, which is determined when the C kernel is built, which determines the size to which the data segment can grow. .sh +0 Transfer Tables .pp Transfer tables are used by compiled lisp code as a transfer vector to other functions. A transfer table consists of pairs of entries: symbol and pointer to function. When a compiled lisp function calls another (non-local) function, it calls indirectly through an entry in the transfer table. Depending on the setting of certain status variables, the call may bring control into a function linkage routine or directly to the function desired (if that function is a compiled lisp or C function). .pp Transfer tables serve a number of useful purposes. .np They allow compiled code to call interpreted code or compiled code using the same calling sequence. .np They allow the selection of which function to call to be determined at runtime based on the name of the function. In most other languages, this selection process is done at either compile or load time. .np They allow the user to run in a debugging mode where all calls from compiled code go through the interpreter. Once control reaches the interpreter, it is easier to debug. .np They allow the user to run in a production mode, where all calls from compiled to other compiled code are done without function lookup overhead. .np They allow the user to switch between debugging and production modes at runtime with one function call. .pp Transfer tables will be described further in the section on the lisp compiler. .sh +0 Catch frames .pp Lisp has a number of forms of non-local transfers. Among them are .i throw , .i error , .i return and .i go . If a program is willing to accept a non-local transfer, it puts a .i catch .i frame on the stack which describes which type of transfer it accepts. The catch frame describes the current state of the registers, the variables np, lbot, and bnp, and also contains entries that describe what kinds of non-local transfers the function will accept. After creating a catch frame, the program goes on to evaluate forms. Should the evaluation of one of those forms result in a non-local transfer to the catch frame that this program has set up, the system will restore the namestack and bindstack to the way they were when the program put the catch frame on the stack (by using np and bnp) and will return control to the program (setting the variable retval to describe why it returned). This is described further in the section on interpreter conventions. .pp The C variable .i errp points to the most recent catch frame, and then each catch frame points to the previous catch frame. .sh +0 oblist .pp Normally when symbols are created they are .i interned , that is they are put in a hash table called an oblist (or obarray). The purpose of the oblist is to insure that if you type a symbol's name to the reader, you will always get the same symbol. Also it protects the symbol from garbage collection. The oblist is simply a hash table with buckets, with a hash link being part of the symbol data structure. Currently the hash table is not accessible to a lisp program, but with a little work it could be. .+c Interpreter .sh 0 _ .nr $1 \n(ch .pp The interpreter is composed of these parts: .ip core: The functions eval, apply and funcall. .ip stack management: The code to create catch frames and handle non-local transfers. .ip memory management: The code to allocate and garbage collect memory. .ip the functions: The user callable functions that expect lisp arguments and return lisp values. .pp In the above list, the first three are written mainly in C (with a little assembler) and the last is written mainly in Lisp with a little bit in C and a very small amount in assembler. .pp The core functions are the center of activity in the interpreter. The .i eval function given a lisp expression to evaluate. It must determine if it is a simple expression such as a symbol or number whose value eval can determine immediately, or if it is an function calling expression. If the form is a function call, eval must determine if the arguments should be evaluated and if so eval must recursively call itself to evaluate the arguments and then stack them. If the function called is to be interpreted, eval must lambda-bind the formal variables of the function to the arguments just stacked. If the function being called is compiled, eval just calls the function and lets the function do the lambda binding if it wants to. .pp The .i apply function is passed two lisp objects: a function to call (either a symbol or a functional object) and a list of arguments already evaluated. It will do lambda binding if the function to call is to be interpreted. .pp The .i funcall function is passed any number of lisp objects, the first of which is the function to call, and the rest are the arguments to the function which have already been evaluated and stacked. Funcall will do lambda binding if the function to call is to be interpreted. .pp When compiled lisp code calls a function which must be interpreted, it enters the interpreter through the funcall function. The interpreter may go to compiled code through either eval, apply or funcall, though most often it goes through eval. .sh 2 Conventions .pp These are the conventions used in interpreted and compiled code. .sh +1 C conventions .pp Our conventions are extensions of the C calling conventions, so we describe here the conventions for C. The VAX has 16 general purpose registers. Registers r12 through r15 are reserved for use by the hardware because we use the .i calls subroutine call. Registers r0-r5 can be used by a program without saving them first. The result of a function call is returned in r0. Registers r11-r6 are allocated (from high to low) by the C compiler when a .i register declaration is made in the C code. Registers r11-r6 must be saved upon entry and restored upon exit if they are used within a function. On the VAX it is very easy to preserve registers since it is done automatically by the hardware if the .i calls (or .i callg ) instruction is used. The first short word (two bytes) of a subroutine is a register save mask which tells which registers should be saved (on the C runtime stack) upon entry and restored when a .i ret instruction is done. .sh +0 np and lbot .pp Earlier we mentioned that the C variables np and lbot are always stored in registers r6 and r7. It is very difficult to globally assign a variable to a register in the C language (no external register declarations are permitted) and thus we have to filter the output of the C compiler and convert all occurrences of 'np' to 'r6' and all occurrences of 'lbot' to 'r7'. This is only half the job, though. We also must modify the register save masks for those routines which alter the value of np or lbot to insure that those registers get saved and restored upon function entry and exit. This is done in the C code by putting .(b C Savestack(n) .)b at the beginning of a routine which will alter np or lbot and which also allocates n register variables. Also in that routine, before the function returns, we put .(b C Restorestack() .)b This is not really necessary in the VAX, but it is there for other machine implementations which don't have a .i ret function which restores registers. The calls to Savestack are recognized by a filter which processes the output of the C compiler and fixes up the save masks. .sh +0 Function calling .pp The following text describes what the conventions are for calling compiled lisp functions, whether they were written in lisp or C. We look at it from the viewpoint of the called function. .pp Upon entry to the compiled functon, lbot points to the first argument and np points to the first free spot on the namestack. If np equals lbot then there are no arguments. Recall that np will be in r6 and lbot in r7. The function is free to alter registers r0 through r5 and should return the result in r0. The function may alter registers r6 through r11 as long as their original values are restored when the function returns. The value of np should always point to the first free entry in the namestack. This is all that is required. The extra conventions followed by the lisp compiler in the code it generates are described in the next chapter. .sh +0 Catch frames .pp A catch frame saves the state of execution that a program might want to return to at some later time. A catch frame is stored on the C runtime stack, with the most recent frame pointed to by the C global variable errp. The information saved is .ip args - one or two optional arguments. The lisp compiler always stacks two arguments since it must know exactly how large the frame is. .ip class - the class describes what type of frame this is (described below). .ip retaddr - address to return to if returning from this frame .ip olderrp - pointer to next older catch frame on the stack .ip bnp - value of bnp (bindstack pointer) when the frame was created .ip np - value of np when the frame was created .ip lbot - value of lbot when the frame was created. .ip system dependent - the rest of the information stacked depends on the particular machine. In the case of the VAX, registers r13 through r8 are stacked. (r14 and r15 are the stack pointer and program counter; they are not saved since they can be calculated from the other information). .pp The information in a catch frame is put on the C runtime stack in the precise order given above, and the variable errp points not at the beginning of the entire frame, but to the lbot entry. Thus errp + 4 points to np. The classes of frames are described below. Each class is defined as a constant in the C code (h/frame.h) whose value is a small integer. .ip F_PROG [1] - takes no arguments. This frame marks the beginning of a piece of code which can accept a .i return or a .i go . The interpreter uses this to implement .i prog and .i do and for its primative break loop. The lisp compiler does not use catch frames for progs since it only permits .i returns and .i gos to occur within .i progs or .i dos and thus it can determine how to do the non-local transfer at compile time. .ip F_CATCH [2] - this takes one or two arguments and is used to implement both .i catch and .i errset . In both cases the first argument is the tag which is being caught, which in the case of an .i errset is symbol ER%all. An .i errset also supplies a second argument which is non nil if the error message should be printed. .ip F_RESET [3] - in the C kernel, the reset function is implemented as a non local transfer to a F_RESET catchframe. When the lisp system is built, the reset function is redefined to do a .i throw. Thus this type of catch frame is rarely used. .ip F_EVAL [4] - this has one argument, the form being evaluated. Since stacking this on every eval would be expensive, this type of catch frame is only stacked if a (*rset t) has been done and if the value of the symbol .i *rset is non nil. The form being evaluated is stacked so that the necessary information for the .i evalframe function is available and so that .i freturn can return from the context of any pending evaluation. .ip F_FUNCALL [5] - this is much like F_EVAL, except the one argument it takes is the name of the function to call. It has the same restrictions on when it is stacked as F_EVAL and for the same reasons. .pp In C, a frame is pushed on the stack with Pushframe, with a call of one of these forms: .(b errp = Pushframe(class); errp = Pushframe(class,arg1); errp = Pushframe(class,arg1,arg2); .)b After the call the C variable .i retval contains the value which describes why this function returned. You must remember that it is possible for this one function call to return more than once! The reasons it can return are these (from h/frame.h): .ip C_INITIAL [0] This is the initial call to set up the frame. .ip C_GO [1] This will only happen for F_PROG frames. In this case, the C variable lispretval contains a lisp symbol which is the tag to which control should be transferred. If the tag cannot be found in this prog or do body, the tag should be again thrown up the stack to a next higher prog or do. .ip C_RET [2] This will only happen for F_PROG frames. In this case, lispretval contains the value to return from this prog. .ip C_THROW [3] This will only happen for F_CATCH frames. In this case lispretval contains the value to return from this catch. .ip C_RESET [4] This will only happen for F_RESET frames. It simply means that a reset is being done. .ip C_FRETURN [5] This will only happen for F_EVAL and F_FRETURN frames. The variable lispretval contains the value to return from this call to .i eval or .i funcall . .pp The call to Pushframe is turned into a simple subroutine call (using the .i jsb instruction instead of the .i calls ) by the filters which alter the code coming out of the C compiler. Thus it may be useful to see what stacking a catch frame really looks like. Here is what the lisp compiler generates to stack the frame for (*catch 'tag x) .(b movl 0(r8),r0 #move 'tag to r0 pushl $0 # dummy argument pushl r0 # put tag argument on C runtime stack pushl $2 # push F_CATCH jsb _qpushframe # call Pushframe movl r0,_errp # move result into errp .)b .pp Every function which does a Pushframe() must also do a Popframe() before it returns to its caller. This simply removes the frame from the stack. In C it is written:

.tl ''errp = Popframe()'' in the code generated by the lisp compiler it looks like: .(b movl _errp,r1 # put current errp in r1 movl 12(r1),_errp # put previous errp in errp addl3 $32,r1,sp # pop frame from stack .)b .pp Non-local transfers are done with the Inonlocalgo function. This function always takes three arguments. The first is the return type (one of the types mentioned above which begin with 'C_'). It will be assigned to retval. The second argument is the value to be assigned to lispretval, except in the case of a C_THROW, where the second argument is the tag to throw to and the third argument is the value to assign to lispretval. This function never returns. If it doesn't find a catch frame which matches what it is looking for, it signals an error. The function is called with .i calls and the arguments are stacked on the C runtime stack, not the namestack. .+c Liszt: The Lisp Compiler .sh 0 _ .nr $1 \n(ch .pp The purpose of compiling a lisp function is to create a representation of the function which can be evaluated in less time and perhaps take up less space. There are two approaches to lisp compilers. One is to convert the functions to a compact form, often called .i bytecodes which can be rapidly interpreted. Each bytecode represents a primitive operation in the lisp system. This approach is simple to implement but there is an time penalty in using an interpreter. The other approach is to compiled to machine code. In general, this is not as portable as the bytecode approach but the result generally runs faster. There are two ways of compiling to machine code. One is to place arguments to functions in registers. If there are more arguments than registers, the arguments are put on a stack and a special type of call is made. This method is used in Maclisp. The other method is to assume a stack model, in which arguments to a function are placed on a stack. This is what the .Fr compiler Liszt does. The stack model made it much easier to write parts of the lisp system in the C langauge. It also simplifies the garbage collector since the mark phase doesn't have to locate and peruse all saved registers looking for lisp data. .sh 2 File Compiler .pp When a file of lisp expressions is loaded, the .i load function repeatedly reads and evaluates the forms in the file. Some of these evaluations may result in functions being defined, and others may use the functions just defined or previously defined. The job of the lisp compiler is to create an object file, which, when read in by .i fasl, acts just like it was .i load ed in, except when a function is defined it is defined in machine code, not as a lisp expression. This is quite a bit different from what compilers do in other languages and it is done this way to make it easier to switch between compiled and interpreted code. .sh +0 Differences between compiled and interpreted .pp There are some differences, though, between compiled and interpreted code. Local variables in compiled code are lexically scoped (put on the namestack and inaccessible to other functions) unless the variable has been declared .i special. The declaration: .(b (declare (special x y)) .)b declares both x and y to be special from this point in the file on. The declaration .(b (declare (specials t)) .)b declares all variables to be special. .pp Lisp has a very powerful macro definition system. The compiler will macro expand all it can, whereas the interpreter expands macros when necessary but never replaces a macro call with its expansion. Thus if you redefine a macro, the compiled code that uses it will still work the same way, but the interpreted code will use the new definition. Also, when compiling a file, macro definitions must be done before any call on the macro is encountered during compiling. In the interpreter, macros can be defined or redefined anytime up until they are used. Thus the interpreter is far freer about macro definitions than the compiler. This could cause programs which work interpreted to fail compiled. .sh +0 Top level algorithm .pp The top level algorithm of the compiler is simply this: .np read a lisp expression from the input file .np macro expand the top level of the form as much as possible .np if the form is a function definition (a list beginning with the symbol 'def') then compile it. .np if the form is not a function definition then put it on a list of forms that will be evaluated when the file is .i fasl ed in. .np if not at end of file, go to step 1. .pp In reality, step 3 is a bit more complex. If the definition is of a macro, then the form will be evaluated immediately, thus adding the macro definition to the compiler's environment. If the lisp variable .i macros is t then the macro will also be compiled. There are also some forms like (progn 'compile ...) and (eval-when () ) which determine when the forms get compiled and evaluated. See the lisp manual for details. .sh +0 Expression Compilation .pp Just as the interpreter is centered around the .i eval function, the lisp compiler is centered around the function .i d-exp whose job it is to compile the expression passed to it. The lisp compiler is one pass. Before a call to d-exp returns, all the compiled code for a form has been computed and written to a file. One reason that the lisp compiler can be a single pass compiler is that the assembler which reads the compiler's output is a two pass assembler. .sh +1 global state variables .pp There are a number of variables that maintain the state of the compilation process. These variables are declared special and are thus dynamically scoped and visible to any function within the compiler. When d-exp is called their meanings are: .ip v-form - contains the expression to be compiled. .ip g-loc - tells where the result of evaluating this expression should be put. If g-loc is nil, then the value returned is unimportant and shouldn't be put anywhere. .ip g-cc - controls conditional branches. If g-cc is non nil, then it is a list cell whose value has either a non-null car or non-null cdr but not both. If the car is non-nil then if the value of the expression held in v-form is non-nil, a branch should be done to the symbol stored in (car g-cc). If the cdr is non-nil then if the value of v-form is nil, a branch should be done to the symbol stored in (cdr g-cc). Since at every conditional branch control should either jump or continue, the car and cdr will never both be specified. .ip g-ret - is non nil if the result of evaluating v-form will be returned as the value of the function we are evaluating. This is used to allow liszt to detect when tail recursion removal is possible. .ip g-locs - maintains current information about the state of the stacks: the bindstack (for specials), the namestack (for locals) and the C runtime stack (for catch frames) The form of g-locs is a stack of tokens stored as a list. The meaning of each token is as follows:

nil - this represents an unnamed object on the namestack. This happens when calling functions, when the arguments are stacked prior to a function call.

<symbol> - the given symbol is the name of a local variable on the namestack.

(prog . <n>) - prog statement which binds <n> special variables

(do . <n>) - head of a do statement which binds <n> special variables

(catcherrset . 0) - catch or errset frame on stack

(lambda . <n>) - lambda expression which binds <n> special variables .ip g-labs - this is a stack of labels. There is one entry in g-labs for every entry which is a list in g-locs. the forms of the entries are:

nil - no labels in this form

(endlab (real . gen) (real2 . gen2) ...) - endlab is the label to go to to get out of this body. After this label will be code to unbind specials and pop off locals. The 'real' labels are those found in the code. the gen labels are those generated and put in the assembler code. .sh +0 Function compilation .pp The action d-exp takes when compiling an expression depends on the type of expression. For atoms (symbols and numbers) the compilation is very simple, it is just a matter of putting the value where g-loc specifies and then jumping if specified by g-cc. When the expression is a list, d-exp first macro expands the form and then looks at the first element of the list (if the list has not macro expanded to an atom). If the first element is a symbol then the list is is a function call. If the function is one of the functions which liszt knows how to open compile then liszt will call the particular routine designated to open compile this function. There are two classes of functions within liszt that do open compiling. The first class, the fl-expr class, are distinguished by names which begin with c-. These functions always generate code which returns the result in r0. They are not equipped to obey the contents of g-loc and g-cc. Thus d-exp, after calling one of these functions, must do what is necessary to obey g-loc and g-cc. The other class of open compiling functions, the fl-exprcc class (whose names begin with cc-), handle g-loc and g-cc. As a result these are usually much more complex and generate better code. .sh -1 Register Conventions .pp The register conventions used by liszt are an extension of those used by the C code. .ip r0 - return value placed here .ip r1,r2,r3,r4 - scratch registers. When long strings of .i car's or .i cdr's are done (such as .i caddadr ) these registers are used in a least-recently-used fashion to access down the list. The compiler keeps track of the values in these registers but must assume that they are garbage after a function is called. .ip r5 - fixnum accumulator. There a number of functions which work on fixnum's only and these usually put their result in r5. The assembler code function .i qnewint which returns a pointer to a cell containing a fixnum value (after checking if it is in the fixnum table), expects its argument to be in r5. .ip r6 np .ip r7 lbot. When calling a function, this register is set just before the function call, and after the function call its value is used to reset the value of np in order to pop the arguments off the namestack. .ip r8 the literal table pointer. The compiler generates a table of all the literal lisp data which the compiled code might access. Upon function entry a pointer to the base of this table is put in r8. For example, if we compile (setq x 'foo) then we will need a pointer to the lisp symbol .i foo and if the symbol .i x as been declared special we will also need a pointer to .i x . .ip r10 - upon function entry the value of lbot (r7) is put in r10. This allows us to reference the arguments to our function while using lbot to call other function. .sh +0 Addresses There are four types of addresses used internally in Franz: symbolic, intermediate addresses (iadr), extended intermediate (eiadr) and vax assembler format. .sh +1 Symbolic .pp These are the names of lisp objects, such as `a', `foo', 34, or 3.234234. A name is either a local variable, a special variable or a number. A number is either a small fixnum, large fixnum, bignum or floating point number. .sh +0 Intermediate address (IADR) .pp This type of address is generated from a symbolic address by .i d-loc, .i d-loclit and the routines .i d-simple and .i d-rsimple which call them. The forms are .ip Nil - the location or value of nil. .ip T - the location or value of t. .ip reg - register r0, which is where the result of function calls are stored. .ip stack - the address pointed to by np with np incremented after the value is stored. (i.e (r6)+) .ip unstack - the address one word below np (np is decremented before accessing. (i.e. -(r6)) .ip <symbol> - this is just <symbol>. This allows strange forms to be represented directly. .ip (stack n) - The n'th value on the namestack for this function. The first value 0(r10) is (stack 1). .ip (vstack n) - The cdr of the n'th value on the namestack. That is, (stack 1) is *0(r10) .ip (bind n) - The value of the n'th value in the literal table. If this refers to a symbol, then this is the value of the symbol. If this refers to a list then this this is the cdr of the list (although this is a rare use). (bind 1) corresponds to *0(r8). .ip (lbind n) - The n'th value in the literal table. If this refers to object foo then this is the address of foo in memory. .ip (fixnum n) - This is the address of small fixnum n in memory. A small fixnum is in the range -1024 to 1023. .ip (immed n) - The is the immediate constant n. .sh +0 extended intermediate address (EIADR) .pp This address is generated from an IADR by e-cvt. It symbolically represents a vax address. .ip <atom> - This corresponds to <atom>. .ip (n <atom>) - This corresponds to n(<atom>) (stack n+1) and (lbind n+1) are converted to these forms. .ip (n <atom1> <atom2>) - This corresponds to n(<atom1>)[<atom2>] There is currently no IADR which generates this. .ip (* n <atom>) -This corresponds to *n(<atom>) (vstack n+1) and (bind n+1) are converted to these forms. .ip (+ <atom>) - This corresponds to (<atom>)+. stack is converted to this form. .ip (- <atom>) - This corresponds to -(<atom>) unstack is converted to this form. .ip ($ <numb>) - This corresponds to $<atom> (immed <numb>) is converted to this form. .ip (# <numb>) - This corresponds to $<loc> where <loc> equals the base of the fixnums (0x1400) plus 4 * <numb> (fixnum <numb>) is converted to this form .ip (* # <numb>) - This corresponds to $<numb>. It is generated by d-rsimple occasionally when you ask for the cdr of a number (which you do sometimes when you are compiling fixnum operators). .sh +0 Vax Unix assembler addresses .pp The addresses are printed from a EIADR by e-cvtas. The conversions are shown in the above section. .sh -1 Function calling convention .sh +1 Function linkages .pp The function associated with a symbol is stored in the function definition slot of the symbol. If the function slot contains a list then the function is to be interpreted and its 'car' will be lambda, nlambda, lexpr or macro. If the function is compiled then the function definition slot will contain a binary object. A binary object is composed of two independent parts: the discipline and the entry. The discipline is one of: .ip lambda - a function whose arguments should be evaluated. .ip nlambda - a function whose arguments should not be evaluated but which should be passed as a list .ip macro - a function which should be passed the unevaluated form being evaluated and whose result should be evaluated. .ip \"subroutine\" - a foreign function subroutine .ip \"integer-function\" - a foreign function returning an integer .ip \"real-function" - a foreign function returning a flonum. .pp A lexpr is not in the list as there is no difference to the caller between compiled lambda's and compiled lexprs. .sh +0 Transfer tables Most calls from compiled code to other lisp functions go through transfer tables. A transfer table entry is a pair: symbol, address of routine. When another lisp function is called it uses the .i calls instruction which will indirectly jump through the address in the transfer table. This address may point to the desired function or it may point to the interface routine. If a call ends up in the interface routine, then that routine will trace back through the call stack and eventually reach the tranfer table entry that it was 'called through'. Since the transfer table entry contains a symbol which names the function to be called, the interface routine can determine which routine was to have been called. If that routine is compiled then the interface routine can modify the transfer table so that a call through that table entry will go directly to the desired function. If the routine to be called is interpreted, or if the user has requested that transfer linkages should be disabled, then the interface routine will go through the 'funcall' function in the interpreter to continue the call. .sh +0 calling sequence in the compiler: .pp When transfer tables are used .(b foreach arg compute arg and stack result using (r6)+ for example: movl r0,(r6)+ movab -n(r6),r7 where n = 4*number of args to fcn calls $0,*trantb+m where m is the index of the function in the transfer table. movl r7,r6 restore r6 .)b .pp The compiler supports local functions, which are function accessible only within one file. Because they are not accessible from C code, we can use a very fast call and return sequence when calling them. To call a local function .(b foreach arg compute and stack using (r6)+ jsb LOCALNAME go directly to the function, it will restore r6 before it returns. .)b .pp The beginning of each function looks as follows. First for a non-lexpr function called in the standard way (topsim is the label jumped to when tail merging, it will be unique for each function; the brackets indicate the optional code which exists if the -p switch is given to liszt); .(b .word 0x5c0 # save r6, r7, r8, r10 [ movab mcounts,r0 # if profiling, mcounts replaced by fasl jsb mcount ] movab linker,r8 # set up r8 movl r7,r10 # set up oldlbot movab n(r10),r6 # n = 4*Number of args expected. topsim: .)b .pp Now for lexprs: .(b .word 0x5c0 [ movab mcounts,r0 # if profiling. [mcounts replaced by fasl] jsb mcount ] movab linker,r8 # set up r8 subl3 $4,r7,-(sp) # address one word below bottom of args topsim: movl r6,r10 # first free addr to arg base subl3 r7,r6,r0 # number of args * 4 into r0 movab 0x1400(r0),(r6)+ # stack boxed number of args movl 0(r10),-(sp) # also store on stack so user can't clobber .)b .pp And finally for local functions: .(b movl r10,-(sp) # save r10 movab -n(r6),r10 # set up arg base using arg top topsim: .)b .sh -1 Assembler file format .pp The liszt compiler generates a file which is in Unix assembler format. The Unix assembler converts that file into an object file which fasl then reads. Although the object file generated is a standard Unix object file (as defined in /usr/include/a.out.h), it is not of a format that the Unix ld loader can understand. This is because the requirements of a lisp object file are different from an object file of other languages. The run time semantics of lisp and the fact that lisp data must be protected from garbage collection are the principal differences. The unconventional object file created by the unix assembler is a result of the unconventional assembler input file. Next we will look at what must be put in the assembler file and how it is put there. .pp The assembler file must contain .ip instructions - vax assembler code for the compiled functions. If there aren't any functions compiled, this can be empty. .ip literals - lisp data which is referred to by compiled code .ip transfer table - the names of the functions which correspond to the calls through the transfer table. The instructions simply say 'call indirect through the nth transfer table entry'. .ip function names - the names of the functions which are being defined. .ip load time forms - in order to mimic the .i load function, fasl has to be able to evaluate lisp expressions at fasl time, so we must be able to store lisp expressions in the assembler file and indicate when they should be evaluated. .pp Based on the requirements above, the form of the assembler file is as described below. The assembler builds two segments: text and data. We put all of our information in the text segment. The compiler places some annotation strings in the data segment so that the object file can be identified, however the data segment is ignored by fasl. The format is .ip compiled instructions The instructions for each compiled (non-local) function begins with .(b .globl F00003 F00003: .word 0x5c0 .)b The globl declaration and the fact that the symbol name begins with a F will tell fasl that this is the beginning of a lisp function. The symbols beginning with F must be unique within a file but may be duplicated in other files. The lisp name of the function will appear later. Next the instructions for the function are given. Only a small fixed set of external symbols may be referenced. The list is found in the code for nfasl.c and the common ones will be described below (soon). Labels should be given a name beginning with L and should be unique within the file. .ip table sizes somewhere in the file there should be a pair of 'set' assembler pseudo ops which describe the sizes of the literal table and transfer table. The form is this .(b .set linker_size 4 .set trans_size 3 .)b where linker_size is the number of entries in the literal table which will be required and trans_size is the number of entries in the transfer table which will be required. Those tables will be built by fasl. .ip binder table - this table describes the order that the functions should be defined and forms evaluated. It is a table of longwords beginning at the symbol bind_org and ending when a -1 entry is seen. The meaning of the entries will be described below. .ip lisp expression table - this is a table of null terminated strings beginning at the symbol lit_org and ending at the symbol lit_end. Each string is read by the lisp read function (using the raw readtable). The first linker_size expressions are put in the literal table. The next trans_size expressions are the names of the functions to put in the transfer table. The remaining expressions correspond to the entries in the binder table. The binder entries are processed, one by one. Provided that the binder entry is not -1, an expression is read from the lisp expression table. Then if the binder table entry is 0, that expression is the name of a lambda type lisp function. A binary object is created, the discipline is set to lambda and the function address is set to the lexically next function defined in the file. If the binder entry is 1 then this is an nlambda function, and if the entry is 2 then this is a macro function. If the entry is 99 then the expression just read should be evaluated by the lisp function eval. .pp The lisp compiler normally puts the assembler format output file in /tmp and removes it when it is done. The -S switch will tell liszt to write the assembler file in the same place as the source file, and with the same root name but a .s extension. The -C file will tell the lisp compiler to comment the file as it generates it, making it easier to understand what is going on. Assume that the following is file foo.l: .(b (defun foo (x) (bar y) (baz k)) (print (foo 3)) (def test (nlambda (l) (print 'hithere) (foo 3))) .)b we now compile it with -SC .(b .globl F00007 #(fcn lambda foo) F00007: .word 0x5c0 movab linker,r8 movl r7,r10 movab 4(r10),r6 L00008: movl *0(r8),(r6)+ #(calling bar) #(from y to stack) movab -4(r6),r7 calls $0,*trantb+0 movl r7,r6 movl *4(r8),(r6)+ #(calling baz) #(from k to stack) movab -4(r6),r7 calls $0,*trantb+8 movl r7,r6 ret .globl F00009 #(fcn nlambda test) F00009: .word 0x5c0 movab linker,r8 movl r7,r10 movab 4(r10),r6 L00010: movl 8(r8),(r6)+ #(calling print) #(from 'hithere to stack) movab -4(r6),r7 calls $0,*trantb+16 movl r7,r6 movl $5132,(r6)+ #(calling foo) #(from (fixnum 3) to stack) movab -4(r6),r7 calls $0,*trantb+24 movl r7,r6 ret bind_org: .set linker_size, 3 .set trans_size, 4 .long 0 .long 99 .long 1 .long -1 lit_org: .asciz "y" .asciz "k" .asciz "hithere" .asciz "bar" .asciz "baz" .asciz "print" .asciz "foo" .asciz "foo" .asciz "(print (foo 3))" .asciz "test" lit_end: .data # this is just for documentation .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982" .asciz "@(#)decl.l 1.5 2/10/82" .asciz "@(#)array.l 1.1 9/25/81" .asciz "@(#)datab.l 1.1 9/25/81" .asciz "@(#)expr.l 1.1 9/25/81" .asciz "@(#)io.l 1.1 9/25/81" .asciz "@(#)funa.l 1.3 2/10/82" .asciz "@(#)funb.l 1.2 2/10/82" .asciz "@(#)func.l 1.2 2/10/82" .asciz "@(#)tlev.l 1.4 10/24/81" .asciz "@(#)fixnum.l 1.6 10/21/81" .asciz "@(#)util.l 1.2 10/7/81" .)b .sh +0 functions callable from compiled lisp code .sh +0 Object File Format .sh +0 Fasl