lisp/PSD.doc/franz.n

         Copyright (c) 1980 Regents of the University of California.
 All rights reserved. The Berkeley software License Agreement
 specifies the terms and conditions for redistribution.

 @(#)franz.n 6.1 (Berkeley) 04/29/86

." franz.n -[Thu Jun 17 11:01:27 1982 by jkf]-
." this file will contain a description of the franz lisp system.
." sort of a systems manual.
F\s-2RANZ\s0 L\s-2ISP\s0
..
.tp
.(l C
.sz +2
The Franz Lisp System
.sz -2
by
John K. Foderaro
.)l
.tl ''Abstract''

This document describes the
.Fr
system written at The University of California
at Berkeley.
Included are descriptions of the memory management, interpreter
and compiler, the conventions used within the C coded kernel
and those within the compiled code.
This does not duplicate the information found in The Franz Lisp Manual.
.++ C '\s-2Draft\s+2'The Franz Lisp System'%'
.+c Characteristics of \F\s-2RANZ \s0L\s-2ISP\s0
.ls 1
.pp
There is no formal standard for lisp systems, thus each lisp
system is almost guaranteed to be different from any other lisp system.
In this section we will examine the design decisions which most often
characterize a lisp system.
This
focus does not imply that all other parts of the language are generally
agreed upon.
In fact, there is nothing sacred to lisp system designers.
For example, one usually identifies the symbol
.i nil
with the empty list,
and one usually thinks of lisp as a dynamically scoped language.
Both of these conventions are are not true in the lisp dialect
.i NIL
currently being written at MIT.
As another example, one imagines that a lisp system must
use garbage collection to reclaim storage.
The lisp dialect
.i ZetaLisp
that is running on Lisp Machines doesn't
normally use a garbage collector
because of the way in which it allocates its address space.
It is faster to reboot the machine than to do a garbage collection.
In most lisp dialects the lisp expressions are written in
parenthesised form. In
.i Portable
.i Standard
.i Lisp
(PSL) at the University of Utah, programs are written
primarily
in an Algol-like
syntax.
.pp
Thus some of the discussion to follow indicates not so much
.i unique
charateristics of
.Fr
but instead, how decisions were made.
.sh 2 Scoping and Binding
.pp
The
.Fr
interpreter
uses
.i dynamic
.i scoping ,
that is, after A calls B, B can access all of the
variables that A allocated
as well
as those that A can access, provided B doesn't allocate variables of the same
names.
There are two popular ways of implementing dynamic scoping:
.i deep
.i binding
and
.i shallow
.i binding .
Note that we will use the terms variable allocation and lambda
binding interchangeably in this report.
The former term is less language-specific than the latter.
.sh +1 Deep Binding
.pp
In a deep binding implementation, when a variable is allocated,
the name of the variable and a space for its value are pushed on the
top of a stack.
When a program asks for the value of a variable,
the lisp system must search down the stack for the first occurrence of
the variable.
This system offers great flexibility for the
programmer since the state of his program
can be described by the contents of the stack.
It is thus possible to save the context of a program by just saving
the stack or in some cases just a pointer to
a place in the stack.
The problem with deep binding is that it is time-consuming to search
down the stack for the value of a variable, and,
as a result, most systems
modify the deep binding model by giving variables a global value
cell and allowing programs to set and access the global cell.
Some implementations of Interlisp use deep binding.
.sh +0 Shallow Binding
.pp
In a shallow binding implementation, each variable has a global value cell
which contains the current value of the variable.
When a variable is allocated inside a program, its name
and old value are pushed on a stack called a
.i bindstack .
The variable is then given a new value.
Throughout the lifetime of the allocation, the current value of the
variable can be found in the global value cell associated with the variable.
When leaving the context of the variable's allocation, the old value
is restored from the
.i bindstack .
A shallow binding scheme makes it much cheaper to access
the values of variables, however it is much more difficult to
save the state and to restore it.
.pp
.Fr
and most other lisps which are not derived from Interlisp
use shallow binding.
Some versions of Interlisp use shallow binding.
.sh -1 Lisp Compiler
.pp
Dynamic scoping often is not necessary and it is never cheap, even
in a shallow binding implementation.
Thus the
.Fr
compiler prefers to do lexical scoping rather than dynamic
scoping.
If the user does not specify otherwise, when a
function is compiled, all variables
allocated will be
put on a local stack and will
.i not
be accessible by any
function that this function calls.
This
convention results in faster code being generated by
the compiler, but if the
user is not careful, it can result in compiled and interpreted code
working differently.
In some of the new lisp designs, such as
.i NIL
and
.i Common
.i Lisp
the semantics of compiled and interpreted code have been
unified by
transferring the compiler semantics (lexical scoping) to the interpreter.
This is a rather large step
if dynamic scoping is no
longer an option, and it is not clear whether the resulting
language should be called 'lisp'.
.sh +0 Data Types
.pp
A complete list of data types in
.Fr
can be found in the first chapter of the
.Fr
Manual.
The most important ones are described below.
.pp
Lisp is a list processing language and the basic data type is the list
cell.
A list cell also goes by the name of cons cell, pair, or dotted pair (dtpr).
It contains pointers to two lisp objects, and these pointers can
be accessed via the
.i car
and
.i cdr
functions.
Each pointer requires four (8-bit) bytes and thus the list cell is
eight bytes large.
The cdr pointer is stored first since this makes it easier
to 'cdr down a list' using the addressing modes of the VAX.
.pp
The symbol cell is used to implement variables. It contains a pointer
to a string that is the print name, a cell to store a value, a cell to
store the function definition, a cell to store a pointer to the property
list and one pointer that the list system uses if it stores
the symbol in the oblist.
.sh +0 Memory Management
.pp
A lisp system must be able to determine the type of an
object at run time.
The method used to determine the type influences the
way storage is managed and garbage collection is done.
Next, we will look at three possible places
to store the type information.
.sh +1 Typed data
.pp
The type of the data object could be placed in the object itself, much
as Pascal stores the type of a variant record in the record itself.
This could result is a large amount of storage being used to store
type information.
No lisp system that we know of uses this method exclusively.
.sh +0 Typed pointers
.pp
Every reference to a lisp object could contain the type of the object
referenced.
This is a good idea in a machine like an IBM 370 where only
part of machine word is used by the addressing hardware.
Lisps that use typed pointers generally use a
.i heap
memory management scheme and a
.i copying
garbage collector.
In a heap scheme,
all memory is divided by a pointer (called the
heap pointer) separating lisp
data and free space.
When a new lisp object is needed, it is formed at the
dividing line and then
the heap pointer is incremented
past the new object.
Garbage collection is done by maintaining two separate spaces for lisp
data, only one of which is used at any one time.
When one fills up, all active lisp objects are copied to the
other space, leaving the first space totally free.
Subsequent allocations are done from the second space, until it fills up,
at which point
all active data in the second space is copied to the first space.
The advantage of the copying garbage collector is that the data space
is compacted, which will improve virtual memory behavior.
The disadvantage is that
half the data space is always unused.
.pp
PSL on a PDP-10 uses this scheme, as does Lisp 370 on
an IBM 370.
PSL and NIL on the VAX will also use this scheme.
Since the VAX hardware
uses the entire word for address
calculation,
PSL and NIL
must mask off the type bits
of a pointer before using it to reference memory.
.sh +0 Typed pages
.pp
The final alternative is to allocate data objects by pages rather than
individually.
Each page contains only one type of object and there is a table that
keeps track of the type of data object in each page.
The address of an object tells which page the object is on and
the page number tells which type the object is.
Since a whole page of objects is allocated when only one
object is necessary,
the rest of the objects in the page are put on a free list.
This method of allocation is sometimes referred to as
.i bibop
for BIg Bag Of Pages.
The garbage collector for bibop is usually
a traditional
.i mark
and
.i sweep
algorithm.
All active data objects are marked and
then each page is swept linearly with
the free cells being put on the free list and the used cells
left alone.
The advantage of mark and sweep over a copying garbage collector is that
the mark phase is cheaper because data objects do not have to be copied.
Also, all of memory can be used since there is no requirement for two spaces.
The disadvantage is that a sweep phase is required in a mark and sweep
but one is not required in a copying garbage collector.
The sweep phase is expensive because it has to alter most data pages
while building a free list.
This operation
can be expensive in a virtual memory environment.
Alternatives to the sweep phase have been proposed in
[Foderaro+Fateman Characterization of Vax Macsyma].
.pp
.Fr
uses a bibop memory allocator and a mark and sweep garbage collector,
as does Maclisp (on a PDP-10).
The reason that
.Fr
uses bibop is primarily due to the VAX architecture,
which makes typed pointers
expensive to implement.
Also, typed pointers would make it difficult to pass lisp values
to programs written in other languages, some of which may not have
the ability to extract the address of a lisp object from a typed pointer.
.sh -1 Construction
.pp
The
.Fr
system is built by adding lisp code to a C-coded kernel.
The C-coded kernel is a working lisp interpreter, although it has few of
the functions a lisp system needs.
The lisp code provides most of the functions, the top level and error
handlers.
The lisp compiler is also written in lisp,
but is usually
run as a separate process.
.sh +0 Unique features
.pp
.Fr
can dynamically load and execute functions defined in the other languages
which run on the VAX.
It uses two different dynamic loaders.
One is part of the lisp system and is used for lisp only, it is called
.i fasl
(for fast loader).
The other is the Unix loader, ld, which is used for loading in
C, Fortran or Pascal code.
Loading code with fasl
is much faster than with ld
since fasl benefits from
the simplicity of a lisp object file.
.+c Data Structures
.sh 0 _
.nr $1 \n(ch
.pp
In this chapter we shall examine the data structures which are most
important to the
.Fr
system.
First, though we will see how the lisp system uses the address space.
.sh 2 Physical layout
.pp
As a Unix process, lisp has three address regions: text, data and stack.
Text begins at location 0 and continues up to 0x3b73c (0x means hex number).
The data segment begins on the next page boundary and continues
up to a limit set by the configuration parameters (currently 0x2fd000).
Lisp does not begin running with such a large data segment, rather it grows
when necessary.
The stack segment begins at address 0x7fffffff and grows downward to a
maximum size of one-half megabyte.
.pp
The text segment stores the instructions for the C kernel as well as read-only
data.
The read-only data for lisp are
the symbol nil, the i/o ports, and the small integer
table.
The symbol nil is stored at location 0 which makes it very easy to test
whether a value is nil.
The problem with storing nil in read-only space is that a special case
must be made for nil's property list, which is the only thing in the
nil symbol that the user is permitted to alter.
.pp
The fixnums -1024 through 1023 are stored sequentially, with 0 being
stored at 0x1400.
.Fr
doesn't have any 'immediate' lisp data, that is data whose value is
stored as part of the reference to the data.
But, by storing some of the fixnums in a known place, we can achieve
some of the benefits of immediate data:
A program can use
.i eq
as a test for a fixnum in the range -1024 to 1023.
In the majority of cases, when asked
to allocate a fixnum, the system can return a pointer into this table
and bypass the memory allocator.
.sh +0 Typetable
.pp
.Fr
uses the typed pages (or bibop)
method of type determination.
The
.i typetable
is an array of bytes (
.i chars
in C lingo).
This table describes the type of all pages, from page 0 where nil is stored,
up to the end of data space.
Thus there are many entries that describe the types of the pages which
make up the C kernel.
In order to reduce the wasted space in the typetable,
we could have made the typetable begin typing pages at the start of data
space and make a special case of the pages that contain nil and
the small integer table.
However, the
effect of this change would be that type checks
(which are done in-line in compiled code) would
be longer and slower. Since type checking happens quite
frequently, we felt it was better to waste the space in the
typetable in order to save execution time and instruction space.
.pp
Each page on a VAX is 512 bytes, and thus to find the type of an
object given the address of it,
it is only necessary to shift the address right 9 bits
and index the typetable array offset by one.
The offset by one is required because the value -4, which is an illegal
address, is used as a sentinel value to indicate an illegal value.
Thus when we take the type of -4 we will be indexing the table by -1
and we want to retrieve the first byte in the table (which contains
the value UNBOUND).
The C macro which retrieves the type is this (from file h/global.h):
.(b I

#define TYPE(a1) ((typetable+1)[(int)(a1) >> 9])

.)b
This is code generated by the lisp compiler to check if the type
code of an argument (stored at 0(r10)) is a symbol (which is type
code 1):
.(b I

ashl $-9,0(r10),r0
cmpb _typetable+1[r0],$1

.)b
.pp
The type codes which are stored in the typetable are these:
.ts 2i 4i 6i
.(b I
UNBO -1\tSTRNG 0\tATOM 1\tINT 2
DTPR 3\tDOUB 4\tBCD 5\tPORT 6
ARRAY 7\tOTHER 8\tSDOT 9\tVALUE 10
HUNK2 11\tHUNK4 12\tHUNK8 13\tHUNK16 14
HUNK32 15\tHUNK64 16\tHUNK128 17
.)b
The names given above are the C kernel internal names.
ATOM is symbol, INT is fixnum, DTPR is list cell,
DOUB is flonum, BCD is binary,
SDOT is bignum and all the HUNKn types are just known as hunk to
the user.
.sh +0 Stacks
.pp
.Fr
uses three stacks: the
.i C
.i runtime
stack, the
.i namestack
and the
.i bindstack .
The C runtime stack is the stack part of the address space
and the other two stacks are stored in the data space.
.sh +1 C runtime stack
The C-coded kernel uses this
stack in the same way as a typical C program
use the stack:
storing return addresses
and non-lisp-object arguments to subroutines,
saving registers,
and storing local variables within a function.
The lisp system stores
.i catch
.i frames
on this stack as well (to be described later).
.pp
The lisp system assumes that there are no lisp data on the stack and
thus the use of this stack
by a program is unrestricted.
As will be discussed later on, the
.b calls
and
.b ret
instructions are used for most subroutine calls and returns.
These instructions expect the stack to look a certain way.
.sh +0 Namestack
.pp
The namestack contains only lisp data.
It is used to pass arguments to functions and to hold
local (lexically scoped) data within lisp functions.
It is also used as a temporary storage spot for lisp data
which must be protected from garbage collection.
.pp
A slight digression on the garbage collector:
The person who writes code for the lisp system must always be aware of
his greatest enemy: the garbage collector.
Whenever a function is called that could possibly allocate more lisp
data, one must assume that it
when it attempts to allocate space, the garbage collector
will be invoked and that it will take away everything that isn't protected
from garbage collection.
The objects that are protected from garbage collection are: the
namestack, the bindstack, the oblist (table of interned symbols),
and the compiler literals. Objects that are only
referred to by values in registers or
the C runtime stack will not be seen by the mark phase of the
garbage collector and will be swept up during the sweep phase.
.pp
Back to the namestack:
The first free element on the namestack is pointed to by the
C variable
.i np .
This variable is always stored in the VAX register r6.
Another pointer into the namestack is the C variable
.i lbot .
It is always stored in VAX register r7.
Its use will be described in the section on calling conventions.
.sh +0 Bindstack
.pp
The bindstack is a stack of lisp object pairs: symbol and
value.
It is used to implement shallow binding.
When a symbol is lambda-bound, the symbol and its current value are put
on this stack.
Then the symbol can be given a new value.
When the context of the lambda binding is over, the symbol and value pair
are popped from the stack and the symbol is given its old value.
The C variable
.i bnp
points to the first free entry on the bindstack.
In the C code, the following macros lambda-bind a symbol to a value
and restore the old value:
.(b
#define PUSHDOWN(atom,value)\e
 {bnp->atm=(atom);\e
 bnp++->val=(atom)->a.clb;\e
 (atom)->a.clb=value;\e
 if(bnp>bnplim) binderr();}

#define POP\e
 {--bnp; bnp->atm->a.clb=bnp->val;}
.)b
.sh -1 Bitmap
.pp
The bitmap is used in garbage collection to hold
the mark bits for the mark and sweep garbage collector.
As its names implies, it is viewed as a collection of bits.
The minimum size of a lisp data object is 4 bytes, thus
there are 128 of those on a VAX page of 512 bytes.
Since there are 8 bits in a byte, it takes 16 bytes to obtain 128 bits.
Therefore the size of the bitmap in bytes is 16
times the maximum number of
pages.
Like the typetable, the bitmap keeps track of marks from the
bottom of memory, not the bottom of data space.
The bitmap and the typetable are static structures.
It is their size, which is determined when the C kernel is built,
which determines the size to which the data segment can grow.
.sh +0 Transfer Tables
.pp
Transfer tables are used by compiled lisp code as a transfer vector to
other functions.
A transfer table consists of pairs of entries: symbol and pointer to
function.
When a compiled lisp function calls another (non-local) function, it
calls indirectly through an entry in the transfer table.
Depending on the setting of certain status variables, the call may
bring control into a function linkage routine or directly to
the function desired (if that function is a compiled lisp or C function).
.pp
Transfer tables serve a number of useful purposes.
.np
They allow compiled code to call interpreted code
or compiled code using the same calling sequence.
.np
They allow the selection of which function to call to be determined
at runtime based on the name of the function.
In most other languages, this selection process is done at either
compile or load time.
.np
They allow the user to run in a debugging mode where all calls
from compiled code go through the interpreter.
Once control reaches the interpreter, it is easier to debug.
.np
They allow the user to run in a production mode, where all calls
from compiled to other compiled code are done without function
lookup overhead.
.np
They allow the user to switch between debugging and production modes
at runtime with one function call.
.pp
Transfer tables will be described further in the section on
the lisp compiler.
.sh +0 Catch frames
.pp
Lisp has a number of forms of non-local transfers.
Among them are
.i throw ,
.i error ,
.i return
and
.i go .
If a program is willing to accept a non-local transfer, it puts a
.i catch
.i frame
on the stack which describes which type of transfer it
accepts.
The catch frame describes the current state of the registers,
the variables np, lbot, and bnp, and also contains entries that describe
what kinds of non-local transfers the function will accept.
After creating a catch frame, the program goes on to evaluate forms.
Should the evaluation of one of those forms result in a non-local
transfer to the catch frame that this program has set up, the
system will restore the namestack and bindstack to the way
they were when the program put the catch frame on the stack (by using
np and bnp) and will return control to the program (setting
the variable retval to describe why it returned).
This is described further in the
section on interpreter conventions.
.pp
The C variable
.i errp
points to the most recent catch frame, and then each catch frame points
to the previous catch frame.
.sh +0 oblist
.pp
Normally when symbols are created they are
.i interned ,
that is they are put in a hash table called an oblist (or obarray).
The purpose of the oblist is to insure that if you type a
symbol's name to the reader, you will always get the same symbol.
Also it protects the symbol from garbage collection.
The oblist is simply a hash table with buckets, with a hash link being
part of the symbol data structure.
Currently the hash table is not accessible to a lisp program, but with
a little work it could be.
.+c Interpreter
.sh 0 _
.nr $1 \n(ch
.pp
The interpreter is composed of these parts:
.ip core:
The functions eval, apply and funcall.
.ip stack management:
The code to create catch frames and handle non-local transfers.
.ip memory management:
The code to allocate and garbage collect memory.
.ip the functions:
The user callable functions that expect lisp arguments and return
lisp values.
.pp
In the above list, the first three are written mainly in C (with a little
assembler) and the last is written mainly in Lisp with a little
bit in C
and a very small amount in assembler.
.pp
The core functions are the center of activity in the interpreter.
The
.i eval
function given a lisp expression to evaluate.
It must determine if it is a simple expression such as a symbol or number
whose value eval can determine immediately, or if it is an function
calling expression.
If the form is a function call, eval must determine if the arguments should
be evaluated and if so eval must recursively call itself to evaluate the
arguments and then stack them.
If the function called is to be interpreted, eval must lambda-bind the
formal variables of the function to the arguments just stacked.
If the function being called is compiled, eval just calls the function and
lets the function do the lambda binding if it wants to.
.pp
The
.i apply
function is passed two lisp objects:
a function to call (either a symbol or a functional object)
and a list of arguments already evaluated.
It will do lambda binding if the function to call is to
be interpreted.
.pp
The
.i funcall
function is passed any number of lisp objects,
the first of which is the function to call, and the rest are the
arguments to the function
which have already been evaluated and stacked.
Funcall will do lambda binding if the function to call is
to be interpreted.
.pp
When compiled lisp code calls a function
which must be interpreted, it enters the interpreter through the
funcall function.
The interpreter may go to compiled code through either eval, apply or
funcall, though most often it goes through eval.
.sh 2 Conventions
.pp
These are the conventions used in interpreted and compiled code.
.sh +1 C conventions
.pp
Our conventions are extensions of the C calling conventions, so we
describe here the conventions for C.
The VAX has 16 general
purpose registers.
Registers r12 through r15 are reserved
for use by the hardware because we use the
.i calls
subroutine call.
Registers r0-r5 can be used by a program without saving them first.
The result of a function call is returned in r0.
Registers r11-r6 are allocated (from high to low)
by the C compiler when a
.i register
declaration is made in the C code.
Registers r11-r6 must be
saved upon entry and restored upon exit if they are used within a function.
On the VAX it is very easy to preserve registers
since it is done automatically
by the hardware if the
.i calls
(or
.i callg )
instruction is used.
The first short word (two bytes) of a subroutine is a
register save mask which
tells which registers should be saved (on the C runtime stack) upon
entry and restored when a
.i ret
instruction is done.
.sh +0 np and lbot
.pp
Earlier we mentioned that the C variables np and lbot are
always stored in
registers r6 and r7.
It is very difficult to globally assign a variable to a register
in the C language (no external register declarations
are permitted)
and thus we have to filter
the output of the C compiler and convert all occurrences of 'np' to 'r6'
and all occurrences of 'lbot' to 'r7'.
This is only half the job, though.
We also must modify the register save masks for those routines which
alter the value of np or lbot to insure that those registers get
saved and restored upon function entry and exit.
This is done in the C code by putting
.(b C
Savestack(n)
.)b
at the beginning of a routine which will alter np or lbot and which
also allocates n register variables.
Also in that routine, before the function returns, we put
.(b C
Restorestack()
.)b
This is not really necessary in the VAX, but it is there for other
machine implementations which don't have a
.i ret
function which restores registers.
The calls to Savestack are recognized by a filter which processes
the output
of the C compiler and fixes up
the save masks.
.sh +0 Function calling
.pp
The following text describes what the conventions are for
calling
compiled lisp functions, whether they were written in lisp or C.
We look at it from the viewpoint of the called function.
.pp
Upon entry
to the compiled functon,
lbot points to the first argument and np points to the
first free spot on the namestack.
If np equals lbot then there are no arguments.
Recall that np will be in r6 and lbot in r7.
The function is free to alter registers r0 through r5 and should return
the result in r0.
The function may alter registers r6 through r11 as long as their
original values
are restored when the function returns.
The value of np should always
point to the first free entry in the namestack.
This is all that is required.
The extra conventions followed by
the lisp compiler
in the code it generates are described in the next chapter.
.sh +0 Catch frames
.pp
A catch frame saves the state of execution that a program
might want to return to at some later time.
A catch frame is stored on the C runtime stack, with the most recent
frame pointed to by the C global variable errp.
The information saved is
.ip args
- one or two optional arguments. The lisp compiler always
stacks two
arguments since it must know exactly how large the frame is.
.ip class
- the class describes what type of frame this is (described below).
.ip retaddr
- address to return to if returning from this frame
.ip olderrp
- pointer to next older catch frame on the stack
.ip bnp
- value of bnp (bindstack pointer) when the frame was created
.ip np
- value of np when the frame was created
.ip lbot
- value of lbot when the frame was created.
.ip system dependent
- the rest of the information stacked depends on the particular machine.
In the case of the VAX, registers r13 through r8 are stacked. (r14 and
r15 are the stack pointer and program counter; they are not saved since
they can be calculated from the other information).
.pp
The information in a catch frame is put on the C runtime stack
in the precise order given above, and the variable errp points not
at the beginning of the entire frame, but to the lbot entry.
Thus errp + 4 points to np.
The classes of frames are described below.
Each class is defined as a constant in the C code (h/frame.h) whose value
is a small integer.
.ip F_PROG [1]
- takes no arguments. This frame marks the beginning of a piece of
code which can accept a
.i return
or a
.i go .
The interpreter uses this to implement
.i prog
and
.i do
and for its primative break loop.
The lisp compiler does not use catch frames for progs since it only
permits
.i returns
and
.i gos
to occur within
.i progs
or
.i dos
and thus it can determine how to do the non-local transfer
at compile time.
.ip F_CATCH [2]
- this takes one or two arguments and is used to implement both
.i catch
and
.i errset .
In both cases
the first argument is the tag which is being caught,
which in the case of an
.i errset
is symbol ER%all.
An
.i errset
also
supplies a second argument which is non nil if the error message
should be printed.
.ip F_RESET [3]
- in the C kernel, the reset function
is implemented as a non local transfer to a F_RESET catchframe.
When the lisp system is built, the reset function is redefined to
do a
.i throw.
Thus this type of catch frame is rarely used.
.ip F_EVAL [4]
- this has one argument, the form being evaluated.
Since stacking this
on every eval would be expensive,
this type of catch frame is only stacked if a (*rset t)
has been done and if the value of the symbol
.i *rset
is non nil.
The form being evaluated is stacked so that
the necessary information for the
.i evalframe
function is available and so that
.i freturn
can return from the context of any pending evaluation.
.ip F_FUNCALL [5]
- this is much like F_EVAL,
except the one argument it takes is the
name of the function to call.
It has the same restrictions on when it is stacked as F_EVAL
and for the same reasons.
.pp
In C, a frame is pushed on the stack with Pushframe, with a call
of one of these forms:
.(b
errp = Pushframe(class);
errp = Pushframe(class,arg1);
errp = Pushframe(class,arg1,arg2);
.)b
After the call the C variable
.i retval
contains the value which describes why this function returned.
You must remember that it is possible for this one function call to
return more than once!
The reasons it can return are these (from h/frame.h):
.ip C_INITIAL [0]
This is the initial call to set up the frame.
.ip C_GO [1]
This will only happen for F_PROG frames.
In this case,
the C variable lispretval contains a lisp symbol which is the tag
to which control should be transferred.
If the tag cannot be found in this prog or do body, the
tag should be again thrown up the stack to a next higher prog or do.
.ip C_RET [2]
This will only happen for F_PROG frames.
In this case, lispretval contains the value to return from this prog.
.ip C_THROW [3]
This will only happen for F_CATCH frames.
In this case lispretval contains the value to return from this catch.
.ip C_RESET [4]
This will only happen for F_RESET frames.
It simply means that a reset is being done.
.ip C_FRETURN [5]
This will only happen for F_EVAL and F_FRETURN frames.
The variable lispretval contains the value to return from this
call to
.i eval
or
.i funcall .
.pp
The call to Pushframe is turned into a simple subroutine call (using
the
.i jsb
instruction instead of the
.i calls )
by the filters which alter the code coming out of the C compiler.
Thus it may be useful to see what stacking a catch frame really looks
like.
Here is what the lisp compiler generates
to stack the frame for (*catch 'tag x)
.(b
movl 0(r8),r0 #move 'tag to r0
pushl $0 # dummy argument
pushl r0 # put tag argument on C runtime stack
pushl $2 # push F_CATCH
jsb _qpushframe # call Pushframe
movl r0,_errp # move result into errp
.)b
.pp
Every function which does a Pushframe() must also do a Popframe()
before it returns to its caller.
This simply removes the frame from the stack.
In C it is written:

.tl ''errp = Popframe()''
in the code generated by the lisp compiler it looks like:
.(b
movl _errp,r1 # put current errp in r1
movl 12(r1),_errp # put previous errp in errp
addl3 $32,r1,sp # pop frame from stack
.)b
.pp
Non-local transfers are done with the Inonlocalgo
function. This function always takes three arguments.
The
first is the return type (one of the types mentioned
above which begin with 'C_'). It will be assigned to retval.
The second argument is the value to be assigned to lispretval,
except in the case of a C_THROW, where the second argument
is the tag to throw to and the third argument is the value
to assign to lispretval.
This function never returns.
If it doesn't find a catch frame
which matches what it is looking for, it signals an error.
The function is called with
.i calls
and the arguments are stacked on the C runtime stack, not the
namestack.
.+c Liszt: The Lisp Compiler
.sh 0 _
.nr $1 \n(ch
.pp
The purpose of compiling a lisp function is to create a representation of
the function which can be evaluated in less time and perhaps take
up less space.
There are two approaches to lisp compilers.
One is to convert the functions to a compact form, often called
.i bytecodes
which can be rapidly interpreted.
Each bytecode represents a primitive operation in the lisp system.
This approach is simple to implement but there is an time penalty
in using an interpreter.
The other approach is to compiled to machine code.
In general, this is not as portable as the bytecode approach but the result
generally runs faster.
There are two ways of compiling to machine code.
One is to place arguments to functions in registers.
If there are more arguments than registers, the arguments are put on
a stack and a special type of call is made.
This method is used in Maclisp.
The other method is to assume a stack model, in which
arguments to a function are placed on a stack.
This is what the
.Fr
compiler Liszt does.
The stack model made it much easier to write parts of the lisp
system in the C langauge.
It also simplifies the garbage collector since the mark phase doesn't
have to locate and peruse all saved registers looking for lisp data.
.sh 2 File Compiler
.pp
When a file of lisp expressions is loaded,
the
.i load
function repeatedly reads and evaluates the forms in the
file.
Some of these evaluations may result in functions being defined,
and others may use the functions just defined or previously defined.
The job of the lisp compiler is to create an object file, which,
when read in
by
.i fasl,
acts just like it was
.i load ed
in, except when a function is defined it is defined in machine
code, not as a lisp expression.
This is quite a bit different from what compilers do in other languages
and it is done this way to make it easier to
switch between compiled and
interpreted code.
.sh +0 Differences between compiled and interpreted
.pp
There are some differences, though, between compiled and interpreted code.
Local variables in compiled code are lexically scoped (put on the
namestack and inaccessible to other functions) unless the variable
has been declared
.i special.
The declaration:
.(b
(declare (special x y))
.)b
declares both x and y to be special from this point in the file on.
The declaration
.(b
(declare (specials t))
.)b
declares all variables to be special.
.pp
Lisp has a very powerful macro definition system.
The compiler will macro expand all it can, whereas the interpreter
expands macros when necessary but never replaces a macro call with
its expansion.
Thus if you redefine a macro, the compiled code that uses it will
still work the same way, but the interpreted code will use the
new definition.
Also, when compiling a file, macro definitions must be
done before any call on the macro is encountered during compiling.
In the interpreter,
macros can be defined or redefined anytime up until
they are used.
Thus the interpreter is far freer about macro definitions than
the compiler.
This could cause programs which work interpreted to
fail compiled.
.sh +0 Top level algorithm
.pp
The top level algorithm of the compiler is simply this:
.np
read a lisp expression from the input file
.np
macro expand the top level of the form as much as possible
.np
if the form is a function definition (a list beginning with
the symbol 'def')
then compile it.
.np
if the form is not a function definition then put it on a list of
forms that will be evaluated when the file is
.i fasl ed
in.
.np
if not at end of file, go to step 1.
.pp
In reality, step 3 is a bit more complex.
If the definition is of a macro, then the form will be evaluated
immediately, thus adding the macro definition to the compiler's
environment.
If the lisp variable
.i macros
is t then the macro will also be compiled.
There are also some forms like (progn 'compile ...)
and (eval-when () ) which determine when the
forms get compiled and evaluated.
See the lisp manual for details.
.sh +0 Expression Compilation
.pp
Just as the interpreter is centered around the
.i eval
function, the lisp compiler is centered around the function
.i d-exp
whose job it is to compile the expression passed to it.
The lisp compiler is one pass.
Before a call to d-exp returns,
all the compiled code for a form has been computed and written
to a file.
One reason that the lisp compiler can be a single pass compiler
is that the assembler which reads the compiler's output is
a two pass assembler.
.sh +1 global state variables
.pp
There are a number of variables that maintain the state of
the compilation process.
These variables are declared special and are thus dynamically scoped
and visible to any function within the compiler.
When d-exp
is called their meanings are:
.ip v-form
- contains the expression to be compiled.
.ip g-loc
- tells where the result of evaluating this expression should be put.
If g-loc is nil, then the value returned is unimportant and shouldn't
be put anywhere.
.ip g-cc
- controls conditional branches.
If g-cc is non nil, then it is a list cell whose value has
either a non-null car or non-null cdr but not both.
If the car is non-nil then
if the value of the expression held in
v-form is non-nil, a branch should be done to the symbol
stored in (car g-cc).
If the cdr is non-nil then if the value of v-form
is nil, a branch should be done to the symbol stored in
(cdr g-cc). Since at
every conditional
branch control should either jump or continue, the car and cdr will
never both be specified.
.ip g-ret
- is non nil if the result of evaluating v-form will be returned as the
value of the function we are evaluating.
This is used to allow liszt to detect
when tail recursion removal is possible.
.ip g-locs
- maintains current information about the state of the stacks:
the bindstack (for specials), the namestack (for locals) and the
C runtime stack (for catch frames)
The form of g-locs is a stack of tokens stored as a list.
The meaning of each token is as follows:

nil - this represents an unnamed object on the namestack. This happens
when calling functions, when the arguments are stacked prior
to a function call.

<symbol> - the given symbol is the name of a local variable on the namestack.

(prog . <n>) - prog statement which binds <n> special variables

(do . <n>) - head of a do statement which binds <n> special variables

(catcherrset . 0) - catch or errset frame on stack

(lambda . <n>) - lambda expression which binds <n> special variables
.ip g-labs
- this is a stack of labels.
There is one entry in g-labs for every entry which is a list
in g-locs.
the forms of the entries are:

nil - no labels in this form

(endlab (real . gen) (real2 . gen2) ...) - endlab is the label to go to
to get out of this body. After this label will be code
to unbind specials and pop off locals.
The 'real' labels are those found in the code. the gen
labels are those generated and put in the assembler code.
.sh +0 Function compilation
.pp
The action d-exp takes when compiling an expression depends on the
type of expression.
For atoms (symbols and numbers) the compilation is very simple, it
is just a matter of putting the value where g-loc specifies and
then jumping if specified by g-cc.
When the expression is a list, d-exp first macro
expands the form and then looks at the first element of
the list (if the list has not macro expanded to an atom).
If the first element is a symbol then the list is
is a function call.
If the function is one of the functions which liszt knows how to open compile
then liszt will call the particular routine designated to open
compile this function.
There are two classes of functions within liszt that
do open compiling.
The first class, the fl-expr class, are distinguished by names which
begin with c-.
These functions always generate
code which returns the result in r0.
They are not equipped to obey the contents of g-loc and g-cc.
Thus d-exp, after calling one of these functions, must do what
is necessary to obey g-loc and g-cc.
The other class of open compiling functions, the fl-exprcc class (whose
names begin with cc-),
handle g-loc and g-cc.
As a result these are usually much more complex and generate better code.
.sh -1 Register Conventions
.pp
The register conventions used by liszt are an extension of those
used by the C code.
.ip r0
- return value placed here
.ip r1,r2,r3,r4
- scratch registers.
When long strings of
.i car's
or
.i cdr's
are done (such as
.i caddadr )
these registers are used in a least-recently-used fashion to access down
the list.
The compiler keeps track of the values in these registers but must
assume that they are garbage after a function is called.
.ip r5
- fixnum accumulator.
There a number of functions which work on fixnum's only and these
usually put their result in r5.
The assembler code function
.i qnewint
which returns a pointer to a cell containing a fixnum value (after
checking if it is in the fixnum table), expects its argument to be
in r5.
.ip r6
np
.ip r7
lbot.
When calling a function, this register is set just before the function
call, and after the function call its value is used to reset the value
of np in order to pop the arguments off the namestack.
.ip r8
the literal table pointer.
The compiler generates a table of all the literal lisp data
which the compiled code might access.
Upon function entry a pointer to the base of this table is put in r8.
For example, if we compile (setq x 'foo) then we will
need a pointer to the lisp symbol
.i foo
and if the symbol
.i x
as been declared special we will also need a pointer to
.i x .
.ip r10
- upon function entry the value of lbot (r7) is put in r10.
This allows us to reference the arguments to our function while
using lbot to call other function.
.sh +0 Addresses
There are four types of addresses used internally in Franz:
symbolic, intermediate addresses (iadr), extended intermediate (eiadr)
and vax assembler format.
.sh +1 Symbolic
.pp
These are the names of lisp objects, such as `a', `foo', 34,
or 3.234234.
A name is either a local variable, a special variable or a
number. A number is either a small fixnum, large fixnum,
bignum or floating point number.
.sh +0 Intermediate address (IADR)
.pp
This type of address is generated from a symbolic address by
.i d-loc,
.i d-loclit
and the routines
.i d-simple
and
.i d-rsimple
which
call them. The forms are
.ip Nil
- the location or value of nil.
.ip T
- the location or value of t.
.ip reg
- register r0, which is where the result of function calls
are stored.
.ip stack
- the address pointed to by np with np incremented after
the value is stored. (i.e (r6)+)
.ip unstack
- the address one word below np (np is decremented before
accessing. (i.e. -(r6))
.ip <symbol>
- this is just <symbol>. This allows strange forms to be
represented directly.
.ip (stack n)
- The n'th value on the namestack for this function.
The first value 0(r10) is (stack 1).
.ip (vstack n)
- The cdr of the n'th value on the namestack.
That is, (stack 1) is *0(r10)
.ip (bind n)
- The value of the n'th value in
the literal table. If
this refers to a symbol, then this is the value
of the symbol.
If this refers to a list then this
this is the cdr of the list (although this is a rare
use). (bind 1) corresponds to *0(r8).
.ip (lbind n)
- The n'th value in the literal table. If
this refers to
object foo then this is the address of foo
in memory.
.ip (fixnum n)
- This is the address of small fixnum n in memory.
A small fixnum is in the range -1024 to 1023.
.ip (immed n)
- The is the immediate constant n.
.sh +0 extended intermediate address (EIADR)
.pp
This address is generated from an IADR by e-cvt. It
symbolically represents a vax address.
.ip <atom>
- This corresponds to <atom>.
.ip (n <atom>)
- This corresponds to n(<atom>)
(stack n+1) and (lbind n+1) are converted to these forms.
.ip (n <atom1> <atom2>)
- This corresponds to n(<atom1>)[<atom2>]
There is currently no IADR which generates this.
.ip (* n <atom>)
-This corresponds to *n(<atom>)
(vstack n+1) and (bind n+1) are converted to these forms.
.ip (+ <atom>)
- This corresponds to (<atom>)+.
stack is converted to this form.
.ip (- <atom>)
- This corresponds to -(<atom>)
unstack is converted to this form.
.ip ($ <numb>)
- This corresponds to $<atom>
(immed <numb>) is converted to this form.
.ip (# <numb>)
- This corresponds to $<loc> where <loc> equals
the base of the fixnums (0x1400) plus 4 * <numb>
(fixnum <numb>) is converted to this form
.ip (* # <numb>)
- This corresponds to $<numb>. It is generated
by d-rsimple occasionally when you ask for the
cdr of a number (which you do sometimes when you
are compiling fixnum operators).
.sh +0 Vax Unix assembler addresses
.pp
The addresses are printed from a EIADR by e-cvtas. The conversions
are shown in the above section.
.sh -1 Function calling convention
.sh +1 Function linkages
.pp
The function associated with a symbol is stored in the function
definition slot of the symbol. If the function slot contains a
list then the function is to be interpreted and its 'car' will
be lambda, nlambda, lexpr or macro. If the function is compiled then
the function definition slot will contain a binary object.
A binary object
is composed of two independent parts: the discipline and the entry.
The discipline is one of:
.ip lambda
- a function whose arguments should be evaluated.
.ip nlambda
- a function whose arguments should not be evaluated but
which should be passed as a list
.ip macro
- a function which should be passed the unevaluated form
being evaluated and whose result should be evaluated.
.ip \"subroutine\"
- a foreign function subroutine
.ip \"integer-function\"
- a foreign function returning an integer
.ip \"real-function"
- a foreign function returning a flonum.
.pp
A lexpr is not in the list as there is no difference to the caller
between compiled lambda's and compiled lexprs.
.sh +0 Transfer tables
Most calls from compiled code to other lisp functions go through
transfer tables. A transfer table entry is a pair: symbol, address
of routine.
When another lisp function is called it uses the
.i calls
instruction which will
indirectly jump through the address in the transfer table. This
address may point to the desired function or it may point to the
interface routine. If a call ends up in the interface routine,
then that routine will trace back through the call stack and eventually
reach the tranfer table entry that it was 'called through'. Since the
transfer table entry contains a symbol which names the function
to be called, the interface routine can determine which routine
was to have been called. If that routine is compiled then the
interface routine can modify the transfer table so that a call
through that table entry will go directly to the desired function.
If the routine to be called is interpreted, or if the user has
requested that transfer linkages should be disabled, then the
interface routine will go through the 'funcall' function
in the interpreter to continue the call.
.sh +0 calling sequence in the compiler:
.pp
When transfer tables are used
.(b
 foreach arg
 compute arg and stack result using (r6)+
 for example: movl r0,(r6)+
 movab -n(r6),r7 where n = 4*number of args to fcn
 calls $0,*trantb+m where m is the index of the function
 in the transfer table.
 movl r7,r6 restore r6
.)b
.pp
The compiler supports local functions, which are function accessible
only within one file.
Because they are not accessible from C code, we can use a very fast
call and return sequence when calling them.
To call a local function
.(b
 foreach arg
 compute and stack using (r6)+
 jsb LOCALNAME go directly to the function, it will
 restore r6 before it returns.
.)b
.pp
The beginning of each function looks as follows.
First for a non-lexpr function called in the standard way
(topsim is the label jumped to when tail merging, it will be unique
for each function; the brackets indicate the optional code which
exists if the -p switch is given to liszt);
.(b
 .word 0x5c0 # save r6, r7, r8, r10
 [ movab mcounts,r0 # if profiling, mcounts replaced by fasl
 jsb mcount ]
 movab linker,r8 # set up r8
 movl r7,r10 # set up oldlbot
 movab n(r10),r6 # n = 4*Number of args expected.
topsim:
.)b
.pp
Now for lexprs:
.(b
 .word 0x5c0
 [ movab mcounts,r0 # if profiling. [mcounts replaced by fasl]
 jsb mcount ]
 movab linker,r8 # set up r8
 subl3 $4,r7,-(sp) # address one word below bottom of args
topsim:
 movl r6,r10 # first free addr to arg base
 subl3 r7,r6,r0 # number of args * 4 into r0
 movab 0x1400(r0),(r6)+ # stack boxed number of args
 movl 0(r10),-(sp) # also store on stack so user can't clobber
.)b
.pp
And finally for local functions:
.(b
 movl r10,-(sp) # save r10
 movab -n(r6),r10 # set up arg base using arg top
topsim:
.)b
.sh -1 Assembler file format
.pp
The liszt compiler generates a file which is in Unix assembler format.
The Unix assembler converts that file into an object file which fasl
then reads.
Although the object file generated is a standard Unix object
file (as defined in /usr/include/a.out.h), it is not of a
format that the Unix ld loader can understand.
This is because the requirements of a lisp object file are different
from an object file of other languages.
The
run time semantics of lisp and the fact that lisp data must be protected
from garbage collection are the principal differences.
The unconventional object file created by the unix assembler
is a result of the
unconventional assembler input file.
Next we will look at what must be put in the assembler file and how
it is put there.
.pp
The assembler file must contain
.ip instructions
- vax assembler code for the compiled functions.
If there aren't any functions compiled, this can be empty.
.ip literals
- lisp data which is referred to by compiled code
.ip transfer table
- the names of the functions which correspond to the calls through
the transfer table.
The instructions simply say 'call indirect through the nth transfer
table entry'.
.ip function names
- the names of the functions which are being
defined.
.ip load time forms
- in order to mimic the
.i load
function, fasl has to be able to evaluate lisp expressions at fasl
time, so we must be able to store lisp expressions in the assembler
file and indicate when they should be evaluated.
.pp
Based on the requirements above, the form of the assembler file
is as described below.
The assembler builds two segments: text and data.
We put all of our information in the text segment.
The compiler places some annotation strings in the data segment
so that the object file can be identified, however the data segment
is ignored by fasl.
The format is
.ip compiled instructions
The instructions for each compiled (non-local) function begins with
.(b
 .globl F00003
 F00003:
 .word 0x5c0
.)b
The globl declaration and the fact that the symbol name begins
with a F will tell fasl that this is the beginning of a lisp
function.
The symbols beginning with F must be unique within a file
but may be duplicated in other files.
The lisp name of the function will appear later.
Next the instructions for the function are given.
Only a small fixed set of external symbols may be referenced.
The list is found in the code for nfasl.c and the common
ones will be described below (soon).
Labels should be given a name beginning with L and should
be unique within the file.
.ip table sizes
somewhere in the file there should be a pair of 'set' assembler
pseudo ops which describe the sizes of the literal table and
transfer table.
The form is this
.(b
 .set linker_size 4
 .set trans_size 3
.)b
where linker_size is the number of entries in the literal table
which will be required and trans_size is the number of entries
in the transfer table which will be required.
Those tables will be built by fasl.
.ip binder table
- this table describes the order that the functions should be
defined and forms evaluated.
It is a table of longwords beginning at the symbol bind_org
and ending when a -1 entry is seen.
The meaning of the entries will be described below.
.ip lisp expression table
- this is a table of null terminated strings beginning at the
symbol lit_org and ending at the symbol lit_end.
Each string is read by the lisp read function (using the
raw readtable).
The first linker_size expressions are put in the literal table.
The next trans_size expressions are the names of the
functions to put in the transfer table.
The remaining expressions correspond to the entries in the
binder table.
The binder entries are processed, one by one.
Provided that the binder entry is not -1, an expression
is read from the lisp expression table.
Then if the binder table entry is 0, that expression is the name of
a lambda type lisp function.
A binary object is created, the discipline is set to lambda and
the function address is set to the lexically next function
defined in the
file.
If the binder entry is 1 then this is an nlambda function, and
if the entry is 2 then this is a macro function.
If the entry is 99 then the expression just read should be evaluated
by the lisp function eval.
.pp
The lisp compiler normally puts the assembler format output file in /tmp
and removes it when it is done.
The -S switch will tell liszt to write the
assembler file in the same place as
the source file, and with the same root name but a .s extension.
The -C file will tell the lisp compiler to comment the file as it
generates it, making it easier to understand what is going on.
Assume that the following is file foo.l:
.(b
(defun foo (x) (bar y) (baz k))
(print (foo 3))
(def test (nlambda (l) (print 'hithere) (foo 3)))
.)b
we now compile it with -SC
.(b
 .globl F00007 #(fcn lambda foo)
 F00007:
 .word 0x5c0
 movab linker,r8
 movl r7,r10
 movab 4(r10),r6
 L00008:
 movl *0(r8),(r6)+ #(calling bar)
 #(from y to stack)
 movab -4(r6),r7
 calls $0,*trantb+0
 movl r7,r6
 movl *4(r8),(r6)+ #(calling baz)
 #(from k to stack)
 movab -4(r6),r7
 calls $0,*trantb+8
 movl r7,r6
 ret
 .globl F00009 #(fcn nlambda test)
 F00009:
 .word 0x5c0
 movab linker,r8
 movl r7,r10
 movab 4(r10),r6
 L00010:
 movl 8(r8),(r6)+ #(calling print)
 #(from 'hithere to stack)
 movab -4(r6),r7
 calls $0,*trantb+16
 movl r7,r6
 movl $5132,(r6)+ #(calling foo)
 #(from (fixnum 3) to stack)
 movab -4(r6),r7
 calls $0,*trantb+24
 movl r7,r6
 ret
 bind_org:
 .set linker_size, 3
 .set trans_size, 4
 .long 0
 .long 99
 .long 1
 .long -1
 lit_org:
 .asciz "y"
 .asciz "k"
 .asciz "hithere"
 .asciz "bar"
 .asciz "baz"
 .asciz "print"
 .asciz "foo"
 .asciz "foo"
 .asciz "(print (foo 3))"
 .asciz "test"
 lit_end:
 .data # this is just for documentation
 .asciz "@(#)Compiled by Lisp Compiler 7.1 on Sun Feb 21 17:51:54 1982"
 .asciz "@(#)decl.l 1.5 2/10/82"
 .asciz "@(#)array.l 1.1 9/25/81"
 .asciz "@(#)datab.l 1.1 9/25/81"
 .asciz "@(#)expr.l 1.1 9/25/81"
 .asciz "@(#)io.l 1.1 9/25/81"
 .asciz "@(#)funa.l 1.3 2/10/82"
 .asciz "@(#)funb.l 1.2 2/10/82"
 .asciz "@(#)func.l 1.2 2/10/82"
 .asciz "@(#)tlev.l 1.4 10/24/81"
 .asciz "@(#)fixnum.l 1.6 10/21/81"
 .asciz "@(#)util.l 1.2 10/7/81"
.)b
.sh +0 functions callable from compiled lisp code
.sh +0 Object File Format
.sh +0 Fasl