1Informal Notes
2August 8th, 2001
3Chuck Liang
4
5
6This directory contains all the files for an experimental compiler
7written in Teyjus Lambda Prolog (please use most recent version -
8b31). The source language being compiled is called "bobcat", because
9it syntactically resembles the "tiger" language described in Andrew
10Appel's compiler text. Bobcat however, has a rudimentary form of C++
11style object orientation. The target language is Sparc assembly code.
12Executables are created using cc (or gcc) in its role as an assembler.
13
14 The purpose of this project is to demonstrate and study the use of
15a higher order logic programming language in compiler construction.
16Optimizing compilers take years to develop, and it's not my purpose to
17perfect a product. I want to develop a set of techniques, that
18combined with previous work by various researchers in higher order
19specification, can be used in realistic compilation. For this purpose
20I've chosen to compile a very typical, imperative style language.
21I've also decided to forgo simple abstract machines in favor of a real
22RISC architecture, namely Sparc.
23
24The get a sense of the bobcat language being compiled, look at
25"test3.bob" and "test7.bob". The generated assembly codes are in
26"test3.s" and "test7.s" respectively. Type "gcc test3.s" or "gcc
27test7.s" on a Sparc machine to generate a normal executable a.out,
28then type "./a.out" to run the program. (Use "cc" if gcc is not
29available). Using the C linker has the beneficial effect of allowing
30C functions (printf) to be called from bobcat programs. :)
31
32
33**
34Some of the code presented here slightly varies from those presented
35in the paper "Compiler Construction in Higher Order Logic Programming."
36The compiler is meant to be experimental and will undergo refinements
37and even major overhauls over time.
38**
39
40
41The structure of the modules are as follows. "bobcatgram" accumulates
42"bobabsyn" and "lambdayacc". The other modules all accumulate the one
43listed before it:
44
45
46module "bobabsyn":
47 Defines the higher order abstract syntax for the bobcat language. The
48 .mod file contains copy clauses for substitution. The file also has
49 first-order alternatives (unused) for certain constructs.
50
51 It may seem strange that this module have abs constructors of type
52 (string -> texp) -> texp. This was a somewhat arbitrary decision.
53 I could have used (texp -> texp) -> texp, but the string form
54 was just more convenient at the time, and it stuck. Teyjus allows
55 the introduction of eigenvariables of type string without any problems,
56 and I don't think it makes that much of a difference. After all, the
57 NAMES are what we want to abstract over.
58
59
60module "lambdayacc"
61 A deterministic, bottom-up parser generator. Grammar symbols have type
62 "gs". This module also contains a semi-customizable lexical scanner.
63
64
65module "bobcatgram"
66 The specification of the bobcat grammar and semantic actions of the parser.
67 The sig file contains the declarations of the grammar symbols used.
68
69
70module "bobparser"
71 The parser that's generated automatically from bobcatgram by lambdayacc.
72 bobparser produces a lambda syntax tree (type texp). To get a good idea
73 of what kind of structure is produced, parse one of the simpler bobcat
74 programs with:
75
76 [bobparser] ?- parsefile "test4.bob" A.
77
78 There is no need to regenerate the parser unless changes are made to the
79 "cfg" clause in "bobcatgram". Changes to semantic actions in "bobcatgram",
80 as long as they retain the same types, will not affect the parser.
81
82
83module "kitty"
84 Contains the definition of a continuation passing style intermediate
85 representation (type kexp) and the transformation from the lambda syntax
86 tree to CPS form. This module is probably the most interesting component
87 of the present project.
88
89 To see a sample cps representation of a program, do
90
91 [kitty] ?- parsefile "test4.bob" (program A), formcps kret A B.
92
93 Variable B will be instantiated with the cps form of the program.
94
95 Be aware that the construct "kfix" doesn't exactly define functions,
96 but is used to abstract over names for local functions as well as variable
97 and class definitions. "kfunc" is the construct that defines functions.
98 "kfix" should probably be called "klet".
99
100 Although if I were to do it over I would have cleaned up much of the
101 syntax, essentially this is correct, and relatively quite elegant
102 (especially when compared to the next module). The style of CPS I use
103 is not traditional in the sense that every function is given an extra
104 argument, the continuation function. Instead, I represent every function
105 call f(x) with (basically) a pair (f(x), K), where K is the continuation
106 function that the return value of f will be passed to. In other words,
107 (let v = f(x) in (K v)). There is literature that argues that this way
108 of doing things is better.
109
110 To be more precise, a function call f(a+b) is represented as
111 (opexp "+" a b (kabs u\ (karg u 0 \ (kcall f CN 1 K))))
112 where K is the overall continuation and CN is the class that f belongs.
113 to. CN="." indicates a global function. (in this way type information is
114 carried in my CPS representation; it's necessitated by object orientation).
115
116
117module "absparc"
118 Contains another, very low level intermediate language "Abstract Sparc,"
119 which is machine dependent, but represents Sparc machine instructions in
120 abstract syntax. For example, moving the contents of register "o1" to
121 register "l2" is represented by
122
123 movop (reg "o" 1) (reg "l" 2)
124
125 This gives me another level of flexibility to manipulate
126 the final code output. I perform last-minute optimizations on this
127 representation, such as elementating obviously redundant operations like
128 in [movop A B, movop B A,...]. The mod file contains the code generation
129 (gencode) clauses that transform the cps language to the Abstract
130 Sparc language. These clauses were by far the trickest to write,
131 largely due to the specific characteristics of the Sparc
132 architecture. In some ways CPS can be thought of as a fancy form of
133 reverse-Polish notation, and naturally implemented on a simple
134 stack-based abstract machine. But I decided to skip that and generate
135 machine-dependent code directly.
136
137 This module makes heavy use of cut (!) to model state.
138 This module can be cleaned up surely.
139
140 The most important type of this module is "store". A store can be
141 a register, a memory location ("indirect"), a constant, or a label.
142 The most important predicates are gencode and genloadop. The constructor
143 "assoces" associates expressions with stores containing them. genloadop
144 (generate-load-operand) "loads" an expression into a register,
145 emitting the necessary instructions to do so at the same time.
146
147
148module "realsparc"
149 Contains the straight-forward translation from the abstract Sparc language
150 to real Sparc assembly language instructions as strings, which are
151 then written to a file. This module also contains the toplevel predicate
152 "tigcompile", which is used as in:
153
154 [realsparc] ?- tigcompile "test3.bob" "test3.s".
155
156
157
158module "typecat" (added 8/4/2001)
159 Type checking module for the lambda-syntax tree. This module is currently
160 NOT hooked into the rest of the compiler (doesn't deal with oop yet).
161 In accumulates the bobparser module, and can be used independently as in
162
163 [typecat] ?- parsefile "test3.bob" (program A), typeexp A B.
164
165
166---
167
168List of sample bobcat program sucessfully compiled:
169
170test3.bob: contains demonstration of basic imperative programming techniques,
171 including recursive functions and loops.
172test4.bob: demonstrates mutually recursive functions
173test5.bob: demonstrates arrays (static arrays). Arrays need to be redone.
174test6.bob: demonstrates static scoping with nested lets
175test7.bob: demonstrates object orientation with two classes.
176
177
178Known limitation of bobcat:
179
180Being experimental and under development, there are many restrictions in
181what can and can't be compiled:
182
183Type checking is not implemented, though some type information is used
184during compilation. There is thus little error reportage.
185
186Functions can have only a limited number of local variables, since stack
187frames of fixed size (200 bytes) are created. This can be solved of course,
188by counting the number of local variables of a function. I regarged this
189as mundane and thus didn't implement it (at least not yet).
190
191Be aware: I have not tested functions that take more than 5 parameters: I'm
192not sure if they'll work. The Sparc architecture has only a limited number
193of registers for parameter passing; more would require using the stack.
194
195A let statement can't just define a function without defining any variables
196first (stupid I know. Sorry).
197
198A function's parameters can't be changed. This can be called a "feature"
199since it also appears in other languages (Ada, Eiffel). Functions can be
200compiled more efficiently with this restriction. Be aware however, that the
201restriction is not enforced. no error is reported: it'll just crash!
202
203A class can only contain simple variables (no arrays, other objects).
204This restrictions was actually due to not enough type information used
205during compilation. An array of objects also can not be created yet.
206
207All variables in a class need to be defined inside the "private" section,
208and all functions need to be in the "public" section. The only way to
209access an object's internal state is through the functions.
210This restriction at least, is not necessarily a bad one.
211
212