1# Compiler
2
3I developed a compiler for Easy-ISLisp (called the "FAST" project).
4I released it in August 2017.
5Here I describe the specifications, constraints, problems, etc.
6
7# Basic idea
8
9I referred to GCL (GNU Common Lisp) created by Mr. Hagiya, Mr. Yuasa and others.
10The Easy-ISLisp (EISL) compiler converts ISLisp code to equivalent C code and gcc or clang generate an object file.
11By dynamically linking this object file, EISL loads the compiled code.
12The internal definitions of functions use GCC/clang extensions.
13Therefore, the scope of labels and flet syntax is limited.
14
15# Usage
16EISL starts compiling with (compile-file filename).
17The filename is given as a string. The compile-file function creates a C file based on a Lisp file.
18It also launches GCC/clang and creates a shared object.
19EISL loads this with the load function like a normal text Lisp file.
20If the file extension is ".o", the load function handles dynamic linking.
21
22# Immediate value of small integer
23The interpreter of EISL created cell objects for everything, including small integers.
24For this reason, GC started soon and it was difficult to speed up.
25The compiler instantiates integers from -999999999 to 999999999 to prevent cell consumption.
26As a result, the calculation of the Takeuchi function and the Fibonacci sequence can be considerably speeded up.
27
28```
29> (load "tarai.o")
30T
31> (time (tarai 12 6 0))
32Elapsed Time(second)=0.068000
33<undef>
34> (time (fib 40))
35Elapsed Time(second)=1.194000
36<undef>
37>
38```
39
40# Tail recursive optimization
41The ISLisp standard does not require tail-recursive optimization.
42However, like Scheme, EISL generates code that replaces tail recursion with repetition.
43It doesn't consume the stack in following code, and it's reasonably fast.
44
45```
46(defun foo (n)
47  (if (= n 0)
48      t
49      (foo (- n 1))))
50
51> (load "test.o")
52T
53> (foo 1000000000)
54T
55> (time (foo 1000000000))
56Elapsed Time(second)=1.789000
57<undef>
58>
59```
60
61# Extensions for the compiler
62
63The following non-standard functions have been implemented for use by the compiler:
64
65| Function                  | Description |
66| ------------------------- | ----------- |
67| (subrp x)                 | t if x is a built-in function, nil otherwise                                                             |
68| (macrop x)                | t if x is a macro, nil otherwise                                                                         |
69| (system str)              | Str Executes the string on the OS. This is for starting GCC.                                             |
70| (freedll)                 | Unlinks the last dynamically linked file.                                                                |
71| (fixnump x)               | t for small integers, nil otherwise                                                                      |
72| (longnump x)              | t for LONGNUM, nil otherwise |
73| (bignump x)               | t for BIGNUM, nil otherwise |
74| (readed-array-list x)     | Convert an array of constants like # 2a((1 2) (3 4)) to a list ((1 2) (3 4)). |
75| (ignore-toplevel-check x) | Passing t as an argument removes top-level checks such as defclass, and passing nil restores and checks. |
76| (self-introduction)       | Returns the symbol depending the kind of OS, because the compiler changes its behavior depending on the type of OS. |
77| (macroexpand-1 x)         | Expand macro once. |
78| (macrexpand-all x)        | Expand macro all. |
79| (get-method x)            | Get all methods of the generic function with name x. |
80| (get-method-body x)       | Get the entity of method x. |
81| (get-method-priority x)   | Gets the priority of method x. This is an integer value and one of AROUND=11, BEFORE=12, PRIORITY=13, AFTER=14. |
82
83These are used to compile generic functions.
84The interpreter evaluates the generic function once organizes and saves all methods,
85and compiler extracts methods and converts it to SUBR.
86
87```lisp
88(format stream string)
89```
90has extended behavior when there are two single quotes in a string.
91Special character controls such as ~% between consecutive single quotes are ignored.
92Also, two consecutive single quotes are converted to one double quote.
93This is useful for converting Lisp code to C code by pooling into a string stream and outputting to a file.
94
95```
96> (format (standard-output) "''hello~% world~A ''")
97"hello~% world~A "NIL
98>
99```
100
101# Constraints of labels and flet
102In C language, it is not possible to define a local function inside a function.
103GCC makes this possible with extensions. I used this extension to gain speed while ensuring static scope operation.
104However, the scope is different from that of Lisp.
105Note that clang does not support this feature at all.
106
107# Implementation of lambda
108`lambda` is supposed to define an anonymous function, but C functions must have a name.
109The name is the file name supplied to compile-file + a natural number.
110Free variables must be retained for static scope.
111This was inspired by the GCL method.
112A list of free variables that must be retained is linked to the function name symbol of lambda.
113Access from the lambda body, is done with the `nth` function.
114
115# Constraints of lambda
116You can nest lambda expressions up to three deep.
117Any more than that will result in an error.
118
119Below is the quoted code from M.Hiroi's page.
120This is correct code that conforms to the ISLisp specification.
121
122```lisp
123(defun id-search (start goal)
124  (labels ((dfs (limit path)
125             (if (= limit (length path))
126                 (if (eq (car path) goal)
127                     (print (reverse path)))
128               (for-each
129                (lambda (x)
130                  (if (not (member x path))
131                      (dfs limit (cons x path))))
132                (cdr (assoc (car path) adjacent))))))
133    (for ((limit 1 (+ limit 1)))
134         ((= limit 7))
135         (format (standard-output) "----- ~D -----~%" limit)
136         (dfs limit (list start)))))
137```
138
139The EISL compiler cannot compile this code.
140`lambda` generates a C function with a name at the top level.
141The locally defined function `dfs` function is called from the lambda.
142The `dfs` function generated as a C local definition function cannot be referenced from the C function corresponding to the generated lambda.
143
144Therefore, in such a case, please write without using labels as shown below.
145
146```lisp
147(defun id-search (start goal)
148  (for ((limit 1 (+ limit 1)))
149       ((= limit 7))
150       (format (standard-output) "----- ~D -----~%" limit)
151       (dfs limit (list start))))
152
153(defun dfs (limit path)
154  (if (= limit (length path))
155      (if (eq (car path) goal)
156          (print (reverse path)))
157      (for-each
158        (lambda (x)
159          (if (not (member x path))
160              (dfs limit (cons x path))))
161        (cdr (assoc (car path) adjacent)))))
162```
163
164# Constraint on generic functions
165### This constraint has been removed from ver2.0.
166
167Below is the quoted code from M. Hiroi's page.
168This is correct code that conforms to the ISLisp specification.
169
170```lisp
171(defgeneric hash-func (k))
172
173(defmethod hash-func ((s <string>))
174  (for ((i 0 (+ i 1))
175        (a 0))
176       ((>= i (length s)) a)
177       (setq a (+ (* a 8) (convert (elt s i) <integer>)))))
178```
179
180In the method definition, the argument name must work correctly even if you give it a different name in the generic function.
181The interpreter evaluates this correctly.
182But the compiler has the restriction that the argument names must be the same for simplicity.
183I think it can be done by α conversion, but I decided to take a short-cut. It should be written as follows.
184
185```lisp
186(defgeneric hash-func (k))
187
188(defmethod hash-func ((k <string>))
189  (for ((i 0 (+ i 1))
190        (a 0))
191       ((>= i (length k)) a)
192       (setq a (+ (* a 8) (convert (elt k i) <integer>))
193
194```
195
196# Some More Limitations
197
198There are a few tickets that were logged for compiler limitations, but the outcome was that it was better to have such a limitation and a simpler system.
199In particular:
200
201* Issue #121. Memory consumption is significant, but modern computers can afford it. This was trying to figure out what the lower limit was. From cursory testing, a heap size of 5M (rather than the default 20M) cells was enough to run the benchmark suite at least. This is configured by `CELLSIZE` in *ffi.h*. However, reducing this could lead to weird failures. Using the "DEBUG=1" build might help debug these quicker.
202* Issue #123. The compiler is not able to handle some of subtleties of generic functions. It may be simpler to rely on functional interfaces to compiled code and have a Lispier interface in an interpreted file (e.g. the split between *ndbm.lsp* and *persist.lsp*).
203
204None of this should take away from the compiler's usefulness.
205For some applications, it may even be more useful to gain access to C libraries rather than any speed-up.
206
207# Calling C from Lisp
208
209This is only possible in compiled code.
210EISL extends the ISLisp standard to allow insertion of C statements into the compiler output.
211In this way you can use the many libraries that present a C interface.
212Below is a simple sample.
213
214```
215(c-include "<stdio.h>")
216
217(defun 1+ (n)
218  (c-lang "res=N+1"))
219
220> (compile-file "test.lsp")
221initialize
222pass1
223pass2
224compiling 1+
225finalize
226invoke GCC
227T
228> (load "test.o")
229T
230> (1+ 3)
2314
232>
233```
234
235In this way, the description in C can be mixed.
236You can use this to call the Wiring PI of Raspberry Pi or write a function that calls socket as a Lisp function.
237It does not rely on CFFI, so it can be easily linked to C.
238
239The prepared functions are as follows.
240
241| Function       | Description                                            |
242| -------------- | ------------------------------------------------------ |
243| (c-include x [platform])  | Insert #include. e.g. (c-include "stdio.h"). Optional `platform` only inserts for a particular platform. |
244| (c-define x y) | Insert #define. e.g. (c-define "MAXINT" "999999999")   |
245| (c-lang x)     | Insert a c language source. e.g. (c-lang "a = a + 1;") |
246| (c-option x [platform])   | Add a compile option. e.g. (c-option "-lwinmm"). Optional `platform` only adds for a particular platform. |
247
248`platform` above is an unquoted bareword linux, macos or openbsd.
249It does not have be quoted.
250
251These functions are ignored by the interpreter.
252
253# Further explanation
254
255* All variable names are converted to uppercase. Therefore, the n and m variables are N and M in C code.
256* Small integers are immediate values for efficiency.
257  Setting the second from most significant bit internally tags a value as a small integer.
258  In C code we must remove this bit using bitwise `& INT_MASK`.
259  When the C code is completed, you can return an immediate value
260  using bitwise `| INT_FLAG` which sets the second from most significant bit to 1.
261  The values INT_MASK and INT_FLAG are defined in fast.h.
262* The return value of an S-expression is a variable "res" in C.
263  A value should be assigned to res.
264
265e.g.
266
267```lisp
268(c-include "<stdio.h>")
269
270(defun ash (n m)
271  (if (>= m 0)
272      (c-lang "res = INT_FLAG | ((INT_MASK & N) << (INT_MASK & M));")
273      (let ((m1 (- m)))
274        (c-lang "res = INT_FLAG | ((INT_MASK & N) >> (INT_MASK & M1));"))))
275```
276
277# Type inference
278
279The EISL compiler has a type inferencer. This was introduced experimentally in ver0.85.
280
281# Examples
282
283### ex1
284
285In this example, the function foo gives a number to length, which causes an error at runtime.
286This is checked by type inference and a warning message is issued.
287The code is generated because it can be compiled.
288
289```
290(defun foo (x)
291  (length (+ x 1)))
292
293> (compile-file "tarai.lsp" 'type)
294initialize
295pass1
296inferencing FOO
297("subr type mismatch" ((LENGTH (+ X 1))))
298pass2
299compiling FOO
300finalize
301invoke GCC
302T
303>
304```
305
306### ex2
307Takeuchi function code.
308Even if you use type inference, it  can only infer that the argument and return value are integers. This is a limitation of the type inferencer.
309Since the Takeuchi function requires a huge amount of recursive calculation
310it can be calculated in a  practical time only when the argument is a small integer.
311Therefore, you can generate efficient code by telling the compiler that compiler should generate code for small integers and not consider that it will become a BIGNUM.
312"the" syntax has no effect on the interpreter, but the compiler executes type inferences based on this additional data.
313
314```
315(defun tarai(x y z)
316  (the <fixnum> x)(the <fixnum> y)(the <fixnum> z)
317  (if (<= x y)
318      y
319      (tarai (tarai (- x 1) y z)
320             (tarai (- y 1) z x)
321             (tarai (- z 1) x y))))
322
323> (compile-file "tarai.lsp" 'type)
324initialize
325pass1
326inferencing TARAI
327pass2
328compiling TARAI
329finalize
330invoke GCC
331T
332>
333
334```
335
336### ex3
337The Ackermann function also requires a huge amount of recursive calculations and produces large numbers.
338However, I think that the limit of calculation on a personal computer within practical time is about ack(4, 1).
339In this case, the calculation is possible within the small integer range.
340Specifying this is also possible by adding type information using "the" syntax.
341
342```lisp
343(defun ack (m n)
344  (the <fixnum> m)(the <fixnum> n)
345  (cond ((= m 0)(+ n 1))
346        ((= n 0)(ack (- m 1) 1))
347        (t (ack (- m 1) (ack m (- n 1))))))
348
349```
350
351### ex4
352The Fibonacci number is an integer, but here is an example of calculating with a floating point number.
353The type inferencer predicts that the arguments and return values are floating point numbers from constants such as 1.0.
354The compiler produces code that is specific to floating point numbers.
355No additional information about types is needed in this case.
356
357```lisp
358(defun fib* (n)
359  (cond ((= n 1.0) 1.0)
360        ((= n 2.0) 1.0)
361        (t (+ (fib* (- n 1.0)) (fib* (- n 2.0))))))
362```
363
364# Benchmark
365
366Performance was compared to SBCL, which is a popular Common Lisp compiler.
367SBCL processes type declarations to speed it up.
368
369### ex1
370Takeuchi function
371
372#### SBCL
373
374```
375(declaim (ftype (function (fixnum fixnum fixnum) fixnum) tarai))
376(defun tarai(x y z)
377  (declare (optimize (speed 3) (debug 0) (safety 0))
378    (type fixnum x)(type fixnum y)(type fixnum z))
379  (if (<= x y)
380      y
381      (tarai (tarai (- x 1) y z)
382             (tarai (- y 1) z x)
383             (tarai (- z 1) x y))))
384
385* (time (tarai 12 6 0))
386
387Evaluation took:
388  0.028 seconds of real time
389  0.031250 seconds of total run time (0.031250 user, 0.000000 system)
390  110.71% CPU
391  79,247,451 processor cycles
392  0 bytes consed
393
39412
395```
396
397#### EISL
398
399```
400> (load "tarai.o")
401T
402> (tarai 12 6 0)
40312
404> (time (tarai 12 6 0))
405Elapsed Time(second)=0.019982
406<undef>
407>
408```
409
410### ex2
411Ackermann function
412
413#### SBCL
414
415```
416(declaim (ftype (function (fixnum fixnum) fixnum) ack))
417(defun ack (m n)
418  (declare (optimize (speed 3) (debug 0) (safety 0))
419    (type fixnum m)(type fixnum n))
420  (cond ((= m 0)(+ n 1))
421        ((= n 0)(ack (- m 1) 1))
422        (t (ack (- m 1) (ack m (- n 1))))))
423
424
425* (time (ack 4 1))
426
427Evaluation took:
428  4.558 seconds of real time
429  4.531250 seconds of total run time (4.531250 user, 0.000000 system)
430  99.41% CPU
431  14,552,407,668 processor cycles
432  0 bytes consed
433
43465533
435*
436```
437
438#### EISL
439
440```
441> (load "tarai.o")
442T
443> (ack 4 1)
44465533
445> (time (ack 4 1))
446Elapsed Time(second)=2.382881
447<undef>
448>
449```
450
451### ex3
452Fibonacci function (float number)
453
454#### SBCL
455
456```
457(declaim (ftype (function (float) float) fib*))
458(defun fib* (n)
459  (declare (optimize (speed 3) (debug 0) (safety 0)) (type float n))
460  (if (< n 2.0)
461    n
462    (+ (fib* (- n 2.0)) (fib* (- n 1.0)))))
463
464* (time (fib* 40.0))
465
466Evaluation took:
467  6.479 seconds of real time
468  6.468750 seconds of total run time (6.406250 user, 0.062500 system)
469  [ Run times consist of 0.156 seconds GC time, and 6.313 seconds non-GC time. ]
470  99.85% CPU
471  20,682,846,107 processor cycles
472  3,973,926,912 bytes consed
473
4741.0233415e8
475*
476```
477
478#### EISL
479
480```
481> (fib* 40.0)
482102334155.0
483> (time (fib* 40.0))
484Elapsed Time(second)=0.479320
485<undef>
486>
487```
488