1# Compiler 2 3I developed a compiler for Easy-ISLisp (called the "FAST" project). 4I released it in August 2017. 5Here I describe the specifications, constraints, problems, etc. 6 7# Basic idea 8 9I referred to GCL (GNU Common Lisp) created by Mr. Hagiya, Mr. Yuasa and others. 10The Easy-ISLisp (EISL) compiler converts ISLisp code to equivalent C code and gcc or clang generate an object file. 11By dynamically linking this object file, EISL loads the compiled code. 12The internal definitions of functions use GCC/clang extensions. 13Therefore, the scope of labels and flet syntax is limited. 14 15# Usage 16EISL starts compiling with (compile-file filename). 17The filename is given as a string. The compile-file function creates a C file based on a Lisp file. 18It also launches GCC/clang and creates a shared object. 19EISL loads this with the load function like a normal text Lisp file. 20If the file extension is ".o", the load function handles dynamic linking. 21 22# Immediate value of small integer 23The interpreter of EISL created cell objects for everything, including small integers. 24For this reason, GC started soon and it was difficult to speed up. 25The compiler instantiates integers from -999999999 to 999999999 to prevent cell consumption. 26As a result, the calculation of the Takeuchi function and the Fibonacci sequence can be considerably speeded up. 27 28``` 29> (load "tarai.o") 30T 31> (time (tarai 12 6 0)) 32Elapsed Time(second)=0.068000 33<undef> 34> (time (fib 40)) 35Elapsed Time(second)=1.194000 36<undef> 37> 38``` 39 40# Tail recursive optimization 41The ISLisp standard does not require tail-recursive optimization. 42However, like Scheme, EISL generates code that replaces tail recursion with repetition. 43It doesn't consume the stack in following code, and it's reasonably fast. 44 45``` 46(defun foo (n) 47 (if (= n 0) 48 t 49 (foo (- n 1)))) 50 51> (load "test.o") 52T 53> (foo 1000000000) 54T 55> (time (foo 1000000000)) 56Elapsed Time(second)=1.789000 57<undef> 58> 59``` 60 61# Extensions for the compiler 62 63The following non-standard functions have been implemented for use by the compiler: 64 65| Function | Description | 66| ------------------------- | ----------- | 67| (subrp x) | t if x is a built-in function, nil otherwise | 68| (macrop x) | t if x is a macro, nil otherwise | 69| (system str) | Str Executes the string on the OS. This is for starting GCC. | 70| (freedll) | Unlinks the last dynamically linked file. | 71| (fixnump x) | t for small integers, nil otherwise | 72| (longnump x) | t for LONGNUM, nil otherwise | 73| (bignump x) | t for BIGNUM, nil otherwise | 74| (readed-array-list x) | Convert an array of constants like # 2a((1 2) (3 4)) to a list ((1 2) (3 4)). | 75| (ignore-toplevel-check x) | Passing t as an argument removes top-level checks such as defclass, and passing nil restores and checks. | 76| (self-introduction) | Returns the symbol depending the kind of OS, because the compiler changes its behavior depending on the type of OS. | 77| (macroexpand-1 x) | Expand macro once. | 78| (macrexpand-all x) | Expand macro all. | 79| (get-method x) | Get all methods of the generic function with name x. | 80| (get-method-body x) | Get the entity of method x. | 81| (get-method-priority x) | Gets the priority of method x. This is an integer value and one of AROUND=11, BEFORE=12, PRIORITY=13, AFTER=14. | 82 83These are used to compile generic functions. 84The interpreter evaluates the generic function once organizes and saves all methods, 85and compiler extracts methods and converts it to SUBR. 86 87```lisp 88(format stream string) 89``` 90has extended behavior when there are two single quotes in a string. 91Special character controls such as ~% between consecutive single quotes are ignored. 92Also, two consecutive single quotes are converted to one double quote. 93This is useful for converting Lisp code to C code by pooling into a string stream and outputting to a file. 94 95``` 96> (format (standard-output) "''hello~% world~A ''") 97"hello~% world~A "NIL 98> 99``` 100 101# Constraints of labels and flet 102In C language, it is not possible to define a local function inside a function. 103GCC makes this possible with extensions. I used this extension to gain speed while ensuring static scope operation. 104However, the scope is different from that of Lisp. 105Note that clang does not support this feature at all. 106 107# Implementation of lambda 108`lambda` is supposed to define an anonymous function, but C functions must have a name. 109The name is the file name supplied to compile-file + a natural number. 110Free variables must be retained for static scope. 111This was inspired by the GCL method. 112A list of free variables that must be retained is linked to the function name symbol of lambda. 113Access from the lambda body, is done with the `nth` function. 114 115# Constraints of lambda 116You can nest lambda expressions up to three deep. 117Any more than that will result in an error. 118 119Below is the quoted code from M.Hiroi's page. 120This is correct code that conforms to the ISLisp specification. 121 122```lisp 123(defun id-search (start goal) 124 (labels ((dfs (limit path) 125 (if (= limit (length path)) 126 (if (eq (car path) goal) 127 (print (reverse path))) 128 (for-each 129 (lambda (x) 130 (if (not (member x path)) 131 (dfs limit (cons x path)))) 132 (cdr (assoc (car path) adjacent)))))) 133 (for ((limit 1 (+ limit 1))) 134 ((= limit 7)) 135 (format (standard-output) "----- ~D -----~%" limit) 136 (dfs limit (list start))))) 137``` 138 139The EISL compiler cannot compile this code. 140`lambda` generates a C function with a name at the top level. 141The locally defined function `dfs` function is called from the lambda. 142The `dfs` function generated as a C local definition function cannot be referenced from the C function corresponding to the generated lambda. 143 144Therefore, in such a case, please write without using labels as shown below. 145 146```lisp 147(defun id-search (start goal) 148 (for ((limit 1 (+ limit 1))) 149 ((= limit 7)) 150 (format (standard-output) "----- ~D -----~%" limit) 151 (dfs limit (list start)))) 152 153(defun dfs (limit path) 154 (if (= limit (length path)) 155 (if (eq (car path) goal) 156 (print (reverse path))) 157 (for-each 158 (lambda (x) 159 (if (not (member x path)) 160 (dfs limit (cons x path)))) 161 (cdr (assoc (car path) adjacent))))) 162``` 163 164# Constraint on generic functions 165### This constraint has been removed from ver2.0. 166 167Below is the quoted code from M. Hiroi's page. 168This is correct code that conforms to the ISLisp specification. 169 170```lisp 171(defgeneric hash-func (k)) 172 173(defmethod hash-func ((s <string>)) 174 (for ((i 0 (+ i 1)) 175 (a 0)) 176 ((>= i (length s)) a) 177 (setq a (+ (* a 8) (convert (elt s i) <integer>))))) 178``` 179 180In the method definition, the argument name must work correctly even if you give it a different name in the generic function. 181The interpreter evaluates this correctly. 182But the compiler has the restriction that the argument names must be the same for simplicity. 183I think it can be done by α conversion, but I decided to take a short-cut. It should be written as follows. 184 185```lisp 186(defgeneric hash-func (k)) 187 188(defmethod hash-func ((k <string>)) 189 (for ((i 0 (+ i 1)) 190 (a 0)) 191 ((>= i (length k)) a) 192 (setq a (+ (* a 8) (convert (elt k i) <integer>)) 193 194``` 195 196# Some More Limitations 197 198There are a few tickets that were logged for compiler limitations, but the outcome was that it was better to have such a limitation and a simpler system. 199In particular: 200 201* Issue #121. Memory consumption is significant, but modern computers can afford it. This was trying to figure out what the lower limit was. From cursory testing, a heap size of 5M (rather than the default 20M) cells was enough to run the benchmark suite at least. This is configured by `CELLSIZE` in *ffi.h*. However, reducing this could lead to weird failures. Using the "DEBUG=1" build might help debug these quicker. 202* Issue #123. The compiler is not able to handle some of subtleties of generic functions. It may be simpler to rely on functional interfaces to compiled code and have a Lispier interface in an interpreted file (e.g. the split between *ndbm.lsp* and *persist.lsp*). 203 204None of this should take away from the compiler's usefulness. 205For some applications, it may even be more useful to gain access to C libraries rather than any speed-up. 206 207# Calling C from Lisp 208 209This is only possible in compiled code. 210EISL extends the ISLisp standard to allow insertion of C statements into the compiler output. 211In this way you can use the many libraries that present a C interface. 212Below is a simple sample. 213 214``` 215(c-include "<stdio.h>") 216 217(defun 1+ (n) 218 (c-lang "res=N+1")) 219 220> (compile-file "test.lsp") 221initialize 222pass1 223pass2 224compiling 1+ 225finalize 226invoke GCC 227T 228> (load "test.o") 229T 230> (1+ 3) 2314 232> 233``` 234 235In this way, the description in C can be mixed. 236You can use this to call the Wiring PI of Raspberry Pi or write a function that calls socket as a Lisp function. 237It does not rely on CFFI, so it can be easily linked to C. 238 239The prepared functions are as follows. 240 241| Function | Description | 242| -------------- | ------------------------------------------------------ | 243| (c-include x [platform]) | Insert #include. e.g. (c-include "stdio.h"). Optional `platform` only inserts for a particular platform. | 244| (c-define x y) | Insert #define. e.g. (c-define "MAXINT" "999999999") | 245| (c-lang x) | Insert a c language source. e.g. (c-lang "a = a + 1;") | 246| (c-option x [platform]) | Add a compile option. e.g. (c-option "-lwinmm"). Optional `platform` only adds for a particular platform. | 247 248`platform` above is an unquoted bareword linux, macos or openbsd. 249It does not have be quoted. 250 251These functions are ignored by the interpreter. 252 253# Further explanation 254 255* All variable names are converted to uppercase. Therefore, the n and m variables are N and M in C code. 256* Small integers are immediate values for efficiency. 257 Setting the second from most significant bit internally tags a value as a small integer. 258 In C code we must remove this bit using bitwise `& INT_MASK`. 259 When the C code is completed, you can return an immediate value 260 using bitwise `| INT_FLAG` which sets the second from most significant bit to 1. 261 The values INT_MASK and INT_FLAG are defined in fast.h. 262* The return value of an S-expression is a variable "res" in C. 263 A value should be assigned to res. 264 265e.g. 266 267```lisp 268(c-include "<stdio.h>") 269 270(defun ash (n m) 271 (if (>= m 0) 272 (c-lang "res = INT_FLAG | ((INT_MASK & N) << (INT_MASK & M));") 273 (let ((m1 (- m))) 274 (c-lang "res = INT_FLAG | ((INT_MASK & N) >> (INT_MASK & M1));")))) 275``` 276 277# Type inference 278 279The EISL compiler has a type inferencer. This was introduced experimentally in ver0.85. 280 281# Examples 282 283### ex1 284 285In this example, the function foo gives a number to length, which causes an error at runtime. 286This is checked by type inference and a warning message is issued. 287The code is generated because it can be compiled. 288 289``` 290(defun foo (x) 291 (length (+ x 1))) 292 293> (compile-file "tarai.lsp" 'type) 294initialize 295pass1 296inferencing FOO 297("subr type mismatch" ((LENGTH (+ X 1)))) 298pass2 299compiling FOO 300finalize 301invoke GCC 302T 303> 304``` 305 306### ex2 307Takeuchi function code. 308Even if you use type inference, it can only infer that the argument and return value are integers. This is a limitation of the type inferencer. 309Since the Takeuchi function requires a huge amount of recursive calculation 310it can be calculated in a practical time only when the argument is a small integer. 311Therefore, you can generate efficient code by telling the compiler that compiler should generate code for small integers and not consider that it will become a BIGNUM. 312"the" syntax has no effect on the interpreter, but the compiler executes type inferences based on this additional data. 313 314``` 315(defun tarai(x y z) 316 (the <fixnum> x)(the <fixnum> y)(the <fixnum> z) 317 (if (<= x y) 318 y 319 (tarai (tarai (- x 1) y z) 320 (tarai (- y 1) z x) 321 (tarai (- z 1) x y)))) 322 323> (compile-file "tarai.lsp" 'type) 324initialize 325pass1 326inferencing TARAI 327pass2 328compiling TARAI 329finalize 330invoke GCC 331T 332> 333 334``` 335 336### ex3 337The Ackermann function also requires a huge amount of recursive calculations and produces large numbers. 338However, I think that the limit of calculation on a personal computer within practical time is about ack(4, 1). 339In this case, the calculation is possible within the small integer range. 340Specifying this is also possible by adding type information using "the" syntax. 341 342```lisp 343(defun ack (m n) 344 (the <fixnum> m)(the <fixnum> n) 345 (cond ((= m 0)(+ n 1)) 346 ((= n 0)(ack (- m 1) 1)) 347 (t (ack (- m 1) (ack m (- n 1)))))) 348 349``` 350 351### ex4 352The Fibonacci number is an integer, but here is an example of calculating with a floating point number. 353The type inferencer predicts that the arguments and return values are floating point numbers from constants such as 1.0. 354The compiler produces code that is specific to floating point numbers. 355No additional information about types is needed in this case. 356 357```lisp 358(defun fib* (n) 359 (cond ((= n 1.0) 1.0) 360 ((= n 2.0) 1.0) 361 (t (+ (fib* (- n 1.0)) (fib* (- n 2.0)))))) 362``` 363 364# Benchmark 365 366Performance was compared to SBCL, which is a popular Common Lisp compiler. 367SBCL processes type declarations to speed it up. 368 369### ex1 370Takeuchi function 371 372#### SBCL 373 374``` 375(declaim (ftype (function (fixnum fixnum fixnum) fixnum) tarai)) 376(defun tarai(x y z) 377 (declare (optimize (speed 3) (debug 0) (safety 0)) 378 (type fixnum x)(type fixnum y)(type fixnum z)) 379 (if (<= x y) 380 y 381 (tarai (tarai (- x 1) y z) 382 (tarai (- y 1) z x) 383 (tarai (- z 1) x y)))) 384 385* (time (tarai 12 6 0)) 386 387Evaluation took: 388 0.028 seconds of real time 389 0.031250 seconds of total run time (0.031250 user, 0.000000 system) 390 110.71% CPU 391 79,247,451 processor cycles 392 0 bytes consed 393 39412 395``` 396 397#### EISL 398 399``` 400> (load "tarai.o") 401T 402> (tarai 12 6 0) 40312 404> (time (tarai 12 6 0)) 405Elapsed Time(second)=0.019982 406<undef> 407> 408``` 409 410### ex2 411Ackermann function 412 413#### SBCL 414 415``` 416(declaim (ftype (function (fixnum fixnum) fixnum) ack)) 417(defun ack (m n) 418 (declare (optimize (speed 3) (debug 0) (safety 0)) 419 (type fixnum m)(type fixnum n)) 420 (cond ((= m 0)(+ n 1)) 421 ((= n 0)(ack (- m 1) 1)) 422 (t (ack (- m 1) (ack m (- n 1)))))) 423 424 425* (time (ack 4 1)) 426 427Evaluation took: 428 4.558 seconds of real time 429 4.531250 seconds of total run time (4.531250 user, 0.000000 system) 430 99.41% CPU 431 14,552,407,668 processor cycles 432 0 bytes consed 433 43465533 435* 436``` 437 438#### EISL 439 440``` 441> (load "tarai.o") 442T 443> (ack 4 1) 44465533 445> (time (ack 4 1)) 446Elapsed Time(second)=2.382881 447<undef> 448> 449``` 450 451### ex3 452Fibonacci function (float number) 453 454#### SBCL 455 456``` 457(declaim (ftype (function (float) float) fib*)) 458(defun fib* (n) 459 (declare (optimize (speed 3) (debug 0) (safety 0)) (type float n)) 460 (if (< n 2.0) 461 n 462 (+ (fib* (- n 2.0)) (fib* (- n 1.0))))) 463 464* (time (fib* 40.0)) 465 466Evaluation took: 467 6.479 seconds of real time 468 6.468750 seconds of total run time (6.406250 user, 0.062500 system) 469 [ Run times consist of 0.156 seconds GC time, and 6.313 seconds non-GC time. ] 470 99.85% CPU 471 20,682,846,107 processor cycles 472 3,973,926,912 bytes consed 473 4741.0233415e8 475* 476``` 477 478#### EISL 479 480``` 481> (fib* 40.0) 482102334155.0 483> (time (fib* 40.0)) 484Elapsed Time(second)=0.479320 485<undef> 486> 487``` 488