doc/ref/guile.info-8

This is guile.info, produced by makeinfo version 6.7 from guile.texi.

This manual documents Guile version 3.0.7.

   Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software
Foundation, Inc.

   Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3 or
any later version published by the Free Software Foundation; with no
Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.  A
copy of the license is included in the section entitled “GNU Free
Documentation License.”
INFO-DIR-SECTION The Algorithmic Language Scheme
START-INFO-DIR-ENTRY
* Guile Reference: (guile).     The Guile reference manual.
END-INFO-DIR-ENTRY


File: guile.info,  Node: Copyright Notice,  Next: Class Definition,  Up: GOOPS

8.1 Copyright Notice
====================

The material in this chapter is partly derived from the STk Reference
Manual written by Erick Gallesio, whose copyright notice is as follows.

   Copyright © 1993-1999 Erick Gallesio - I3S-CNRS/ESSI <eg@unice.fr>
Permission to use, copy, modify, distribute,and license this software
and its documentation for any purpose is hereby granted, provided that
existing copyright notices are retained in all copies and that this
notice is included verbatim in any distributions.  No written agreement,
license, or royalty fee is required for any of the authorized uses.
This software is provided “AS IS” without express or implied warranty.

   The material has been adapted for use in Guile, with the author’s
permission.


File: guile.info,  Node: Class Definition,  Next: Instance Creation,  Prev: Copyright Notice,  Up: GOOPS

8.2 Class Definition
====================

A new class is defined with the ‘define-class’ syntax:

     (define-class CLASS (SUPERCLASS ...)
        SLOT-DESCRIPTION ...
        CLASS-OPTION ...)

   CLASS is the class being defined.  The list of SUPERCLASSes specifies
which existing classes, if any, to inherit slots and properties from.
“Slots” hold per-instance(1) data, for instances of that class — like
“fields” or “member variables” in other object oriented systems.  Each
SLOT-DESCRIPTION gives the name of a slot and optionally some
“properties” of this slot; for example its initial value, the name of a
function which will access its value, and so on.  Class options, slot
descriptions and inheritance are discussed more below.

 -- syntax: define-class name (super ...) slot-definition ...
          class-option ...
     Define a class called NAME that inherits from SUPERs, with direct
     slots defined by SLOT-DEFINITIONs and CLASS-OPTIONs.  The newly
     created class is bound to the variable name NAME in the current
     environment.

     Each SLOT-DEFINITION is either a symbol that names the slot or a
     list,

          (SLOT-NAME-SYMBOL . SLOT-OPTIONS)

     where SLOT-NAME-SYMBOL is a symbol and SLOT-OPTIONS is a list with
     an even number of elements.  The even-numbered elements of
     SLOT-OPTIONS (counting from zero) are slot option keywords; the
     odd-numbered elements are the corresponding values for those
     keywords.

     Each CLASS-OPTION is an option keyword and corresponding value.

   As an example, let us define a type for representing a complex number
in terms of two real numbers.(2)  This can be done with the following
class definition:

     (define-class <my-complex> (<number>)
        r i)

   This binds the variable ‘<my-complex>’ to a new class whose instances
will contain two slots.  These slots are called ‘r’ and ‘i’ and will
hold the real and imaginary parts of a complex number.  Note that this
class inherits from ‘<number>’, which is a predefined class.(3)

   Slot options are described in the next section.  The possible class
options are as follows.

 -- class option: #:metaclass metaclass
     The ‘#:metaclass’ class option specifies the metaclass of the class
     being defined.  METACLASS must be a class that inherits from
     ‘<class>’.  For the use of metaclasses, see *note Metaobjects and
     the Metaobject Protocol:: and *note Metaclasses::.

     If the ‘#:metaclass’ option is absent, GOOPS reuses or constructs a
     metaclass for the new class by calling ‘ensure-metaclass’ (*note
     ensure-metaclass: Class Definition Protocol.).

 -- class option: #:name name
     The ‘#:name’ class option specifies the new class’s name.  This
     name is used to identify the class whenever related objects - the
     class itself, its instances and its subclasses - are printed.

     If the ‘#:name’ option is absent, GOOPS uses the first argument to
     ‘define-class’ as the class name.

   ---------- Footnotes ----------

   (1) Usually — but see also the ‘#:allocation’ slot option.

   (2) Of course Guile already provides complex numbers, and ‘<complex>’
is in fact a predefined class in GOOPS; but the definition here is still
useful as an example.

   (3) ‘<number>’ is the direct superclass of the predefined class
‘<complex>’; ‘<complex>’ is the superclass of ‘<real>’, and ‘<real>’ is
the superclass of ‘<integer>’.


File: guile.info,  Node: Instance Creation,  Next: Slot Options,  Prev: Class Definition,  Up: GOOPS

8.3 Instance Creation and Slot Access
=====================================

An instance (or object) of a defined class can be created with ‘make’.
‘make’ takes one mandatory parameter, which is the class of the instance
to create, and a list of optional arguments that will be used to
initialize the slots of the new instance.  For instance the following
form

     (define c (make <my-complex>))

creates a new ‘<my-complex>’ object and binds it to the Scheme variable
‘c’.

 -- generic: make
 -- method: make (class <class>) initarg ...
     Create and return a new instance of class CLASS, initialized using
     INITARG ....

     In theory, INITARG ... can have any structure that is understood by
     whatever methods get applied when the ‘initialize’ generic function
     is applied to the newly allocated instance.

     In practice, specialized ‘initialize’ methods would normally call
     ‘(next-method)’, and so eventually the standard GOOPS ‘initialize’
     methods are applied.  These methods expect INITARGS to be a list
     with an even number of elements, where even-numbered elements
     (counting from zero) are keywords and odd-numbered elements are the
     corresponding values.

     GOOPS processes initialization argument keywords automatically for
     slots whose definition includes the ‘#:init-keyword’ option (*note
     init-keyword: Slot Options.).  Other keyword value pairs can only
     be processed by an ‘initialize’ method that is specialized for the
     new instance’s class.  Any unprocessed keyword value pairs are
     ignored.

 -- generic: make-instance
 -- method: make-instance (class <class>) initarg ...
     ‘make-instance’ is an alias for ‘make’.

   The slots of the new complex number can be accessed using ‘slot-ref’
and ‘slot-set!’.  ‘slot-set!’ sets the value of an object slot and
‘slot-ref’ retrieves it.

     (slot-set! c 'r 10)
     (slot-set! c 'i 3)
     (slot-ref c 'r) ⇒ 10
     (slot-ref c 'i) ⇒ 3

   The ‘(oop goops describe)’ module provides a ‘describe’ function that
is useful for seeing all the slots of an object; it prints the slots and
their values to standard output.

     (describe c)
     ⊣
     #<<my-complex> 401d8638> is an instance of class <my-complex>
     Slots are:
          r = 10
          i = 3


File: guile.info,  Node: Slot Options,  Next: Slot Description Example,  Prev: Instance Creation,  Up: GOOPS

8.4 Slot Options
================

When specifying a slot (in a ‘(define-class ...)’ form), various options
can be specified in addition to the slot’s name.  Each option is
specified by a keyword.  The list of possible keywords is as follows.

 -- slot option: #:init-value init-value
 -- slot option: #:init-form init-form
 -- slot option: #:init-thunk init-thunk
 -- slot option: #:init-keyword init-keyword
     These options provide various ways to specify how to initialize the
     slot’s value at instance creation time.

     INIT-VALUE specifies a fixed initial slot value (shared across all
     new instances of the class).

     INIT-THUNK specifies a thunk that will provide a default value for
     the slot.  The thunk is called when a new instance is created and
     should return the desired initial slot value.

     INIT-FORM specifies a form that, when evaluated, will return an
     initial value for the slot.  The form is evaluated each time that
     an instance of the class is created, in the lexical environment of
     the containing ‘define-class’ expression.

     INIT-KEYWORD specifies a keyword that can be used to pass an
     initial slot value to ‘make’ when creating a new instance.

     Note that, since an ‘init-value’ value is shared across all
     instances of a class, you should only use it when the initial value
     is an immutable value, like a constant.  If you want to initialize
     a slot with a fresh, independently mutable value, you should use
     ‘init-thunk’ or ‘init-form’ instead.  Consider the following
     example.

          (define-class <chbouib> ()
            (hashtab #:init-value (make-hash-table)))

     Here only one hash table is created and all instances of
     ‘<chbouib>’ have their ‘hashtab’ slot refer to it.  In order to
     have each instance of ‘<chbouib>’ refer to a new hash table, you
     should instead write:

          (define-class <chbouib> ()
            (hashtab #:init-thunk make-hash-table))

     or:

          (define-class <chbouib> ()
            (hashtab #:init-form (make-hash-table)))

     If more than one of these options is specified for the same slot,
     the order of precedence, highest first is

        • ‘#:init-keyword’, if INIT-KEYWORD is present in the options
          passed to ‘make’

        • ‘#:init-thunk’, ‘#:init-form’ or ‘#:init-value’.

     If the slot definition contains more than one initialization option
     of the same precedence, the later ones are ignored.  If a slot is
     not initialized at all, its value is unbound.

     In general, slots that are shared between more than one instance
     are only initialized at new instance creation time if the slot
     value is unbound at that time.  However, if the new instance
     creation specifies a valid init keyword and value for a shared
     slot, the slot is re-initialized regardless of its previous value.

     Note, however, that the power of GOOPS’ metaobject protocol means
     that everything written here may be customized or overridden for
     particular classes!  The slot initializations described here are
     performed by the least specialized method of the generic function
     ‘initialize’, whose signature is

          (define-method (initialize (object <object>) initargs) ...)

     The initialization of instances of any given class can be
     customized by defining a ‘initialize’ method that is specialized
     for that class, and the author of the specialized method may decide
     to call ‘next-method’ - which will result in a call to the next
     less specialized ‘initialize’ method - at any point within the
     specialized code, or maybe not at all.  In general, therefore, the
     initialization mechanisms described here may be modified or
     overridden by more specialized code, or may not be supported at all
     for particular classes.

 -- slot option: #:getter getter
 -- slot option: #:setter setter
 -- slot option: #:accessor accessor
     Given an object OBJ with slots named ‘foo’ and ‘bar’, it is always
     possible to read and write those slots by calling ‘slot-ref’ and
     ‘slot-set!’ with the relevant slot name; for example:

          (slot-ref OBJ 'foo)
          (slot-set! OBJ 'bar 25)

     The ‘#:getter’, ‘#:setter’ and ‘#:accessor’ options, if present,
     tell GOOPS to create generic function and method definitions that
     can be used to get and set the slot value more conveniently.
     GETTER specifies a generic function to which GOOPS will add a
     method for getting the slot value.  SETTER specifies a generic
     function to which GOOPS will add a method for setting the slot
     value.  ACCESSOR specifies an accessor to which GOOPS will add
     methods for both getting and setting the slot value.

     So if a class includes a slot definition like this:

          (c #:getter get-count #:setter set-count #:accessor count)

     GOOPS defines generic function methods such that the slot value can
     be referenced using either the getter or the accessor -

          (let ((current-count (get-count obj))) ...)
          (let ((current-count (count obj))) ...)

     - and set using either the setter or the accessor -

          (set-count obj (+ 1 current-count))
          (set! (count obj) (+ 1 current-count))

     Note that

        • with an accessor, the slot value is set using the generalized
          ‘set!’ syntax

        • in practice, it is unusual for a slot to use all three of
          these options: read-only, write-only and read-write slots
          would typically use only ‘#:getter’, ‘#:setter’ and
          ‘#:accessor’ options respectively.

     The binding of the specified names is done in the environment of
     the ‘define-class’ expression.  If the names are already bound (in
     that environment) to values that cannot be upgraded to generic
     functions, those values are overwritten when the ‘define-class’
     expression is evaluated.  For more detail, see *note
     ensure-generic: Generic Function Internals.

 -- slot option: #:allocation allocation
     The ‘#:allocation’ option tells GOOPS how to allocate storage for
     the slot.  Possible values for ALLOCATION are

        • ‘#:instance’

          Indicates that GOOPS should create separate storage for this
          slot in each new instance of the containing class (and its
          subclasses).  This is the default.

        • ‘#:class’

          Indicates that GOOPS should create storage for this slot that
          is shared by all instances of the containing class (and its
          subclasses).  In other words, a slot in class C with
          allocation ‘#:class’ is shared by all INSTANCEs for which
          ‘(is-a? INSTANCE C)’.  This permits defining a kind of global
          variable which can be accessed only by (in)direct instances of
          the class which defines the slot.

        • ‘#:each-subclass’

          Indicates that GOOPS should create storage for this slot that
          is shared by all _direct_ instances of the containing class,
          and that whenever a subclass of the containing class is
          defined, GOOPS should create a new storage for the slot that
          is shared by all _direct_ instances of the subclass.  In other
          words, a slot with allocation ‘#:each-subclass’ is shared by
          all instances with the same ‘class-of’.

        • ‘#:virtual’

          Indicates that GOOPS should not allocate storage for this
          slot.  The slot definition must also include the ‘#:slot-ref’
          and ‘#:slot-set!’ options to specify how to reference and set
          the value for this slot.  See the example below.

     Slot allocation options are processed when defining a new class by
     the generic function ‘compute-get-n-set’, which is specialized by
     the class’s metaclass.  Hence new types of slot allocation can be
     implemented by defining a new metaclass and a method for
     ‘compute-get-n-set’ that is specialized for the new metaclass.  For
     an example of how to do this, see *note Customizing Class
     Definition::.

 -- slot option: #:slot-ref getter
 -- slot option: #:slot-set! setter
     The ‘#:slot-ref’ and ‘#:slot-set!’ options must be specified if the
     slot allocation is ‘#:virtual’, and are ignored otherwise.

     GETTER should be a closure taking a single INSTANCE parameter that
     returns the current slot value.  SETTER should be a closure taking
     two parameters - INSTANCE and NEW-VAL - that sets the slot value to
     NEW-VAL.


File: guile.info,  Node: Slot Description Example,  Next: Methods and Generic Functions,  Prev: Slot Options,  Up: GOOPS

8.5 Illustrating Slot Description
=================================

To illustrate slot description, we can redefine the ‘<my-complex>’ class
seen before.  A definition could be:

     (define-class <my-complex> (<number>)
        (r #:init-value 0 #:getter get-r #:setter set-r! #:init-keyword #:r)
        (i #:init-value 0 #:getter get-i #:setter set-i! #:init-keyword #:i))

With this definition, the ‘r’ and ‘i’ slots are set to 0 by default, and
can be initialised to other values by calling ‘make’ with the ‘#:r’ and
‘#:i’ keywords.  Also the generic functions ‘get-r’, ‘set-r!’, ‘get-i’
and ‘set-i!’ are automatically defined to read and write the slots.

     (define c1 (make <my-complex> #:r 1 #:i 2))
     (get-r c1) ⇒ 1
     (set-r! c1 12)
     (get-r c1) ⇒ 12
     (define c2 (make <my-complex> #:r 2))
     (get-r c2) ⇒ 2
     (get-i c2) ⇒ 0

   Accessors can both read and write a slot.  So, another definition of
the ‘<my-complex>’ class, using the ‘#:accessor’ option, could be:

     (define-class <my-complex> (<number>)
        (r #:init-value 0 #:accessor real-part #:init-keyword #:r)
        (i #:init-value 0 #:accessor imag-part #:init-keyword #:i))

With this definition, the ‘r’ slot can be read with:
     (real-part c)
and set with:
     (set! (real-part c) new-value)

   Suppose now that we want to manipulate complex numbers with both
rectangular and polar coordinates.  One solution could be to have a
definition of complex numbers which uses one particular representation
and some conversion functions to pass from one representation to the
other.  A better solution is to use virtual slots, like this:

     (define-class <my-complex> (<number>)
        ;; True slots use rectangular coordinates
        (r #:init-value 0 #:accessor real-part #:init-keyword #:r)
        (i #:init-value 0 #:accessor imag-part #:init-keyword #:i)
        ;; Virtual slots access do the conversion
        (m #:accessor magnitude #:init-keyword #:magn
           #:allocation #:virtual
           #:slot-ref (lambda (o)
                       (let ((r (slot-ref o 'r)) (i (slot-ref o 'i)))
                         (sqrt (+ (* r r) (* i i)))))
           #:slot-set! (lambda (o m)
                         (let ((a (slot-ref o 'a)))
                           (slot-set! o 'r (* m (cos a)))
                           (slot-set! o 'i (* m (sin a))))))
        (a #:accessor angle #:init-keyword #:angle
           #:allocation #:virtual
           #:slot-ref (lambda (o)
                       (atan (slot-ref o 'i) (slot-ref o 'r)))
           #:slot-set! (lambda(o a)
                        (let ((m (slot-ref o 'm)))
                           (slot-set! o 'r (* m (cos a)))
                           (slot-set! o 'i (* m (sin a)))))))


   In this class definition, the magnitude ‘m’ and angle ‘a’ slots are
virtual, and are calculated, when referenced, from the normal (i.e.
‘#:allocation #:instance’) slots ‘r’ and ‘i’, by calling the function
defined in the relevant ‘#:slot-ref’ option.  Correspondingly, writing
‘m’ or ‘a’ leads to calling the function defined in the ‘#:slot-set!’
option.  Thus the following expression

     (slot-set! c 'a 3)

permits to set the angle of the ‘c’ complex number.

     (define c (make <my-complex> #:r 12 #:i 20))
     (real-part c) ⇒ 12
     (angle c) ⇒ 1.03037682652431
     (slot-set! c 'i 10)
     (set! (real-part c) 1)
     (describe c)
     ⊣
     #<<my-complex> 401e9b58> is an instance of class <my-complex>
     Slots are:
          r = 1
          i = 10
          m = 10.0498756211209
          a = 1.47112767430373

   Since initialization keywords have been defined for the four slots,
we can now define the standard Scheme primitives ‘make-rectangular’ and
‘make-polar’.

     (define make-rectangular
        (lambda (x y) (make <my-complex> #:r x #:i y)))

     (define make-polar
        (lambda (x y) (make <my-complex> #:magn x #:angle y)))


File: guile.info,  Node: Methods and Generic Functions,  Next: Inheritance,  Prev: Slot Description Example,  Up: GOOPS

8.6 Methods and Generic Functions
=================================

A GOOPS method is like a Scheme procedure except that it is specialized
for a particular set of argument classes, and will only be used when the
actual arguments in a call match the classes in the method definition.

     (define-method (+ (x <string>) (y <string>))
       (string-append x y))

     (+ "abc" "de") ⇒ "abcde"

   A method is not formally associated with any single class (as it is
in many other object oriented languages), because a method can be
specialized for a combination of several classes.  If you’ve studied
object orientation in non-Lispy languages, you may remember discussions
such as whether a method to stretch a graphical image around a surface
should be a method of the image class, with a surface as a parameter, or
a method of the surface class, with an image as a parameter.  In GOOPS
you’d just write

     (define-method (stretch (im <image>) (sf <surface>))
       ...)

and the question of which class the method is more associated with does
not need answering.

   There can simultaneously be several methods with the same name but
different sets of specializing argument classes; for example:

     (define-method (+ (x <string>) (y <string)) ...)
     (define-method (+ (x <matrix>) (y <matrix>)) ...)
     (define-method (+ (f <fish>) (b <bicycle>)) ...)
     (define-method (+ (a <foo>) (b <bar>) (c <baz>)) ...)

A generic function is a container for the set of such methods that a
program intends to use.

   If you look at a program’s source code, and see ‘(+ x y)’ somewhere
in it, conceptually what is happening is that the program at that point
calls a generic function (in this case, the generic function bound to
the identifier ‘+’).  When that happens, Guile works out which of the
generic function’s methods is the most appropriate for the arguments
that the function is being called with; then it evaluates the method’s
code with the arguments as formal parameters.  This happens every time
that a generic function call is evaluated — it isn’t assumed that a
given source code call will end up invoking the same method every time.

   Defining an identifier as a generic function is done with the
‘define-generic’ macro.  Definition of a new method is done with the
‘define-method’ macro.  Note that ‘define-method’ automatically does a
‘define-generic’ if the identifier concerned is not already a generic
function, so often an explicit ‘define-generic’ call is not needed.

 -- syntax: define-generic symbol
     Create a generic function with name SYMBOL and bind it to the
     variable SYMBOL.  If SYMBOL was previously bound to a Scheme
     procedure (or procedure-with-setter), the old procedure (and
     setter) is incorporated into the new generic function as its
     default procedure (and setter).  Any other previous value,
     including an existing generic function, is discarded and replaced
     by a new, empty generic function.

 -- syntax: define-method (generic parameter ...) body ...
     Define a method for the generic function or accessor GENERIC with
     parameters PARAMETERs and body BODY ....

     GENERIC is a generic function.  If GENERIC is a variable which is
     not yet bound to a generic function object, the expansion of
     ‘define-method’ will include a call to ‘define-generic’.  If
     GENERIC is ‘(setter GENERIC-WITH-SETTER)’, where
     GENERIC-WITH-SETTER is a variable which is not yet bound to a
     generic-with-setter object, the expansion will include a call to
     ‘define-accessor’.

     Each PARAMETER must be either a symbol or a two-element list
     ‘(SYMBOL CLASS)’.  The symbols refer to variables in the body forms
     that will be bound to the parameters supplied by the caller when
     calling this method.  The CLASSes, if present, specify the possible
     combinations of parameters to which this method can be applied.

     BODY ... are the bodies of the method definition.

   ‘define-method’ expressions look a little like Scheme procedure
definitions of the form

     (define (name formals ...) . body)

   The important difference is that each formal parameter, apart from
the possible “rest” argument, can be qualified by a class name: ‘FORMAL’
becomes ‘(FORMAL CLASS)’.  The meaning of this qualification is that the
method being defined will only be applicable in a particular generic
function invocation if the corresponding argument is an instance of
‘CLASS’ (or one of its subclasses).  If more than one of the formal
parameters is qualified in this way, then the method will only be
applicable if each of the corresponding arguments is an instance of its
respective qualifying class.

   Note that unqualified formal parameters act as though they are
qualified by the class ‘<top>’, which GOOPS uses to mean the superclass
of all valid Scheme types, including both primitive types and GOOPS
classes.

   For example, if a generic function method is defined with PARAMETERs
‘(s1 <square>)’ and ‘(n <number>)’, that method is only applicable to
invocations of its generic function that have two parameters where the
first parameter is an instance of the ‘<square>’ class and the second
parameter is a number.

* Menu:

* Accessors::
* Extending Primitives::
* Merging Generics::
* Next-method::
* Generic Function and Method Examples::
* Handling Invocation Errors::


File: guile.info,  Node: Accessors,  Next: Extending Primitives,  Up: Methods and Generic Functions

8.6.1 Accessors
---------------

An accessor is a generic function that can also be used with the
generalized ‘set!’ syntax (*note Procedures with Setters::).  Guile will
handle a call like

     (set! (accessor args...) value)

by calling the most specialized method of ‘accessor’ that matches the
classes of ‘args’ and ‘value’.  ‘define-accessor’ is used to bind an
identifier to an accessor.

 -- syntax: define-accessor symbol
     Create an accessor with name SYMBOL and bind it to the variable
     SYMBOL.  If SYMBOL was previously bound to a Scheme procedure (or
     procedure-with-setter), the old procedure (and setter) is
     incorporated into the new accessor as its default procedure (and
     setter).  Any other previous value, including an existing generic
     function or accessor, is discarded and replaced by a new, empty
     accessor.


File: guile.info,  Node: Extending Primitives,  Next: Merging Generics,  Prev: Accessors,  Up: Methods and Generic Functions

8.6.2 Extending Primitives
--------------------------

Many of Guile’s primitive procedures can be extended by giving them a
generic function definition that operates in conjunction with their
normal C-coded implementation.  When a primitive is extended in this
way, it behaves like a generic function with the C-coded implementation
as its default method.

   This extension happens automatically if a method is defined (by a
‘define-method’ call) for a variable whose current value is a primitive.
But it can also be forced by calling ‘enable-primitive-generic!’.

 -- primitive procedure: enable-primitive-generic! primitive
     Force the creation of a generic function definition for PRIMITIVE.

   Once the generic function definition for a primitive has been
created, it can be retrieved using ‘primitive-generic-generic’.

 -- primitive procedure: primitive-generic-generic primitive
     Return the generic function definition of PRIMITIVE.

     ‘primitive-generic-generic’ raises an error if PRIMITIVE is not a
     primitive with generic capability.


File: guile.info,  Node: Merging Generics,  Next: Next-method,  Prev: Extending Primitives,  Up: Methods and Generic Functions

8.6.3 Merging Generics
----------------------

GOOPS generic functions and accessors often have short, generic names.
For example, if a vector package provides an accessor for the X
coordinate of a vector, that accessor may just be called ‘x’.  It
doesn’t need to be called, for example, ‘vector:x’, because GOOPS will
work out, when it sees code like ‘(x OBJ)’, that the vector-specific
method of ‘x’ should be called if OBJ is a vector.

   That raises the question, though, of what happens when different
packages define a generic function with the same name.  Suppose we work
with a graphical package which needs to use two independent vector
packages for 2D and 3D vectors respectively.  If both packages export
‘x’, what does the code using those packages end up with?

   *note duplicate binding handlers: Creating Guile Modules. explains
how this is resolved for conflicting bindings in general.  For generics,
there is a special duplicates handler, ‘merge-generics’, which tells the
module system to merge generic functions with the same name.  Here is an
example:

     (define-module (math 2D-vectors)
       #:use-module (oop goops)
       #:export (x y ...))

     (define-module (math 3D-vectors)
       #:use-module (oop goops)
       #:export (x y z ...))

     (define-module (my-module)
       #:use-module (oop goops)
       #:use-module (math 2D-vectors)
       #:use-module (math 3D-vectors)
       #:duplicates (merge-generics))

   The generic function ‘x’ in ‘(my-module)’ will now incorporate all of
the methods of ‘x’ from both imported modules.

   To be precise, there will now be three distinct generic functions
named ‘x’: ‘x’ in ‘(math 2D-vectors)’, ‘x’ in ‘(math 3D-vectors)’, and
‘x’ in ‘(my-module)’; and these functions share their methods in an
interesting and dynamic way.

   To explain, let’s call the imported generic functions (in ‘(math
2D-vectors)’ and ‘(math 3D-vectors)’) the “ancestors”, and the merged
generic function (in ‘(my-module)’), the “descendant”.  The general rule
is that for any generic function G, the applicable methods are selected
from the union of the methods of G’s descendant functions, the methods
of G itself and the methods of G’s ancestor functions.

   Thus ancestor functions effectively share methods with their
descendants, and vice versa.  In the example above, ‘x’ in ‘(math
2D-vectors)’ will share the methods of ‘x’ in ‘(my-module)’ and vice
versa.(1)  Sharing is dynamic, so adding another new method to a
descendant implies adding it to that descendant’s ancestors too.

   ---------- Footnotes ----------

   (1) But note that ‘x’ in ‘(math 2D-vectors)’ doesn’t share methods
with ‘x’ in ‘(math 3D-vectors)’, so modularity is still preserved.


File: guile.info,  Node: Next-method,  Next: Generic Function and Method Examples,  Prev: Merging Generics,  Up: Methods and Generic Functions

8.6.4 Next-method
-----------------

When you call a generic function, with a particular set of arguments,
GOOPS builds a list of all the methods that are applicable to those
arguments and orders them by how closely the method definitions match
the actual argument types.  It then calls the method at the top of this
list.  If the selected method’s code wants to call on to the next method
in this list, it can do so by using ‘next-method’.

     (define-method (Test (a <integer>)) (cons 'integer (next-method)))
     (define-method (Test (a <number>))  (cons 'number  (next-method)))
     (define-method (Test a)             (list 'top))

   With these definitions,

     (Test 1)   ⇒ (integer number top)
     (Test 1.0) ⇒ (number top)
     (Test #t)  ⇒ (top)

   ‘next-method’ is always called as just ‘(next-method)’.  The
arguments for the next method call are always implicit, and always the
same as for the original method call.

   If you want to call on to a method with the same name but with a
different set of arguments (as you might with overloaded methods in C++,
for example), you do not use ‘next-method’, but instead simply write the
new call as usual:

     (define-method (Test (a <number>) min max)
       (if (and (>= a min) (<= a max))
           (display "Number is in range\n"))
       (Test a))

     (Test 2 1 10)
     ⊣
     Number is in range
     ⇒
     (integer number top)

   (You should be careful in this case that the ‘Test’ calls do not lead
to an infinite recursion, but this consideration is just the same as in
Scheme code in general.)


File: guile.info,  Node: Generic Function and Method Examples,  Next: Handling Invocation Errors,  Prev: Next-method,  Up: Methods and Generic Functions

8.6.5 Generic Function and Method Examples
------------------------------------------

Consider the following definitions:

     (define-generic G)
     (define-method (G (a <integer>) b) 'integer)
     (define-method (G (a <real>) b) 'real)
     (define-method (G a b) 'top)

   The ‘define-generic’ call defines G as a generic function.  The three
next lines define methods for G.  Each method uses a sequence of
“parameter specializers” that specify when the given method is
applicable.  A specializer permits to indicate the class a parameter
must belong to (directly or indirectly) to be applicable.  If no
specializer is given, the system defaults it to ‘<top>’.  Thus, the
first method definition is equivalent to

     (define-method (G (a <integer>) (b <top>)) 'integer)

   Now, let’s look at some possible calls to the generic function G:

     (G 2 3)    ⇒ integer
     (G 2 #t)   ⇒ integer
     (G 1.2 'a) ⇒ real
     (G #t #f)  ⇒ top
     (G 1 2 3)  ⇒ error (since no method exists for 3 parameters)

   The methods above use only one specializer per parameter list.  But
in general, any or all of a method’s parameters may be specialized.
Suppose we define now:

     (define-method (G (a <integer>) (b <number>))  'integer-number)
     (define-method (G (a <integer>) (b <real>))    'integer-real)
     (define-method (G (a <integer>) (b <integer>)) 'integer-integer)
     (define-method (G a (b <number>))              'top-number)

With these definitions:

     (G 1 2)   ⇒ integer-integer
     (G 1 1.0) ⇒ integer-real
     (G 1 #t)  ⇒ integer
     (G 'a 1)  ⇒ top-number

   As a further example we shall continue to define operations on the
‘<my-complex>’ class.  Suppose that we want to use it to implement
complex numbers completely.  For instance a definition for the addition
of two complex numbers could be

     (define-method (new-+ (a <my-complex>) (b <my-complex>))
       (make-rectangular (+ (real-part a) (real-part b))
                         (+ (imag-part a) (imag-part b))))

   To be sure that the ‘+’ used in the method ‘new-+’ is the standard
addition we can do:

     (define-generic new-+)

     (let ((+ +))
       (define-method (new-+ (a <my-complex>) (b <my-complex>))
         (make-rectangular (+ (real-part a) (real-part b))
                           (+ (imag-part a) (imag-part b)))))

   The ‘define-generic’ ensures here that ‘new-+’ will be defined in the
global environment.  Once this is done, we can add methods to the
generic function ‘new-+’ which make a closure on the ‘+’ symbol.  A
complete writing of the ‘new-+’ methods is shown in *note Figure 8.1:
fig:newplus.

     (define-generic new-+)

     (let ((+ +))

       (define-method (new-+ (a <real>) (b <real>)) (+ a b))

       (define-method (new-+ (a <real>) (b <my-complex>))
         (make-rectangular (+ a (real-part b)) (imag-part b)))

       (define-method (new-+ (a <my-complex>) (b <real>))
         (make-rectangular (+ (real-part a) b) (imag-part a)))

       (define-method (new-+ (a <my-complex>) (b <my-complex>))
         (make-rectangular (+ (real-part a) (real-part b))
                           (+ (imag-part a) (imag-part b))))

       (define-method (new-+ (a <number>))  a)

       (define-method (new-+) 0)

       (define-method (new-+ . args)
         (new-+ (car args)
           (apply new-+ (cdr args)))))

     (set! + new-+)

Figure 8.1: Extending ‘+’ to handle complex numbers

   We take advantage here of the fact that generic function are not
obliged to have a fixed number of parameters.  The four first methods
implement dyadic addition.  The fifth method says that the addition of a
single element is this element itself.  The sixth method says that using
the addition with no parameter always return 0 (as is also true for the
primitive ‘+’).  The last method takes an arbitrary number of
parameters(1).  This method acts as a kind of ‘reduce’: it calls the
dyadic addition on the _car_ of the list and on the result of applying
it on its rest.  To finish, the ‘set!’ permits to redefine the ‘+’
symbol to our extended addition.

   To conclude our implementation (integration?)  of complex numbers, we
could redefine standard Scheme predicates in the following manner:

     (define-method (complex? c <my-complex>) #t)
     (define-method (complex? c)           #f)

     (define-method (number? n <number>) #t)
     (define-method (number? n)          #f)
     ...

   Standard primitives in which complex numbers are involved could also
be redefined in the same manner.

   ---------- Footnotes ----------

   (1) The parameter list for a ‘define-method’ follows the conventions
used for Scheme procedures.  In particular it can use the dot notation
or a symbol to denote an arbitrary number of parameters


File: guile.info,  Node: Handling Invocation Errors,  Prev: Generic Function and Method Examples,  Up: Methods and Generic Functions

8.6.6 Handling Invocation Errors
--------------------------------

If a generic function is invoked with a combination of parameters for
which there is no applicable method, GOOPS raises an error.

 -- generic: no-method
 -- method: no-method (gf <generic>) args
     When an application invokes a generic function, and no methods at
     all have been defined for that generic function, GOOPS calls the
     ‘no-method’ generic function.  The default method calls
     ‘goops-error’ with an appropriate message.

 -- generic: no-applicable-method
 -- method: no-applicable-method (gf <generic>) args
     When an application applies a generic function to a set of
     arguments, and no methods have been defined for those argument
     types, GOOPS calls the ‘no-applicable-method’ generic function.
     The default method calls ‘goops-error’ with an appropriate message.

 -- generic: no-next-method
 -- method: no-next-method (gf <generic>) args
     When a generic function method calls ‘(next-method)’ to invoke the
     next less specialized method for that generic function, and no less
     specialized methods have been defined for the current generic
     function arguments, GOOPS calls the ‘no-next-method’ generic
     function.  The default method calls ‘goops-error’ with an
     appropriate message.


File: guile.info,  Node: Inheritance,  Next: Introspection,  Prev: Methods and Generic Functions,  Up: GOOPS

8.7 Inheritance
===============

Here are some class definitions to help illustrate inheritance:

     (define-class A () a)
     (define-class B () b)
     (define-class C () c)
     (define-class D (A B) d a)
     (define-class E (A C) e c)
     (define-class F (D E) f)

   ‘A’, ‘B’, ‘C’ have a null list of superclasses.  In this case, the
system will replace the null list by a list which only contains
‘<object>’, the root of all the classes defined by ‘define-class’.  ‘D’,
‘E’, ‘F’ use multiple inheritance: each class inherits from two
previously defined classes.  Those class definitions define a hierarchy
which is shown in *note Figure 8.2: fig:hier.  In this figure, the class
‘<top>’ is also shown; this class is the superclass of all Scheme
objects.  In particular, ‘<top>’ is the superclass of all standard
Scheme types.

          <top>
          / \\\_____________________
         /   \\___________          \
        /     \           \          \
    <object>  <pair>  <procedure>  <number>
    /  |  \                           |
   /   |   \                          |
  A    B    C                      <complex>
  |\__/__   |                         |
   \ /   \ /                          |
    D     E                         <real>
     \   /                            |
       F                              |
                                   <integer>

Figure 8.2: A class hierarchy.

   When a class has superclasses, its set of slots is calculated by
taking the union of its own slots and those of all its superclasses.
Thus each instance of D will have three slots, ‘a’, ‘b’ and ‘d’).  The
slots of a class can be discovered using the ‘class-slots’ primitive.
For instance,

     (class-slots A) ⇒ ((a))
     (class-slots E) ⇒ ((a) (e) (c))
     (class-slots F) ⇒ ((e) (c) (b) (d) (a) (f))

The ordering of the returned slots is not significant.

* Menu:

* Class Precedence List::
* Sorting Methods::


File: guile.info,  Node: Class Precedence List,  Next: Sorting Methods,  Up: Inheritance

8.7.1 Class Precedence List
---------------------------

What happens when a class inherits from two or more superclasses that
have a slot with the same name but incompatible definitions — for
example, different init values or slot allocations?  We need a rule for
deciding which slot definition the derived class ends up with, and this
rule is provided by the class’s “Class Precedence List”.(1)

   Another problem arises when invoking a generic function, and there is
more than one method that could apply to the call arguments.  Here we
need a way of ordering the applicable methods, so that Guile knows which
method to use first, which to use next if that method calls
‘next-method’, and so on.  One of the ingredients for this ordering is
determining, for each given call argument, which of the specializing
classes, from each applicable method’s definition, is the most specific
for that argument; and here again the class precedence list helps.

   If inheritance was restricted such that each class could only have
one superclass — which is known as “single” inheritance — class ordering
would be easy.  The rule would be simply that a subclass is considered
more specific than its superclass.

   With multiple inheritance, ordering is less obvious, and we have to
impose an arbitrary rule to determine precedence.  Suppose we have

     (define-class X ()
        (x #:init-value 1))

     (define-class Y ()
        (x #:init-value 2))

     (define-class Z (X Y)
        (...))

Clearly the ‘Z’ class is more specific than ‘X’ or ‘Y’, for instances of
‘Z’.  But which is more specific out of ‘X’ and ‘Y’ — and hence, for the
definitions above, which ‘#:init-value’ will take effect when creating
an instance of ‘Z’?  The rule in GOOPS is that the superclasses listed
earlier are more specific than those listed later.  Hence ‘X’ is more
specific than ‘Y’, and the ‘#:init-value’ for slot ‘x’ in instances of
‘Z’ will be 1.

   Hence there is a linear ordering for a class and all its
superclasses, from most specific to least specific, and this ordering is
called the Class Precedence List of the class.

   In fact the rules above are not quite enough to always determine a
unique order, but they give an idea of how things work.  For example,
for the ‘F’ class shown in *note Figure 8.2: fig:hier, the class
precedence list is

     (f d e a c b <object> <top>)

In cases where there is any ambiguity (like this one), it is a bad idea
for programmers to rely on exactly what the order is.  If the order for
some superclasses is important, it can be expressed directly in the
class definition.

   The precedence list of a class can be obtained by calling
‘class-precedence-list’.  This function returns a ordered list whose
first element is the most specific class.  For instance:

     (class-precedence-list B) ⇒ (#<<class> B 401b97c8>
                                          #<<class> <object> 401e4a10>
                                          #<<class> <top> 4026a9d8>)

Or for a more immediately readable result:

     (map class-name (class-precedence-list B)) ⇒ (B <object> <top>)

   ---------- Footnotes ----------

   (1) This section is an adaptation of material from Jeff Dalton’s
(J.Dalton@ed.ac.uk) ‘Brief introduction to CLOS’


File: guile.info,  Node: Sorting Methods,  Prev: Class Precedence List,  Up: Inheritance

8.7.2 Sorting Methods
---------------------

Now, with the idea of the class precedence list, we can state precisely
how the possible methods are sorted when more than one of the methods of
a generic function are applicable to the call arguments.

   The rules are that
   • the applicable methods are sorted in order of specificity, and the
     most specific method is used first, then the next if that method
     calls ‘next-method’, and so on

   • a method M1 is more specific than another method M2 if the first
     specializing class that differs, between the definitions of M1 and
     M2, is more specific, in M1’s definition, for the corresponding
     actual call argument, than the specializing class in M2’s
     definition

   • a class C1 is more specific than another class C2, for an object of
     actual class C, if C1 comes before C2 in C’s class precedence list.


File: guile.info,  Node: Introspection,  Next: GOOPS Error Handling,  Prev: Inheritance,  Up: GOOPS

8.8 Introspection
=================

“Introspection”, or “reflection”, means being able to obtain information
dynamically about GOOPS objects.  It is perhaps best illustrated by
considering an object oriented language that does not provide any
introspection, namely C++.

   Nothing in C++ allows a running program to obtain answers to the
following types of question:

   • What are the data members of this object or class?

   • What classes does this class inherit from?

   • Is this method call virtual or non-virtual?

   • If I invoke ‘Employee::adjustHoliday()’, what class contains the
     ‘adjustHoliday()’ method that will be applied?

   In C++, answers to such questions can only be determined by looking
at the source code, if you have access to it.  GOOPS, on the other hand,
includes procedures that allow answers to these questions — or their
GOOPS equivalents — to be obtained dynamically, at run time.

* Menu:

* Classes::
* Instances::
* Slots::
* Generic Functions::
* Accessing Slots::


File: guile.info,  Node: Classes,  Next: Instances,  Up: Introspection

8.8.1 Classes
-------------

A GOOPS class is itself an instance of the ‘<class>’ class, or of a
subclass of ‘<class>’.  The definition of the ‘<class>’ class has slots
that are used to describe the properties of a class, including the
following.

 -- primitive procedure: class-name class
     Return the name of class CLASS.  This is the value of CLASS’s
     ‘name’ slot.

 -- primitive procedure: class-direct-supers class
     Return a list containing the direct superclasses of CLASS.  This is
     the value of CLASS’s ‘direct-supers’ slot.

 -- primitive procedure: class-direct-slots class
     Return a list containing the slot definitions of the direct slots
     of CLASS.  This is the value of CLASS’s ‘direct-slots’ slot.

 -- primitive procedure: class-direct-subclasses class
     Return a list containing the direct subclasses of CLASS.  This is
     the value of CLASS’s ‘direct-subclasses’ slot.

 -- primitive procedure: class-direct-methods class
     Return a list of all the generic function methods that use CLASS as
     a formal parameter specializer.  This is the value of CLASS’s
     ‘direct-methods’ slot.

 -- primitive procedure: class-precedence-list class
     Return the class precedence list for class CLASS (*note Class
     Precedence List::).  This is the value of CLASS’s ‘cpl’ slot.

 -- primitive procedure: class-slots class
     Return a list containing the slot definitions for all CLASS’s
     slots, including any slots that are inherited from superclasses.
     This is the value of CLASS’s ‘slots’ slot.

 -- procedure: class-subclasses class
     Return a list of all subclasses of CLASS.

 -- procedure: class-methods class
     Return a list of all methods that use CLASS or a subclass of CLASS
     as one of its formal parameter specializers.


File: guile.info,  Node: Instances,  Next: Slots,  Prev: Classes,  Up: Introspection

8.8.2 Instances
---------------

 -- primitive procedure: class-of value
     Return the GOOPS class of any Scheme VALUE.

 -- primitive procedure: instance? object
     Return ‘#t’ if OBJECT is any GOOPS instance, otherwise ‘#f’.

 -- procedure: is-a? object class
     Return ‘#t’ if OBJECT is an instance of CLASS or one of its
     subclasses.

   You can use the ‘is-a?’ predicate to ask whether any given value
belongs to a given class, or ‘class-of’ to discover the class of a given
value.  Note that when GOOPS is loaded (by code using the ‘(oop goops)’
module) built-in classes like ‘<string>’, ‘<list>’ and ‘<number>’ are
automatically set up, corresponding to all Guile Scheme types.

     (is-a? 2.3 <number>) ⇒ #t
     (is-a? 2.3 <real>) ⇒ #t
     (is-a? 2.3 <string>) ⇒ #f
     (is-a? '("a" "b") <string>) ⇒ #f
     (is-a? '("a" "b") <list>) ⇒ #t
     (is-a? (car '("a" "b")) <string>) ⇒ #t
     (is-a? <string> <class>) ⇒ #t
     (is-a? <class> <string>) ⇒ #f

     (class-of 2.3) ⇒ #<<class> <real> 908c708>
     (class-of #(1 2 3)) ⇒ #<<class> <vector> 908cd20>
     (class-of <string>) ⇒ #<<class> <class> 8bd3e10>
     (class-of <class>) ⇒ #<<class> <class> 8bd3e10>


File: guile.info,  Node: Slots,  Next: Generic Functions,  Prev: Instances,  Up: Introspection

8.8.3 Slots
-----------

 -- procedure: class-slot-definition class slot-name
     Return the slot definition for the slot named SLOT-NAME in class
     CLASS.  SLOT-NAME should be a symbol.

 -- procedure: slot-definition-name slot-def
     Extract and return the slot name from SLOT-DEF.

 -- procedure: slot-definition-options slot-def
     Extract and return the slot options from SLOT-DEF.

 -- procedure: slot-definition-allocation slot-def
     Extract and return the slot allocation option from SLOT-DEF.  This
     is the value of the ‘#:allocation’ keyword (*note allocation: Slot
     Options.), or ‘#:instance’ if the ‘#:allocation’ keyword is absent.

 -- procedure: slot-definition-getter slot-def
     Extract and return the slot getter option from SLOT-DEF.  This is
     the value of the ‘#:getter’ keyword (*note getter: Slot Options.),
     or ‘#f’ if the ‘#:getter’ keyword is absent.

 -- procedure: slot-definition-setter slot-def
     Extract and return the slot setter option from SLOT-DEF.  This is
     the value of the ‘#:setter’ keyword (*note setter: Slot Options.),
     or ‘#f’ if the ‘#:setter’ keyword is absent.

 -- procedure: slot-definition-accessor slot-def
     Extract and return the slot accessor option from SLOT-DEF.  This is
     the value of the ‘#:accessor’ keyword (*note accessor: Slot
     Options.), or ‘#f’ if the ‘#:accessor’ keyword is absent.

 -- procedure: slot-definition-init-value slot-def
     Extract and return the slot init-value option from SLOT-DEF.  This
     is the value of the ‘#:init-value’ keyword (*note init-value: Slot
     Options.), or the unbound value if the ‘#:init-value’ keyword is
     absent.

 -- procedure: slot-definition-init-form slot-def
     Extract and return the slot init-form option from SLOT-DEF.  This
     is the value of the ‘#:init-form’ keyword (*note init-form: Slot
     Options.), or the unbound value if the ‘#:init-form’ keyword is
     absent.

 -- procedure: slot-definition-init-thunk slot-def
     Extract and return the slot init-thunk option from SLOT-DEF.  This
     is the value of the ‘#:init-thunk’ keyword (*note init-thunk: Slot
     Options.), or ‘#f’ if the ‘#:init-thunk’ keyword is absent.

 -- procedure: slot-definition-init-keyword slot-def
     Extract and return the slot init-keyword option from SLOT-DEF.
     This is the value of the ‘#:init-keyword’ keyword (*note
     init-keyword: Slot Options.), or ‘#f’ if the ‘#:init-keyword’
     keyword is absent.

 -- procedure: slot-init-function class slot-name
     Return the initialization function for the slot named SLOT-NAME in
     class CLASS.  SLOT-NAME should be a symbol.

     The returned initialization function incorporates the effects of
     the standard ‘#:init-thunk’, ‘#:init-form’ and ‘#:init-value’ slot
     options.  These initializations can be overridden by the
     ‘#:init-keyword’ slot option or by a specialized ‘initialize’
     method, so, in general, the function returned by
     ‘slot-init-function’ may be irrelevant.  For a fuller discussion,
     see *note init-value: Slot Options.


File: guile.info,  Node: Generic Functions,  Next: Accessing Slots,  Prev: Slots,  Up: Introspection

8.8.4 Generic Functions
-----------------------

A generic function is an instance of the ‘<generic>’ class, or of a
subclass of ‘<generic>’.  The definition of the ‘<generic>’ class has
slots that are used to describe the properties of a generic function.

 -- primitive procedure: generic-function-name gf
     Return the name of generic function GF.

 -- primitive procedure: generic-function-methods gf
     Return a list of the methods of generic function GF.  This is the
     value of GF’s ‘methods’ slot.

   Similarly, a method is an instance of the ‘<method>’ class, or of a
subclass of ‘<method>’; and the definition of the ‘<method>’ class has
slots that are used to describe the properties of a method.

 -- primitive procedure: method-generic-function method
     Return the generic function that METHOD belongs to.  This is the
     value of METHOD’s ‘generic-function’ slot.

 -- primitive procedure: method-specializers method
     Return a list of METHOD’s formal parameter specializers .  This is
     the value of METHOD’s ‘specializers’ slot.

 -- primitive procedure: method-procedure method
     Return the procedure that implements METHOD.  This is the value of
     METHOD’s ‘procedure’ slot.

 -- generic: method-source
 -- method: method-source (m <method>)
     Return an expression that prints to show the definition of method
     M.

          (define-generic cube)

          (define-method (cube (n <number>))
            (* n n n))

          (map method-source (generic-function-methods cube))
          ⇒
          ((method ((n <number>)) (* n n n)))


File: guile.info,  Node: Accessing Slots,  Prev: Generic Functions,  Up: Introspection

8.8.5 Accessing Slots
---------------------

Any slot, regardless of its allocation, can be queried, referenced and
set using the following four primitive procedures.

 -- primitive procedure: slot-exists? obj slot-name
     Return ‘#t’ if OBJ has a slot with name SLOT-NAME, otherwise ‘#f’.

 -- primitive procedure: slot-bound? obj slot-name
     Return ‘#t’ if the slot named SLOT-NAME in OBJ has a value,
     otherwise ‘#f’.

     ‘slot-bound?’ calls the generic function ‘slot-missing’ if OBJ does
     not have a slot called SLOT-NAME (*note slot-missing: Accessing
     Slots.).

 -- primitive procedure: slot-ref obj slot-name
     Return the value of the slot named SLOT-NAME in OBJ.

     ‘slot-ref’ calls the generic function ‘slot-missing’ if OBJ does
     not have a slot called SLOT-NAME (*note slot-missing: Accessing
     Slots.).

     ‘slot-ref’ calls the generic function ‘slot-unbound’ if the named
     slot in OBJ does not have a value (*note slot-unbound: Accessing
     Slots.).

 -- primitive procedure: slot-set! obj slot-name value
     Set the value of the slot named SLOT-NAME in OBJ to VALUE.

     ‘slot-set!’ calls the generic function ‘slot-missing’ if OBJ does
     not have a slot called SLOT-NAME (*note slot-missing: Accessing
     Slots.).

   GOOPS stores information about slots in classes.  Internally, all of
these procedures work by looking up the slot definition for the slot
named SLOT-NAME in the class ‘(class-of OBJ)’, and then using the slot
definition’s “getter” and “setter” closures to get and set the slot
value.

   The next four procedures differ from the previous ones in that they
take the class as an explicit argument, rather than assuming ‘(class-of
OBJ)’.  Therefore they allow you to apply the “getter” and “setter”
closures of a slot definition in one class to an instance of a different
class.

 -- primitive procedure: slot-exists-using-class? class obj slot-name
     Return ‘#t’ if CLASS has a slot definition for a slot with name
     SLOT-NAME, otherwise ‘#f’.

 -- primitive procedure: slot-bound-using-class? class obj slot-name
     Return ‘#t’ if applying ‘slot-ref-using-class’ to the same
     arguments would call the generic function ‘slot-unbound’, otherwise
     ‘#f’.

     ‘slot-bound-using-class?’ calls the generic function ‘slot-missing’
     if CLASS does not have a slot definition for a slot called
     SLOT-NAME (*note slot-missing: Accessing Slots.).

 -- primitive procedure: slot-ref-using-class class obj slot-name
     Apply the “getter” closure for the slot named SLOT-NAME in CLASS to
     OBJ, and return its result.

     ‘slot-ref-using-class’ calls the generic function ‘slot-missing’ if
     CLASS does not have a slot definition for a slot called SLOT-NAME
     (*note slot-missing: Accessing Slots.).

     ‘slot-ref-using-class’ calls the generic function ‘slot-unbound’ if
     the application of the “getter” closure to OBJ returns an unbound
     value (*note slot-unbound: Accessing Slots.).

 -- primitive procedure: slot-set-using-class! class obj slot-name value
     Apply the “setter” closure for the slot named SLOT-NAME in CLASS to
     OBJ and VALUE.

     ‘slot-set-using-class!’ calls the generic function ‘slot-missing’
     if CLASS does not have a slot definition for a slot called
     SLOT-NAME (*note slot-missing: Accessing Slots.).

   Slots whose allocation is per-class rather than per-instance can be
referenced and set without needing to specify any particular instance.

 -- procedure: class-slot-ref class slot-name
     Return the value of the slot named SLOT-NAME in class CLASS.  The
     named slot must have ‘#:class’ or ‘#:each-subclass’ allocation
     (*note allocation: Slot Options.).

     If there is no such slot with ‘#:class’ or ‘#:each-subclass’
     allocation, ‘class-slot-ref’ calls the ‘slot-missing’ generic
     function with arguments CLASS and SLOT-NAME.  Otherwise, if the
     slot value is unbound, ‘class-slot-ref’ calls the ‘slot-unbound’
     generic function, with the same arguments.

 -- procedure: class-slot-set! class slot-name value
     Set the value of the slot named SLOT-NAME in class CLASS to VALUE.
     The named slot must have ‘#:class’ or ‘#:each-subclass’ allocation
     (*note allocation: Slot Options.).

     If there is no such slot with ‘#:class’ or ‘#:each-subclass’
     allocation, ‘class-slot-ref’ calls the ‘slot-missing’ generic
     function with arguments CLASS and SLOT-NAME.

   When a ‘slot-ref’ or ‘slot-set!’ call specifies a non-existent slot
name, or tries to reference a slot whose value is unbound, GOOPS calls
one of the following generic functions.

 -- generic: slot-missing
 -- method: slot-missing (class <class>) slot-name
 -- method: slot-missing (class <class>) (object <object>) slot-name
 -- method: slot-missing (class <class>) (object <object>) slot-name
          value
     When an application attempts to reference or set a class or
     instance slot by name, and the slot name is invalid for the
     specified CLASS or OBJECT, GOOPS calls the ‘slot-missing’ generic
     function.

     The default methods all call ‘goops-error’ with an appropriate
     message.

 -- generic: slot-unbound
 -- method: slot-unbound (object <object>)
 -- method: slot-unbound (class <class>) slot-name
 -- method: slot-unbound (class <class>) (object <object>) slot-name
     When an application attempts to reference a class or instance slot,
     and the slot’s value is unbound, GOOPS calls the ‘slot-unbound’
     generic function.

     The default methods all call ‘goops-error’ with an appropriate
     message.


File: guile.info,  Node: GOOPS Error Handling,  Next: GOOPS Object Miscellany,  Prev: Introspection,  Up: GOOPS

8.9 Error Handling
==================

The procedure ‘goops-error’ is called to raise an appropriate error by
the default methods of the following generic functions:

   • ‘slot-missing’ (*note slot-missing: Accessing Slots.)

   • ‘slot-unbound’ (*note slot-unbound: Accessing Slots.)

   • ‘no-method’ (*note no-method: Handling Invocation Errors.)

   • ‘no-applicable-method’ (*note no-applicable-method: Handling
     Invocation Errors.)

   • ‘no-next-method’ (*note no-next-method: Handling Invocation
     Errors.)

   If you customize these functions for particular classes or
metaclasses, you may still want to use ‘goops-error’ to signal any error
conditions that you detect.

 -- procedure: goops-error format-string arg ...
     Raise an error with key ‘goops-error’ and error message constructed
     from FORMAT-STRING and ARG ....  Error message formatting is as
     done by ‘scm-error’.


File: guile.info,  Node: GOOPS Object Miscellany,  Next: The Metaobject Protocol,  Prev: GOOPS Error Handling,  Up: GOOPS

8.10 GOOPS Object Miscellany
============================

Here we cover some points about GOOPS objects that aren’t substantial
enough to merit sections on their own.

Object Equality
---------------

When GOOPS is loaded, ‘eqv?’, ‘equal?’ and ‘=’ become generic functions,
and you can define methods for them, specialized for your own classes,
so as to control what the various kinds of equality mean for your
classes.

   For example, the ‘assoc’ procedure, for looking up an entry in an
alist, is specified as using ‘equal?’ to determine when the car of an
entry in the alist is the same as the key parameter that ‘assoc’ is
called with.  Hence, if you had defined a new class, and wanted to use
instances of that class as the keys in an alist, you could define a
method for ‘equal?’, for your class, to control ‘assoc’’s lookup
precisely.

Cloning Objects
---------------

 -- generic: shallow-clone
 -- method: shallow-clone (self <object>)
     Return a “shallow” clone of SELF.  The default method makes a
     shallow clone by allocating a new instance and copying slot values
     from self to the new instance.  Each slot value is copied either as
     an immediate value or by reference.

 -- generic: deep-clone
 -- method: deep-clone (self <object>)
     Return a “deep” clone of SELF.  The default method makes a deep
     clone by allocating a new instance and copying or cloning slot
     values from self to the new instance.  If a slot value is an
     instance (satisfies ‘instance?’), it is cloned by calling
     ‘deep-clone’ on that value.  Other slot values are copied either as
     immediate values or by reference.

Write and Display
-----------------

 -- primitive generic: write object port
 -- primitive generic: display object port
     When GOOPS is loaded, ‘write’ and ‘display’ become generic
     functions with special methods for printing

        • objects - instances of the class ‘<object>’

        • foreign objects - instances of the class ‘<foreign-object>’

        • classes - instances of the class ‘<class>’

        • generic functions - instances of the class ‘<generic>’

        • methods - instances of the class ‘<method>’.

     ‘write’ and ‘display’ print non-GOOPS values in the same way as the
     Guile primitive ‘write’ and ‘display’ functions.

   In addition to the cases mentioned, you can of course define ‘write’
and ‘display’ methods for your own classes, to customize how instances
of those classes are printed.


File: guile.info,  Node: The Metaobject Protocol,  Next: Redefining a Class,  Prev: GOOPS Object Miscellany,  Up: GOOPS

8.11 The Metaobject Protocol
============================

At this point, we’ve said about as much as can be said about GOOPS
without having to confront the idea of the metaobject protocol.  There
are a couple more topics that could be discussed in isolation first —
class redefinition, and changing the class of existing instances — but
in practice developers using them will be advanced enough to want to
understand the metaobject protocol too, and will probably be using the
protocol to customize exactly what happens during these events.

   So let’s plunge in.  GOOPS is based on a “metaobject protocol” (aka
“MOP”) derived from the ones used in CLOS (the Common Lisp Object
System), tiny-clos (a small Scheme implementation of a subset of CLOS
functionality) and STKlos.

   The MOP underlies many possible GOOPS customizations — such as
defining an ‘initialize’ method to customize the initialization of
instances of an application-defined class — and an understanding of the
MOP makes it much easier to explain such customizations in a precise
way.  And at a deeper level, understanding the MOP is a key part of
understanding GOOPS, and of taking full advantage of GOOPS’ power, by
customizing the behaviour of GOOPS itself.

* Menu:

* Metaobjects and the Metaobject Protocol::
* Metaclasses::
* MOP Specification::
* Instance Creation Protocol::
* Class Definition Protocol::
* Customizing Class Definition::
* Method Definition::
* Method Definition Internals::
* Generic Function Internals::
* Generic Function Invocation::


File: guile.info,  Node: Metaobjects and the Metaobject Protocol,  Next: Metaclasses,  Up: The Metaobject Protocol

8.11.1 Metaobjects and the Metaobject Protocol
----------------------------------------------

The building blocks of GOOPS are classes, slot definitions, instances,
generic functions and methods.  A class is a grouping of inheritance
relations and slot definitions.  An instance is an object with slots
that are allocated following the rules implied by its class’s
superclasses and slot definitions.  A generic function is a collection
of methods and rules for determining which of those methods to apply
when the generic function is invoked.  A method is a procedure and a set
of specializers that specify the type of arguments to which the
procedure is applicable.

   Of these entities, GOOPS represents classes, generic functions and
methods as “metaobjects”.  In other words, the values in a GOOPS program
that describe classes, generic functions and methods, are themselves
instances (or “objects”) of special GOOPS classes that encapsulate the
behaviour, respectively, of classes, generic functions, and methods.

   (The other two entities are slot definitions and instances.  Slot
definitions are not strictly instances, but every slot definition is
associated with a GOOPS class that specifies the behaviour of the slot
as regards accessibility and protection from garbage collection.
Instances are of course objects in the usual sense, and there is no
benefit from thinking of them as metaobjects.)

   The “metaobject protocol” (or “MOP”) is the specification of the
generic functions which determine the behaviour of these metaobjects and
the circumstances in which these generic functions are invoked.

   For a concrete example of what this means, consider how GOOPS
calculates the set of slots for a class that is being defined using
‘define-class’.  The desired set of slots is the union of the new
class’s direct slots and the slots of all its superclasses.  But
‘define-class’ itself does not perform this calculation.  Instead, there
is a method of the ‘initialize’ generic function that is specialized for
instances of type ‘<class>’, and it is this method that performs the
slot calculation.

   ‘initialize’ is a generic function which GOOPS calls whenever a new
instance is created, immediately after allocating memory for a new
instance, in order to initialize the new instance’s slots.  The sequence
of steps is as follows.

   • ‘define-class’ uses ‘make’ to make a new instance of the ‘<class>’
     class, passing as initialization arguments the superclasses, slot
     definitions and class options that were specified in the
     ‘define-class’ form.

   • ‘make’ allocates memory for the new instance, and invokes the
     ‘initialize’ generic function to initialize the new instance’s
     slots.

   • The ‘initialize’ generic function applies the method that is
     specialized for instances of type ‘<class>’, and this method
     performs the slot calculation.

   In other words, rather than being hardcoded in ‘define-class’, the
default behaviour of class definition is encapsulated by generic
function methods that are specialized for the class ‘<class>’.

   It is possible to create a new class that inherits from ‘<class>’,
which is called a “metaclass”, and to write a new ‘initialize’ method
that is specialized for instances of the new metaclass.  Then, if the
‘define-class’ form includes a ‘#:metaclass’ class option whose value is
the new metaclass, the class that is defined by the ‘define-class’ form
will be an instance of the new metaclass rather than of the default
‘<class>’, and will be defined in accordance with the new ‘initialize’
method.  Thus the default slot calculation, as well as any other aspect
of the new class’s relationship with its superclasses, can be modified
or overridden.

   In a similar way, the behaviour of generic functions can be modified
or overridden by creating a new class that inherits from the standard
generic function class ‘<generic>’, writing appropriate methods that are
specialized to the new class, and creating new generic functions that
are instances of the new class.

   The same is true for method metaobjects.  And the same basic
mechanism allows the application class author to write an ‘initialize’
method that is specialized to their application class, to initialize
instances of that class.

   Such is the power of the MOP. Note that ‘initialize’ is just one of a
large number of generic functions that can be customized to modify the
behaviour of application objects and classes and of GOOPS itself.  Each
following section covers a particular area of GOOPS functionality, and
describes the generic functions that are relevant for customization of
that area.


File: guile.info,  Node: Metaclasses,  Next: MOP Specification,  Prev: Metaobjects and the Metaobject Protocol,  Up: The Metaobject Protocol

8.11.2 Metaclasses
------------------

A “metaclass” is the class of an object which represents a GOOPS class.
Put more succinctly, a metaclass is a class’s class.

   Most GOOPS classes have the metaclass ‘<class>’ and, by default, any
new class that is created using ‘define-class’ has the metaclass
‘<class>’.

   But what does this really mean?  To find out, let’s look in more
detail at what happens when a new class is created using ‘define-class’:

     (define-class <my-class> (<object>) . slots)

Guile expands this to something like:

     (define <my-class> (class (<object>) . slots))

which in turn expands to:

     (define <my-class>
       (make <class> #:dsupers (list <object>) #:slots slots))

   As this expansion makes clear, the resulting value of ‘<my-class>’ is
an instance of the class ‘<class>’ with slot values specifying the
superclasses and slot definitions for the class ‘<my-class>’.
(‘#:dsupers’ and ‘#:slots’ are initialization keywords for the ‘dsupers’
and ‘dslots’ slots of the ‘<class>’ class.)

   Now suppose that you want to define a new class with a metaclass
other than the default ‘<class>’.  This is done by writing:

     (define-class <my-class2> (<object>)
        slot ...
        #:metaclass <my-metaclass>)

and Guile expands _this_ to something like:

     (define <my-class2>
       (make <my-metaclass> #:dsupers (list <object>) #:slots slots))

   In this case, the value of ‘<my-class2>’ is an instance of the more
specialized class ‘<my-metaclass>’.  Note that ‘<my-metaclass>’ itself
must previously have been defined as a subclass of ‘<class>’.  For a
full discussion of when and how it is useful to define new metaclasses,
see *note MOP Specification::.

   Now let’s make an instance of ‘<my-class2>’:

     (define my-object (make <my-class2> ...))

   All of the following statements are correct expressions of the
relationships between ‘my-object’, ‘<my-class2>’, ‘<my-metaclass>’ and
‘<class>’.

   • ‘my-object’ is an instance of the class ‘<my-class2>’.

   • ‘<my-class2>’ is an instance of the class ‘<my-metaclass>’.

   • ‘<my-metaclass>’ is an instance of the class ‘<class>’.

   • The class of ‘my-object’ is ‘<my-class2>’.

   • The class of ‘<my-class2>’ is ‘<my-metaclass>’.

   • The class of ‘<my-metaclass>’ is ‘<class>’.


File: guile.info,  Node: MOP Specification,  Next: Instance Creation Protocol,  Prev: Metaclasses,  Up: The Metaobject Protocol

8.11.3 MOP Specification
------------------------

The aim of the MOP specification in this chapter is to specify all the
customizable generic function invocations that can be made by the
standard GOOPS syntax, procedures and methods, and to explain the
protocol for customizing such invocations.

   A generic function invocation is customizable if the types of the
arguments to which it is applied are not completely determined by the
lexical context in which the invocation appears.  For example, the
‘(initialize INSTANCE INITARGS)’ invocation in the default
‘make-instance’ method is customizable, because the type of the
‘INSTANCE’ argument is determined by the class that was passed to
‘make-instance’.

   (Whereas — to give a counter-example — the ‘(make <generic> #:name
',name)’ invocation in ‘define-generic’ is not customizable, because all
of its arguments have lexically determined types.)

   When using this rule to decide whether a given generic function
invocation is customizable, we ignore arguments that are expected to be
handled in method definitions as a single “rest” list argument.

   For each customizable generic function invocation, the “invocation
protocol” is explained by specifying

   • what, conceptually, the applied method is intended to do

   • what assumptions, if any, the caller makes about the applied
     method’s side effects

   • what the caller expects to get as the applied method’s return
     value.


File: guile.info,  Node: Instance Creation Protocol,  Next: Class Definition Protocol,  Prev: MOP Specification,  Up: The Metaobject Protocol

8.11.4 Instance Creation Protocol
---------------------------------

‘make <class> . INITARGS’ (method)

   • ‘allocate-instance CLASS INITARGS’ (generic)

     The applied ‘allocate-instance’ method should allocate storage for
     a new instance of class CLASS and return the uninitialized
     instance.

   • ‘initialize INSTANCE INITARGS’ (generic)

     INSTANCE is the uninitialized instance returned by
     ‘allocate-instance’.  The applied method should initialize the new
     instance in whatever sense is appropriate for its class.  The
     method’s return value is ignored.

   ‘make’ itself is a generic function.  Hence the ‘make’ invocation
itself can be customized in the case where the new instance’s metaclass
is more specialized than the default ‘<class>’, by defining a ‘make’
method that is specialized to that metaclass.

   Normally, however, the method for classes with metaclass ‘<class>’
will be applied.  This method calls two generic functions:

   • (allocate-instance CLASS .  INITARGS)

   • (initialize INSTANCE .  INITARGS)

   ‘allocate-instance’ allocates storage for and returns the new
instance, uninitialized.  You might customize ‘allocate-instance’, for
example, if you wanted to provide a GOOPS wrapper around some other
object programming system.

   To do this, you would create a specialized metaclass, which would act
as the metaclass for all classes and instances from the other system.
Then define an ‘allocate-instance’ method, specialized to that
metaclass, which calls a Guile primitive C function (or FFI code), which
in turn allocates the new instance using the interface of the other
object system.

   In this case, for a complete system, you would also need to customize
a number of other generic functions like ‘make’ and ‘initialize’, so
that GOOPS knows how to make classes from the other system, access
instance slots, and so on.

   ‘initialize’ initializes the instance that is returned by
‘allocate-instance’.  The standard GOOPS methods perform initializations
appropriate to the instance class.

   • At the least specialized level, the method for instances of type
     ‘<object>’ performs internal GOOPS instance initialization, and
     initializes the instance’s slots according to the slot definitions
     and any slot initialization keywords that appear in INITARGS.

   • The method for instances of type ‘<class>’ calls ‘(next-method)’,
     then performs the class initializations described in *note Class
     Definition Protocol::.

   • and so on for generic functions, methods, operator classes ...

   Similarly, you can customize the initialization of instances of any
application-defined class by defining an ‘initialize’ method specialized
to that class.

   Imagine a class whose instances’ slots need to be initialized at
instance creation time by querying a database.  Although it might be
possible to achieve this a combination of ‘#:init-thunk’ keywords and
closures in the slot definitions, it may be neater to write an
‘initialize’ method for the class that queries the database once and
initializes all the dependent slot values according to the results.


File: guile.info,  Node: Class Definition Protocol,  Next: Customizing Class Definition,  Prev: Instance Creation Protocol,  Up: The Metaobject Protocol

8.11.5 Class Definition Protocol
--------------------------------

Here is a summary diagram of the syntax, procedures and generic
functions that may be involved in class definition.

‘define-class’ (syntax)

   • ‘class’ (syntax)

        • ‘make-class’ (procedure)

             • ‘ensure-metaclass’ (procedure)

             • ‘make METACLASS ...’ (generic)

                  • ‘allocate-instance’ (generic)

                  • ‘initialize’ (generic)

                       • ‘compute-cpl’ (generic)

                            • ‘compute-std-cpl’ (procedure)

                       • ‘compute-slots’ (generic)

                       • ‘compute-get-n-set’ (generic)

                       • ‘compute-getter-method’ (generic)

                       • ‘compute-setter-method’ (generic)

   • ‘class-redefinition’ (generic)

        • ‘remove-class-accessors’ (generic)

        • ‘update-direct-method!’ (generic)

        • ‘update-direct-subclass!’ (generic)

   Wherever a step above is marked as “generic”, it can be customized,
and the detail shown below it is only “correct” insofar as it describes
what the default method of that generic function does.  For example, if
you write an ‘initialize’ method, for some metaclass, that does not call
‘next-method’ and does not call ‘compute-cpl’, then ‘compute-cpl’ will
not be called when a class is defined with that metaclass.

   A ‘(define-class ...)’ form (*note Class Definition::) expands to an
expression which

   • checks that it is being evaluated only at top level

   • defines any accessors that are implied by the SLOT-DEFINITIONs

   • uses ‘class’ to create the new class

   • checks for a previous class definition for NAME and, if found,
     handles the redefinition by invoking ‘class-redefinition’ (*note
     Redefining a Class::).

 -- syntax: class name (super ...) slot-definition ... class-option ...
     Return a newly created class that inherits from SUPERs, with direct
     slots defined by SLOT-DEFINITIONs and CLASS-OPTIONs.  For the
     format of SLOT-DEFINITIONs and CLASS-OPTIONs, see *note
     define-class: Class Definition.

‘class’ expands to an expression which

   • processes the class and slot definition options to check that they
     are well-formed, to convert the ‘#:init-form’ option to an
     ‘#:init-thunk’ option, to supply a default environment parameter
     (the current top-level environment) and to evaluate all the bits
     that need to be evaluated

   • calls ‘make-class’ to create the class with the processed and
     evaluated parameters.

 -- procedure: make-class supers slots class-option ...
     Return a newly created class that inherits from SUPERS, with direct
     slots defined by SLOTS and CLASS-OPTIONs.  For the format of SLOTS
     and CLASS-OPTIONs, see *note define-class: Class Definition, except
     note that for ‘make-class’, SLOTS is a separate list of slot
     definitions.

‘make-class’

   • adds ‘<object>’ to the SUPERS list if SUPERS is empty or if none of
     the classes in SUPERS have ‘<object>’ in their class precedence
     list

   • defaults the ‘#:environment’, ‘#:name’ and ‘#:metaclass’ options,
     if they are not specified by OPTIONS, to the current top-level
     environment, the unbound value, and ‘(ensure-metaclass SUPERS)’
     respectively

   • checks for duplicate classes in SUPERS and duplicate slot names in
     SLOTS, and signals an error if there are any duplicates

   • calls ‘make’, passing the metaclass as the first parameter and all
     other parameters as option keywords with values.

 -- procedure: ensure-metaclass supers env
     Return a metaclass suitable for a class that inherits from the list
     of classes in SUPERS.  The returned metaclass is the union by
     inheritance of the metaclasses of the classes in SUPERS.

     In the simplest case, where all the SUPERS are straightforward
     classes with metaclass ‘<class>’, the returned metaclass is just
     ‘<class>’.

     For a more complex example, suppose that SUPERS contained one class
     with metaclass ‘<operator-class>’ and one with metaclass
     ‘<foreign-object-class>’.  Then the returned metaclass would be a
     class that inherits from both ‘<operator-class>’ and
     ‘<foreign-object-class>’.

     If SUPERS is the empty list, ‘ensure-metaclass’ returns the default
     GOOPS metaclass ‘<class>’.

     GOOPS keeps a list of the metaclasses created by
     ‘ensure-metaclass’, so that each required type of metaclass only
     has to be created once.

     The ‘env’ parameter is ignored.

 -- generic: make metaclass initarg ...
     METACLASS is the metaclass of the class being defined, either taken
     from the ‘#:metaclass’ class option or computed by
     ‘ensure-metaclass’.  The applied method must create and return the
     fully initialized class metaobject for the new class definition.

   The ‘(make METACLASS INITARG ...)’ invocation is a particular case of
the instance creation protocol covered in the previous section.  It will
create an class metaobject with metaclass METACLASS.  By default, this
metaobject will be initialized by the ‘initialize’ method that is
specialized for instances of type ‘<class>’.

   The ‘initialize’ method for classes (signature ‘(initialize <class>
initargs)’) calls the following generic functions.

   • ‘compute-cpl CLASS’ (generic)

     The applied method should compute and return the class precedence
     list for CLASS as a list of class metaobjects.  When ‘compute-cpl’
     is called, the following CLASS metaobject slots have all been
     initialized: ‘name’, ‘direct-supers’, ‘direct-slots’,
     ‘direct-subclasses’ (empty), ‘direct-methods’.  The value returned
     by ‘compute-cpl’ will be stored in the ‘cpl’ slot.

   • ‘compute-slots CLASS’ (generic)

     The applied method should compute and return the slots (union of
     direct and inherited) for CLASS as a list of slot definitions.
     When ‘compute-slots’ is called, all the CLASS metaobject slots
     mentioned for ‘compute-cpl’ have been initialized, plus the
     following: ‘cpl’, ‘redefined’ (‘#f’), ‘environment’.  The value
     returned by ‘compute-slots’ will be stored in the ‘slots’ slot.

   • ‘compute-get-n-set CLASS SLOT-DEF’ (generic)

     ‘initialize’ calls ‘compute-get-n-set’ for each slot computed by
     ‘compute-slots’.  The applied method should compute and return a
     pair of closures that, respectively, get and set the value of the
     specified slot.  The get closure should have arity 1 and expect a
     single argument that is the instance whose slot value is to be
     retrieved.  The set closure should have arity 2 and expect two
     arguments, where the first argument is the instance whose slot
     value is to be set and the second argument is the new value for
     that slot.  The closures should be returned in a two element list:
     ‘(list GET SET)’.

     The closures returned by ‘compute-get-n-set’ are stored as part of
     the value of the CLASS metaobject’s ‘getters-n-setters’ slot.
     Specifically, the value of this slot is a list with the same number
     of elements as there are slots in the class, and each element looks
     either like

          (SLOT-NAME-SYMBOL INIT-FUNCTION . INDEX)

     or like

          (SLOT-NAME-SYMBOL INIT-FUNCTION GET SET)

     Where the get and set closures are replaced by INDEX, the slot is
     an instance slot and INDEX is the slot’s index in the underlying
     structure: GOOPS knows how to get and set the value of such slots
     and so does not need specially constructed get and set closures.
     Otherwise, GET and SET are the closures returned by
     ‘compute-get-n-set’.

     The structure of the ‘getters-n-setters’ slot value is important
     when understanding the next customizable generic functions that
     ‘initialize’ calls...

   • ‘compute-getter-method CLASS GNS’ (generic)

     ‘initialize’ calls ‘compute-getter-method’ for each of the class’s
     slots (as determined by ‘compute-slots’) that includes a ‘#:getter’
     or ‘#:accessor’ slot option.  GNS is the element of the CLASS
     metaobject’s ‘getters-n-setters’ slot that specifies how the slot
     in question is referenced and set, as described above under
     ‘compute-get-n-set’.  The applied method should create and return a
     method that is specialized for instances of type CLASS and uses the
     get closure to retrieve the slot’s value.  ‘initialize’ uses
     ‘add-method!’ to add the returned method to the generic function
     named by the slot definition’s ‘#:getter’ or ‘#:accessor’ option.

   • ‘compute-setter-method CLASS GNS’ (generic)

     ‘compute-setter-method’ is invoked with the same arguments as
     ‘compute-getter-method’, for each of the class’s slots that
     includes a ‘#:setter’ or ‘#:accessor’ slot option.  The applied
     method should create and return a method that is specialized for
     instances of type CLASS and uses the set closure to set the slot’s
     value.  ‘initialize’ then uses ‘add-method!’ to add the returned
     method to the generic function named by the slot definition’s
     ‘#:setter’ or ‘#:accessor’ option.


File: guile.info,  Node: Customizing Class Definition,  Next: Method Definition,  Prev: Class Definition Protocol,  Up: The Metaobject Protocol

8.11.6 Customizing Class Definition
-----------------------------------

If the metaclass of the new class is something more specialized than the
default ‘<class>’, then the type of CLASS in the calls above is more
specialized than ‘<class>’, and hence it becomes possible to define
generic function methods, specialized for the new class’s metaclass,
that can modify or override the default behaviour of ‘initialize’,
‘compute-cpl’ or ‘compute-get-n-set’.

   ‘compute-cpl’ computes the class precedence list (“CPL”) for the new
class (*note Class Precedence List::), and returns it as a list of class
objects.  The CPL is important because it defines a superclass ordering
that is used, when a generic function is invoked upon an instance of the
class, to decide which of the available generic function methods is the
most specific.  Hence ‘compute-cpl’ could be customized in order to
modify the CPL ordering algorithm for all classes with a special
metaclass.

   The default CPL algorithm is encapsulated by the ‘compute-std-cpl’
procedure, which is called by the default ‘compute-cpl’ method.

 -- procedure: compute-std-cpl class
     Compute and return the class precedence list for CLASS according to
     the algorithm described in *note Class Precedence List::.

   ‘compute-slots’ computes and returns a list of all slot definitions
for the new class.  By default, this list includes the direct slot
definitions from the ‘define-class’ form, plus the slot definitions that
are inherited from the new class’s superclasses.  The default
‘compute-slots’ method uses the CPL computed by ‘compute-cpl’ to
calculate this union of slot definitions, with the rule that slots
inherited from superclasses are shadowed by direct slots with the same
name.  One possible reason for customizing ‘compute-slots’ would be to
implement an alternative resolution strategy for slot name conflicts.

   ‘compute-get-n-set’ computes the low-level closures that will be used
to get and set the value of a particular slot, and returns them in a
list with two elements.

   The closures returned depend on how storage for that slot is
allocated.  The standard ‘compute-get-n-set’ method, specialized for
classes of type ‘<class>’, handles the standard GOOPS values for the
‘#:allocation’ slot option (*note allocation: Slot Options.).  By
defining a new ‘compute-get-n-set’ method for a more specialized
metaclass, it is possible to support new types of slot allocation.

   Suppose you wanted to create a large number of instances of some
class with a slot that should be shared between some but not all
instances of that class - say every 10 instances should share the same
slot storage.  The following example shows how to implement and use a
new type of slot allocation to do this.

     (define-class <batched-allocation-metaclass> (<class>))

     (let ((batch-allocation-count 0)
           (batch-get-n-set #f))
       (define-method (compute-get-n-set
                          (class <batched-allocation-metaclass>) s)
         (case (slot-definition-allocation s)
           ((#:batched)
            ;; If we've already used the same slot storage for 10 instances,
            ;; reset variables.
            (if (= batch-allocation-count 10)
                (begin
                  (set! batch-allocation-count 0)
                  (set! batch-get-n-set #f)))
            ;; If we don't have a current pair of get and set closures,
            ;; create one.  make-closure-variable returns a pair of closures
            ;; around a single Scheme variable - see goops.scm for details.
            (or batch-get-n-set
                (set! batch-get-n-set (make-closure-variable)))
            ;; Increment the batch allocation count.
            (set! batch-allocation-count (+ batch-allocation-count 1))
            batch-get-n-set)

           ;; Call next-method to handle standard allocation types.
           (else (next-method)))))

     (define-class <class-using-batched-slot> ()
       ...
       (c #:allocation #:batched)
       ...
       #:metaclass <batched-allocation-metaclass>)

   The usage of ‘compute-getter-method’ and ‘compute-setter-method’ is
described in *note Class Definition Protocol::.

   ‘compute-cpl’ and ‘compute-get-n-set’ are called by the standard
‘initialize’ method for classes whose metaclass is ‘<class>’.  But
‘initialize’ itself can also be modified, by defining an ‘initialize’
method specialized to the new class’s metaclass.  Such a method could
complete override the standard behaviour, by not calling ‘(next-method)’
at all, but more typically it would perform additional class
initialization steps before and/or after calling ‘(next-method)’ for the
standard behaviour.


File: guile.info,  Node: Method Definition,  Next: Method Definition Internals,  Prev: Customizing Class Definition,  Up: The Metaobject Protocol

8.11.7 Method Definition
------------------------

‘define-method’ (syntax)

   • ‘add-method! TARGET METHOD’ (generic)

‘define-method’ invokes the ‘add-method!’ generic function to handle
adding the new method to a variety of possible targets.  GOOPS includes
methods to handle TARGET as

   • a generic function (the most common case)

   • a procedure

   • a primitive generic (*note Extending Primitives::)

   By defining further methods for ‘add-method!’, you can theoretically
handle adding methods to further types of target.


File: guile.info,  Node: Method Definition Internals,  Next: Generic Function Internals,  Prev: Method Definition,  Up: The Metaobject Protocol

8.11.8 Method Definition Internals
----------------------------------

‘define-method’:

   • checks the form of the first parameter, and applies the following
     steps to the accessor’s setter if it has the ‘(setter ...)’ form

   • interpolates a call to ‘define-generic’ or ‘define-accessor’ if a
     generic function is not already defined with the supplied name

   • calls ‘method’ with the PARAMETERs and BODY, to make a new method
     instance

   • calls ‘add-method!’ to add this method to the relevant generic
     function.

 -- syntax: method (parameter ...) body ...
     Make a method whose specializers are defined by the classes in
     PARAMETERs and whose procedure definition is constructed from the
     PARAMETER symbols and BODY forms.

     The PARAMETER and BODY parameters should be as for ‘define-method’
     (*note define-method: Methods and Generic Functions.).

‘method’:

   • extracts formals and specializing classes from the PARAMETERs,
     defaulting the class for unspecialized parameters to ‘<top>’

   • creates a closure using the formals and the BODY forms

   • calls ‘make’ with metaclass ‘<method>’ and the specializers and
     closure using the ‘#:specializers’ and ‘#:procedure’ keywords.

 -- procedure: make-method specializers procedure
     Make a method using SPECIALIZERS and PROCEDURE.

     SPECIALIZERS should be a list of classes that specifies the
     parameter combinations to which this method will be applicable.

     PROCEDURE should be the closure that will applied to the generic
     function parameters when this method is invoked.

‘make-method’ is a simple wrapper around ‘make’ with metaclass
‘<method>’.

 -- generic: add-method! target method
     Generic function for adding method METHOD to TARGET.

 -- method: add-method! (generic <generic>) (method <method>)
     Add method METHOD to the generic function GENERIC.

 -- method: add-method! (proc <procedure>) (method <method>)
     If PROC is a procedure with generic capability (*note
     generic-capability?: Extending Primitives.), upgrade it to a
     primitive generic and add METHOD to its generic function
     definition.

 -- method: add-method! (pg <primitive-generic>) (method <method>)
     Add method METHOD to the generic function definition of PG.

     Implementation: ‘(add-method! (primitive-generic-generic pg)
     method)’.

 -- method: add-method! (whatever <top>) (method <method>)
     Raise an error indicating that WHATEVER is not a valid generic
     function.


File: guile.info,  Node: Generic Function Internals,  Next: Generic Function Invocation,  Prev: Method Definition Internals,  Up: The Metaobject Protocol

8.11.9 Generic Function Internals
---------------------------------

‘define-generic’ calls ‘ensure-generic’ to upgrade a pre-existing
procedure value, or ‘make’ with metaclass ‘<generic>’ to create a new
generic function.

   ‘define-accessor’ calls ‘ensure-accessor’ to upgrade a pre-existing
procedure value, or ‘make-accessor’ to create a new accessor.

 -- procedure: ensure-generic old-definition [name]
     Return a generic function with name NAME, if possible by using or
     upgrading OLD-DEFINITION.  If unspecified, NAME defaults to ‘#f’.

     If OLD-DEFINITION is already a generic function, it is returned
     unchanged.

     If OLD-DEFINITION is a Scheme procedure or procedure-with-setter,
     ‘ensure-generic’ returns a new generic function that uses
     OLD-DEFINITION for its default procedure and setter.

     Otherwise ‘ensure-generic’ returns a new generic function with no
     defaults and no methods.

 -- procedure: make-generic [name]
     Return a new generic function with name ‘(car NAME)’.  If
     unspecified, NAME defaults to ‘#f’.

   ‘ensure-generic’ calls ‘make’ with metaclasses ‘<generic>’ and
‘<generic-with-setter>’, depending on the previous value of the variable
that it is trying to upgrade.

   ‘make-generic’ is a simple wrapper for ‘make’ with metaclass
‘<generic>’.

 -- procedure: ensure-accessor proc [name]
     Return an accessor with name NAME, if possible by using or
     upgrading PROC.  If unspecified, NAME defaults to ‘#f’.

     If PROC is already an accessor, it is returned unchanged.

     If PROC is a Scheme procedure, procedure-with-setter or generic
     function, ‘ensure-accessor’ returns an accessor that reuses the
     reusable elements of PROC.

     Otherwise ‘ensure-accessor’ returns a new accessor with no defaults
     and no methods.

 -- procedure: make-accessor [name]
     Return a new accessor with name ‘(car NAME)’.  If unspecified, NAME
     defaults to ‘#f’.

   ‘ensure-accessor’ calls ‘make’ with metaclass
‘<generic-with-setter>’, as well as calls to ‘ensure-generic’,
‘make-accessor’ and (tail recursively) ‘ensure-accessor’.

   ‘make-accessor’ calls ‘make’ twice, first with metaclass ‘<generic>’
to create a generic function for the setter, then with metaclass
‘<generic-with-setter>’ to create the accessor, passing the setter
generic function as the value of the ‘#:setter’ keyword.


File: guile.info,  Node: Generic Function Invocation,  Prev: Generic Function Internals,  Up: The Metaobject Protocol

8.11.10 Generic Function Invocation
-----------------------------------

There is a detailed and customizable protocol involved in the process of
invoking a generic function — i.e., in the process of deciding which of
the generic function’s methods are applicable to the current arguments,
and which one of those to apply.  Here is a summary diagram of the
generic functions involved.

‘apply-generic’ (generic)

   • ‘no-method’ (generic)

   • ‘compute-applicable-methods’ (generic)

   • ‘sort-applicable-methods’ (generic)

        • ‘method-more-specific?’ (generic)

   • ‘apply-methods’ (generic)

        • ‘apply-method’ (generic)

        • ‘no-next-method’ (generic)

   • ‘no-applicable-method’

   We do not yet have full documentation for these.  Please refer to the
code (‘oop/goops.scm’) for details.


File: guile.info,  Node: Redefining a Class,  Next: Changing the Class of an Instance,  Prev: The Metaobject Protocol,  Up: GOOPS

8.12 Redefining a Class
=======================

Suppose that a class ‘<my-class>’ is defined using ‘define-class’ (*note
define-class: Class Definition.), with slots that have accessor
functions, and that an application has created several instances of
‘<my-class>’ using ‘make’ (*note make: Instance Creation.).  What then
happens if ‘<my-class>’ is redefined by calling ‘define-class’ again?

* Menu:

* Redefinable Classes::
* Default Class Redefinition Behaviour::
* Customizing Class Redefinition::


File: guile.info,  Node: Redefinable Classes,  Next: Default Class Redefinition Behaviour,  Up: Redefining a Class

8.12.1 Redefinable Classes
--------------------------

The ability for a class to be redefined is a choice for a class author
to make.  By default, classes in GOOPS are _not_ redefinable.  A
redefinable class is an instance of ‘<redefinable-class>’; that is to
say, a class with ‘<redefinable-class>’ as its metaclass.  Accordingly,
to define a redefinable class, add ‘#:metaclass <redefinable-class>’ to
its class definition:

     (define-class <foo> ()
       #:metaclass <redefinable-class>)

   Note that any subclass of ‘<foo>’ is also redefinable, without the
need to explicitly pass the ‘#:metaclass’ argument, so you only need to
specify ‘#:metaclass’ for the roots of your application’s class
hierarchy.

     (define-class <bar> (<foo>))
     (class-of <bar>) ⇒ <redefinable-class>

   Note that prior to Guile 3.0, all GOOPS classes were redefinable in
theory.  In practice, attempting to, for example, redefine ‘<class>’
itself would almost certainly not do what you want.  Still, redefinition
is an interesting capability when building long-lived resilient systems,
so GOOPS does offer this facility.


File: guile.info,  Node: Default Class Redefinition Behaviour,  Next: Customizing Class Redefinition,  Prev: Redefinable Classes,  Up: Redefining a Class

8.12.2 Default Class Redefinition Behaviour
-------------------------------------------

When a class is defined using ‘define-class’ and the class name was
previously defined, by default the new binding just replaces the old
binding.  This is the normal behavior for ‘define’.  However if both the
old and new bindings are redefinable classes (instances of
‘<redefinable-class>’), then the class will be updated in place, and its
instances lazily migrated over.

   The way that the class is updated and the way that the instances
migrate over are of course part of the meta-object protocol.  However
the default behavior usually suffices, and it goes as follows.

   • All existing direct instances of ‘<my-class>’ are converted to be
     instances of the new class.  This is achieved by preserving the
     values of slots that exist in both the old and new definitions, and
     initializing the values of new slots in the usual way (*note make:
     Instance Creation.).

   • All existing subclasses of ‘<my-class>’ are redefined, as though
     the ‘define-class’ expressions that defined them were re-evaluated
     following the redefinition of ‘<my-class>’, and the class
     redefinition process described here is applied recursively to the
     redefined subclasses.

   • Once all of its instances and subclasses have been updated, the
     class metaobject previously bound to the variable ‘<my-class>’ is
     no longer needed and so can be allowed to be garbage collected.

   To keep things tidy, GOOPS also needs to do a little housekeeping on
methods that are associated with the redefined class.

   • Slot accessor methods for slots in the old definition should be
     removed from their generic functions.  They will be replaced by
     accessor methods for the slots of the new class definition.

   • Any generic function method that uses the old ‘<my-class>’
     metaobject as one of its formal parameter specializers must be
     updated to refer to the new ‘<my-class>’ metaobject.  (Whenever a
     new generic function method is defined, ‘define-method’ adds the
     method to a list stored in the class metaobject for each class used
     as a formal parameter specializer, so it is easy to identify all
     the methods that must be updated when a class is redefined.)

   If this class redefinition strategy strikes you as rather
counter-intuitive, bear in mind that it is derived from similar
behaviour in other object systems such as CLOS, and that experience in
those systems has shown it to be very useful in practice.

   Also bear in mind that, like most of GOOPS’ default behaviour, it can
be customized...


File: guile.info,  Node: Customizing Class Redefinition,  Prev: Default Class Redefinition Behaviour,  Up: Redefining a Class

8.12.3 Customizing Class Redefinition
-------------------------------------

When ‘define-class’ notices that a class is being redefined, it
constructs the new class metaobject as usual, then invokes the
‘class-redefinition’ generic function with the old and new classes as
arguments.  Therefore, if the old or new classes have metaclasses other
than the default ‘<redefinable-class>’, class redefinition behaviour can
be customized by defining a ‘class-redefinition’ method that is
specialized for the relevant metaclasses.

 -- generic: class-redefinition
     Handle the class redefinition from OLD to NEW, and return the new
     class metaobject that should be bound to the variable specified by
     ‘define-class’’s first argument.

 -- method: class-redefinition (old <top>) (new <class>)
     Not all classes are redefinable, and not all previous bindings are
     classes.  *Note Redefinable Classes::.  This default method just
     returns NEW.

 -- method: class-redefinition (old <redefinable-class>) (new
          <redefinable-class>)
     This method implements GOOPS’ default class redefinition behaviour,
     as described in *note Default Class Redefinition Behaviour::.
     Returns the metaobject for the new class definition.

   The ‘class-redefinition’ method for classes with metaclass
‘<redefinable-class>’ calls the following generic functions, which could
of course be individually customized.

 -- generic: remove-class-accessors! old
     The default ‘remove-class-accessors!’ method removes the accessor
     methods of the old class from all classes which they specialize.

 -- generic: update-direct-method! method old new
     The default ‘update-direct-method!’ method substitutes the new
     class for the old in all methods specialized to the old class.

 -- generic: update-direct-subclass! subclass old new
     The default ‘update-direct-subclass!’ method invokes
     ‘class-redefinition’ recursively to handle the redefinition of
     subclasses.

   An alternative class redefinition strategy could be to leave all
existing instances as instances of the old class, but accepting that the
old class is now “nameless”, since its name has been taken over by the
new definition.  In this strategy, any existing subclasses could also be
left as they are, on the understanding that they inherit from a nameless
superclass.

   This strategy is easily implemented in GOOPS, by defining a new
metaclass, that will be used as the metaclass for all classes to which
the strategy should apply, and then defining a ‘class-redefinition’
method that is specialized for this metaclass:

     (define-class <can-be-nameless> (<redefinable-class>))

     (define-method (class-redefinition (old <can-be-nameless>)
                                        (new <class>))
       new)

   When customization can be as easy as this, aren’t you glad that GOOPS
implements the far more difficult strategy as its default!


File: guile.info,  Node: Changing the Class of an Instance,  Prev: Redefining a Class,  Up: GOOPS

8.13 Changing the Class of an Instance
======================================

When a redefinable class is redefined, any existing instance of the
redefined class will be modified for the new class definition before the
next time that any of the instance’s slots is referenced or set.  GOOPS
modifies each instance by calling the generic function ‘change-class’.

   More generally, you can change the class of an existing instance at
any time by invoking the generic function ‘change-class’ with two
arguments: the instance and the new class.

   The default method for ‘change-class’ decides how to implement the
change of class by looking at the slot definitions for the instance’s
existing class and for the new class.  If the new class has slots with
the same name as slots in the existing class, the values for those slots
are preserved.  Slots that are present only in the existing class are
discarded.  Slots that are present only in the new class are initialized
using the corresponding slot definition’s init function (*note
slot-init-function: Classes.).

 -- generic: change-class instance new-class

 -- method: change-class (obj <object>) (new <redefinable-class>)
     Modify instance OBJ to make it an instance of class NEW.  OBJ
     itself must already be an instance of a redefinable class.

     The value of each of OBJ’s slots is preserved only if a similarly
     named slot exists in NEW; any other slot values are discarded.

     The slots in NEW that do not correspond to any of OBJ’s
     pre-existing slots are initialized according to NEW’s slot
     definitions’ init functions.

   The default ‘change-class’ method also invokes another generic
function, ‘update-instance-for-different-class’, as the last thing that
it does before returning.  The applied
‘update-instance-for-different-class’ method can make any further
adjustments to NEW-INSTANCE that are required to complete or modify the
change of class.  The return value from the applied method is ignored.

 -- generic: update-instance-for-different-class old-instance
          new-instance
     A generic function that can be customized to put finishing touches
     to an instance whose class has just been changed.  The default
     ‘update-instance-for-different-class’ method does nothing.

   Customized change of class behaviour can be implemented by defining
‘change-class’ methods that are specialized either by the class of the
instances to be modified or by the metaclass of the new class.


File: guile.info,  Node: Guile Implementation,  Next: GNU Free Documentation License,  Prev: GOOPS,  Up: Top

9 Guile Implementation
**********************

At some point, after one has been programming in Scheme for some time,
another level of Scheme comes into view: its implementation.  Knowledge
of how Scheme can be implemented turns out to be necessary to become an
expert hacker.  As Peter Norvig notes in his retrospective on PAIP(1),
“The expert Lisp programmer eventually develops a good ‘efficiency
model’.”

   By this Norvig means that over time, the Lisp hacker eventually
develops an understanding of how much her code “costs” in terms of space
and time.

   This chapter describes Guile as an implementation of Scheme: its
history, how it represents and evaluates its data, and its compiler.
This knowledge can help you to make that step from being one who is
merely familiar with Scheme to being a real hacker.

* Menu:

* History::                          A brief history of Guile.
* Data Representation::              How Guile represents Scheme data.
* A Virtual Machine for Guile::      How compiled procedures work.
* Compiling to the Virtual Machine:: Not as hard as you might think.

   ---------- Footnotes ----------

   (1) PAIP is the common abbreviation for ‘Paradigms of Artificial
Intelligence Programming’, an old but still useful text on Lisp.
Norvig’s retrospective sums up the lessons of PAIP, and can be found at
<http://norvig.com/Lisp-retro.html>.


File: guile.info,  Node: History,  Next: Data Representation,  Up: Guile Implementation

9.1 A Brief History of Guile
============================

Guile is an artifact of historical processes, both as code and as a
community of hackers.  It is sometimes useful to know this history when
hacking the source code, to know about past decisions and future
directions.

   Of course, the real history of Guile is written by the hackers
hacking and not the writers writing, so we round up the section with a
note on current status and future directions.

* Menu:

* The Emacs Thesis::
* Early Days::
* A Scheme of Many Maintainers::
* A Timeline of Selected Guile Releases::
* Status::


File: guile.info,  Node: The Emacs Thesis,  Next: Early Days,  Up: History

9.1.1 The Emacs Thesis
----------------------

The story of Guile is the story of bringing the development experience
of Emacs to the mass of programs on a GNU system.

   Emacs, when it was first created in its GNU form in 1984, was a new
take on the problem of “how to make a program”.  The Emacs thesis is
that it is delightful to create composite programs based on an
orthogonal kernel written in a low-level language together with a
powerful, high-level extension language.

   Extension languages foster extensible programs, programs which adapt
readily to different users and to changing times.  Proof of this can be
seen in Emacs’ current and continued existence, spanning more than a
quarter-century.

   Besides providing for modification of a program by others, extension
languages are good for _intension_ as well.  Programs built in “the
Emacs way” are pleasurable and easy for their authors to flesh out with
the features that they need.

   After the Emacs experience was appreciated more widely, a number of
hackers started to consider how to spread this experience to the rest of
the GNU system.  It was clear that the easiest way to Emacsify a program
would be to embed a shared language implementation into it.


File: guile.info,  Node: Early Days,  Next: A Scheme of Many Maintainers,  Prev: The Emacs Thesis,  Up: History

9.1.2 Early Days
----------------

Tom Lord was the first to fully concentrate his efforts on an embeddable
language runtime, which he named “GEL”, the GNU Extension Language.

   GEL was the product of converting SCM, Aubrey Jaffer’s implementation
of Scheme, into something more appropriate to embedding as a library.
(SCM was itself based on an implementation by George Carrette, SIOD.)

   Lord managed to convince Richard Stallman to dub GEL the official
extension language for the GNU project.  It was a natural fit, given
that Scheme was a cleaner, more modern Lisp than Emacs Lisp.  Part of
the argument was that eventually when GEL became more capable, it could
gain the ability to execute other languages, especially Emacs Lisp.

   Due to a naming conflict with another programming language, Jim
Blandy suggested a new name for GEL: “Guile”.  Besides being a recursive
acronym, “Guile” craftily follows the naming of its ancestors,
“Planner”, “Conniver”, and “Schemer”.  (The latter was truncated to
“Scheme” due to a 6-character file name limit on an old operating
system.)  Finally, “Guile” suggests “guy-ell”, or “Guy L. Steele”, who,
together with Gerald Sussman, originally discovered Scheme.

   Around the same time that Guile (then GEL) was readying itself for
public release, another extension language was gaining in popularity,
Tcl.  Many developers found advantages in Tcl because of its shell-like
syntax and its well-developed graphical widgets library, Tk.  Also, at
the time there was a large marketing push promoting Tcl as a “universal
extension language”.

   Richard Stallman, as the primary author of GNU Emacs, had a
particular vision of what extension languages should be, and Tcl did not
seem to him to be as capable as Emacs Lisp.  He posted a criticism to
the comp.lang.tcl newsgroup, sparking one of the internet’s legendary
flamewars.  As part of these discussions, retrospectively dubbed the
“Tcl Wars”, he announced the Free Software Foundation’s intent to
promote Guile as the extension language for the GNU project.

   It is a common misconception that Guile was created as a reaction to
Tcl.  While it is true that the public announcement of Guile happened at
the same time as the “Tcl wars”, Guile was created out of a condition
that existed outside the polemic.  Indeed, the need for a powerful
language to bridge the gap between extension of existing applications
and a more fully dynamic programming environment is still with us today.


File: guile.info,  Node: A Scheme of Many Maintainers,  Next: A Timeline of Selected Guile Releases,  Prev: Early Days,  Up: History

9.1.3 A Scheme of Many Maintainers
----------------------------------

Surveying the field, it seems that Scheme implementations correspond
with their maintainers on an N-to-1 relationship.  That is to say, that
those people that implement Schemes might do so on a number of
occasions, but that the lifetime of a given Scheme is tied to the
maintainership of one individual.

   Guile is atypical in this regard.

   Tom Lord maintained Guile for its first year and a half or so,
corresponding to the end of 1994 through the middle of 1996.  The
releases made in this time constitute an arc from SCM as a standalone
program to Guile as a reusable, embeddable library, but passing through
a explosion of features: embedded Tcl and Tk, a toolchain for compiling
and disassembling Java, addition of a C-like syntax, creation of a
module system, and a start at a rich POSIX interface.

   Only some of those features remain in Guile.  There were ongoing
tensions between providing a small, embeddable language, and one which
had all of the features (e.g. a graphical toolkit) that a modern Emacs
might need.  In the end, as Guile gained in uptake, the development team
decided to focus on depth, documentation and orthogonality rather than
on breadth.  This has been the focus of Guile ever since, although there
is a wide range of third-party libraries for Guile.

   Jim Blandy presided over that period of stabilization, in the three
years until the end of 1999, when he too moved on to other projects.
Since then, Guile has had a group maintainership.  The first group was
Maciej Stachowiak, Mikael Djurfeldt, and Marius Vollmer, with Vollmer
staying on the longest.  By late 2007, Marius had mostly moved on to
other things, so Neil Jerram and Ludovic Courtès stepped up to take on
the primary maintenance responsibility.  Neil and Ludovic were joined by
Andy Wingo in late 2009, allowing Neil to step away, and Mark Weaver
joined shortly thereafter.  After spending more than 5 years in the
role, Mark stepped down as well, leaving Ludovic and Andy as the current
co-maintainers of Guile as of January 2020.

   Of course, a large part of the actual work on Guile has come from
other contributors too numerous to mention, but without whom the world
would be a poorer place.


File: guile.info,  Node: A Timeline of Selected Guile Releases,  Next: Status,  Prev: A Scheme of Many Maintainers,  Up: History

9.1.4 A Timeline of Selected Guile Releases
-------------------------------------------

guile-i — 4 February 1995
     SCM, turned into a library.

guile-ii — 6 April 1995
     A low-level module system was added.  Tcl/Tk support was added,
     allowing extension of Scheme by Tcl or vice versa.  POSIX support
     was improved, and there was an experimental stab at Java
     integration.

guile-iii — 18 August 1995
     The C-like syntax, ctax, was improved, but mostly this release
     featured a start at the task of breaking Guile into pieces.

1.0 — 5 January 1997
     ‘#f’ was distinguished from ‘'()’.  User-level, cooperative
     multi-threading was added.  Source-level debugging became more
     useful, and programmer’s and user’s manuals were begun.  The module
     system gained a high-level interface, which is still used today in
     more or less the same form.

1.1 — 16 May 1997
1.2 — 24 June 1997
     Support for Tcl/Tk and ctax were split off as separate packages,
     and have remained there since.  Guile became more compatible with
     SCSH, and more useful as a UNIX scripting language.  Libguile could
     now be built as a shared library, and third-party extensions
     written in C became loadable via dynamic linking.

1.3.0 — 19 October 1998
     Command-line editing became much more pleasant through the use of
     the readline library.  The initial support for internationalization
     via multi-byte strings was removed; 10 years were to pass before
     proper internationalization would land again.  Initial Emacs Lisp
     support landed, ports gained better support for file descriptors,
     and fluids were added.

1.3.2 — 20 August 1999
1.3.4 — 25 September 1999
1.4 — 21 June 2000
     A long list of lispy features were added: hooks, Common Lisp’s
     ‘format’, optional and keyword procedure arguments, ‘getopt-long’,
     sorting, random numbers, and many other fixes and enhancements.
     Guile also gained an interactive debugger, interactive help, and
     better backtraces.

1.6 — 6 September 2002
     Guile gained support for the R5RS standard, and added a number of
     SRFI modules.  The module system was expanded with programmatic
     support for identifier selection and renaming.  The GOOPS object
     system was merged into Guile core.

1.8 — 20 February 2006
     Guile’s arbitrary-precision arithmetic switched to use the GMP
     library, and added support for exact rationals.  Guile’s embedded
     user-space threading was removed in favor of POSIX pre-emptive
     threads, providing true multiprocessing.  Gettext support was
     added, and Guile’s C API was cleaned up and orthogonalized in a
     massive way.

2.0 — 16 February 2010
     A virtual machine was added to Guile, along with the associated
     compiler and toolchain.  Support for internationalization was
     finally reimplemented, in terms of unicode, locales, and
     libunistring.  Running Guile instances became controllable and
     debuggable from within Emacs, via Geiser.  Guile caught up to
     features found in a number of other Schemes: SRFI-18 threads,
     module-hygienic macros, a profiler, tracer, and debugger, SSAX XML
     integration, bytevectors, a dynamic FFI, delimited continuations,
     module versions, and partial support for R6RS.

2.2 — 15 March 2017
     The virtual machine and introduced in 2.0 was completely rewritten,
     along with much of the compiler and toolchain.  This speeds up many
     Guile programs as well as reducing startup time and memory usage.
     Guile’s POSIX multithreading was improved, stacks became
     dynamically expandable, the ports facility gained support for
     non-blocking I/O.

3.0 – January 2020
     Guile gained support for native code generation via a simple
     just-in-time (JIT) compiler, further improving the speed of its
     virtual machine.  The compiler itself gained a number of new
     optimizations: inlining of top-level bindings, better closure
     optimization, and better unboxing of integer and floating-point
     values.  R7RS support was added, and R6RS support improved.  The
     exception facility (throw and catch) was rewritten in terms of
     SRFI-34 exception handlers.


File: guile.info,  Node: Status,  Prev: A Timeline of Selected Guile Releases,  Up: History

9.1.5 Status, or: Your Help Needed
----------------------------------

Guile has achieved much of what it set out to achieve, but there is much
remaining to do.

   There is still the old problem of bringing existing applications into
a more Emacs-like experience.  Guile has had some successes in this
respect, but still most applications in the GNU system are without Guile
integration.

   Getting Guile to those applications takes an investment, the
“hacktivation energy” needed to wire Guile into a program that only pays
off once it is good enough to enable new kinds of behavior.  This would
be a great way for new hackers to contribute: take an application that
you use and that you know well, think of something that it can’t yet do,
and figure out a way to integrate Guile and implement that task in
Guile.

   With time, perhaps this exposure can reverse itself, whereby programs
can run under Guile instead of vice versa, eventually resulting in the
Emacsification of the entire GNU system.  Indeed, this is the reason for
the naming of the many Guile modules that live in the ‘ice-9’ namespace,
a nod to the fictional substance in Kurt Vonnegut’s novel, Cat’s Cradle,
capable of acting as a seed crystal to crystallize the mass of software.

   Implicit to this whole discussion is the idea that dynamic languages
are somehow better than languages like C. While languages like C have
their place, Guile’s take on this question is that yes, Scheme is more
expressive than C, and more fun to write.  This realization carries an
imperative with it to write as much code in Scheme as possible rather
than in other languages.

   These days it is possible to write extensible applications almost
entirely from high-level languages, through byte-code and native
compilation, speed gains in the underlying hardware, and foreign call
interfaces in the high-level language.  Smalltalk systems are like this,
as are Common Lisp-based systems.  While there already are a number of
pure-Guile applications out there, in the past users have still needed
to drop down to C for some tasks: interfacing to system libraries that
don’t have prebuilt Guile interfaces, and for some tasks requiring high
performance.  With the arrival of native code generation via a JIT
compiler in Guile 3.0, most of these older applications can now be
updated to move more C code to Scheme.

   Still, even with an all-Guile application, sometimes you want to
provide an opportunity for users to extend your program from a language
with a syntax that is closer to C, or to Python.  Another interesting
idea to consider is compiling e.g. Python to Guile.  It’s not that
far-fetched of an idea: see for example IronPython or JRuby.

   Also, there’s Emacs itself.  Guile’s Emacs Lisp support has reached
an excellent level of correctness, robustness, and speed.  However there
is still work to do to finish its integration into Emacs itself.  This
will give lots of exciting things to Emacs: native threads, a real
object system, more sophisticated types, cleaner syntax, and access to
all of the Guile extensions.

   Finally, so much of the world’s computation is performed in web
browsers that it makes sense to ask ourselves what the
Guile-on-the-web-client story is.  With the advent of WebAssembly, there
may finally be a reasonable compilation target that’s present on almost
all user-exposed devices.  Especially with the upcoming proposals to
allow for tail calls, delimited continuations, and GC-managed objects,
Scheme might once again have a place in the web browser.  Get to it!


File: guile.info,  Node: Data Representation,  Next: A Virtual Machine for Guile,  Prev: History,  Up: Guile Implementation

9.2 Data Representation
=======================

Scheme is a latently-typed language; this means that the system cannot,
in general, determine the type of a given expression at compile time.
Types only become apparent at run time.  Variables do not have fixed
types; a variable may hold a pair at one point, an integer at the next,
and a thousand-element vector later.  Instead, values, not variables,
have fixed types.

   In order to implement standard Scheme functions like ‘pair?’ and
‘string?’ and provide garbage collection, the representation of every
value must contain enough information to accurately determine its type
at run time.  Often, Scheme systems also use this information to
determine whether a program has attempted to apply an operation to an
inappropriately typed value (such as taking the ‘car’ of a string).

   Because variables, pairs, and vectors may hold values of any type,
Scheme implementations use a uniform representation for values — a
single type large enough to hold either a complete value or a pointer to
a complete value, along with the necessary typing information.

   The following sections will present a simple typing system, and then
make some refinements to correct its major weaknesses.  We then conclude
with a discussion of specific choices that Guile has made regarding
garbage collection and data representation.

* Menu:

* A Simple Representation::
* Faster Integers::
* Cheaper Pairs::
* Conservative GC::
* The SCM Type in Guile::


File: guile.info,  Node: A Simple Representation,  Next: Faster Integers,  Up: Data Representation

9.2.1 A Simple Representation
-----------------------------

The simplest way to represent Scheme values in C would be to represent
each value as a pointer to a structure containing a type indicator,
followed by a union carrying the real value.  Assuming that ‘SCM’ is the
name of our universal type, we can write:

     enum type { integer, pair, string, vector, ... };

     typedef struct value *SCM;

     struct value {
       enum type type;
       union {
         int integer;
         struct { SCM car, cdr; } pair;
         struct { int length; char *elts; } string;
         struct { int length; SCM  *elts; } vector;
         ...
       } value;
     };
   with the ellipses replaced with code for the remaining Scheme types.

   This representation is sufficient to implement all of Scheme’s
semantics.  If X is an ‘SCM’ value:
   • To test if X is an integer, we can write ‘X->type == integer’.
   • To find its value, we can write ‘X->value.integer’.
   • To test if X is a vector, we can write ‘X->type == vector’.
   • If we know X is a vector, we can write ‘X->value.vector.elts[0]’ to
     refer to its first element.
   • If we know X is a pair, we can write ‘X->value.pair.car’ to extract
     its car.


File: guile.info,  Node: Faster Integers,  Next: Cheaper Pairs,  Prev: A Simple Representation,  Up: Data Representation

9.2.2 Faster Integers
---------------------

Unfortunately, the above representation has a serious disadvantage.  In
order to return an integer, an expression must allocate a ‘struct
value’, initialize it to represent that integer, and return a pointer to
it.  Furthermore, fetching an integer’s value requires a memory
reference, which is much slower than a register reference on most
processors.  Since integers are extremely common, this representation is
too costly, in both time and space.  Integers should be very cheap to
create and manipulate.

   One possible solution comes from the observation that, on many
architectures, heap-allocated data (i.e., what you get when you call
‘malloc’) must be aligned on an eight-byte boundary.  (Whether or not
the machine actually requires it, we can write our own allocator for
‘struct value’ objects that assures this is true.)  In this case, the
lower three bits of the structure’s address are known to be zero.

   This gives us the room we need to provide an improved representation
for integers.  We make the following rules:
   • If the lower three bits of an ‘SCM’ value are zero, then the SCM
     value is a pointer to a ‘struct value’, and everything proceeds as
     before.
   • Otherwise, the ‘SCM’ value represents an integer, whose value
     appears in its upper bits.

   Here is C code implementing this convention:
     enum type { pair, string, vector, ... };

     typedef struct value *SCM;

     struct value {
       enum type type;
       union {
         struct { SCM car, cdr; } pair;
         struct { int length; char *elts; } string;
         struct { int length; SCM  *elts; } vector;
         ...
       } value;
     };

     #define POINTER_P(x) (((int) (x) & 7) == 0)
     #define INTEGER_P(x) (! POINTER_P (x))

     #define GET_INTEGER(x)  ((int) (x) >> 3)
     #define MAKE_INTEGER(x) ((SCM) (((x) << 3) | 1))

   Notice that ‘integer’ no longer appears as an element of ‘enum type’,
and the union has lost its ‘integer’ member.  Instead, we use the
‘POINTER_P’ and ‘INTEGER_P’ macros to make a coarse classification of
values into integers and non-integers, and do further type testing as
before.

   Here’s how we would answer the questions posed above (again, assume X
is an ‘SCM’ value):
   • To test if X is an integer, we can write ‘INTEGER_P (X)’.
   • To find its value, we can write ‘GET_INTEGER (X)’.
   • To test if X is a vector, we can write:
            POINTER_P (X) && X->type == vector
     Given the new representation, we must make sure X is truly a
     pointer before we dereference it to determine its complete type.
   • If we know X is a vector, we can write ‘X->value.vector.elts[0]’ to
     refer to its first element, as before.
   • If we know X is a pair, we can write ‘X->value.pair.car’ to extract
     its car, just as before.

   This representation allows us to operate more efficiently on integers
than the first.  For example, if X and Y are known to be integers, we
can compute their sum as follows:
     MAKE_INTEGER (GET_INTEGER (X) + GET_INTEGER (Y))
   Now, integer math requires no allocation or memory references.  Most
real Scheme systems actually implement addition and other operations
using an even more efficient algorithm, but this essay isn’t about
bit-twiddling.  (Hint: how do you decide when to overflow to a bignum?
How would you do it in assembly?)


File: guile.info,  Node: Cheaper Pairs,  Next: Conservative GC,  Prev: Faster Integers,  Up: Data Representation

9.2.3 Cheaper Pairs
-------------------

However, there is yet another issue to confront.  Most Scheme heaps
contain more pairs than any other type of object; Jonathan Rees said at
one point that pairs occupy 45% of the heap in his Scheme
implementation, Scheme 48.  However, our representation above spends
three ‘SCM’-sized words per pair — one for the type, and two for the CAR
and CDR.  Is there any way to represent pairs using only two words?

   Let us refine the convention we established earlier.  Let us assert
that:
   • If the bottom three bits of an ‘SCM’ value are ‘#b000’, then it is
     a pointer, as before.
   • If the bottom three bits are ‘#b001’, then the upper bits are an
     integer.  This is a bit more restrictive than before.
   • If the bottom two bits are ‘#b010’, then the value, with the bottom
     three bits masked out, is the address of a pair.

   Here is the new C code:
     enum type { string, vector, ... };

     typedef struct value *SCM;

     struct value {
       enum type type;
       union {
         struct { int length; char *elts; } string;
         struct { int length; SCM  *elts; } vector;
         ...
       } value;
     };

     struct pair {
       SCM car, cdr;
     };

     #define POINTER_P(x) (((int) (x) & 7) == 0)

     #define INTEGER_P(x)  (((int) (x) & 7) == 1)
     #define GET_INTEGER(x)  ((int) (x) >> 3)
     #define MAKE_INTEGER(x) ((SCM) (((x) << 3) | 1))

     #define PAIR_P(x) (((int) (x) & 7) == 2)
     #define GET_PAIR(x) ((struct pair *) ((int) (x) & ~7))

   Notice that ‘enum type’ and ‘struct value’ now only contain
provisions for vectors and strings; both integers and pairs have become
special cases.  The code above also assumes that an ‘int’ is large
enough to hold a pointer, which isn’t generally true.

   Our list of examples is now as follows:
   • To test if X is an integer, we can write ‘INTEGER_P (X)’; this is
     as before.
   • To find its value, we can write ‘GET_INTEGER (X)’, as before.
   • To test if X is a vector, we can write:
            POINTER_P (X) && X->type == vector
     We must still make sure that X is a pointer to a ‘struct value’
     before dereferencing it to find its type.
   • If we know X is a vector, we can write ‘X->value.vector.elts[0]’ to
     refer to its first element, as before.
   • We can write ‘PAIR_P (X)’ to determine if X is a pair, and then
     write ‘GET_PAIR (X)->car’ to refer to its car.

   This change in representation reduces our heap size by 15%.  It also
makes it cheaper to decide if a value is a pair, because no memory
references are necessary; it suffices to check the bottom two bits of
the ‘SCM’ value.  This may be significant when traversing lists, a
common activity in a Scheme system.

   Again, most real Scheme systems use a slightly different
implementation; for example, if GET_PAIR subtracts off the low bits of
‘x’, instead of masking them off, the optimizer will often be able to
combine that subtraction with the addition of the offset of the
structure member we are referencing, making a modified pointer as fast
to use as an unmodified pointer.


File: guile.info,  Node: Conservative GC,  Next: The SCM Type in Guile,  Prev: Cheaper Pairs,  Up: Data Representation

9.2.4 Conservative Garbage Collection
-------------------------------------

Aside from the latent typing, the major source of constraints on a
Scheme implementation’s data representation is the garbage collector.
The collector must be able to traverse every live object in the heap, to
determine which objects are not live, and thus collectable.

   There are many ways to implement this.  Guile’s garbage collection is
built on a library, the Boehm-Demers-Weiser conservative garbage
collector (BDW-GC). The BDW-GC “just works”, for the most part.  But
since it is interesting to know how these things work, we include here a
high-level description of what the BDW-GC does.

   Garbage collection has two logical phases: a “mark” phase, in which
the set of live objects is enumerated, and a “sweep” phase, in which
objects not traversed in the mark phase are collected.  Correct
functioning of the collector depends on being able to traverse the
entire set of live objects.

   In the mark phase, the collector scans the system’s global variables
and the local variables on the stack to determine which objects are
immediately accessible by the C code.  It then scans those objects to
find the objects they point to, and so on.  The collector logically sets
a “mark bit” on each object it finds, so each object is traversed only
once.

   When the collector can find no unmarked objects pointed to by marked
objects, it assumes that any objects that are still unmarked will never
be used by the program (since there is no path of dereferences from any
global or local variable that reaches them) and deallocates them.

   In the above paragraphs, we did not specify how the garbage collector
finds the global and local variables; as usual, there are many different
approaches.  Frequently, the programmer must maintain a list of pointers
to all global variables that refer to the heap, and another list
(adjusted upon entry to and exit from each function) of local variables,
for the collector’s benefit.

   The list of global variables is usually not too difficult to
maintain, since global variables are relatively rare.  However, an
explicitly maintained list of local variables (in the author’s personal
experience) is a nightmare to maintain.  Thus, the BDW-GC uses a
technique called “conservative garbage collection”, to make the local
variable list unnecessary.

   The trick to conservative collection is to treat the C stack as an
ordinary range of memory, and assume that _every_ word on the C stack is
a pointer into the heap.  Thus, the collector marks all objects whose
addresses appear anywhere in the C stack, without knowing for sure how
that word is meant to be interpreted.

   In addition to the stack, the BDW-GC will also scan static data
sections.  This means that global variables are also scanned when
looking for live Scheme objects.

   Obviously, such a system will occasionally retain objects that are
actually garbage, and should be freed.  In practice, this is not a
problem, as the set of conservatively-scanned locations is fixed; the
Scheme stack is maintained apart from the C stack, and is scanned
precisely (as opposed to conservatively).  The GC-managed heap is also
partitioned into parts that can contain pointers (such as vectors) and
parts that can’t (such as bytevectors), limiting the potential for
confusing a raw integer with a pointer to a live object.

   Interested readers should see the BDW-GC web page at
<http://www.hboehm.info/gc/>, for more information on conservative GC in
general and the BDW-GC implementation in particular.


File: guile.info,  Node: The SCM Type in Guile,  Prev: Conservative GC,  Up: Data Representation

9.2.5 The SCM Type in Guile
---------------------------

Guile classifies Scheme objects into two kinds: those that fit entirely
within an ‘SCM’, and those that require heap storage.

   The former class are called “immediates”.  The class of immediates
includes small integers, characters, boolean values, the empty list, the
mysterious end-of-file object, and some others.

   The remaining types are called, not surprisingly, “non-immediates”.
They include pairs, procedures, strings, vectors, and all other data
types in Guile.  For non-immediates, the ‘SCM’ word contains a pointer
to data on the heap, with further information about the object in
question is stored in that data.

   This section describes how the ‘SCM’ type is actually represented and
used at the C level.  Interested readers should see ‘libguile/scm.h’ for
an exposition of how Guile stores type information.

   In fact, there are two basic C data types to represent objects in
Guile: ‘SCM’ and ‘scm_t_bits’.

* Menu:

* Relationship Between SCM and scm_t_bits::
* Immediate Objects::
* Non-Immediate Objects::
* Allocating Heap Objects::
* Heap Object Type Information::
* Accessing Heap Object Fields::


File: guile.info,  Node: Relationship Between SCM and scm_t_bits,  Next: Immediate Objects,  Up: The SCM Type in Guile

9.2.5.1 Relationship Between ‘SCM’ and ‘scm_t_bits’
...................................................

A variable of type ‘SCM’ is guaranteed to hold a valid Scheme object.  A
variable of type ‘scm_t_bits’, on the other hand, may hold a
representation of a ‘SCM’ value as a C integral type, but may also hold
any C value, even if it does not correspond to a valid Scheme object.

   For a variable X of type ‘SCM’, the Scheme object’s type information
is stored in a form that is not directly usable.  To be able to work on
the type encoding of the scheme value, the ‘SCM’ variable has to be
transformed into the corresponding representation as a ‘scm_t_bits’
variable Y by using the ‘SCM_UNPACK’ macro.  Once this has been done,
the type of the scheme object X can be derived from the content of the
bits of the ‘scm_t_bits’ value Y, in the way illustrated by the example
earlier in this chapter (*note Cheaper Pairs::).  Conversely, a valid
bit encoding of a Scheme value as a ‘scm_t_bits’ variable can be
transformed into the corresponding ‘SCM’ value using the ‘SCM_PACK’
macro.


File: guile.info,  Node: Immediate Objects,  Next: Non-Immediate Objects,  Prev: Relationship Between SCM and scm_t_bits,  Up: The SCM Type in Guile

9.2.5.2 Immediate Objects
.........................

A Scheme object may either be an immediate, i.e. carrying all necessary
information by itself, or it may contain a reference to a “heap object”
which is, as the name implies, data on the heap.  Although in general it
should be irrelevant for user code whether an object is an immediate or
not, within Guile’s own code the distinction is sometimes of importance.
Thus, the following low level macro is provided:

 -- Macro: int SCM_IMP (SCM X)
     A Scheme object is an immediate if it fulfills the ‘SCM_IMP’
     predicate, otherwise it holds an encoded reference to a heap
     object.  The result of the predicate is delivered as a C style
     boolean value.  User code and code that extends Guile should
     normally not be required to use this macro.

Summary:
   • Given a Scheme object X of unknown type, check first with ‘SCM_IMP
     (X)’ if it is an immediate object.
   • If so, all of the type and value information can be determined from
     the ‘scm_t_bits’ value that is delivered by ‘SCM_UNPACK (X)’.

   There are a number of special values in Scheme, most of them
documented elsewhere in this manual.  It’s not quite the right place to
put them, but for now, here’s a list of the C names given to some of
these values:

 -- Macro: SCM SCM_EOL
     The Scheme empty list object, or “End Of List” object, usually
     written in Scheme as ‘'()’.

 -- Macro: SCM SCM_EOF_VAL
     The Scheme end-of-file value.  It has no standard written
     representation, for obvious reasons.

 -- Macro: SCM SCM_UNSPECIFIED
     The value returned by some (but not all) expressions that the
     Scheme standard says return an “unspecified” value.

     This is sort of a weirdly literal way to take things, but the
     standard read-eval-print loop prints nothing when the expression
     returns this value, so it’s not a bad idea to return this when you
     can’t think of anything else helpful.

 -- Macro: SCM SCM_UNDEFINED
     The “undefined” value.  Its most important property is that is not
     equal to any valid Scheme value.  This is put to various internal
     uses by C code interacting with Guile.

     For example, when you write a C function that is callable from
     Scheme and which takes optional arguments, the interpreter passes
     ‘SCM_UNDEFINED’ for any arguments you did not receive.

     We also use this to mark unbound variables.

 -- Macro: int SCM_UNBNDP (SCM X)
     Return true if X is ‘SCM_UNDEFINED’.  Note that this is not a check
     to see if X is ‘SCM_UNBOUND’.  History will not be kind to us.


File: guile.info,  Node: Non-Immediate Objects,  Next: Allocating Heap Objects,  Prev: Immediate Objects,  Up: The SCM Type in Guile

9.2.5.3 Non-Immediate Objects
.............................

A Scheme object of type ‘SCM’ that does not fulfill the ‘SCM_IMP’
predicate holds an encoded reference to a heap object.  This reference
can be decoded to a C pointer to a heap object using the
‘SCM_UNPACK_POINTER’ macro.  The encoding of a pointer to a heap object
into a ‘SCM’ value is done using the ‘SCM_PACK_POINTER’ macro.

   Before Guile 2.0, Guile had a custom garbage collector that allocated
heap objects in units of 2-word “cells”.  With the move to the BDW-GC
collector in Guile 2.0, Guile can allocate heap objects of any size, and
the concept of a cell is now obsolete.  Still, we mention it here as the
name still appears in various low-level interfaces.

 -- Macro: scm_t_bits * SCM_UNPACK_POINTER (SCM X)
 -- Macro: scm_t_cell * SCM2PTR (SCM X)
     Extract and return the heap object pointer from a non-immediate
     ‘SCM’ object X.  The name ‘SCM2PTR’ is deprecated but still common.

 -- Macro: SCM_PACK_POINTER (scm_t_bits * X)
 -- Macro: SCM PTR2SCM (scm_t_cell * X)
     Return a ‘SCM’ value that encodes a reference to the heap object
     pointer X.  The name ‘PTR2SCM’ is deprecated but still common.

   Note that it is also possible to transform a non-immediate ‘SCM’
value by using ‘SCM_UNPACK’ into a ‘scm_t_bits’ variable.  However, the
result of ‘SCM_UNPACK’ may not be used as a pointer to a heap object:
only ‘SCM_UNPACK_POINTER’ is guaranteed to transform a ‘SCM’ object into
a valid pointer to a heap object.  Also, it is not allowed to apply
‘SCM_PACK_POINTER’ to anything that is not a valid pointer to a heap
object.

Summary:
   • Only use ‘SCM_UNPACK_POINTER’ on ‘SCM’ values for which ‘SCM_IMP’
     is false!
   • Don’t use ‘(scm_t_cell *) SCM_UNPACK (X)’!  Use ‘SCM_UNPACK_POINTER
     (X)’ instead!
   • Don’t use ‘SCM_PACK_POINTER’ for anything but a heap object
     pointer!


File: guile.info,  Node: Allocating Heap Objects,  Next: Heap Object Type Information,  Prev: Non-Immediate Objects,  Up: The SCM Type in Guile

9.2.5.4 Allocating Heap Objects
...............................

Heap objects are heap-allocated data pointed to by non-immediate ‘SCM’
value.  The first word of the heap object should contain a type code.
The object may be any number of words in length, and is generally
scanned by the garbage collector for additional unless the object was
allocated using a “pointerless” allocation function.

   You should generally not need these functions, unless you are
implementing a new data type, and thoroughly understand the code in
‘<libguile/scm.h>’.

   If you just want to allocate pairs, use ‘scm_cons’.

 -- Function: SCM scm_words (scm_t_bits word_0, uint32_t n_words)
     Allocate a new heap object containing N_WORDS, and initialize the
     first slot to WORD_0, and return a non-immediate ‘SCM’ value
     encoding a pointer to the object.  Typically WORD_0 will contain
     the type tag.

   There are also deprecated but common variants of ‘scm_words’ that use
the term “cell” to indicate 2-word objects.

 -- Function: SCM scm_cell (scm_t_bits word_0, scm_t_bits word_1)
     Allocate a new 2-word heap object, initialize the two slots with
     WORD_0 and WORD_1, and return it.  Just like calling ‘scm_words
     (WORD_0, 2)’, then initializing the second slot to WORD_1.

     Note that WORD_0 and WORD_1 are of type ‘scm_t_bits’.  If you want
     to pass a ‘SCM’ object, you need to use ‘SCM_UNPACK’.

 -- Function: SCM scm_double_cell (scm_t_bits word_0, scm_t_bits word_1,
          scm_t_bits word_2, scm_t_bits word_3)
     Like ‘scm_cell’, but allocates a 4-word heap object.


File: guile.info,  Node: Heap Object Type Information,  Next: Accessing Heap Object Fields,  Prev: Allocating Heap Objects,  Up: The SCM Type in Guile

9.2.5.5 Heap Object Type Information
....................................

Heap objects contain a type tag and are followed by a number of
word-sized slots.  The interpretation of the object contents depends on
the type of the object.

 -- Macro: scm_t_bits SCM_CELL_TYPE (SCM X)
     Extract the first word of the heap object pointed to by X.  This
     value holds the information about the cell type.

 -- Macro: void SCM_SET_CELL_TYPE (SCM X, scm_t_bits T)
     For a non-immediate Scheme object X, write the value T into the
     first word of the heap object referenced by X.  The value T must
     hold a valid cell type.


File: guile.info,  Node: Accessing Heap Object Fields,  Prev: Heap Object Type Information,  Up: The SCM Type in Guile

9.2.5.6 Accessing Heap Object Fields
....................................

For a non-immediate Scheme object X, the object type can be determined
by using the ‘SCM_CELL_TYPE’ macro described in the previous section.
For each different type of heap object it is known which fields hold
tagged Scheme objects and which fields hold untagged raw data.  To
access the different fields appropriately, the following macros are
provided.

 -- Macro: scm_t_bits SCM_CELL_WORD (SCM X, unsigned int N)
 -- Macro: scm_t_bits SCM_CELL_WORD_0 (X)
 -- Macro: scm_t_bits SCM_CELL_WORD_1 (X)
 -- Macro: scm_t_bits SCM_CELL_WORD_2 (X)
 -- Macro: scm_t_bits SCM_CELL_WORD_3 (X)
     Deliver the field N of the heap object referenced by the
     non-immediate Scheme object X as raw untagged data.  Only use this
     macro for fields containing untagged data; don’t use it for fields
     containing tagged ‘SCM’ objects.

 -- Macro: SCM SCM_CELL_OBJECT (SCM X, unsigned int N)
 -- Macro: SCM SCM_CELL_OBJECT_0 (SCM X)
 -- Macro: SCM SCM_CELL_OBJECT_1 (SCM X)
 -- Macro: SCM SCM_CELL_OBJECT_2 (SCM X)
 -- Macro: SCM SCM_CELL_OBJECT_3 (SCM X)
     Deliver the field N of the heap object referenced by the
     non-immediate Scheme object X as a Scheme object.  Only use this
     macro for fields containing tagged ‘SCM’ objects; don’t use it for
     fields containing untagged data.

 -- Macro: void SCM_SET_CELL_WORD (SCM X, unsigned int N, scm_t_bits W)
 -- Macro: void SCM_SET_CELL_WORD_0 (X, W)
 -- Macro: void SCM_SET_CELL_WORD_1 (X, W)
 -- Macro: void SCM_SET_CELL_WORD_2 (X, W)
 -- Macro: void SCM_SET_CELL_WORD_3 (X, W)
     Write the raw value W into field number N of the heap object
     referenced by the non-immediate Scheme value X.  Values that are
     written into heap objects as raw values should only be read later
     using the ‘SCM_CELL_WORD’ macros.

 -- Macro: void SCM_SET_CELL_OBJECT (SCM X, unsigned int N, SCM O)
 -- Macro: void SCM_SET_CELL_OBJECT_0 (SCM X, SCM O)
 -- Macro: void SCM_SET_CELL_OBJECT_1 (SCM X, SCM O)
 -- Macro: void SCM_SET_CELL_OBJECT_2 (SCM X, SCM O)
 -- Macro: void SCM_SET_CELL_OBJECT_3 (SCM X, SCM O)
     Write the Scheme object O into field number N of the heap object
     referenced by the non-immediate Scheme value X.  Values that are
     written into heap objects as objects should only be read using the
     ‘SCM_CELL_OBJECT’ macros.

Summary:
   • For a non-immediate Scheme object X of unknown type, get the type
     information by using ‘SCM_CELL_TYPE (X)’.
   • As soon as the type information is available, only use the
     appropriate access methods to read and write data to the different
     heap object fields.
   • Note that field 0 stores the cell type information.  Generally
     speaking, other data associated with a heap object is stored
     starting from field 1.


File: guile.info,  Node: A Virtual Machine for Guile,  Next: Compiling to the Virtual Machine,  Prev: Data Representation,  Up: Guile Implementation

9.3 A Virtual Machine for Guile
===============================

Enough about data—how does Guile run code?

   Code is a grammatical production of a language.  Sometimes these
languages are implemented using interpreters: programs that run
along-side the program being interpreted, dynamically translating the
high-level code to low-level code.  Sometimes these languages are
implemented using compilers: programs that translate high-level programs
to equivalent low-level code, and pass on that low-level code to some
other language implementation.  Each of these languages can be thought
to be virtual machines: they offer programs an abstract machine on which
to run.

   Guile implements a number of interpreters and compilers on different
language levels.  For example, there is an interpreter for the Scheme
language that is itself implemented as a Scheme program compiled to a
bytecode for a low-level virtual machine shipped with Guile.  That
virtual machine is implemented by both an interpreter—a C program that
interprets the bytecodes—and a compiler—a C program that dynamically
translates bytecode programs to native machine code(1).

   This section describes the language implemented by Guile’s bytecode
virtual machine, as well as some examples of translations of Scheme
programs to Guile’s VM.

* Menu:

* Why a VM?::
* VM Concepts::
* Stack Layout::
* Variables and the VM::
* VM Programs::
* Object File Format::
* Instruction Set::
* Just-In-Time Native Code::

   ---------- Footnotes ----------

   (1) Even the lowest-level machine code can be thought to be
interpreted by the CPU, and indeed is often implemented by compiling
machine instructions to “micro-operations”.


File: guile.info,  Node: Why a VM?,  Next: VM Concepts,  Up: A Virtual Machine for Guile

9.3.1 Why a VM?
---------------

For a long time, Guile only had a Scheme interpreter, implemented in C.
Guile’s interpreter operated directly on the S-expression representation
of Scheme source code.

   But while the interpreter was highly optimized and hand-tuned, it
still performed many needless computations during the course of
evaluating a Scheme expression.  For example, application of a function
to arguments needlessly consed up the arguments in a list.  Evaluation
of an expression like ‘(f x y)’ always had to figure out whether F was a
procedure, or a special form like ‘if’, or something else.  The
interpreter represented the lexical environment as a heap data
structure, so every evaluation caused allocation, which was of course
slow.  Et cetera.

   The solution to the slow-interpreter problem was to compile the
higher-level language, Scheme, into a lower-level language for which all
of the checks and dispatching have already been done—the code is instead
stripped to the bare minimum needed to “do the job”.

   The question becomes then, what low-level language to choose?  There
are many options.  We could compile to native code directly, but that
poses portability problems for Guile, as it is a highly cross-platform
project.

   So we want the performance gains that compilation provides, but we
also want to maintain the portability benefits of a single code path.
The obvious solution is to compile to a virtual machine that is present
on all Guile installations.

   The easiest (and most fun) way to depend on a virtual machine is to
implement the virtual machine within Guile itself.  Guile contains a
bytecode interpreter (written in C) and a Scheme to bytecode compiler
(written in Scheme).  This way the virtual machine provides what Scheme
needs (tail calls, multiple values, ‘call/cc’) and can provide optimized
inline instructions for Guile as well (GC-managed allocations, type
checks, etc.).

   Guile also includes a just-in-time (JIT) compiler to translate
bytecode to native code.  Because Guile embeds a portable code
generation library (<https://gitlab.com/wingo/lightening>), we keep the
benefits of portability while also benefitting from fast native code.
To avoid too much time spent in the JIT compiler itself, Guile is tuned
to only emit machine code for bytecode that is called often.

   The rest of this section describes that VM that Guile implements, and
the compiled procedures that run on it.

   Before moving on, though, we should note that though we spoke of the
interpreter in the past tense, Guile still has an interpreter.  The
difference is that before, it was Guile’s main Scheme implementation,
and so was implemented in highly optimized C; now, it is actually
implemented in Scheme, and compiled down to VM bytecode, just like any
other program.  (There is still a C interpreter around, used to
bootstrap the compiler, but it is not normally used at runtime.)

   The upside of implementing the interpreter in Scheme is that we
preserve tail calls and multiple-value handling between interpreted and
compiled code, and with advent of the JIT compiler in Guile 3.0 we reach
the speed of the old hand-tuned C implementation; it’s the best of both
worlds.

   Also note that this decision to implement a bytecode compiler does
not preclude ahead-of-time native compilation.  More possibilities are
discussed in *note Extending the Compiler::.


File: guile.info,  Node: VM Concepts,  Next: Stack Layout,  Prev: Why a VM?,  Up: A Virtual Machine for Guile

9.3.2 VM Concepts
-----------------

The bytecode in a Scheme procedure is interpreted by a virtual machine
(VM). Each thread has its own instantiation of the VM. The virtual
machine executes the sequence of instructions in a procedure.

   Each VM instruction starts by indicating which operation it is, and
then follows by encoding its source and destination operands.  Each
procedure declares that it has some number of local variables, including
the function arguments.  These local variables form the available
operands of the procedure, and are accessed by index.

   The local variables for a procedure are stored on a stack.  Calling a
procedure typically enlarges the stack, and returning from a procedure
shrinks it.  Stack memory is exclusive to the virtual machine that owns
it.

   In addition to their stacks, virtual machines also have access to the
global memory (modules, global bindings, etc) that is shared among other
parts of Guile, including other VMs.

   The registers that a VM has are as follows:

   • ip - Instruction pointer
   • sp - Stack pointer
   • fp - Frame pointer

   In other architectures, the instruction pointer is sometimes called
the “program counter” (pc).  This set of registers is pretty typical for
virtual machines; their exact meanings in the context of Guile’s VM are
described in the next section.


File: guile.info,  Node: Stack Layout,  Next: Variables and the VM,  Prev: VM Concepts,  Up: A Virtual Machine for Guile

9.3.3 Stack Layout
------------------

The stack of Guile’s virtual machine is composed of “frames”.  Each
frame corresponds to the application of one compiled procedure, and
contains storage space for arguments, local variables, and some
bookkeeping information (such as what to do after the frame is
finished).

   While the compiler is free to do whatever it wants to, as long as the
semantics of a computation are preserved, in practice every time you
call a function, a new frame is created.  (The notable exception of
course is the tail call case, *note Tail Calls::.)

   The structure of the top stack frame is as follows:

        | ...previous frame locals...  |
        +==============================+ <- fp + 3
        | Dynamic link                 |
        +------------------------------+
        | Virtual return address (vRA) |
        +------------------------------+
        | Machine return address (mRA) |
        +==============================+ <- fp
        | Local 0                      |
        +------------------------------+
        | Local 1                      |
        +------------------------------+
        | ...                          |
        +------------------------------+
        | Local N-1                    |
        \------------------------------/ <- sp

   In the above drawing, the stack grows downward.  At the beginning of
a function call, the procedure being applied is in local 0, followed by
the arguments from local 1.  After the procedure checks that it is being
passed a compatible set of arguments, the procedure allocates some
additional space in the frame to hold variables local to the function.

   Note that once a value in a local variable slot is no longer needed,
Guile is free to re-use that slot.  This applies to the slots that were
initially used for the callee and arguments, too.  For this reason,
backtraces in Guile aren’t always able to show all of the arguments: it
could be that the slot corresponding to that argument was re-used by
some other variable.

   The “virtual return address” is the ‘ip’ that was in effect before
this program was applied.  When we return from this activation frame, we
will jump back to this ‘ip’.  Likewise, the “dynamic link” is the offset
of the ‘fp’ that was in effect before this program was applied, relative
to the current ‘fp’.

   There are two return addresses: the virtual return address (vRA), and
the machine return address (mRA). The vRA is always present and
indicates a bytecode address.  The mRA is only present when a call is
made from a function with machine code (e.g.  a function that has been
JIT-compiled).

   To prepare for a non-tail application, Guile’s VM will emit code that
shuffles the function to apply and its arguments into appropriate stack
slots, with three free slots below them.  The call then initializes
those free slots to hold the machine return address (or NULL), the
virtual return address, and the offset to the previous frame pointer
(‘fp’).  It then gets the ‘ip’ for the function being called and adjusts
‘fp’ to point to the new call frame.

   In this way, the dynamic link links the current frame to the previous
frame.  Computing a stack trace involves traversing these frames.

   Each stack local in Guile is 64 bits wide, even on 32-bit
architectures.  This allows Guile to preserve its uniform treatment of
stack locals while allowing for unboxed arithmetic on 64-bit integers
and floating-point numbers.  *Note Instruction Set::, for more on
unboxed arithmetic.

   As an implementation detail, we actually store the dynamic link as an
offset and not an absolute value because the stack can move at runtime
as it expands or during partial continuation calls.  If it were an
absolute value, we would have to walk the frames, relocating frame
pointers.


File: guile.info,  Node: Variables and the VM,  Next: VM Programs,  Prev: Stack Layout,  Up: A Virtual Machine for Guile

9.3.4 Variables and the VM
--------------------------

Consider the following Scheme code as an example:

       (define (foo a)
         (lambda (b) (vector foo a b)))

   Within the lambda expression, ‘foo’ is a top-level variable, ‘a’ is a
lexically captured variable, and ‘b’ is a local variable.

   Another way to refer to ‘a’ and ‘b’ is to say that ‘a’ is a “free”
variable, since it is not defined within the lambda, and ‘b’ is a
“bound” variable.  These are the terms used in the “lambda calculus”, a
mathematical notation for describing functions.  The lambda calculus is
useful because it is a language in which to reason precisely about
functions and variables.  It is especially good at describing scope
relations, and it is for that reason that we mention it here.

   Guile allocates all variables on the stack.  When a lexically
enclosed procedure with free variables—a “closure”—is created, it copies
those variables into its free variable vector.  References to free
variables are then redirected through the free variable vector.

   If a variable is ever ‘set!’, however, it will need to be
heap-allocated instead of stack-allocated, so that different closures
that capture the same variable can see the same value.  Also, this
allows continuations to capture a reference to the variable, instead of
to its value at one point in time.  For these reasons, ‘set!’ variables
are allocated in “boxes”—actually, in variable cells.  *Note
Variables::, for more information.  References to ‘set!’ variables are
indirected through the boxes.

   Thus perhaps counterintuitively, what would seem “closer to the
metal”, viz ‘set!’, actually forces an extra memory allocation and
indirection.  Sometimes Guile’s optimizer can remove this allocation,
but not always.

   Going back to our example, ‘b’ may be allocated on the stack, as it
is never mutated.

   ‘a’ may also be allocated on the stack, as it too is never mutated.
Within the enclosed lambda, its value will be copied into (and
referenced from) the free variables vector.

   ‘foo’ is a top-level variable, because ‘foo’ is not lexically bound
in this example.


File: guile.info,  Node: VM Programs,  Next: Object File Format,  Prev: Variables and the VM,  Up: A Virtual Machine for Guile

9.3.5 Compiled Procedures are VM Programs
-----------------------------------------

By default, when you enter in expressions at Guile’s REPL, they are
first compiled to bytecode.  Then that bytecode is executed to produce a
value.  If the expression evaluates to a procedure, the result of this
process is a compiled procedure.

   A compiled procedure is a compound object consisting of its bytecode
and a reference to any captured lexical variables.  In addition, when a
procedure is compiled, it has associated metadata written to side
tables, for instance a line number mapping, or its docstring.  You can
pick apart these pieces with the accessors in ‘(system vm program)’.
*Note Compiled Procedures::, for a full API reference.

   A procedure may reference data that was statically allocated when the
procedure was compiled.  For example, a pair of immediate objects (*note
Immediate Objects::) can be allocated directly in the memory segment
that contains the compiled bytecode, and accessed directly by the
bytecode.

   Another use for statically allocated data is to serve as a cache for
a bytecode.  Top-level variable lookups are handled in this way; the
first time a top-level binding is referenced, the resolved variable will
be stored in a cache.  Thereafter all access to the variable goes
through the cache cell.  The variable’s value may change in the future,
but the variable itself will not.

   We can see how these concepts tie together by disassembling the ‘foo’
function we defined earlier to see what is going on:

     scheme@(guile-user)> (define (foo a) (lambda (b) (vector foo a b)))
     scheme@(guile-user)> ,x foo
     Disassembly of #<procedure foo (a)> at #xf1da30:

        0    (instrument-entry 164)                                at (unknown file):5:0
        2    (assert-nargs-ee/locals 2 1)    ;; 3 slots (1 arg)
        3    (allocate-words/immediate 2 3)                        at (unknown file):5:16
        4    (load-u64 0 0 65605)
        7    (word-set!/immediate 2 0 0)
        8    (load-label 0 7)                ;; anonymous procedure at #xf1da6c
       10    (word-set!/immediate 2 1 0)
       11    (scm-set!/immediate 2 2 1)
       12    (reset-frame 1)                 ;; 1 slot
       13    (handle-interrupts)
       14    (return-values)

     ----------------------------------------
     Disassembly of anonymous procedure at #xf1da6c:

        0    (instrument-entry 183)                                at (unknown file):5:16
        2    (assert-nargs-ee/locals 2 3)    ;; 5 slots (1 arg)
        3    (static-ref 2 152)              ;; #<variable 112e530 value: #<procedure foo (a)>>
        5    (immediate-tag=? 2 7 0)         ;; heap-object?
        7    (je 19)                         ;; -> L2
        8    (static-ref 2 119)              ;; #<directory (guile-user) ca9750>
       10    (static-ref 1 127)              ;; foo
       12    (call-scm<-scm-scm 2 2 1 40)
       14    (immediate-tag=? 2 7 0)         ;; heap-object?
       16    (jne 8)                         ;; -> L1
       17    (scm-ref/immediate 0 2 1)
       18    (immediate-tag=? 0 4095 2308)   ;; undefined?
       20    (je 4)                          ;; -> L1
       21    (static-set! 2 134)             ;; #<variable 112e530 value: #<procedure foo (a)>>
       23    (j 3)                           ;; -> L2
     L1:
       24    (throw/value 1 151)             ;; #(unbound-variable #f "Unbound variable: ~S")
     L2:
       26    (scm-ref/immediate 2 2 1)
       27    (allocate-words/immediate 1 4)                        at (unknown file):5:28
       28    (load-u64 0 0 781)
       31    (word-set!/immediate 1 0 0)
       32    (scm-set!/immediate 1 1 2)
       33    (scm-ref/immediate 4 4 2)
       34    (scm-set!/immediate 1 2 4)
       35    (scm-set!/immediate 1 3 3)
       36    (mov 4 1)
       37    (reset-frame 1)                 ;; 1 slot
       38    (handle-interrupts)
       39    (return-values)

   The first thing to notice is that the bytecode is at a fairly low
level.  When a program is compiled from Scheme to bytecode, it is
expressed in terms of more primitive operations.  As such, there can be
more instructions than you might expect.

   The first chunk of instructions is the outer ‘foo’ procedure.  It is
followed by the code for the contained closure.  The code can look
daunting at first glance, but with practice it quickly becomes
comprehensible, and indeed being able to read bytecode is an important
step to understanding the low-level performance of Guile programs.

   The ‘foo’ function begins with a prelude.  The ‘instrument-entry’
bytecode increments a counter associated with the function.  If the
counter reaches a certain threshold, Guile will emit machine code
(“JIT-compile”) for ‘foo’.  Emitting machine code is fairly cheap but it
does take time, so it’s not something you want to do for every function.
Using a per-function counter and a global threshold allows Guile to
spend time JIT-compiling only the “hot” functions.

   Next in the prelude is an argument-checking instruction, which checks
that it was called with only 1 argument (plus the callee function itself
makes 2) and then reserves stack space for an additional 1 local.

   Then from ‘ip’ 3 to 11, we allocate a new closure by allocating a
three-word object, initializing its first word to store a type tag,
setting its second word to its code pointer, and finally at ‘ip’ 11,
storing local value 1 (the ‘a’ argument) into the third word (the first
free variable).

   Before returning, ‘foo’ “resets the frame” to hold only one local
(the return value), runs any pending interrupts (*note Asyncs::) and
then returns.

   Note that local variables in Guile’s virtual machine are usually
addressed relative to the stack pointer, which leads to a pleasantly
efficient ‘sp[N]’ access.  However it can make the disassembly hard to
read, because the ‘sp’ can change during the function, and because
incoming arguments are relative to the ‘fp’, not the ‘sp’.

   To know what ‘fp’-relative slot corresponds to an ‘sp’-relative
reference, scan up in the disassembly until you get to a “N slots”
annotation; in our case, 3, indicating that the frame has space for 3
slots.  Thus a zero-indexed ‘sp’-relative slot of 2 corresponds to the
‘fp’-relative slot of 0, which initially held the value of the closure
being called.  This means that Guile doesn’t need the value of the
closure to compute its result, and so slot 0 was free for re-use, in
this case for the result of making a new closure.

   A closure is code with data.  As you can see, making the closure
involved making an object (‘ip’ 3), putting a code pointer in it (‘ip’ 8
and 10), and putting in the closure’s free variable (‘ip’ 11).

   The second stanza disassembles the code for the closure.  After the
prelude, all of the code between ‘ip’ 5 and 24 is related to loading the
toplevel variable ‘foo’ into slot 1.  This lookup happens only once, and
is associated with a cache; after the first run, the value in the cache
will be a bound variable, and the code will jump from ‘ip’ 7 to 26.  On
the first run, Guile gets the module associated with the function, calls
out to a run-time routine to look up the variable, and checks that the
variable is bound before initializing the cache.  Either way, ‘ip’ 26
dereferences the variable into local 2.

   What follows is the allocation and initialization of the vector
return value.  ‘Ip’ 27 does the allocation, and the following two
instructions initialize the type-and-length tag for the object’s first
word.  ‘Ip’ 32 sets word 1 of the object (the first vector slot) to the
value of ‘foo’; ‘ip’ 33 fetches the closure variable for ‘a’, then in
‘ip’ 34 stores it in the second vector slot; and finally, in ‘ip’ 35,
local ‘b’ is stored to the third vector slot.  This is followed by the
return sequence.


File: guile.info,  Node: Object File Format,  Next: Instruction Set,  Prev: VM Programs,  Up: A Virtual Machine for Guile

9.3.6 Object File Format
------------------------

To compile a file to disk, we need a format in which to write the
compiled code to disk, and later load it into Guile.  A good “object
file format” has a number of characteristics:

   • Above all else, it should be very cheap to load a compiled file.
   • It should be possible to statically allocate constants in the file.
     For example, a bytevector literal in source code can be emitted
     directly into the object file.
   • The compiled file should enable maximum code and data sharing
     between different processes.
   • The compiled file should contain debugging information, such as
     line numbers, but that information should be separated from the
     code itself.  It should be possible to strip debugging information
     if space is tight.

   These characteristics are not specific to Scheme.  Indeed, mainstream
languages like C and C++ have solved this issue many times in the past.
Guile builds on their work by adopting ELF, the object file format of
GNU and other Unix-like systems, as its object file format.  Although
Guile uses ELF on all platforms, we do not use platform support for ELF.
Guile implements its own linker and loader.  The advantage of using ELF
is not sharing code, but sharing ideas.  ELF is simply a well-designed
object file format.

   An ELF file has two meta-tables describing its contents.  The first
meta-table is for the loader, and is called the “program table” or
sometimes the “segment table”.  The program table divides the file into
big chunks that should be treated differently by the loader.  Mostly the
difference between these “segments” is their permissions.

   Typically all segments of an ELF file are marked as read-only, except
that part that represents modifiable static data or static data that
needs load-time initialization.  Loading an ELF file is as simple as
mmapping the thing into memory with read-only permissions, then using
the segment table to mark a small sub-region of the file as writable.
This writable section is typically added to the root set of the garbage
collector as well.

   One ELF segment is marked as “dynamic”, meaning that it has data of
interest to the loader.  Guile uses this segment to record the Guile
version corresponding to this file.  There is also an entry in the
dynamic segment that points to the address of an initialization thunk
that is run to perform any needed link-time initialization.  (This is
like dynamic relocations for normal ELF shared objects, except that we
compile the relocations as a procedure instead of having the loader
interpret a table of relocations.)  Finally, the dynamic segment marks
the location of the “entry thunk” of the object file.  This thunk is
returned to the caller of ‘load-thunk-from-memory’ or
‘load-thunk-from-file’.  When called, it will execute the “body” of the
compiled expression.

   The other meta-table in an ELF file is the “section table”.  Whereas
the program table divides an ELF file into big chunks for the loader,
the section table specifies small sections for use by introspective
tools like debuggers or the like.  One segment (program table entry)
typically contains many sections.  There may be sections outside of any
segment, as well.

   Typical sections in a Guile ‘.go’ file include:

‘.rtl-text’
     Bytecode.
‘.data’
     Data that needs initialization, or which may be modified at
     runtime.
‘.rodata’
     Statically allocated data that needs no run-time initialization,
     and which therefore can be shared between processes.
‘.dynamic’
     The dynamic section, discussed above.
‘.symtab’
‘.strtab’
     A table mapping addresses in the ‘.rtl-text’ to procedure names.
     ‘.strtab’ is used by ‘.symtab’.
‘.guile.procprops’
‘.guile.arities’
‘.guile.arities.strtab’
‘.guile.docstrs’
‘.guile.docstrs.strtab’
     Side tables of procedure properties, arities, and docstrings.
‘.guile.docstrs.strtab’
     Side table of frame maps, describing the set of live slots for ever
     return point in the program text, and whether those slots are
     pointers are not.  Used by the garbage collector.
‘.debug_info’
‘.debug_abbrev’
‘.debug_str’
‘.debug_loc’
‘.debug_line’
     Debugging information, in DWARF format.  See the DWARF
     specification, for more information.
‘.shstrtab’
     Section name string table.

   For more information, see the elf(5) man page.  See the DWARF
specification (http://dwarfstd.org/) for more on the DWARF debugging
format.  Or if you are an adventurous explorer, try running ‘readelf’ or
‘objdump’ on compiled ‘.go’ files.  It’s good times!


File: guile.info,  Node: Instruction Set,  Next: Just-In-Time Native Code,  Prev: Object File Format,  Up: A Virtual Machine for Guile

9.3.7 Instruction Set
---------------------

There are currently about 150 instructions in Guile’s virtual machine.
These instructions represent atomic units of a program’s execution.
Ideally, they perform one task without conditional branches, then
dispatch to the next instruction in the stream.

   Instructions themselves are composed of 1 or more 32-bit units.  The
low 8 bits of the first word indicate the opcode, and the rest of
instruction describe the operands.  There are a number of different ways
operands can be encoded.

‘sN’
     An unsigned N-bit integer, indicating the ‘sp’-relative index of a
     local variable.
‘fN’
     An unsigned N-bit integer, indicating the ‘fp’-relative index of a
     local variable.  Used when a continuation accepts a variable number
     of values, to shuffle received values into known locations in the
     frame.
‘cN’
     An unsigned N-bit integer, indicating a constant value.
‘l24’
     An offset from the current ‘ip’, in 32-bit units, as a signed
     24-bit value.  Indicates a bytecode address, for a relative jump.
‘zi16’
‘i16’
‘i32’
     An immediate Scheme value (*note Immediate Objects::), encoded
     directly in 16 or 32 bits.  ‘zi16’ is sign-extended; the others are
     zero-extended.
‘a32’
‘b32’
     An immediate Scheme value, encoded as a pair of 32-bit words.
     ‘a32’ and ‘b32’ values always go together on the same opcode, and
     indicate the high and low bits, respectively.  Normally only used
     on 64-bit systems.
‘n32’
     A statically allocated non-immediate.  The address of the
     non-immediate is encoded as a signed 32-bit integer, and indicates
     a relative offset in 32-bit units.  Think of it as ‘SCM x = ip +
     offset’.
‘r32’
     Indirect scheme value, like ‘n32’ but indirected.  Think of it as
     ‘SCM *x = ip + offset’.
‘l32’
‘lo32’
     An ip-relative address, as a signed 32-bit integer.  Could indicate
     a bytecode address, as in ‘make-closure’, or a non-immediate
     address, as with ‘static-patch!’.

     ‘l32’ and ‘lo32’ are the same from the perspective of the virtual
     machine.  The difference is that an assembler might want to allow
     an ‘lo32’ address to be specified as a label and then some number
     of words offset from that label, for example when patching a field
     of a statically allocated object.
‘v32:x8-l24’
     Almost all VM instructions have a fixed size.  The ‘jtable’
     instruction used to perform optimized ‘case’ branches is an
     exception, which uses a ‘v32’ trailing word to indicate the number
     of additional words in the instruction, which themselves are
     encoded as ‘x8-l24’ values.
‘b1’
     A boolean value: 1 for true, otherwise 0.
‘xN’
     An ignored sequence of N bits.

   An instruction is specified by giving its name, then describing its
operands.  The operands are packed by 32-bit words, with earlier
operands occupying the lower bits.

   For example, consider the following instruction specification:

 -- Instruction: call f24:PROC x8:_ c24:NLOCALS

   The first word in the instruction will start with the 8-bit value
corresponding to the CALL opcode in the low bits, followed by PROC as a
24-bit value.  The second word starts with 8 dead bits, followed by the
index as a 24-bit immediate value.

   For instructions with operands that encode references to the stack,
the interpretation of those stack values is up to the instruction
itself.  Most instructions expect their operands to be tagged SCM values
(‘scm’ representation), but some instructions expect unboxed integers
(‘u64’ and ‘s64’ representations) or floating-point numbers (‘f64’
representation).  It is assumed that the bits for a ‘u64’ value are the
same as those for an ‘s64’ value, and that ‘s64’ values are stored in
two’s complement.

   Instructions have static types: they must receive their operands in
the format they expect.  It’s up to the compiler to ensure this is the
case.

   Unless otherwise mentioned, all operands and results are in the ‘scm’
representation.

* Menu:

* Call and Return Instructions::
* Function Prologue Instructions::
* Shuffling Instructions::
* Trampoline Instructions::
* Non-Local Control Flow Instructions::
* Instrumentation Instructions::
* Intrinsic Call Instructions::
* Constant Instructions::
* Memory Access Instructions::
* Atomic Memory Access Instructions::
* Tagging and Untagging Instructions::
* Integer Arithmetic Instructions::
* Floating-Point Arithmetic Instructions::
* Comparison Instructions::
* Branch Instructions::
* Raw Memory Access Instructions::


File: guile.info,  Node: Call and Return Instructions,  Next: Function Prologue Instructions,  Up: Instruction Set

9.3.7.1 Call and Return Instructions
....................................

As described earlier (*note Stack Layout::), Guile’s calling convention
is that arguments are passed and values returned on the stack.

   For calls, both in tail position and in non-tail position, we require
that the procedure and the arguments already be shuffled into place
before the call instruction.  “Into place” for a tail call means that
the procedure should be in slot 0, relative to the ‘fp’, and the
arguments should follow.  For a non-tail call, if the procedure is in
‘fp’-relative slot N, the arguments should follow from slot N+1, and
there should be three free slots between N-1 and N-3 in which to save
the mRA, vRA, and ‘fp’.

   Returning values is similar.  Multiple-value returns should have
values already shuffled down to start from ‘fp’-relative slot 0 before
emitting ‘return-values’.

   In both calls and returns, the ‘sp’ is used to indicate to the callee
or caller the number of arguments or return values, respectively.  After
receiving return values, it is the caller’s responsibility to “restore
the frame” by resetting the ‘sp’ to its former value.

 -- Instruction: call f24:PROC x8:_ c24:NLOCALS
     Call a procedure.  PROC is the local corresponding to a procedure.
     The three values below PROC will be overwritten by the saved call
     frame data.  The new frame will have space for NLOCALS locals: one
     for the procedure, and the rest for the arguments which should
     already have been pushed on.

     When the call returns, execution proceeds with the next
     instruction.  There may be any number of values on the return
     stack; the precise number can be had by subtracting the address of
     PROC-1 from the post-call ‘sp’.

 -- Instruction: call-label f24:PROC x8:_ c24:NLOCALS l32:LABEL
     Call a procedure in the same compilation unit.

     This instruction is just like ‘call’, except that instead of
     dereferencing PROC to find the call target, the call target is
     known to be at LABEL, a signed 32-bit offset in 32-bit units from
     the current ‘ip’.  Since PROC is not dereferenced, it may be some
     other representation of the closure.

 -- Instruction: tail-call x24:_
     Tail-call a procedure.  Requires that the procedure and all of the
     arguments have already been shuffled into position, and that the
     frame has already been reset to the number of arguments to the
     call.

 -- Instruction: tail-call-label x24:_ l32:LABEL
     Tail-call a known procedure.  As ‘call’ is to ‘call-label’,
     ‘tail-call’ is to ‘tail-call-label’.

 -- Instruction: return-values x24:_
     Return a number of values from a call frame.  The return values
     should have already been shuffled down to a contiguous array
     starting at slot 0, and the frame already reset.

 -- Instruction: receive f12:DST f12:PROC x8:_ c24:NLOCALS
     Receive a single return value from a call whose procedure was in
     PROC, asserting that the call actually returned at least one value.
     Afterwards, resets the frame to NLOCALS locals.

 -- Instruction: receive-values f24:PROC b1:ALLOW-EXTRA? x7:_
          c24:NVALUES
     Receive a return of multiple values from a call whose procedure was
     in PROC.  If fewer than NVALUES values were returned, signal an
     error.  Unless ALLOW-EXTRA? is true, require that the number of
     return values equals NVALUES exactly.  After ‘receive-values’ has
     run, the values can be copied down via ‘mov’, or used in place.


File: guile.info,  Node: Function Prologue Instructions,  Next: Shuffling Instructions,  Prev: Call and Return Instructions,  Up: Instruction Set

9.3.7.2 Function Prologue Instructions
......................................

A function call in Guile is very cheap: the VM simply hands control to
the procedure.  The procedure itself is responsible for asserting that
it has been passed an appropriate number of arguments.  This strategy
allows arbitrarily complex argument parsing idioms to be developed,
without harming the common case.

   For example, only calls to keyword-argument procedures “pay” for the
cost of parsing keyword arguments.  (At the time of this writing,
calling procedures with keyword arguments is typically two to four times
as costly as calling procedures with a fixed set of arguments.)

 -- Instruction: assert-nargs-ee c24:EXPECTED
 -- Instruction: assert-nargs-ge c24:EXPECTED
 -- Instruction: assert-nargs-le c24:EXPECTED
     If the number of actual arguments is not ‘==’, ‘>=’, or ‘<=’
     EXPECTED, respectively, signal an error.

     The number of arguments is determined by subtracting the stack
     pointer from the frame pointer (‘fp - sp’).  *Note Stack Layout::,
     for more details on stack frames.  Note that EXPECTED includes the
     procedure itself.

 -- Instruction: arguments<=? c24:EXPECTED
     Set the ‘LESS_THAN’, ‘EQUAL’, or ‘NONE’ comparison result values if
     the number of arguments is respectively less than, equal to, or
     greater than EXPECTED.

 -- Instruction: positional-arguments<=? c24:NREQ x8:_ c24:EXPECTED
     Set the ‘LESS_THAN’, ‘EQUAL’, or ‘NONE’ comparison result values if
     the number of positional arguments is respectively less than, equal
     to, or greater than EXPECTED.  The first NREQ arguments are
     positional arguments, as are the subsequent arguments that are not
     keywords.

   The ‘arguments<=?’ and ‘positional-arguments<=?’ instructions are
used to implement multiple arities, as in ‘case-lambda’.  *Note
Case-lambda::, for more information.  *Note Branch Instructions::, for
more on comparison results.

 -- Instruction: bind-kwargs c24:NREQ c8:FLAGS c24:NREQ-AND-OPT x8:_
          c24:NTOTAL n32:KW-OFFSET
     FLAGS is a bitfield, whose lowest bit is ALLOW-OTHER-KEYS, second
     bit is HAS-REST, and whose following six bits are unused.

     Find the last positional argument, and shuffle all the rest above
     NTOTAL.  Initialize the intervening locals to ‘SCM_UNDEFINED’.
     Then load the constant at KW-OFFSET words from the current IP, and
     use it and the ALLOW-OTHER-KEYS flag to bind keyword arguments.  If
     HAS-REST, collect all shuffled arguments into a list, and store it
     in NREQ-AND-OPT.  Finally, clear the arguments that we shuffled up.

     The parsing is driven by a keyword arguments association list,
     looked up using KW-OFFSET.  The alist is a list of pairs of the
     form ‘(KW . INDEX)’, mapping keyword arguments to their local slot
     indices.  Unless ‘allow-other-keys’ is set, the parser will signal
     an error if an unknown key is found.

     A macro-mega-instruction.

 -- Instruction: bind-optionals f24:NLOCALS
     Expand the current frame to have at least NLOCALS locals, filling
     in any fresh values with ‘SCM_UNDEFINED’.  If the frame has more
     than NLOCALS locals, it is left as it is.

 -- Instruction: bind-rest f24:DST
     Collect any arguments at or above DST into a list, and store that
     list at DST.

 -- Instruction: alloc-frame c24:NLOCALS
     Ensure that there is space on the stack for NLOCALS local
     variables.  The value of any new local is undefined.

 -- Instruction: reset-frame c24:NLOCALS
     Like ‘alloc-frame’, but doesn’t check that the stack is big enough,
     and doesn’t initialize values to ‘SCM_UNDEFINED’.  Used to reset
     the frame size to something less than the size that was previously
     set via alloc-frame.

 -- Instruction: assert-nargs-ee/locals c12:EXPECTED c12:NLOCALS
     Equivalent to a sequence of ‘assert-nargs-ee’ and ‘allocate-frame’.
     The number of locals reserved is EXPECTED + NLOCALS.


File: guile.info,  Node: Shuffling Instructions,  Next: Trampoline Instructions,  Prev: Function Prologue Instructions,  Up: Instruction Set

9.3.7.3 Shuffling Instructions
..............................

These instructions are used to move around values on the stack.

 -- Instruction: mov s12:DST s12:SRC
 -- Instruction: long-mov s24:DST x8:_ s24:SRC
     Copy a value from one local slot to another.

     As discussed previously, procedure arguments and local variables
     are allocated to local slots.  Guile’s compiler tries to avoid
     shuffling variables around to different slots, which often makes
     ‘mov’ instructions redundant.  However there are some cases in
     which shuffling is necessary, and in those cases, ‘mov’ is the
     thing to use.

 -- Instruction: long-fmov f24:DST x8:_ f24:SRC
     Copy a value from one local slot to another, but addressing slots
     relative to the ‘fp’ instead of the ‘sp’.  This is used when
     shuffling values into place after multiple-value returns.

 -- Instruction: push s24:SRC
     Bump the stack pointer by one word, and fill it with the value from
     slot SRC.  The offset to SRC is calculated before the stack pointer
     is adjusted.

   The ‘push’ instruction is used when another instruction is unable to
address an operand because the operand is encoded with fewer than 24
bits.  In that case, Guile’s assembler will transparently emit code that
temporarily pushes any needed operands onto the stack, emits the
original instruction to address those now-near variables, then shuffles
the result (if any) back into place.

 -- Instruction: pop s24:DST
     Pop the stack pointer, storing the value that was there in slot
     DST.  The offset to DST is calculated after the stack pointer is
     adjusted.

 -- Instruction: drop c24:COUNT
     Pop the stack pointer by COUNT words, discarding any values that
     were stored there.

 -- Instruction: shuffle-down f12:FROM f12:TO
     Shuffle down values from FROM to TO, reducing the frame size by
     FROM-TO slots.  Part of the internal implementation of
     ‘call-with-values’, ‘values’, and ‘apply’.

 -- Instruction: expand-apply-argument x24:_
     Take the last local in a frame and expand it out onto the stack, as
     for the last argument to ‘apply’.


File: guile.info,  Node: Trampoline Instructions,  Next: Non-Local Control Flow Instructions,  Prev: Shuffling Instructions,  Up: Instruction Set

9.3.7.4 Trampoline Instructions
...............................

Though most applicable objects in Guile are procedures implemented in
bytecode, not all are.  There are primitives, continuations, and other
procedure-like objects that have their own calling convention.  Instead
of adding special cases to the ‘call’ instruction, Guile wraps these
other applicable objects in VM trampoline procedures, then provides
special support for these objects in bytecode.

   Trampoline procedures are typically generated by Guile at runtime,
for example in response to a call to ‘scm_c_make_gsubr’.  As such, a
compiler probably shouldn’t emit code with these instructions.  However,
it’s still interesting to know how these things work, so we document
these trampoline instructions here.

 -- Instruction: subr-call c24:IDX
     Call a subr, passing all locals in this frame as arguments, and
     storing the results on the stack, ready to be returned.

 -- Instruction: foreign-call c12:CIF-IDX c12:PTR-IDX
     Call a foreign function.  Fetch the CIF and foreign pointer from
     CIF-IDX and PTR-IDX closure slots of the callee.  Arguments are
     taken from the stack, and results placed on the stack, ready to be
     returned.

 -- Instruction: builtin-ref s12:DST c12:IDX
     Load a builtin stub by index into DST.


File: guile.info,  Node: Non-Local Control Flow Instructions,  Next: Instrumentation Instructions,  Prev: Trampoline Instructions,  Up: Instruction Set

9.3.7.5 Non-Local Control Flow Instructions
...........................................

 -- Instruction: capture-continuation s24:DST
     Capture the current continuation, and write it to DST.  Part of the
     implementation of ‘call/cc’.

 -- Instruction: continuation-call c24:CONTREGS
     Return to a continuation, nonlocally.  The arguments to the
     continuation are taken from the stack.  CONTREGS is a free variable
     containing the reified continuation.

 -- Instruction: abort x24:_
     Abort to a prompt handler.  The tag is expected in slot 1, and the
     rest of the values in the frame are returned to the prompt handler.
     This corresponds to a tail application of ‘abort-to-prompt’.

     If no prompt can be found in the dynamic environment with the given
     tag, an error is signalled.  Otherwise all arguments are passed to
     the prompt’s handler, along with the captured continuation, if
     necessary.

     If the prompt’s handler can be proven to not reference the captured
     continuation, no continuation is allocated.  This decision happens
     dynamically, at run-time; the general case is that the continuation
     may be captured, and thus resumed.  A reinstated continuation will
     have its arguments pushed on the stack from slot 0, as if from a
     multiple-value return, and control resumes in the caller.  Thus to
     the calling function, a call to ‘abort-to-prompt’ looks like any
     other function call.

 -- Instruction: compose-continuation c24:CONT
     Compose a partial continuation with the current continuation.  The
     arguments to the continuation are taken from the stack.  CONT is a
     free variable containing the reified continuation.

 -- Instruction: prompt s24:TAG b1:ESCAPE-ONLY? x7:_ f24:PROC-SLOT x8:_
          l24:HANDLER-OFFSET
     Push a new prompt on the dynamic stack, with a tag from TAG and a
     handler at HANDLER-OFFSET words from the current IP.

     If an abort is made to this prompt, control will jump to the
     handler.  The handler will expect a multiple-value return as if
     from a call with the procedure at PROC-SLOT, with the reified
     partial continuation as the first argument, followed by the values
     returned to the handler.  If control returns to the handler, the
     prompt is already popped off by the abort mechanism.  (Guile’s
     ‘prompt’ implements Felleisen’s “–F–” operator.)

     If ESCAPE-ONLY? is nonzero, the prompt will be marked as
     escape-only, which allows an abort to this prompt to avoid reifying
     the continuation.

     *Note Prompts::, for more information on prompts.

 -- Instruction: throw s12:KEY s12:ARGS
     Raise an error by throwing to KEY and ARGS.  ARGS should be a list.

 -- Instruction: throw/value s24:VALUE n32:KEY-SUBR-AND-MESSAGE
 -- Instruction: throw/value+data s24:VALUE n32:KEY-SUBR-AND-MESSAGE
     Raise an error, indicating VAL as the bad value.
     KEY-SUBR-AND-MESSAGE should be a vector, where the first element is
     the symbol to which to throw, the second is the procedure in which
     to signal the error (a string) or ‘#f’, and the third is a format
     string for the message, with one template.  These instructions do
     not fall through.

     Both of these instructions throw to a key with four arguments: the
     procedure that indicates the error (or ‘#f’, the format string, a
     list with VALUE, and either ‘#f’ or the list with VALUE as the last
     argument respectively.


File: guile.info,  Node: Instrumentation Instructions,  Next: Intrinsic Call Instructions,  Prev: Non-Local Control Flow Instructions,  Up: Instruction Set

9.3.7.6 Instrumentation Instructions
....................................

 -- Instruction: instrument-entry x24__ n32:DATA
 -- Instruction: instrument-loop x24__ n32:DATA
     Increase execution counter for this function and potentially tier
     up to the next JIT level.  DATA is an offset to a structure
     recording execution counts and the next-level JIT code
     corresponding to this function.  The increment values are currently
     30 for ‘instrument-entry’ and 2 for ‘instrument-loop’.

     ‘instrument-entry’ will also run the apply hook, if VM hooks are
     enabled.

 -- Instruction: handle-interrupts x24:_
     Handle pending asynchronous interrupts (asyncs).  *Note Asyncs::.
     The compiler inserts ‘handle-interrupts’ instructions before any
     call, return, or loop back-edge.

 -- Instruction: return-from-interrupt x24:_
     A special instruction to return from a call and also pop off the
     stack frame from the call.  Used when returning from asynchronous
     interrupts.


File: guile.info,  Node: Intrinsic Call Instructions,  Next: Constant Instructions,  Prev: Instrumentation Instructions,  Up: Instruction Set

9.3.7.7 Intrinsic Call Instructions
...................................

Guile’s instruction set is low-level.  This is good because the separate
components of, say, a ‘vector-ref’ operation might be able to be
optimized out, leaving only the operations that need to be performed at
run-time.

   However some macro-operations may need to perform large amounts of
computation at run-time to handle all the edge cases, and whose
micro-operation components aren’t amenable to optimization.
Residualizing code for the entire macro-operation would lead to code
bloat with no benefit.

   In this kind of a case, Guile’s VM calls out to “intrinsics”:
run-time routines written in the host language (currently C, possibly
more in the future if Guile gains more run-time targets like
WebAssembly).  There is one instruction for each instrinsic prototype;
the intrinsic is specified by index in the instruction.

 -- Instruction: call-thread x24:_ c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing the
     current ‘scm_thread*’ as the argument.

 -- Instruction: call-thread-scm s24:A c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing the
     current ‘scm_thread*’ and the ‘scm’ local A as arguments.

 -- Instruction: call-thread-scm-scm s12:A s12:B c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing the
     current ‘scm_thread*’ and the ‘scm’ locals A and B as arguments.

 -- Instruction: call-scm-sz-u32 s12:A s12:B c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing the
     locals A, B, and C as arguments.  A is a ‘scm’ value, while B and C
     are raw ‘u64’ values which fit into ‘size_t’ and ‘uint32_t’ types,
     respectively.

 -- Instruction: call-scm<-u64 s24:DST c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing the
     current ‘scm_thread*’ as the argument.  Place the result in DST.

 -- Instruction: call-scm<-u64 s12:DST s12:A c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘u64’
     local A as the argument.  Place the result in DST.

 -- Instruction: call-scm<-s64 s12:DST s12:A c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘s64’
     local A as the argument.  Place the result in DST.

 -- Instruction: call-scm<-scm s12:DST s12:A c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘scm’
     local A as the argument.  Place the result in DST.

 -- Instruction: call-u64<-scm s12:DST s12:A c32:IDX
     Call the ‘uint64_t’-returning instrinsic with index IDX, passing
     ‘scm’ local A as the argument.  Place the ‘u64’ result in DST.

 -- Instruction: call-s64<-scm s12:DST s12:A c32:IDX
     Call the ‘int64_t’-returning instrinsic with index IDX, passing
     ‘scm’ local A as the argument.  Place the ‘s64’ result in DST.

 -- Instruction: call-f64<-scm s12:DST s12:A c32:IDX
     Call the ‘double’-returning instrinsic with index IDX, passing
     ‘scm’ local A as the argument.  Place the ‘f64’ result in DST.

 -- Instruction: call-scm<-scm-scm s8:DST s8:A s8:B c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘scm’
     locals A and B as arguments.  Place the ‘scm’ result in DST.

 -- Instruction: call-scm<-scm-uimm s8:DST s8:A c8:B c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘scm’
     local A and ‘uint8_t’ immediate B as arguments.  Place the ‘scm’
     result in DST.

 -- Instruction: call-scm<-thread-scm s12:DST s12:A c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing the
     current ‘scm_thread*’ and ‘scm’ local A as arguments.  Place the
     ‘scm’ result in DST.

 -- Instruction: call-scm<-scm-u64 s8:DST s8:A s8:B c32:IDX
     Call the ‘SCM’-returning instrinsic with index IDX, passing ‘scm’
     local A and ‘u64’ local B as arguments.  Place the ‘scm’ result in
     DST.

 -- Instruction: call-scm-scm s12:A s12:B c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing ‘scm’
     locals A and B as arguments.

 -- Instruction: call-scm-scm-scm s8:A s8:B s8:C c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing ‘scm’
     locals A, B, and C as arguments.

 -- Instruction: call-scm-uimm-scm s8:A c8:B s8:C c32:IDX
     Call the ‘void’-returning instrinsic with index IDX, passing ‘scm’
     local A, ‘uint8_t’ immediate B, and ‘scm’ local C as arguments.

   There are corresponding macro-instructions for specific intrinsics.
These are equivalent to ‘call-INSTRINSIC-KIND’ instructions with the
appropriate intrinsic IDX arguments.

 -- Macro Instruction: add dst a b
 -- Macro Instruction: add/immediate dst a b/imm
     Add ‘SCM’ values A and B and place the result in DST.
 -- Macro Instruction: sub dst a b
 -- Macro Instruction: sub/immediate dst a b/imm
     Subtract ‘SCM’ value B from A and place the result in DST.
 -- Macro Instruction: mul dst a b
     Multiply ‘SCM’ values A and B and place the result in DST.
 -- Macro Instruction: div dst a b
     Divide ‘SCM’ value A by B and place the result in DST.
 -- Macro Instruction: quo dst a b
     Compute the quotient of ‘SCM’ values A and B and place the result
     in DST.
 -- Macro Instruction: rem dst a b
     Compute the remainder of ‘SCM’ values A and B and place the result
     in DST.
 -- Macro Instruction: mod dst a b
     Compute the modulo of ‘SCM’ value A by B and place the result in
     DST.
 -- Macro Instruction: logand dst a b
     Compute the bitwise ‘and’ of ‘SCM’ values A and B and place the
     result in DST.
 -- Macro Instruction: logior dst a b
     Compute the bitwise inclusive ‘or’ of ‘SCM’ values A and B and
     place the result in DST.
 -- Macro Instruction: logxor dst a b
     Compute the bitwise exclusive ‘or’ of ‘SCM’ values A and B and
     place the result in DST.
 -- Macro Instruction: logsub dst a b
     Compute the bitwise ‘and’ of ‘SCM’ value A and the bitwise ‘not’ of
     B and place the result in DST.
 -- Macro Instruction: lsh dst a b
 -- Macro Instruction: lsh/immediate a b/imm
     Shift ‘SCM’ value A left by ‘u64’ value B bits and place the result
     in DST.
 -- Macro Instruction: rsh dst a b
 -- Macro Instruction: rsh/immediate dst a b/imm
     Shifts ‘SCM’ value A right by ‘u64’ value B bits and place the
     result in DST.
 -- Macro Instruction: scm->f64 dst src
     Convert SRC to an unboxed ‘f64’ and place the result in DST, or
     raises an error if SRC is not a real number.
 -- Macro Instruction: scm->u64 dst src
     Convert SRC to an unboxed ‘u64’ and place the result in DST, or
     raises an error if SRC is not an integer within range.
 -- Macro Instruction: scm->u64/truncate dst src
     Convert SRC to an unboxed ‘u64’ and place the result in DST,
     truncating to the low 64 bits, or raises an error if SRC is not an
     integer.
 -- Macro Instruction: scm->s64 dst src
     Convert SRC to an unboxed ‘s64’ and place the result in DST, or
     raises an error if SRC is not an integer within range.
 -- Macro Instruction: u64->scm dst src
     Convert U64 value SRC to a Scheme integer in DST.
 -- Macro Instruction: s64->scm scm<-s64
     Convert S64 value SRC to a Scheme integer in DST.
 -- Macro Instruction: string-set! str idx ch
     Sets the character IDX (a ‘u64’) of string STR to CH (a ‘u64’ that
     is a valid character value).
 -- Macro Instruction: string->number dst src
     Call ‘string->number’ on SRC and place the result in DST.
 -- Macro Instruction: string->symbol dst src
     Call ‘string->symbol’ on SRC and place the result in DST.
 -- Macro Instruction: symbol->keyword dst src
     Call ‘symbol->keyword’ on SRC and place the result in DST.
 -- Macro Instruction: class-of dst src
     Set DST to the GOOPS class of ‘src’.
 -- Macro Instruction: wind winder unwinder
     Push wind and unwind procedures onto the dynamic stack.  Note that
     neither are actually called; the compiler should emit calls to
     WINDER and UNWINDER for the normal dynamic-wind control flow.  Also
     note that the compiler should have inserted checks that WINDER and
     UNWINDER are thunks, if it could not prove that to be the case.
     *Note Dynamic Wind::.
 -- Macro Instruction: unwind
     Exit from the dynamic extent of an expression, popping the top
     entry off of the dynamic stack.
 -- Macro Instruction: push-fluid fluid value
     Dynamically bind VALUE to FLUID by creating a with-fluids object,
     pushing that object on the dynamic stack.  *Note Fluids and Dynamic
     States::.
 -- Macro Instruction: pop-fluid
     Leave the dynamic extent of a ‘with-fluid*’ expression, restoring
     the fluid to its previous value.  ‘push-fluid’ should always be
     balanced with ‘pop-fluid’.
 -- Macro Instruction: fluid-ref dst fluid
     Place the value associated with the fluid FLUID in DST.
 -- Macro Instruction: fluid-set! fluid value
     Set the value of the fluid FLUID to VALUE.
 -- Macro Instruction: push-dynamic-state state
     Save the current set of fluid bindings on the dynamic stack and
     instate the bindings from STATE instead.  *Note Fluids and Dynamic
     States::.
 -- Macro Instruction: pop-dynamic-state
     Restore a saved set of fluid bindings from the dynamic stack.
     ‘push-dynamic-state’ should always be balanced with
     ‘pop-dynamic-state’.
 -- Macro Instruction: resolve-module dst name public?
     Look up the module named NAME, resolve its public interface if the
     immediate operand PUBLIC? is true, then place the result in DST.
 -- Macro Instruction: lookup dst mod sym
     Look up SYM in module MOD, placing the resulting variable (or ‘#f’
     if not found) in DST.
 -- Macro Instruction: define! dst mod sym
     Look up SYM in module MOD, placing the resulting variable in DST,
     creating the variable if needed.
 -- Macro Instruction: current-module dst
     Set DST to the current module.
 -- Macro Instruction: $car dst src
 -- Macro Instruction: $cdr dst src
 -- Macro Instruction: $set-car! x val
 -- Macro Instruction: $set-cdr! x val
 -- Macro Instruction: $variable-ref dst src
 -- Macro Instruction: $variable-set! x val
 -- Macro Instruction: $vector-length dst x
 -- Macro Instruction: $vector-ref dst x idx
 -- Macro Instruction: $vector-ref/immediate dst x idx/imm
 -- Macro Instruction: $vector-set! x idx v
 -- Macro Instruction: $vector-set!/immediate x idx/imm v
 -- Macro Instruction: $allocate-struct dst vtable nwords
 -- Macro Instruction: $struct-vtable dst src
 -- Macro Instruction: $struct-ref dst src idx
 -- Macro Instruction: $struct-ref/immediate dst src idx/imm
 -- Macro Instruction: $struct-set! x idx v
 -- Macro Instruction: $struct-set!/immediate x idx/imm v
     Intrinsics for use by the baseline compiler.  The usual strategy
     for CPS compilation is to expose the component parts of e.g.
     ‘vector-ref’ so that the compiler can learn from them and eliminate
     needless bits.  However in the non-optimizing baseline compiler,
     that’s just overhead, so we have some intrinsics that encapsulate
     all the usual type checks.


File: guile.info,  Node: Constant Instructions,  Next: Memory Access Instructions,  Prev: Intrinsic Call Instructions,  Up: Instruction Set

9.3.7.8 Constant Instructions
.............................

The following instructions load literal data into a program.  There are
two kinds.

   The first set of instructions loads immediate values.  These
instructions encode the immediate directly into the instruction stream.

 -- Instruction: make-immediate s8:DST zi16:LOW-BITS
     Make an immediate whose low bits are LOW-BITS, sign-extended.

 -- Instruction: make-short-immediate s8:DST i16:LOW-BITS
     Make an immediate whose low bits are LOW-BITS, and whose top bits
     are 0.

 -- Instruction: make-long-immediate s24:DST i32:LOW-BITS
     Make an immediate whose low bits are LOW-BITS, and whose top bits
     are 0.

 -- Instruction: make-long-long-immediate s24:DST a32:HIGH-BITS
          b32:LOW-BITS
     Make an immediate with HIGH-BITS and LOW-BITS.

   Non-immediate constant literals are referenced either directly or
indirectly.  For example, Guile knows at compile-time what the layout of
a string will be like, and arranges to embed that object directly in the
compiled image.  A reference to a string will use ‘make-non-immediate’
to treat a pointer into the compilation unit as a ‘scm’ value directly.

 -- Instruction: make-non-immediate s24:DST n32:OFFSET
     Load a pointer to statically allocated memory into DST.  The
     object’s memory will be found OFFSET 32-bit words away from the
     current instruction pointer.  Whether the object is mutable or
     immutable depends on where it was allocated by the compiler, and
     loaded by the loader.

   Sometimes you need to load up a code pointer into a register; for
this, use ‘load-label’.

 -- Instruction: load-label s24:DST l32:OFFSET
     Load a label OFFSET words away from the current ‘ip’ and write it
     to DST.  OFFSET is a signed 32-bit integer.

   Finally, Guile supports a number of unboxed data types, with their
associate constant loaders.

 -- Instruction: load-f64 s24:DST au32:HIGH-BITS au32:LOW-BITS
     Load a double-precision floating-point value formed by joining
     HIGH-BITS and LOW-BITS, and write it to DST.

 -- Instruction: load-u64 s24:DST au32:HIGH-BITS au32:LOW-BITS
     Load an unsigned 64-bit integer formed by joining HIGH-BITS and
     LOW-BITS, and write it to DST.

 -- Instruction: load-s64 s24:DST au32:HIGH-BITS au32:LOW-BITS
     Load a signed 64-bit integer formed by joining HIGH-BITS and
     LOW-BITS, and write it to DST.

   Some objects must be unique across the whole system.  This is the
case for symbols and keywords.  For these objects, Guile arranges to
initialize them when the compilation unit is loaded, storing them into a
slot in the image.  References go indirectly through that slot.
‘static-ref’ is used in this case.

 -- Instruction: static-ref s24:DST r32:OFFSET
     Load a SCM value into DST.  The SCM value will be fetched from
     memory, OFFSET 32-bit words away from the current instruction
     pointer.  OFFSET is a signed value.

   Fields of non-immediates may need to be fixed up at load time,
because we do not know in advance at what address they will be loaded.
This is the case, for example, for a pair containing a non-immediate in
one of its fields.  ‘static-set!’ and ‘static-patch!’ are used in these
situations.

 -- Instruction: static-set! s24:SRC lo32:OFFSET
     Store a SCM value into memory, OFFSET 32-bit words away from the
     current instruction pointer.  OFFSET is a signed value.

 -- Instruction: static-patch! x24:_ lo32:DST-OFFSET l32:SRC-OFFSET
     Patch a pointer at DST-OFFSET to point to SRC-OFFSET.  Both offsets
     are signed 32-bit values, indicating a memory address as a number
     of 32-bit words away from the current instruction pointer.


File: guile.info,  Node: Memory Access Instructions,  Next: Atomic Memory Access Instructions,  Prev: Constant Instructions,  Up: Instruction Set

9.3.7.9 Memory Access Instructions
..................................

In these instructions, the ‘/immediate’ variants represent their indexes
or counts as immediates; otherwise these values are unboxed u64 locals.

 -- Instruction: allocate-words s12:DST s12:COUNT
 -- Instruction: allocate-words/immediate s12:DST c12:COUNT
     Allocate a fresh GC-traced object consisting of COUNT words and
     store it into DST.

 -- Instruction: scm-ref s8:DST s8:OBJ s8:IDX
 -- Instruction: scm-ref/immediate s8:DST s8:OBJ c8:IDX
     Load the ‘SCM’ object at word offset IDX from local OBJ, and store
     it to DST.

 -- Instruction: scm-set! s8:DST s8:IDX s8:OBJ
 -- Instruction: scm-set!/immediate s8:DST c8:IDX s8:OBJ
     Store the ‘scm’ local VAL into object OBJ at word offset IDX.

 -- Instruction: scm-ref/tag s8:DST s8:OBJ c8:TAG
     Load the first word of OBJ, subtract the immediate TAG, and store
     the resulting ‘SCM’ to DST.

 -- Instruction: scm-set!/tag s8:OBJ c8:TAG s8:VAL
     Set the first word of OBJ to the unpacked bits of the ‘scm’ value
     VAL plus the immediate value TAG.

 -- Instruction: word-ref s8:DST s8:OBJ s8:IDX
 -- Instruction: word-ref/immediate s8:DST s8:OBJ c8:IDX
     Load the word at offset IDX from local OBJ, and store it to the
     ‘u64’ local DST.

 -- Instruction: word-set! s8:DST s8:IDX s8:OBJ
 -- Instruction: word-set!/immediate s8:DST c8:IDX s8:OBJ
     Store the ‘u64’ local VAL into object OBJ at word offset IDX.

 -- Instruction: pointer-ref/immediate s8:DST s8:OBJ c8:IDX
     Load the pointer at offset IDX from local OBJ, and store it to the
     unboxed pointer local DST.

 -- Instruction: pointer-set!/immediate s8:DST c8:IDX s8:OBJ
     Store the unboxed pointer local VAL into object OBJ at word offset
     IDX.

 -- Instruction: tail-pointer-ref/immediate s8:DST s8:OBJ c8:IDX
     Compute the address of word offset IDX from local OBJ, and store it
     to DST.


File: guile.info,  Node: Atomic Memory Access Instructions,  Next: Tagging and Untagging Instructions,  Prev: Memory Access Instructions,  Up: Instruction Set

9.3.7.10 Atomic Memory Access Instructions
..........................................

 -- Instruction: current-thread s24:DST
     Write the current thread into DST.

 -- Instruction: atomic-scm-ref/immediate s8:DST s8:OBJ c8:IDX
     Atomically load the ‘SCM’ object at word offset IDX from local OBJ,
     using the sequential consistency memory model.  Store the result to
     DST.

 -- Instruction: atomic-scm-set!/immediate s8:OBJ c8:IDX s8:VAL
     Atomically set the ‘SCM’ object at word offset IDX from local OBJ
     to VAL, using the sequential consistency memory model.

 -- Instruction: atomic-scm-swap!/immediate s24:DST x8:_ s24:OBJ c8:IDX
          s24:VAL
     Atomically swap the ‘SCM’ value stored in object OBJ at word offset
     IDX with VAL, using the sequentially consistent memory model.
     Store the previous value to DST.

 -- Instruction: atomic-scm-compare-and-swap!/immediate s24:DST x8:_
          s24:OBJ c8:IDX s24:EXPECTED x8:_ s24:DESIRED
     Atomically swap the ‘SCM’ value stored in object OBJ at word offset
     IDX with DESIRED, if and only if the value that was there was
     EXPECTED, using the sequentially consistent memory model.  Store
     the value that was previously at IDX from OBJ in DST.


File: guile.info,  Node: Tagging and Untagging Instructions,  Next: Integer Arithmetic Instructions,  Prev: Atomic Memory Access Instructions,  Up: Instruction Set

9.3.7.11 Tagging and Untagging Instructions
...........................................

 -- Instruction: tag-char s12:DST s12:SRC
     Make a ‘SCM’ character whose integer value is the ‘u64’ in SRC, and
     store it in DST.

 -- Instruction: untag-char s12:DST s12:SRC
     Extract the integer value from the ‘SCM’ character SRC, and store
     the resulting ‘u64’ in DST.

 -- Instruction: tag-fixnum s12:DST s12:SRC
     Make a ‘SCM’ integer whose value is the ‘s64’ in SRC, and store it
     in DST.

 -- Instruction: untag-fixnum s12:DST s12:SRC
     Extract the integer value from the ‘SCM’ integer SRC, and store the
     resulting ‘s64’ in DST.


File: guile.info,  Node: Integer Arithmetic Instructions,  Next: Floating-Point Arithmetic Instructions,  Prev: Tagging and Untagging Instructions,  Up: Instruction Set

9.3.7.12 Integer Arithmetic Instructions
........................................

 -- Instruction: uadd s8:DST s8:A s8:B
 -- Instruction: uadd/immediate s8:DST s8:A c8:B
     Add the ‘u64’ values A and B, and store the ‘u64’ result to DST.
     Overflow will wrap.

 -- Instruction: usub s8:DST s8:A s8:B
 -- Instruction: usub/immediate s8:DST s8:A c8:B
     Subtract the ‘u64’ value B from A, and store the ‘u64’ result to
     DST.  Underflow will wrap.

 -- Instruction: umul s8:DST s8:A s8:B
 -- Instruction: umul/immediate s8:DST s8:A c8:B
     Multiply the ‘u64’ values A and B, and store the ‘u64’ result to
     DST.  Overflow will wrap.

 -- Instruction: ulogand s8:DST s8:A s8:B
     Place the bitwise ‘and’ of the ‘u64’ values A and B into the ‘u64’
     local DST.

 -- Instruction: ulogior s8:DST s8:A s8:B
     Place the bitwise inclusive ‘or’ of the ‘u64’ values A and B into
     the ‘u64’ local DST.

 -- Instruction: ulogxor s8:DST s8:A s8:B
     Place the bitwise exclusive ‘or’ of the ‘u64’ values A and B into
     the ‘u64’ local DST.

 -- Instruction: ulogsub s8:DST s8:A s8:B
     Place the bitwise ‘and’ of the ‘u64’ values A and the bitwise ‘not’
     of B into the ‘u64’ local DST.

 -- Instruction: ulsh s8:DST s8:A s8:B
 -- Instruction: ulsh/immediate s8:DST s8:A c8:B
     Shift the unboxed unsigned 64-bit integer in A left by B bits, also
     an unboxed unsigned 64-bit integer.  Truncate to 64 bits and write
     to DST as an unboxed value.  Only the lower 6 bits of B are used.

 -- Instruction: ursh s8:DST s8:A s8:B
 -- Instruction: ursh/immediate s8:DST s8:A c8:B
     Shift the unboxed unsigned 64-bit integer in A right by B bits,
     also an unboxed unsigned 64-bit integer.  Truncate to 64 bits and
     write to DST as an unboxed value.  Only the lower 6 bits of B are
     used.

 -- Instruction: srsh s8:DST s8:A s8:B
 -- Instruction: srsh/immediate s8:DST s8:A c8:B
     Shift the unboxed signed 64-bit integer in A right by B bits, also
     an unboxed signed 64-bit integer.  Truncate to 64 bits and write to
     DST as an unboxed value.  Only the lower 6 bits of B are used.


File: guile.info,  Node: Floating-Point Arithmetic Instructions,  Next: Comparison Instructions,  Prev: Integer Arithmetic Instructions,  Up: Instruction Set

9.3.7.13 Floating-Point Arithmetic Instructions
...............................................

 -- Instruction: fadd s8:DST s8:A s8:B
     Add the ‘f64’ values A and B, and store the ‘f64’ result to DST.

 -- Instruction: fsub s8:DST s8:A s8:B
     Subtract the ‘f64’ value B from A, and store the ‘f64’ result to
     DST.

 -- Instruction: fmul s8:DST s8:A s8:B
     Multiply the ‘f64’ values A and B, and store the ‘f64’ result to
     DST.

 -- Instruction: fdiv s8:DST s8:A s8:B
     Divide the ‘f64’ values A by B, and store the ‘f64’ result to DST.


File: guile.info,  Node: Comparison Instructions,  Next: Branch Instructions,  Prev: Floating-Point Arithmetic Instructions,  Up: Instruction Set

9.3.7.14 Comparison Instructions
................................

 -- Instruction: u64=? s12:A s12:B
     Set the comparison result to EQUAL if the ‘u64’ values A and B are
     the same, or ‘NONE’ otherwise.

 -- Instruction: u64<? s12:A s12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘u64’ value A is
     less than the ‘u64’ value B are the same, or ‘NONE’ otherwise.

 -- Instruction: s64<? s12:A s12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘s64’ value A is
     less than the ‘s64’ value B are the same, or ‘NONE’ otherwise.

 -- Instruction: s64-imm=? s12:A z12:B
     Set the comparison result to EQUAL if the ‘s64’ value A is equal to
     the immediate ‘s64’ value B, or ‘NONE’ otherwise.

 -- Instruction: u64-imm<? s12:A c12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘u64’ value A is
     less than the immediate ‘u64’ value B, or ‘NONE’ otherwise.

 -- Instruction: imm-u64<? s12:A s12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘u64’ immediate B
     is less than the ‘u64’ value A, or ‘NONE’ otherwise.

 -- Instruction: s64-imm<? s12:A z12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘s64’ value A is
     less than the immediate ‘s64’ value B, or ‘NONE’ otherwise.

 -- Instruction: imm-s64<? s12:A z12:B
     Set the comparison result to ‘LESS_THAN’ if the ‘s64’ immediate B
     is less than the ‘s64’ value A, or ‘NONE’ otherwise.

 -- Instruction: f64=? s12:A s12:B
     Set the comparison result to EQUAL if the f64 value A is equal to
     the f64 value B, or ‘NONE’ otherwise.

 -- Instruction: f64<? s12:A s12:B
     Set the comparison result to ‘LESS_THAN’ if the f64 value A is less
     than the f64 value B, ‘NONE’ if A is greater than or equal to B, or
     ‘INVALID’ otherwise.

 -- Instruction: =? s12:A s12:B
     Set the comparison result to EQUAL if the SCM values A and B are
     numerically equal, in the sense of the Scheme ‘=’ operator.  Set to
     ‘NONE’ otherwise.

 -- Instruction: heap-numbers-equal? s12:A s12:B
     Set the comparison result to EQUAL if the SCM values A and B are
     numerically equal, in the sense of Scheme ‘=’.  Set to ‘NONE’
     otherwise.  It is known that both A and B are heap numbers.

 -- Instruction: <? s12:A s12:B
     Set the comparison result to ‘LESS_THAN’ if the SCM value A is less
     than the SCM value B, ‘NONE’ if A is greater than or equal to B, or
     ‘INVALID’ otherwise.

 -- Instruction: immediate-tag=? s24:OBJ c16:MASK c16:TAG
     Set the comparison result to EQUAL if the result of a bitwise ‘and’
     between the bits of ‘scm’ value A and the immediate MASK is TAG, or
     ‘NONE’ otherwise.

 -- Instruction: heap-tag=? s24:OBJ c16:MASK c16:TAG
     Set the comparison result to EQUAL if the result of a bitwise ‘and’
     between the first word of ‘scm’ value A and the immediate MASK is
     TAG, or ‘NONE’ otherwise.

 -- Instruction: eq? s12:A s12:B
     Set the comparison result to EQUAL if the SCM values A and B are
     ‘eq?’, or ‘NONE’ otherwise.

 -- Instruction: eq-immediate? s8:A zi16:B
     Set the comparison result to EQUAL if the SCM value A is equal to
     the immediate SCM value B (sign-extended), or ‘NONE’ otherwise.

   There are a set of macro-instructions for ‘immediate-tag=?’ and
‘heap-tag=?’ as well that abstract away the precise type tag values.
*Note The SCM Type in Guile::.

 -- Macro Instruction: fixnum? x
 -- Macro Instruction: heap-object? x
 -- Macro Instruction: char? x
 -- Macro Instruction: eq-false? x
 -- Macro Instruction: eq-nil? x
 -- Macro Instruction: eq-null? x
 -- Macro Instruction: eq-true? x
 -- Macro Instruction: unspecified? x
 -- Macro Instruction: undefined? x
 -- Macro Instruction: eof-object? x
 -- Macro Instruction: null? x
 -- Macro Instruction: false? x
 -- Macro Instruction: nil? x
     Emit a ‘immediate-tag=?’ instruction that will set the comparison
     result to ‘EQUAL’ if X would pass the corresponding predicate (e.g.
     ‘null?’), or ‘NONE’ otherwise.

 -- Macro Instruction: pair? x
 -- Macro Instruction: struct? x
 -- Macro Instruction: symbol? x
 -- Macro Instruction: variable? x
 -- Macro Instruction: vector? x
 -- Macro Instruction: immutable-vector? x
 -- Macro Instruction: mutable-vector? x
 -- Macro Instruction: weak-vector? x
 -- Macro Instruction: string? x
 -- Macro Instruction: heap-number? x
 -- Macro Instruction: hash-table? x
 -- Macro Instruction: pointer? x
 -- Macro Instruction: fluid? x
 -- Macro Instruction: stringbuf? x
 -- Macro Instruction: dynamic-state? x
 -- Macro Instruction: frame? x
 -- Macro Instruction: keyword? x
 -- Macro Instruction: atomic-box? x
 -- Macro Instruction: syntax? x
 -- Macro Instruction: program? x
 -- Macro Instruction: vm-continuation? x
 -- Macro Instruction: bytevector? x
 -- Macro Instruction: weak-set? x
 -- Macro Instruction: weak-table? x
 -- Macro Instruction: array? x
 -- Macro Instruction: bitvector? x
 -- Macro Instruction: smob? x
 -- Macro Instruction: port? x
 -- Macro Instruction: bignum? x
 -- Macro Instruction: flonum? x
 -- Macro Instruction: compnum? x
 -- Macro Instruction: fracnum? x
     Emit a ‘heap-tag=?’ instruction that will set the comparison result
     to ‘EQUAL’ if X would pass the corresponding predicate (e.g.
     ‘null?’), or ‘NONE’ otherwise.


File: guile.info,  Node: Branch Instructions,  Next: Raw Memory Access Instructions,  Prev: Comparison Instructions,  Up: Instruction Set

9.3.7.15 Branch Instructions
............................

All offsets to branch instructions are 24-bit signed numbers, which
count 32-bit units.  This gives Guile effectively a 26-bit address range
for relative jumps.

 -- Instruction: j l24:OFFSET
     Add OFFSET to the current instruction pointer.

 -- Instruction: jl l24:OFFSET
     If the last comparison result is ‘LESS_THAN’, add OFFSET, a signed
     24-bit number, to the current instruction pointer.

 -- Instruction: je l24:OFFSET
     If the last comparison result is ‘EQUAL’, add OFFSET, a signed
     24-bit number, to the current instruction pointer.

 -- Instruction: jnl l24:OFFSET
     If the last comparison result is not ‘LESS_THAN’, add OFFSET, a
     signed 24-bit number, to the current instruction pointer.

 -- Instruction: jne l24:OFFSET
     If the last comparison result is not ‘EQUAL’, add OFFSET, a signed
     24-bit number, to the current instruction pointer.

 -- Instruction: jge l24:OFFSET
     If the last comparison result is ‘NONE’, add OFFSET, a signed
     24-bit number, to the current instruction pointer.

     This is intended for use after a ‘<?’ comparison, and is different
     from ‘jnl’ in the way it handles not-a-number (NaN) values: ‘<?’
     sets ‘INVALID’ instead of ‘NONE’ if either value is a NaN. For
     exact numbers, ‘jge’ is the same as ‘jnl’.

 -- Instruction: jnge l24:OFFSET
     If the last comparison result is not ‘NONE’, add OFFSET, a signed
     24-bit number, to the current instruction pointer.

     This is intended for use after a ‘<?’ comparison, and is different
     from ‘jl’ in the way it handles not-a-number (NaN) values: ‘<?’
     sets ‘INVALID’ instead of ‘NONE’ if either value is a NaN. For
     exact numbers, ‘jnge’ is the same as ‘jl’.

 -- Instruction: jtable s24:IDX v32:LENGTH [x8:_ l24:OFFSET]...
     Branch to an entry in a table, as in C’s ‘switch’ statement.  IDX
     is a ‘u64’ local indicating which entry to branch to.  The
     immediate LEN indicates the number of entries in the table, and
     should be greater than or equal to 1.  The last entry in the table
     is the "catch-all" entry.  The OFFSET...  values are signed 24-bit
     immediates (‘l24’ encoding), indicating a memory address as a
     number of 32-bit words away from the current instruction pointer.


File: guile.info,  Node: Raw Memory Access Instructions,  Prev: Branch Instructions,  Up: Instruction Set

9.3.7.16 Raw Memory Access Instructions
.......................................

Bytevector operations correspond closely to what the current hardware
can do, so it makes sense to inline them to VM instructions, providing a
clear path for eventual native compilation.  Without this, Scheme
programs would need other primitives for accessing raw bytes – but these
primitives are as good as any.

 -- Instruction: u8-ref s8:DST s8:PTR s8:IDX
 -- Instruction: s8-ref s8:DST s8:PTR s8:IDX
 -- Instruction: u16-ref s8:DST s8:PTR s8:IDX
 -- Instruction: s16-ref s8:DST s8:PTR s8:IDX
 -- Instruction: u32-ref s8:DST s8:PTR s8:IDX
 -- Instruction: s32-ref s8:DST s8:PTR s8:IDX
 -- Instruction: u64-ref s8:DST s8:PTR s8:IDX
 -- Instruction: s64-ref s8:DST s8:PTR s8:IDX
 -- Instruction: f32-ref s8:DST s8:PTR s8:IDX
 -- Instruction: f64-ref s8:DST s8:PTR s8:IDX

     Fetch the item at byte offset IDX from the raw pointer local PTR,
     and store it in DST.  All accesses use native endianness.

     The IDX value should be an unboxed unsigned 64-bit integer.

     The results are all written to the stack as unboxed values, either
     as signed 64-bit integers, unsigned 64-bit integers, or IEEE double
     floating point numbers.

 -- Instruction: u8-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: s8-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: u16-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: s16-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: u32-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: s32-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: u64-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: s64-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: f32-set! s8:PTR s8:IDX s8:VAL
 -- Instruction: f64-set! s8:PTR s8:IDX s8:VAL

     Store VAL into memory pointed to by raw pointer local PTR, at byte
     offset IDX.  Multibyte values are written using native endianness.

     The IDX value should be an unboxed unsigned 64-bit integer.

     The VAL values are all unboxed, either as signed 64-bit integers,
     unsigned 64-bit integers, or IEEE double floating point numbers.


File: guile.info,  Node: Just-In-Time Native Code,  Prev: Instruction Set,  Up: A Virtual Machine for Guile

9.3.8 Just-In-Time Native Code
------------------------------

The final piece of Guile’s virtual machine is a just-in-time (JIT)
compiler from bytecode instructions to native code.  It is faster to run
a function when its bytecode instructions are compiled to native code,
compared to having the VM interpret the instructions.

   The JIT compiler runs automatically, triggered by counters associated
with each function.  The counter increments when functions are called
and during each loop iteration.  Once a function’s counter passes a
certain value, the function gets JIT-compiled.  *Note Instrumentation
Instructions::, for full details.

   Guile’s JIT compiler is what is known as a “template JIT”. This kind
of JIT is very simple: for each instruction in a function, the JIT
compiler will emit a generic sequence of machine code corresponding to
the instruction kind, specializing that generic template to reference
the specific operands of the instruction being compiled.

   The strength of a template JIT is principally that it is very fast at
emitting code.  It doesn’t need to do any time-consuming analysis on the
bytecode that it is compiling to do its job.

   A template JIT is also very predictable: the native code emitted by a
template JIT has the same performance characteristics of the
corresponding bytecode, only that it runs faster.  In theory you could
even generate the template-JIT machine code ahead of time, as it doesn’t
depend on any value seen at run-time.

   This predictability makes it possible to reason about the performance
of a system in terms of bytecode, knowing that the conclusions apply to
native code emitted by a template JIT.

   Because the machine code corresponding to an instruction always
performs the same tasks that the interpreter would do for that
instruction, bytecode and a template JIT also allows Guile programmers
to debug their programs in terms of the bytecode model.  When a Guile
programmer sets a breakpoint, Guile will disable the JIT for the thread
being debugged, falling back to the interpreter (which has the
corresponding code to run the hooks).  *Note VM Hooks::.

   To emit native code, Guile uses a forked version of GNU Lightning.
This "Lightening" effort, spun out as a separate project, aims to build
on the back-end support from GNU Lightning, but adapting the API and
behavior of the library to match Guile’s needs.  This code is included
in the Guile source distribution.  For more information, see
<https://gitlab.com/wingo/lightening>.  As of mid-2019, Lightening
supports code generation for the x86-64, ia32, ARMv7, and AArch64
architectures.

   The weaknesses of a template JIT are two-fold.  Firstly, as a simple
back-end that has to run fast, a template JIT doesn’t have time to do
analysis that could help it generate better code, notably global
register allocation and instruction selection.

   However this is a minor weakness compared to the inability to perform
significant, speculative program transformations.  For example, Guile
could see that in an expression ‘(f x)’, that in practice F always
refers to the same function.  An advanced JIT compiler would
speculatively inline F into the call-site, along with a dynamic check to
make sure that the assertion still held.  But as a template JIT doesn’t
pay attention to values only known at run-time, it can’t make this
transformation.

   This limitation is mitigated in part by Guile’s robust ahead-of-time
compiler which can already perform significant optimizations when it can
prove they will always be valid, and its low-level bytecode which is
able to represent the effect of those optimizations (e.g.  elided
type-checks).  *Note Compiling to the Virtual Machine::, for more on
Guile’s compiler.

   An ahead-of-time Scheme-to-bytecode strategy, complemented by a
template JIT, also particularly suits the somewhat static nature of
Scheme.  Scheme programmers often write code in a way that makes the
identity of free variable references lexically apparent.  For example,
the ‘(f x)’ expression could appear within a ‘(let ((f (lambda (x) (1+
x)))) ...)’ expression, or we could see that ‘f’ was imported from a
particular module where we know its binding.  Ahead-of-time compilation
techniques can work well for a language like Scheme where there is
little polymorphism and much first-order programming.  They do not work
so well for a language like JavaScript, which is highly mutable at
run-time and difficult to analyze due to method calls (which are
effectively higher-order calls).

   All that said, a template JIT works well for Guile at this point.
It’s only a few thousand lines of maintainable code, it speeds up Scheme
programs, and it keeps the bulk of the Guile Scheme implementation
written in Scheme itself.  The next step is probably to add
ahead-of-time native code emission to the back-end of the compiler
written in Scheme, to take advantage of the opportunity to do global
register allocation and instruction selection.  Once this is working, it
can allow Guile to experiment with speculative optimizations in Scheme
as well.  *Note Extending the Compiler::, for more on future directions.

   Finally, note that there are a few environment variables that can be
tweaked to make JIT compilation happen sooner, later, or never.  *Note
Environment Variables::, for more.


File: guile.info,  Node: Compiling to the Virtual Machine,  Prev: A Virtual Machine for Guile,  Up: Guile Implementation

9.4 Compiling to the Virtual Machine
====================================

Compilers!  The word itself inspires excitement and awe, even among
experienced practitioners.  But a compiler is just a program: an
eminently hackable thing.  This section aims to describe Guile’s
compiler in such a way that interested Scheme hackers can feel
comfortable reading and extending it.

   *Note Read/Load/Eval/Compile::, if you’re lost and you just wanted to
know how to compile your ‘.scm’ file.

* Menu:

* Compiler Tower::
* The Scheme Compiler::
* Tree-IL::
* Continuation-Passing Style::
* Bytecode::
* Writing New High-Level Languages::
* Extending the Compiler::


File: guile.info,  Node: Compiler Tower,  Next: The Scheme Compiler,  Up: Compiling to the Virtual Machine

9.4.1 Compiler Tower
--------------------

Guile’s compiler is quite simple – its _compilers_, to put it more
accurately.  Guile defines a tower of languages, starting at Scheme and
progressively simplifying down to languages that resemble the VM
instruction set (*note Instruction Set::).

   Each language knows how to compile to the next, so each step is
simple and understandable.  Furthermore, this set of languages is not
hardcoded into Guile, so it is possible for the user to add new
high-level languages, new passes, or even different compilation targets.

   Languages are registered in the module, ‘(system base language)’:

     (use-modules (system base language))

   They are registered with the ‘define-language’ form.

 -- Scheme Syntax: define-language [#:name] [#:title] [#:reader]
          [#:printer] [#:parser=#f] [#:compilers='()]
          [#:decompilers='()] [#:evaluator=#f] [#:joiner=#f]
          [#:for-humans?=#t]
          [#:make-default-environment=make-fresh-user-module]
          [#:lowerer=#f] [#:analyzer=#f] [#:compiler-chooser=#f]
     Define a language.

     This syntax defines a ‘<language>’ object, bound to NAME in the
     current environment.  In addition, the language will be added to
     the global language set.  For example, this is the language
     definition for Scheme:

          (define-language scheme
            #:title	"Scheme"
            #:reader      (lambda (port env) ...)
            #:compilers   `((tree-il . ,compile-tree-il))
            #:decompilers `((tree-il . ,decompile-tree-il))
            #:evaluator	(lambda (x module) (primitive-eval x))
            #:printer	write
            #:make-default-environment (lambda () ...))

   The interesting thing about having languages defined this way is that
they present a uniform interface to the read-eval-print loop.  This
allows the user to change the current language of the REPL:

     scheme@(guile-user)> ,language tree-il
     Happy hacking with Tree Intermediate Language!  To switch back, type `,L scheme'.
     tree-il@(guile-user)> ,L scheme
     Happy hacking with Scheme!  To switch back, type `,L tree-il'.
     scheme@(guile-user)>

   Languages can be looked up by name, as they were above.

 -- Scheme Procedure: lookup-language name
     Looks up a language named NAME, autoloading it if necessary.

     Languages are autoloaded by looking for a variable named NAME in a
     module named ‘(language NAME spec)’.

     The language object will be returned, or ‘#f’ if there does not
     exist a language with that name.

   When Guile goes to compile Scheme to bytecode, it will ask the Scheme
language to choose a compiler from Scheme to the next language on the
path from Scheme to bytecode.  Performing this computation recursively
builds transformations from a flexible chain of compilers.  The next
link will be obtained by invoking the language’s compiler chooser, or if
not present, from the language’s compilers field.

   A language can specify an analyzer, which is run before a term of
that language is lowered and compiled.  This is where compiler warnings
are issued.

   If a language specifies a lowerer, that procedure is called on
expressions before compilation.  This is where optimizations and
canonicalizations go.

   Finally a language’s compiler translates a lowered term from one
language to the next one in the chain.

   There is a notion of a “current language”, which is maintained in the
‘current-language’ parameter, defined in the core ‘(guile)’ module.
This language is normally Scheme, and may be rebound by the user.  The
run-time compilation interfaces (*note Read/Load/Eval/Compile::) also
allow you to choose other source and target languages.

   The normal tower of languages when compiling Scheme goes like this:

   • Scheme
   • Tree Intermediate Language (Tree-IL)
   • Continuation-Passing Style (CPS)
   • Bytecode

   As discussed before (*note Object File Format::), bytecode is in ELF
format, ready to be serialized to disk.  But when compiling Scheme at
run time, you want a Scheme value: for example, a compiled procedure.
For this reason, so as not to break the abstraction, Guile defines a
fake language at the bottom of the tower:

   • Value

   Compiling to ‘value’ loads the bytecode into a procedure, turning
cold bytes into warm code.

   Perhaps this strangeness can be explained by example: ‘compile-file’
defaults to compiling to bytecode, because it produces object code that
has to live in the barren world outside the Guile runtime; but ‘compile’
defaults to compiling to ‘value’, as its product re-enters the Guile
world.

   Indeed, the process of compilation can circulate through these
different worlds indefinitely, as shown by the following quine:

     ((lambda (x) ((compile x) x)) '(lambda (x) ((compile x) x)))


File: guile.info,  Node: The Scheme Compiler,  Next: Tree-IL,  Prev: Compiler Tower,  Up: Compiling to the Virtual Machine

9.4.2 The Scheme Compiler
-------------------------

The job of the Scheme compiler is to expand all macros and all of Scheme
to its most primitive expressions.  The definition of “primitive
expression” is given by the inventory of constructs provided by Tree-IL,
the target language of the Scheme compiler: procedure calls,
conditionals, lexical references, and so on.  This is described more
fully in the next section.

   The tricky and amusing thing about the Scheme-to-Tree-IL compiler is
that it is completely implemented by the macro expander.  Since the
macro expander has to run over all of the source code already in order
to expand macros, it might as well do the analysis at the same time,
producing Tree-IL expressions directly.

   Because this compiler is actually the macro expander, it is
extensible.  Any macro which the user writes becomes part of the
compiler.

   The Scheme-to-Tree-IL expander may be invoked using the generic
‘compile’ procedure:

     (compile '(+ 1 2) #:from 'scheme #:to 'tree-il)
     ⇒
     #<tree-il (call (toplevel +) (const 1) (const 2))>

   ‘(compile FOO #:from 'scheme #:to 'tree-il)’ is entirely equivalent
to calling the macro expander as ‘(macroexpand FOO 'c '(compile load
eval))’.  *Note Macro Expansion::.  ‘compile-tree-il’, the procedure
dispatched by ‘compile’ to ‘'tree-il’, is a small wrapper around
‘macroexpand’, to make it conform to the general form of compiler
procedures in Guile’s language tower.

   Compiler procedures take three arguments: an expression, an
environment, and a keyword list of options.  They return three values:
the compiled expression, the corresponding environment for the target
language, and a “continuation environment”.  The compiled expression and
environment will serve as input to the next language’s compiler.  The
“continuation environment” can be used to compile another expression
from the same source language within the same module.

   For example, you might compile the expression, ‘(define-module
(foo))’.  This will result in a Tree-IL expression and environment.  But
if you compiled a second expression, you would want to take into account
the compile-time effect of compiling the previous expression, which puts
the user in the ‘(foo)’ module.  That is the purpose of the
“continuation environment”; you would pass it as the environment when
compiling the subsequent expression.

   For Scheme, an environment is a module.  By default, the ‘compile’
and ‘compile-file’ procedures compile in a fresh module, such that
bindings and macros introduced by the expression being compiled are
isolated:

     (eq? (current-module) (compile '(current-module)))
     ⇒ #f

     (compile '(define hello 'world))
     (defined? 'hello)
     ⇒ #f

     (define / *)
     (eq? (compile '/) /)
     ⇒ #f

   Similarly, changes to the ‘current-reader’ fluid (*note
‘current-reader’: Loading.) are isolated:

     (compile '(fluid-set! current-reader (lambda args 'fail)))
     (fluid-ref current-reader)
     ⇒ #f

   Nevertheless, having the compiler and “compilee” share the same name
space can be achieved by explicitly passing ‘(current-module)’ as the
compilation environment:

     (define hello 'world)
     (compile 'hello #:env (current-module))
     ⇒ world


File: guile.info,  Node: Tree-IL,  Next: Continuation-Passing Style,  Prev: The Scheme Compiler,  Up: Compiling to the Virtual Machine

9.4.3 Tree-IL
-------------

Tree Intermediate Language (Tree-IL) is a structured intermediate
language that is close in expressive power to Scheme.  It is an
expanded, pre-analyzed Scheme.

   Tree-IL is “structured” in the sense that its representation is based
on records, not S-expressions.  This gives a rigidity to the language
that ensures that compiling to a lower-level language only requires a
limited set of transformations.  For example, the Tree-IL type ‘<const>’
is a record type with two fields, ‘src’ and ‘exp’.  Instances of this
type are created via ‘make-const’.  Fields of this type are accessed via
the ‘const-src’ and ‘const-exp’ procedures.  There is also a predicate,
‘const?’.  *Note Records::, for more information on records.

   All Tree-IL types have a ‘src’ slot, which holds source location
information for the expression.  This information, if present, will be
residualized into the compiled object code, allowing backtraces to show
source information.  The format of ‘src’ is the same as that returned by
Guile’s ‘source-properties’ function.  *Note Source Properties::, for
more information.

   Although Tree-IL objects are represented internally using records,
there is also an equivalent S-expression external representation for
each kind of Tree-IL. For example, the S-expression representation of
‘#<const src: #f exp: 3>’ expression would be:

     (const 3)

   Users may program with this format directly at the REPL:

     scheme@(guile-user)> ,language tree-il
     Happy hacking with Tree Intermediate Language!  To switch back, type `,L scheme'.
     tree-il@(guile-user)> (call (primitive +) (const 32) (const 10))
     ⇒ 42

   The ‘src’ fields are left out of the external representation.

   One may create Tree-IL objects from their external representations
via calling ‘parse-tree-il’, the reader for Tree-IL. If any source
information is attached to the input S-expression, it will be propagated
to the resulting Tree-IL expressions.  This is probably the easiest way
to compile to Tree-IL: just make the appropriate external
representations in S-expression format, and let ‘parse-tree-il’ take
care of the rest.

 -- Scheme Variable: <void> src
 -- External Representation: (void)
     An empty expression.  In practice, equivalent to Scheme’s ‘(if #f
     #f)’.

 -- Scheme Variable: <const> src exp
 -- External Representation: (const EXP)
     A constant.

 -- Scheme Variable: <primitive-ref> src name
 -- External Representation: (primitive NAME)
     A reference to a “primitive”.  A primitive is a procedure that,
     when compiled, may be open-coded.  For example, ‘cons’ is usually
     recognized as a primitive, so that it compiles down to a single
     instruction.

     Compilation of Tree-IL usually begins with a pass that resolves
     some ‘<module-ref>’ and ‘<toplevel-ref>’ expressions to
     ‘<primitive-ref>’ expressions.  The actual compilation pass has
     special cases for calls to certain primitives, like ‘apply’ or
     ‘cons’.

 -- Scheme Variable: <lexical-ref> src name gensym
 -- External Representation: (lexical NAME GENSYM)
     A reference to a lexically-bound variable.  The NAME is the
     original name of the variable in the source program.  GENSYM is a
     unique identifier for this variable.

 -- Scheme Variable: <lexical-set> src name gensym exp
 -- External Representation: (set! (lexical NAME GENSYM) EXP)
     Sets a lexically-bound variable.

 -- Scheme Variable: <module-ref> src mod name public?
 -- External Representation: (@ MOD NAME)
 -- External Representation: (@@ MOD NAME)
     A reference to a variable in a specific module.  MOD should be the
     name of the module, e.g. ‘(guile-user)’.

     If PUBLIC? is true, the variable named NAME will be looked up in
     MOD’s public interface, and serialized with ‘@’; otherwise it will
     be looked up among the module’s private bindings, and is serialized
     with ‘@@’.

 -- Scheme Variable: <module-set> src mod name public? exp
 -- External Representation: (set! (@ MOD NAME) EXP)
 -- External Representation: (set! (@@ MOD NAME) EXP)
     Sets a variable in a specific module.

 -- Scheme Variable: <toplevel-ref> src name
 -- External Representation: (toplevel NAME)
     References a variable from the current procedure’s module.

 -- Scheme Variable: <toplevel-set> src name exp
 -- External Representation: (set! (toplevel NAME) EXP)
     Sets a variable in the current procedure’s module.

 -- Scheme Variable: <toplevel-define> src name exp
 -- External Representation: (define NAME EXP)
     Defines a new top-level variable in the current procedure’s module.

 -- Scheme Variable: <conditional> src test then else
 -- External Representation: (if TEST THEN ELSE)
     A conditional.  Note that ELSE is not optional.

 -- Scheme Variable: <call> src proc args
 -- External Representation: (call PROC . ARGS)
     A procedure call.

 -- Scheme Variable: <primcall> src name args
 -- External Representation: (primcall NAME . ARGS)
     A call to a primitive.  Equivalent to ‘(call (primitive NAME) .
     ARGS)’.  This construct is often more convenient to generate and
     analyze than ‘<call>’.

     As part of the compilation process, instances of ‘(call (primitive
     NAME) . ARGS)’ are transformed into primcalls.

 -- Scheme Variable: <seq> src head tail
 -- External Representation: (seq HEAD TAIL)
     A sequence.  The semantics is that HEAD is evaluated first, and any
     resulting values are ignored.  Then TAIL is evaluated, in tail
     position.

 -- Scheme Variable: <lambda> src meta body
 -- External Representation: (lambda META BODY)
     A closure.  META is an association list of properties for the
     procedure.  BODY is a single Tree-IL expression of type
     ‘<lambda-case>’.  As the ‘<lambda-case>’ clause can chain to an
     alternate clause, this makes Tree-IL’s ‘<lambda>’ have the
     expressiveness of Scheme’s ‘case-lambda’.

 -- Scheme Variable: <lambda-case> req opt rest kw inits gensyms body
          alternate
 -- External Representation: (lambda-case ((REQ OPT REST KW INITS
          GENSYMS) BODY) [ALTERNATE])
     One clause of a ‘case-lambda’.  A ‘lambda’ expression in Scheme is
     treated as a ‘case-lambda’ with one clause.

     REQ is a list of the procedure’s required arguments, as symbols.
     OPT is a list of the optional arguments, or ‘#f’ if there are no
     optional arguments.  REST is the name of the rest argument, or
     ‘#f’.

     KW is a list of the form, ‘(ALLOW-OTHER-KEYS? (KEYWORD NAME VAR)
     ...)’, where KEYWORD is the keyword corresponding to the argument
     named NAME, and whose corresponding gensym is VAR, or ‘#f’ if there
     are no keyword arguments.  INITS are tree-il expressions
     corresponding to all of the optional and keyword arguments,
     evaluated to bind variables whose value is not supplied by the
     procedure caller.  Each INIT expression is evaluated in the lexical
     context of previously bound variables, from left to right.

     GENSYMS is a list of gensyms corresponding to all arguments: first
     all of the required arguments, then the optional arguments if any,
     then the rest argument if any, then all of the keyword arguments.

     BODY is the body of the clause.  If the procedure is called with an
     appropriate number of arguments, BODY is evaluated in tail
     position.  Otherwise, if there is an ALTERNATE, it should be a
     ‘<lambda-case>’ expression, representing the next clause to try.
     If there is no ALTERNATE, a wrong-number-of-arguments error is
     signaled.

 -- Scheme Variable: <let> src names gensyms vals exp
 -- External Representation: (let NAMES GENSYMS VALS EXP)
     Lexical binding, like Scheme’s ‘let’.  NAMES are the original
     binding names, GENSYMS are gensyms corresponding to the NAMES, and
     VALS are Tree-IL expressions for the values.  EXP is a single
     Tree-IL expression.

 -- Scheme Variable: <letrec> in-order? src names gensyms vals exp
 -- External Representation: (letrec NAMES GENSYMS VALS EXP)
 -- External Representation: (letrec* NAMES GENSYMS VALS EXP)
     A version of ‘<let>’ that creates recursive bindings, like Scheme’s
     ‘letrec’, or ‘letrec*’ if IN-ORDER? is true.

 -- Scheme Variable: <prompt> escape-only? tag body handler
 -- External Representation: (prompt ESCAPE-ONLY? TAG BODY HANDLER)
     A dynamic prompt.  Instates a prompt named TAG, an expression,
     during the dynamic extent of the execution of BODY, also an
     expression.  If an abort occurs to this prompt, control will be
     passed to HANDLER, also an expression, which should be a procedure.
     The first argument to the handler procedure will be the captured
     continuation, followed by all of the values passed to the abort.
     If ESCAPE-ONLY? is true, the handler should be a ‘<lambda>’ with a
     single ‘<lambda-case>’ body expression with no optional or keyword
     arguments, and no alternate, and whose first argument is
     unreferenced.  *Note Prompts::, for more information.

 -- Scheme Variable: <abort> tag args tail
 -- External Representation: (abort TAG ARGS TAIL)
     An abort to the nearest prompt with the name TAG, an expression.
     ARGS should be a list of expressions to pass to the prompt’s
     handler, and TAIL should be an expression that will evaluate to a
     list of additional arguments.  An abort will save the partial
     continuation, which may later be reinstated, resulting in the
     ‘<abort>’ expression evaluating to some number of values.

   There are two Tree-IL constructs that are not normally produced by
higher-level compilers, but instead are generated during the
source-to-source optimization and analysis passes that the Tree-IL
compiler does.  Users should not generate these expressions directly,
unless they feel very clever, as the default analysis pass will generate
them as necessary.

 -- Scheme Variable: <let-values> src names gensyms exp body
 -- External Representation: (let-values NAMES GENSYMS EXP BODY)
     Like Scheme’s ‘receive’ – binds the values returned by evaluating
     ‘exp’ to the ‘lambda’-like bindings described by GENSYMS.  That is
     to say, GENSYMS may be an improper list.

     ‘<let-values>’ is an optimization of a ‘<call>’ to the primitive,
     ‘call-with-values’.

 -- Scheme Variable: <fix> src names gensyms vals body
 -- External Representation: (fix NAMES GENSYMS VALS BODY)
     Like ‘<letrec>’, but only for VALS that are unset ‘lambda’
     expressions.

     ‘fix’ is an optimization of ‘letrec’ (and ‘let’).

   Tree-IL is a convenient compilation target from source languages.  It
can be convenient as a medium for optimization, though CPS is usually
better.  The strength of Tree-IL is that it does not fix order of
evaluation, so it makes some code motion a bit easier.

   Optimization passes performed on Tree-IL currently include:

   • Open-coding (turning toplevel-refs into primitive-refs, and calls
     to primitives to primcalls)
   • Partial evaluation (comprising inlining, copy propagation, and
     constant folding)


File: guile.info,  Node: Continuation-Passing Style,  Next: Bytecode,  Prev: Tree-IL,  Up: Compiling to the Virtual Machine

9.4.4 Continuation-Passing Style
--------------------------------

Continuation-passing style (CPS) is Guile’s principal intermediate
language, bridging the gap between languages for people and languages
for machines.  CPS gives a name to every part of a program: every
control point, and every intermediate value.  This makes it an excellent
medium for reasoning about programs, which is the principal job of a
compiler.

* Menu:

* An Introduction to CPS::
* CPS in Guile::
* Building CPS::
* CPS Soup::
* Compiling CPS::


File: guile.info,  Node: An Introduction to CPS,  Next: CPS in Guile,  Up: Continuation-Passing Style

9.4.4.1 An Introduction to CPS
..............................

Consider the following Scheme expression:

     (begin
       (display "The sum of 32 and 10 is: ")
       (display 42)
       (newline))

   Let us identify all of the sub-expressions in this expression,
annotating them with unique labels:

     (begin
       (display "The sum of 32 and 10 is: ")
       |k1      k2
       k0
       (display 42)
       |k4      k5
       k3
       (newline))
       |k7
       k6

   Each of these labels identifies a point in a program.  One label may
be the continuation of another label.  For example, the continuation of
‘k7’ is ‘k6’.  This is because after evaluating the value of ‘newline’,
performed by the expression labelled ‘k7’, we continue to apply it in
‘k6’.

   Which expression has ‘k0’ as its continuation?  It is either the
expression labelled ‘k1’ or the expression labelled ‘k2’.  Scheme does
not have a fixed order of evaluation of arguments, though it does
guarantee that they are evaluated in some order.  Unlike general Scheme,
continuation-passing style makes evaluation order explicit.  In Guile,
this choice is made by the higher-level language compilers.

   Let us assume a left-to-right evaluation order.  In that case the
continuation of ‘k1’ is ‘k2’, and the continuation of ‘k2’ is ‘k0’.

   With this example established, we are ready to give an example of CPS
in Scheme:

     (lambda (ktail)
       (let ((k1 (lambda ()
                   (let ((k2 (lambda (proc)
                               (let ((k0 (lambda (arg0)
                                           (proc k4 arg0))))
                                 (k0 "The sum of 32 and 10 is: ")))))
                     (k2 display))))
             (k4 (lambda _
                   (let ((k5 (lambda (proc)
                               (let ((k3 (lambda (arg0)
                                           (proc k7 arg0))))
                                 (k3 42)))))
                     (k5 display))))
             (k7 (lambda _
                   (let ((k6 (lambda (proc)
                               (proc ktail))))
                     (k6 newline)))))
         (k1))

   Holy code explosion, Batman!  What’s with all the lambdas?  Indeed,
CPS is by nature much more verbose than “direct-style” intermediate
languages like Tree-IL. At the same time, CPS is simpler than full
Scheme, because it makes things more explicit.

   In the original program, the expression labelled ‘k0’ is in effect
context.  Any values it returns are ignored.  In Scheme, this fact is
implicit.  In CPS, we can see it explicitly by noting that its
continuation, ‘k4’, takes any number of values and ignores them.
Compare this to ‘k2’, which takes a single value; in this way we can say
that ‘k1’ is in a “value” context.  Likewise ‘k6’ is in tail context
with respect to the expression as a whole, because its continuation is
the tail continuation, ‘ktail’.  CPS makes these details manifest, and
gives them names.


File: guile.info,  Node: CPS in Guile,  Next: Building CPS,  Prev: An Introduction to CPS,  Up: Continuation-Passing Style

9.4.4.2 CPS in Guile
....................

Guile’s CPS language is composed of “continuations”.  A continuation is
a labelled program point.  If you are used to traditional compilers,
think of a continuation as a trivial basic block.  A program is a “soup”
of continuations, represented as a map from labels to continuations.

   Like basic blocks, each continuation belongs to only one function.
Some continuations are special, like the continuation corresponding to a
function’s entry point, or the continuation that represents the tail of
a function.  Others contain a “term”.  A term contains an “expression”,
which evaluates to zero or more values.  The term also records the
continuation to which it will pass its values.  Some terms, like
conditional branches, may continue to one of a number of continuations.

   Continuation labels are small integers.  This makes it easy to sort
them and to group them into sets.  Whenever a term refers to a
continuation, it does so by name, simply recording the label of the
continuation.  Continuation labels are unique among the set of labels in
a program.

   Variables are also named by small integers.  Variable names are
unique among the set of variables in a program.

   For example, a simple continuation that receives two values and adds
them together can be matched like this, using the ‘match’ form from
‘(ice-9 match)’:

     (match cont
       (($ $kargs (x-name y-name) (x-var y-var)
           ($ $continue k src ($ $primcall '+ #f (x-var y-var))))
        (format #t "Add ~a and ~a and pass the result to label ~a"
                x-var y-var k)))

   Here we see the most common kind of continuation, ‘$kargs’, which
binds some number of values to variables and then evaluates a term.

 -- CPS Continuation: $kargs names vars term
     Bind the incoming values to the variables VARS, with original names
     NAMES, and then evaluate TERM.

   The NAMES of a ‘$kargs’ are just for debugging, and will end up
residualized in the object file for use by the debugger.

   The TERM in a ‘$kargs’ is always a ‘$continue’, which evaluates an
expression and continues to a continuation.

 -- CPS Term: $continue k src exp
     Evaluate the expression EXP and pass the resulting values (if any)
     to the continuation labelled K.  The source information associated
     with the expression may be found in SRC, which is either an alist
     as in ‘source-properties’ or is ‘#f’ if there is no associated
     source.

   There are a number of expression kinds.  Above you see an example of
‘$primcall’.

 -- CPS Expression: $primcall name param args
     Perform the primitive operation identified by ‘name’, a well-known
     symbol, passing it the arguments ARGS, and pass all resulting
     values to the continuation.

     PARAM is a constant parameter whose interpretation is up to the
     primcall in question.  Usually it’s ‘#f’ but for a primcall that
     might need some compile-time constant information – such as
     ‘add/immediate’, which adds a constant number to a value – the
     parameter holds this information.

     The set of available primitives includes many primitives known to
     Tree-IL and then some more; see the source code for details.  Note
     that some Tree-IL primcalls need to be converted to a sequence of
     lower-level CPS primcalls.  Again, see ‘(language tree-il
     compile-cps)’ for full details.

   The variables that are used by ‘$primcall’, or indeed by any
expression, must be defined before the expression is evaluated.  An
equivalent way of saying this is that predecessor ‘$kargs’
continuation(s) that bind the variables(s) used by the expression must
“dominate” the continuation that uses the expression: definitions
dominate uses.  This condition is trivially satisfied in our example
above, but in general to determine the set of variables that are in
“scope” for a given term, you need to do a flow analysis to see what
continuations dominate a term.  The variables that are in scope are
those variables defined by the continuations that dominate a term.

   Here is an inventory of the kinds of expressions in Guile’s CPS
language, besides ‘$primcall’ which has already been described.  Recall
that all expressions are wrapped in a ‘$continue’ term which specifies
their continuation.

 -- CPS Expression: $const val
     Continue with the constant value VAL.

 -- CPS Expression: $prim name
     Continue with the procedure that implements the primitive operation
     named by NAME.

 -- CPS Expression: $call proc args
     Call PROC with the arguments ARGS, and pass all values to the
     continuation.  PROC and the elements of the ARGS list should all be
     variable names.  The continuation identified by the term’s K should
     be a ‘$kreceive’ or a ‘$ktail’ instance.

 -- CPS Expression: $values args
     Pass the values named by the list ARGS to the continuation.

 -- CPS Expression: $prompt escape? tag handler

   There are two sub-languages of CPS, “higher-order CPS” and
“first-order CPS”. The difference is that in higher-order CPS, there are
‘$fun’ and ‘$rec’ expressions that bind functions or mutually-recursive
functions in the implicit scope of their use sites.  Guile transforms
higher-order CPS into first-order CPS by “closure conversion”, which
chooses representations for all closures and which arranges to access
free variables through the implicit closure parameter that is passed to
every function call.

 -- CPS Expression: $fun body
     Continue with a procedure.  BODY names the entry point of the
     function, which should be a ‘$kfun’.  This expression kind is only
     valid in higher-order CPS, which is the CPS language before closure
     conversion.

 -- CPS Expression: $rec names vars funs
     Continue with a set of mutually recursive procedures denoted by
     NAMES, VARS, and FUNS.  NAMES is a list of symbols, VARS is a list
     of variable names (unique integers), and FUNS is a list of ‘$fun’
     values.  Note that the ‘$kargs’ continuation should also define
     NAMES/VARS bindings.

   The contification pass will attempt to transform the functions
declared in a ‘$rec’ into local continuations.  Any remaining ‘$fun’
instances are later removed by the closure conversion pass.  If the
function has no free variables, it gets allocated as a constant.

 -- CPS Expression: $const-fun label
     A constant which is a function whose entry point is LABEL.  As a
     constant, instances of ‘$const-fun’ with the same LABEL will not
     allocate; the space for the function is allocated as part of the
     compilation unit.

     In practice, ‘$const-fun’ expressions are reified by CPS-conversion
     for functions whose call sites are not all visible within the
     compilation unit and which have no free variables.  This expression
     kind is part of first-order CPS.

   Otherwise, if the closure has free variables, it will be allocated at
its definition site via an ‘allocate-words’ primcall and its free
variables initialized there.  The code pointer in the closure is
initialized from a ‘$code’ expression.

 -- CPS Expression: $code label
     Continue with the value of LABEL, which should denote some ‘$kfun’
     continuation in the program.  Used when initializing the code
     pointer of closure objects.

   However, If the closure can be proven to never escape its scope then
other lighter-weight representations can be chosen.  Additionally, if
all call sites are known, closure conversion will hard-wire the calls by
lowering ‘$call’ to ‘$callk’.

 -- CPS Expression: $callk label proc args
     Like ‘$call’, but for the case where the call target is known to be
     in the same compilation unit.  LABEL should denote some ‘$kfun’
     continuation in the program.  In this case the PROC is simply an
     additional argument, since it is not used to determine the call
     target at run-time.

   To summarize: a ‘$continue’ is a CPS term that continues to a single
label.  But there are other kinds of CPS terms that can continue to a
different number of labels: ‘$branch’, ‘$switch’, ‘$throw’, and
‘$prompt’.

 -- CPS Term: $branch kf kt src op param args
     Evaluate the branching primcall OP, with arguments ARGS and
     constant parameter PARAM, and continue to KT with zero values if
     the test is true.  Otherwise continue to KF.

     The ‘$branch’ term is like a ‘$continue’ term with a ‘$primcall’
     expression, except that instead of binding a value and continuing
     to a single label, the result of the test is not bound but instead
     used to choose the continuation label.

     The set of operations (corresponding to OP values) that are valid
     in a $BRANCH is limited.  In the general case, bind the result of a
     test expression to a variable, and then make a ‘$branch’ on a
     ‘true?’ op referencing that variable.  The optimizer should inline
     the branch if possible.

 -- CPS Term: $switch kf kt* src arg
     Continue to a label in the list K* according to the index argument
     ARG, or to the default continuation KF if ARG is greater than or
     equal to the length K*.  The index variable ARG is an unboxed,
     unsigned 64-bit value.

     The ‘$switch’ term is like C’s ‘switch’ statement.  The compiler to
     CPS can generate a ‘$switch’ term directly, if the source language
     has such a concept, or it can rely on the CPS optimizer to turn
     appropriate chains of ‘$branch’ statements to ‘$switch’ instances,
     which is what the Scheme compiler does.

 -- CPS Term: $throw src op param args
     Throw a non-resumable exception.  Throw terms do not continue at
     all.  The usual value of OP is ‘throw’, with two arguments KEY and
     ARGS.  There are also some specific primcalls that compile to the
     VM ‘throw/value’ and ‘throw/value+data’ instructions; see the code
     for full details.

     The advantage of having ‘$throw’ as a term is that, because it does
     not continue, this allows the optimizer to gather more information
     from type predicates.  For example, if the predicate is ‘char?’ and
     the KF continues to a throw, the set of labels dominated by KT is
     larger than if the throw notationally continued to some label that
     would never be reached by the throw.

 -- CPS Term: $prompt k kh src escape? tag
     Push a prompt on the stack identified by the variable name TAG,
     which may be escape-only if ESCAPE? is true, and continue to KH
     with zero values.  If the body aborts to this prompt, control will
     proceed at the continuation labelled KH, which should be a
     ‘$kreceive’ continuation.  Prompts are later popped by ‘pop-prompt’
     primcalls.

   At this point we have described terms, expressions, and the most
common kind of continuation, ‘$kargs’.  ‘$kargs’ is used when the
predecessors of the continuation can be instructed to pass the values
where the continuation wants them.  For example, if a ‘$kargs’
continuation K binds a variable V, and the compiler decides to allocate
V to slot 6, all predecessors of K should put the value for V in slot 6
before jumping to K.  One situation in which this isn’t possible is
receiving values from function calls.  Guile has a calling convention
for functions which currently places return values on the stack.  A
continuation of a call must check that the number of values returned
from a function matches the expected number of values, and then must
shuffle or collect those values to named variables.  ‘$kreceive’ denotes
this kind of continuation.

 -- CPS Continuation: $kreceive arity k
     Receive values on the stack.  Parse them according to ARITY, and
     then proceed with the parsed values to the ‘$kargs’ continuation
     labelled K.  As a limitation specific to ‘$kreceive’, ARITY may
     only contain required and rest arguments.

   ‘$arity’ is a helper data structure used by ‘$kreceive’ and also by
‘$kclause’, described below.

 -- CPS Data: $arity req opt rest kw allow-other-keys?
     A data type declaring an arity.  REQ and OPT are lists of source
     names of required and optional arguments, respectively.  REST is
     either the source name of the rest variable, or ‘#f’ if this arity
     does not accept additional values.  KW is a list of the form
     ‘((KEYWORD NAME VAR) ...)’, describing the keyword arguments.
     ALLOW-OTHER-KEYS? is true if other keyword arguments are allowed
     and false otherwise.

     Note that all of these names with the exception of the VARs in the
     KW list are source names, not unique variable names.

   Additionally, there are three specific kinds of continuations that
are only used in function entries.

 -- CPS Continuation: $kfun src meta self tail clause
     Declare a function entry.  SRC is the source information for the
     procedure declaration, and META is the metadata alist as described
     above in Tree-IL’s ‘<lambda>’.  SELF is a variable bound to the
     procedure being called, and which may be used for self-references.
     TAIL is the label of the ‘$ktail’ for this function, corresponding
     to the function’s tail continuation.  CLAUSE is the label of the
     first ‘$kclause’ for the first ‘case-lambda’ clause in the
     function, or otherwise ‘#f’.

 -- CPS Continuation: $ktail
     A tail continuation.

 -- CPS Continuation: $kclause arity cont alternate
     A clause of a function with a given arity.  Applications of a
     function with a compatible set of actual arguments will continue to
     the continuation labelled CONT, a ‘$kargs’ instance representing
     the clause body.  If the arguments are incompatible, control
     proceeds to ALTERNATE, which is a ‘$kclause’ for the next clause,
     or ‘#f’ if there is no next clause.


File: guile.info,  Node: Building CPS,  Next: CPS Soup,  Prev: CPS in Guile,  Up: Continuation-Passing Style

9.4.4.3 Building CPS
....................

Unlike Tree-IL, the CPS language is built to be constructed and
deconstructed with abstract macros instead of via procedural
constructors or accessors, or instead of S-expression matching.

   Deconstruction and matching is handled adequately by the ‘match’ form
from ‘(ice-9 match)’.  *Note Pattern Matching::.  Construction is
handled by a set of mutually builder macros: ‘build-term’, ‘build-cont’,
and ‘build-exp’.

   In the following interface definitions, consider ‘term’ and ‘exp’ to
be built by ‘build-term’ or ‘build-exp’, respectively.  Consider any
other name to be evaluated as a Scheme expression.  Many of these forms
recognize ‘unquote’ in some contexts, to splice in a previously-built
value; see the specifications below for full details.

 -- Scheme Syntax: build-term ,val
 -- Scheme Syntax: build-term ($continue k src exp)
 -- Scheme Syntax: build-exp ,val
 -- Scheme Syntax: build-exp ($const val)
 -- Scheme Syntax: build-exp ($prim name)
 -- Scheme Syntax: build-exp ($fun kentry)
 -- Scheme Syntax: build-exp ($const-fun kentry)
 -- Scheme Syntax: build-exp ($code kentry)
 -- Scheme Syntax: build-exp ($rec names syms funs)
 -- Scheme Syntax: build-exp ($call proc (arg ...))
 -- Scheme Syntax: build-exp ($call proc args)
 -- Scheme Syntax: build-exp ($callk k proc (arg ...))
 -- Scheme Syntax: build-exp ($callk k proc args)
 -- Scheme Syntax: build-exp ($primcall name param (arg ...))
 -- Scheme Syntax: build-exp ($primcall name param args)
 -- Scheme Syntax: build-exp ($values (arg ...))
 -- Scheme Syntax: build-exp ($values args)
 -- Scheme Syntax: build-exp ($prompt escape? tag handler)
 -- Scheme Syntax: build-term ($branch kf kt src op param (arg ...))
 -- Scheme Syntax: build-term ($branch kf kt src op param args)
 -- Scheme Syntax: build-term ($switch kf kt* src arg)
 -- Scheme Syntax: build-term ($throw src op param (arg ...))
 -- Scheme Syntax: build-term ($throw src op param args)
 -- Scheme Syntax: build-term ($prompt k kh src escape? tag)
 -- Scheme Syntax: build-cont ,val
 -- Scheme Syntax: build-cont ($kargs (name ...) (sym ...) term)
 -- Scheme Syntax: build-cont ($kargs names syms term)
 -- Scheme Syntax: build-cont ($kreceive req rest kargs)
 -- Scheme Syntax: build-cont ($kfun src meta self ktail kclause)
 -- Scheme Syntax: build-cont ($kclause ,arity kbody kalt)
 -- Scheme Syntax: build-cont ($kclause (req opt rest kw aok?) kbody)
     Construct a CPS term, expression, or continuation.

   There are a few more miscellaneous interfaces as well.

 -- Scheme Procedure: make-arity req opt rest kw allow-other-keywords?
     A procedural constructor for ‘$arity’ objects.

 -- Scheme Syntax: rewrite-term val (pat term) ...
 -- Scheme Syntax: rewrite-exp val (pat exp) ...
 -- Scheme Syntax: rewrite-cont val (pat cont) ...
     Match VAL against the series of patterns PAT..., using ‘match’.
     The body of the matching clause should be a template in the syntax
     of ‘build-term’, ‘build-exp’, or ‘build-cont’, respectively.


File: guile.info,  Node: CPS Soup,  Next: Compiling CPS,  Prev: Building CPS,  Up: Continuation-Passing Style

9.4.4.4 CPS Soup
................

We describe programs in Guile’s CPS language as being a kind of “soup”
because all continuations in the program are mixed into the same “pot”,
so to speak, without explicit markers as to what function or scope a
continuation is in.  A program in CPS is a map from continuation labels
to continuation values.  As discussed in the introduction, a
continuation label is an integer.  No label may be negative.

   As a matter of convention, label 0 should map to the ‘$kfun’
continuation of the entry to the program, which should be a function of
no arguments.  The body of a function consists of the labelled
continuations that are reachable from the function entry.  A program can
refer to other functions, either via ‘$fun’ and ‘$rec’ in higher-order
CPS, or via ‘$const-fun’, ‘$callk’, and allocated closures in
first-order CPS. The program logically contains all continuations of all
functions reachable from the entry function.  A compiler pass may leave
unreachable continuations in a program; subsequent compiler passes
should ensure that their transformations and analyses only take
reachable continuations into account.  It’s OK though if transformation
runs over all continuations if including the unreachable continuations
has no effect on the transformations on the live continuations.

   The “soup” itself is implemented as an “intmap”, a functional
array-mapped trie specialized for integer keys.  Intmaps associate
integers with values of any kind.  Currently intmaps are a private data
structure only used by the CPS phase of the compiler.  To work with
intmaps, load the ‘(language cps intmap)’ module:

     (use-modules (language cps intmap))

   Intmaps are functional data structures, so there is no constructor as
such: one can simply start with the empty intmap and add entries to it.

     (intmap? empty-intmap) ⇒ #t
     (define x (intmap-add empty-intmap 42 "hi"))
     (intmap? x) ⇒ #t
     (intmap-ref x 42) ⇒ "hi"
     (intmap-ref x 43) ⇒ error: 43 not present
     (intmap-ref x 43 (lambda (k) "yo!")) ⇒ "yo"
     (intmap-add x 42 "hej") ⇒ error: 42 already present

   ‘intmap-ref’ and ‘intmap-add’ are the core of the intmap interface.
There is also ‘intmap-replace’, which replaces the value associated with
a given key, requiring that the key was present already, and
‘intmap-remove’, which removes a key from an intmap.

   Intmaps have a tree-like structure that is well-suited to set
operations such as union and intersection, so there are also the binary
‘intmap-union’ and ‘intmap-intersect’ procedures.  If the result is
equivalent to either argument, that argument is returned as-is; in that
way, one can detect whether the set operation produced a new result
simply by checking with ‘eq?’.  This makes intmaps useful when computing
fixed points.

   If a key is present in both intmaps and the associated values are not
the same in the sense of ‘eq?’, the resulting value is determined by a
“meet” procedure, which is the optional last argument to ‘intmap-union’,
‘intmap-intersect’, and also to ‘intmap-add’, ‘intmap-replace’, and
similar functions.  The meet procedure will be called with the two
values and should return the intersected or unioned value in some
domain-specific way.  If no meet procedure is given, the default meet
procedure will raise an error.

   To traverse over the set of values in an intmap, there are the
‘intmap-next’ and ‘intmap-prev’ procedures.  For example, if intmap X
has one entry mapping 42 to some value, we would have:

     (intmap-next x) ⇒ 42
     (intmap-next x 0) ⇒ 42
     (intmap-next x 42) ⇒ 42
     (intmap-next x 43) ⇒ #f
     (intmap-prev x) ⇒ 42
     (intmap-prev x 42) ⇒ 42
     (intmap-prev x 41) ⇒ #f

   There is also the ‘intmap-fold’ procedure, which folds over keys and
values in the intmap from lowest to highest value, and
‘intmap-fold-right’ which does so in the opposite direction.  These
procedures may take up to 3 seed values.  The number of values that the
fold procedure returns is the number of seed values.

     (define q (intmap-add (intmap-add empty-intmap 1 2) 3 4))
     (intmap-fold acons q '()) ⇒ ((3 . 4) (1 . 2))
     (intmap-fold-right acons q '()) ⇒ ((1 . 2) (3 . 4))

   When an entry in an intmap is updated (removed, added, or changed), a
new intmap is created that shares structure with the original intmap.
This operation ensures that the result of existing computations is not
affected by future computations: no mutation is ever visible to user
code.  This is a great property in a compiler data structure, as it lets
us hold a copy of a program before a transformation and use it while we
build a post-transformation program.  Updating an intmap is O(log N) in
the size of the intmap.

   However, the O(log N) allocation costs are sometimes too much,
especially in cases when we know that we can just update the intmap in
place.  As an example, say we have an intmap mapping the integers 1 to
100 to the integers 42 to 141.  Let’s say that we want to transform this
map by adding 1 to each value.  There is already an efficient
‘intmap-map’ procedure in the ‘(language cps utils’) module, but if we
didn’t know about that we might do:

     (define (intmap-increment map)
       (let lp ((k 0) (map map))
         (let ((k (intmap-next map k)))
           (if k
               (let ((v (intmap-ref map k)))
                 (lp (1+ k) (intmap-replace map k (1+ v))))
               map))))

   Observe that the intermediate values created by ‘intmap-replace’ are
completely invisible to the program – only the last result of
‘intmap-replace’ value is needed.  The rest might as well share state
with the last one, and we could update in place.  Guile allows this kind
of interface via “transient intmaps”, inspired by Clojure’s transient
interface (<http://clojure.org/transients>).

   The in-place ‘intmap-add!’ and ‘intmap-replace!’ procedures return
transient intmaps.  If one of these in-place procedures is called on a
normal persistent intmap, a new transient intmap is created.  This is an
O(1) operation.  In all other respects the interface is like their
persistent counterparts, ‘intmap-add’ and ‘intmap-replace’.  If an
in-place procedure is called on a transient intmap, the intmap is
mutated in-place and the same value is returned.

   If a persistent operation like ‘intmap-add’ is called on a transient
intmap, the transient’s mutable substructure is then marked as
persistent, and ‘intmap-add’ then runs on a new persistent intmap
sharing structure but not state with the original transient.  Mutating a
transient will cause enough copying to ensure that it can make its
change, but if part of its substructure is already “owned” by it, no
more copying is needed.

   We can use transients to make ‘intmap-increment’ more efficient.  The
two changed elements have been marked *like this*.

     (define (intmap-increment map)
       (let lp ((k 0) (map map))
         (let ((k (intmap-next map k)))
           (if k
               (let ((v (intmap-ref map k)))
                 (lp (1+ k) (*intmap-replace!* map k (1+ v))))
               (*persistent-intmap* map)))))

   Be sure to tag the result as persistent using the ‘persistent-intmap’
procedure to prevent the mutability from leaking to other parts of the
program.  For added paranoia, you could call ‘persistent-intmap’ on the
incoming map, to ensure that if it were already transient, that the
mutations in the body of ‘intmap-increment’ wouldn’t affect the incoming
value.

   In summary, programs in CPS are intmaps whose values are
continuations.  See the source code of ‘(language cps utils)’ for a
number of useful facilities for working with CPS values.


File: guile.info,  Node: Compiling CPS,  Prev: CPS Soup,  Up: Continuation-Passing Style

9.4.4.5 Compiling CPS
.....................

Compiling CPS in Guile has three phases: conversion, optimization, and
code generation.

   CPS conversion is the process of taking a higher-level language and
compiling it to CPS. Source languages can do this directly, or they can
convert to Tree-IL (which is probably easier) and let Tree-IL convert to
CPS later.  Going through Tree-IL has the advantage of running Tree-IL
optimization passes, like partial evaluation.  Also, the compiler from
Tree-IL to CPS handles assignment conversion, in which assigned local
variables (in Tree-IL, locals that are ‘<lexical-set>’) are converted to
being boxed values on the heap.  *Note Variables and the VM::.

   After CPS conversion, Guile runs some optimization passes over the
CPS. Most optimization in Guile is done on the CPS language.  The one
major exception is partial evaluation, which for historic reasons is
done on Tree-IL.

   The major optimization performed on CPS is contification, in which
functions that are always called with the same continuation are
incorporated directly into a function’s body.  This opens up space for
more optimizations, and turns procedure calls into ‘goto’.  It can also
make loops out of recursive function nests.  Guile also does dead code
elimination, common subexpression elimination, loop peeling and
invariant code motion, and range and type inference.

   The rest of the optimization passes are really cleanups and
canonicalizations.  CPS spans the gap between high-level languages and
low-level bytecodes, which allows much of the compilation process to be
expressed as source-to-source transformations.  Such is the case for
closure conversion, in which references to variables that are free in a
function are converted to closure references, and in which functions are
converted to closures.  There are a few more passes to ensure that the
only primcalls left in the term are those that have a corresponding
instruction in the virtual machine, and that their continuations expect
the right number of values.

   Finally, the backend of the CPS compiler emits bytecode for each
function, one by one.  To do so, it determines the set of live variables
at all points in the function.  Using this liveness information, it
allocates stack slots to each variable, such that a variable can live in
one slot for the duration of its lifetime, without shuffling.  (Of
course, variables with disjoint lifetimes can share a slot.)  Finally
the backend emits code, typically just one VM instruction, for each
continuation in the function.


File: guile.info,  Node: Bytecode,  Next: Writing New High-Level Languages,  Prev: Continuation-Passing Style,  Up: Compiling to the Virtual Machine

9.4.5 Bytecode
--------------

As mentioned before, Guile compiles all code to bytecode, and that
bytecode is contained in ELF images.  *Note Object File Format::, for
more on Guile’s use of ELF.

   To produce a bytecode image, Guile provides an assembler and a
linker.

   The assembler, defined in the ‘(system vm assembler)’ module, has a
relatively straightforward imperative interface.  It provides a
‘make-assembler’ function to instantiate an assembler and a set of
‘emit-INST’ procedures to emit instructions of each kind.

   The ‘emit-INST’ procedures are actually generated at compile-time
from a machine-readable description of the VM. With a few exceptions for
certain operand types, each operand of an emit procedure corresponds to
an operand of the corresponding instruction.

   Consider ‘allocate-words’, from *note Memory Access Instructions::.
It is documented as:

 -- Instruction: allocate-words s12:DST s12:NWORDS

   Therefore the emit procedure has the form:

 -- Scheme Procedure: emit-allocate-words asm dst nwords

   All emit procedure take the assembler as their first argument, and
return no useful values.

   The argument types depend on the operand types.  *Note Instruction
Set::.  Most are integers within a restricted range, though labels are
generally expressed as opaque symbols.  Besides the emitters that
correspond to instructions, there are a few additional helpers defined
in the assembler module.

 -- Scheme Procedure: emit-label asm label
     Define a label at the current program point.

 -- Scheme Procedure: emit-source asm source
     Associate SOURCE with the current program point.

 -- Scheme Procedure: emit-cache-ref asm dst key
 -- Scheme Procedure: emit-cache-set! asm key val
     Macro-instructions to implement compilation-unit caches.  A single
     cache cell corresponding to KEY will be allocated for the
     compilation unit.

 -- Scheme Procedure: emit-load-constant asm dst constant
     Load the Scheme datum CONSTANT into DST.

 -- Scheme Procedure: emit-begin-program asm label properties
 -- Scheme Procedure: emit-end-program asm
     Delimit the bounds of a procedure, with the given LABEL and the
     metadata PROPERTIES.

 -- Scheme Procedure: emit-load-static-procedure asm dst label
     Load a procedure with the given LABEL into local DST.  This
     macro-instruction should only be used with procedures without free
     variables – procedures that are not closures.

 -- Scheme Procedure: emit-begin-standard-arity asm req nlocals
          alternate
 -- Scheme Procedure: emit-begin-opt-arity asm req opt rest nlocals
          alternate
 -- Scheme Procedure: emit-begin-kw-arity asm req opt rest kw-indices
          allow-other-keys? nlocals alternate
 -- Scheme Procedure: emit-end-arity asm
     Delimit a clause of a procedure.

   The linker is a complicated beast.  Hackers interested in how it
works would do well do read Ian Lance Taylor’s series of articles on
linkers.  Searching the internet should find them easily.  From the
user’s perspective, there is only one knob to control: whether the
resulting image will be written out to a file or not.  If the user
passes ‘#:to-file? #t’ as part of the compiler options (*note The Scheme
Compiler::), the linker will align the resulting segments on page
boundaries, and otherwise not.

 -- Scheme Procedure: link-assembly asm #:page-aligned?=#t
     Link an ELF image, and return the bytevector.  If PAGE-ALIGNED? is
     true, Guile will align the segments with different permissions on
     page-sized boundaries, in order to maximize code sharing between
     different processes.  Otherwise, padding is minimized, to minimize
     address space consumption.

   To write an image to disk, just use ‘put-bytevector’ from ‘(ice-9
binary-ports)’.

   Compiling object code to the fake language, ‘value’, is performed via
loading objcode into a program, then executing that thunk with respect
to the compilation environment.  Normally the environment propagates
through the compiler transparently, but users may specify the
compilation environment manually as well, as a module.  Procedures to
load images can be found in the ‘(system vm loader)’ module:

     (use-modules (system vm loader))

 -- Scheme Variable: load-thunk-from-file file
 -- C Function: scm_load_thunk_from_file (file)
     Load object code from a file named FILE.  The file will be mapped
     into memory via ‘mmap’, so this is a very fast operation.

 -- Scheme Variable: load-thunk-from-memory bv
 -- C Function: scm_load_thunk_from_memory (bv)
     Load object code from a bytevector.  The data will be copied out of
     the bytevector in order to ensure proper alignment of embedded
     Scheme values.

   Additionally there are procedures to find the ELF image for a given
pointer, or to list all mapped ELF images:

 -- Scheme Variable: find-mapped-elf-image ptr
     Given the integer value PTR, find and return the ELF image that
     contains that pointer, as a bytevector.  If no image is found,
     return ‘#f’.  This routine is mostly used by debuggers and other
     introspective tools.

 -- Scheme Variable: all-mapped-elf-images
     Return all mapped ELF images, as a list of bytevectors.