1% -*- mode: ess-noweb; ess-noweb-code-mode: R-mode -*-
2
3\documentclass[11pt]{article}
4\usepackage{hyperref}
5\usepackage[headings]{fullpage}
6\usepackage{verbatim}
7\usepackage{noweb}
8
9% This is a minor modification to the new verbatim environment that uses
10% the same size as noweb's code size and indents the same amount at
11% noweb's code chunks. I redefine verbatim instead of defining my own
12% environment since the html converter only seems to understand
13% verbatim, not new definitions.
14\makeatletter
15\addto@hook{\every@verbatim}{\nowebsize\setlength{\leftmargin}{50mm}}
16\def\verbatim@processline{\hspace{\codemargin}\the\verbatim@line\par}
17\makeatother
18
19% The following try to prevent wasteful page breaks
20\def\nwendcode{\endtrivlist \endgroup}
21\let\nwdocspar=\par
22
23\pagestyle{noweb}
24\bibliographystyle{plain}
25
26\noweboptions{noidentxref,longchunks,smallcode}
27
28\title{A Byte Code Compiler for R}
29\author{Luke Tierney\\
30  Department of Statistics and Actuarial Science\\
31  University of Iowa}
32
33\begin{document}
34\maketitle
35This document presents the current implementation of the byte code
36compiler for R.  The compiler produces code for a virtual machine that
37is then executed by a virtual machine runtime system.  The virtual
38machine is a stack based machine.  Thus instructions for the virtual
39machine take arguments off a stack and may leave one or more results
40on the stack.  Byte code objects consists of an integer vector
41representing instruction opcodes and operands, and a generic vector
42representing a constant pool. The compiler is implemented almost
43entirely in R, with just a few support routines in C to manage
44compiled code objects.
45
46The virtual machine instruction set is designed to allow much of the
47interpreter internals to be re-used.  In particular, for now the
48mechanism for calling functions of all types from compiled code
49remains the same as the function calling mechanism for interpreted
50code.  There are opportunities for efficiency improvements through
51using a different mechanism for calls from compiled functions to
52compiled functions, or changing the mechanism for both interpreted and
53compiled code; this will be explored in future work.
54
55The style used by the compiler for building up code objects is
56imperative: A code buffer object is created that contains buffers for
57the instruction stream and the constant pool.  Instructions and
58constants are written to the buffer as the compiler processes an
59expression tree, and at the end a code object is constructed. A more
60functional design in which each compiler step returns a modified code
61object might be more elegant in principle, but it would be more
62difficult to make efficient.
63
64A multi-pass compiler in which a first pass produces an intermediate
65representation, subsequent passes optimize the intermediate
66representation, and a final pass produces actual code would also be
67useful and might be able to produce better code.  A future version of
68the compiler may use this approach.  But for now to keep things simple
69a single pass is used.
70
71%% **** Some peephole optimization is probably possible, and at least
72%% **** some constant folding could be done on the bytecode, but more
73%% **** sophisticated optimizations like inlining or R code would require
74%% **** a more suitable intermediate representation.
75
76%% **** I _think_ conversion from stack-based byte code to register-based
77%% **** code is reasonably straight forward but I haven't thought it
78%% **** through thoroughly yet.
79
80
81\section{The compiler interface}
82The compiler can be used either explicitly by calling certain
83functions to carry out compilations, or implicitly by enabling
84compilation to occur automatically at certain points.
85
86\subsection{Explicit compilation}
87The primary functions for explicit compilation are [[compile]],
88[[cmpfun]], and [[cmpfile]].
89
90The [[compile]] function compiles an expression and returns a byte code
91object, which can then be passed to [[eval]].  A simple example is
92\begin{verbatim}
93> library(compiler)
94> compile(quote(1+3))
95<bytecode: 0x25ba070>
96> eval(compile(quote(1+3)))
97[1] 4
98\end{verbatim}
99
100A closure can be compiled using [[cmpfun]].  If the function [[f]] is
101defined as
102\begin{verbatim}
103f <- function(x) {
104    s <- 0.0
105    for (y in x)
106        s <- s + y
107    s
108}
109\end{verbatim}
110then a compiled version is produced by
111\begin{verbatim}
112fc <- cmpfun(f)
113\end{verbatim}
114We can then compare the performance of the interpreted and compiled
115versions:
116\begin{verbatim}
117> x <- as.double(1 : 10000000)
118> system.time(f(x))
119   user  system elapsed
120  6.470   0.010   6.483
121> system.time(fc(x))
122   user  system elapsed
123  1.870   0.000   1.865
124\end{verbatim}
125
126A source file can be compiled with [[cmpfile]].  For now, the resulting
127file has to then be loaded with [[loadcmp]].  In the future it may
128make sense to allow [[source]] to either load a pre-compiled file or
129to optionally compile while sourcing.
130
131\subsection{Implicit compilation}
132Implicit compilation can be used to compile packages as they are
133installed or for just-in-time (JIT) compilation of functions or
134expressions.  The mechanism for enabling these is experimental and
135likely to change.
136
137For now, compilation of packages requires the use of lazy loading and can be
138enabled either by calling [[compilePKGS]] with argument [[TRUE]] or by
139starting R with the environment variable [[_R_COMPILE_PKGS_]] set to a
140positive integer value.  These settings are used internally during R build
141to compile the base package (and tools, utils, methods, etc) and by
142[[R CMD INSTALL]].
143Functions are compiled as they are written to the lazy loading database.
144Compilation of packages should only be enabled for that time, because it
145adds noticeable time and space overhead to any serialization.
146
147In a UNIX-like environment, for example, installing a package with
148\begin{verbatim}
149env R_COMPILE_PKGS=1 R CMD INSTALL foo.tar.gz
150\end{verbatim}
151will internally enable package compilation using [[compilePKGS]].
152
153If R is installed from source then the base and required packages can
154be compiled on installation using
155\begin{verbatim}
156make bytecode
157\end{verbatim}
158This does not require setting the [[_R_COMPILE_PKGS_]] environment variable.
159
160JIT compilation can be enabled from within R by calling [[enableJIT]]
161with a non-negative integer argument or by starting R with the
162environment variable [[R_ENABLE_JIT]] set to a non-negative integer.
163The possible values of the argument to [[enableJIT]] and their
164meanings are
165\begin{itemize}
166\item[0] turn off JIT
167\item[1] compile closures before they are called the first time
168\item[2] same as 1, plus compile closures before duplicating (useful
169  for packages that store closures in lists, like lattice)
170\item[3] same as 2, plus compile all [[for()]], [[while()]], and
171  [[repeat()]] loops before executing.
172\end{itemize}
173R may initially be somewhat sluggish if JIT is enabled and base and
174recommended packages have not been pre-compiled as almost everything
175will initially need some compilation.
176
177
178\section{The basic compiler}
179This section presents the basic compiler for compiling R expressions
180to byte code objects.
181
182
183\subsection{The compiler top level}
184R expressions consist of function calls, variable references, and
185literal constants.  To create a byte code object representing an R
186expression the compiler has to walk the expression tree and emit code
187for the different node types in encounters. The code emitted may
188depend on the environment in which the expression will be evaluated as
189well as various compiler option settings.
190
191The simplest function in the top level compiler interface is the
192function [[compile]].  This function requires an expression argument
193and takes three optional arguments: an environment, a list of
194options and source code reference.  The default environment is the global
195environment. By default, the source reference argument is [[NULL]] and the
196source reference is taken from the [[srcref]] attribute of the expression
197argument.
198<<[[compile]] function>>=
199compile <- function(e, env = .GlobalEnv, options = NULL, srcref = NULL) {
200    cenv <- makeCenv(env)
201    cntxt <- make.toplevelContext(cenv, options)
202    cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt))
203    if (mayCallBrowser(e, cntxt))
204        ## NOTE: compilation will be attempted repeatedly
205        e
206    else if (is.null(srcref))
207        genCode(e, cntxt)
208    else
209        genCode(e, cntxt, loc = list(expr = e, srcref = srcref))
210}
211@ %def compile
212The supplied environment is converted into a compilation environment
213data structure.  This compilation environment and any options
214provided are then used to construct a compiler context.  The function
215[[genCode]] is then used to generate a byte code object for the
216expression and the constructed compilation context.
217
218Compilation environments are described in Section
219\ref{sec:environments} and compiler contexts in Section
220\ref{sec:contexts}. The [[genCode]] function is defined as
221<<[[genCode]] function>>=
222genCode <- function(e, cntxt, gen = NULL, loc = NULL) {
223    cb <- make.codeBuf(e, loc)
224    if (is.null(gen))
225        cmp(e, cb, cntxt, setloc = FALSE)
226    else
227        gen(cb, cntxt)
228    codeBufCode(cb, cntxt)
229}
230@ %def genCode
231[[genCode]] creates a code buffer, fills the code buffer, and then
232calls [[codeBufCode]] to extract and return the byte code object.  In
233the most common case [[genCode]] uses the low level recursive
234compilation function [[cmp]], described in Section \ref{subsec:cmp},
235to generate the code.  For added flexibility it can be given a
236generator function that emits code into the code buffer based on the
237provided context.  This is used in Section \ref{sec:loops} for
238compilation of loop bodies in loops that require an explicit loop context
239(and a long jump in the byte-code interpreter).
240
241
242\subsection{Basic code buffer interface}
243Code buffers are used to accumulate the compiled code and related
244constant values.  A code buffer [[cb]] is a list containing a number
245of closures used to manipulate the content of the code buffer.  In
246this section two closures are used, [[putconst]] and [[putcode]].
247
248The closure [[cb$putconst]] is used to enter constants into the
249constant pool.  It takes a single argument, an arbitrary R object to
250be entered into the constant pool, and returns an integer index into
251the pool.  The [[cb$putcode]] closure takes an instruction opcode and
252any operands the opcode requires and emits them into the code buffer.
253The operands are typically constant pool indices or labels, to be
254introduced in Section \ref{sec:codebuf}.
255
256As an example, the [[GETVAR]] instruction takes one operand, the index
257in the constant pool of a symbol. The opcode for this instruction is
258[[GETVAR.OP]].  The instruction retrieves the symbol from the constant
259pool, looks up its value in the current environment, and pushes the
260value on the stack.  If [[sym]] is a variable with value of a symbol,
261then code to enter the symbol in the constant pool and emit an
262instruction to get its value would be
263<<example of emitting a [[GETVAR]] instruction>>=
264ci <- cb$putconst(sym)
265cb$putcode(GETVAR.OP, ci)
266@ %def
267
268The complete code buffer implementation is given in Section
269\ref{sec:codebuf}.
270
271
272\subsection{The recursive code generator}
273\label{subsec:cmp}
274The function [[cmp]] is the basic code generation function. It
275recursively traverses the expression tree and emits code as it visits
276each node in the tree.
277
278Before generating code for an expression the function [[cmp]] attempts
279to determine the value of the expression by constant folding using the
280function [[constantFold]].  If constant folding is successful then
281[[contantFold]] returns a named list containing a [[value]] element.
282Otherwise it returns [[NULL]].  If constant folding is successful,
283then the result is compiled as a constant.  Otherwise, the standard
284code generation process is used.
285%% **** comment on alternative of doing constant folding as an
286%% **** optimization on the bytecode or an intermediate representation?
287
288In the interpreter there are four types of objects that are not
289treated as constants, i.e. as evaluating to themselves: function calls
290of type [["language"]], variable references of type [["symbol"]],
291promises, and byte code objects.  Neither promises nor byte code
292objects should appear as literals in code so an error is signaled for
293those.  The language, symbol, and constant cases are each handled by
294their own code generators.
295%% **** promises do appear in the expressions generated by the
296%% **** interpreter for evaluating complex assignment expressions
297<<generate code for expression [[e]]>>=
298if (typeof(e) == "language")
299    cmpCall(e, cb, cntxt)
300else if (typeof(e) == "symbol")
301    cmpSym(e, cb, cntxt, missingOK)
302else if (typeof(e) == "bytecode")
303    cntxt$stop(gettext("cannot compile byte code literals in code"),
304               cntxt, loc = cb$savecurloc())
305else if (typeof(e) == "promise")
306    cntxt$stop(gettext("cannot compile promise literals in code"),
307               cntxt, loc = cb$savecurloc())
308else
309    cmpConst(e, cb, cntxt)
310@
311The function [[cmp]] is then defined as
312<<[[cmp]] function>>=
313cmp <- function(e, cb, cntxt, missingOK = FALSE, setloc = TRUE) {
314    if (setloc) {
315        sloc <- cb$savecurloc()
316        cb$setcurexpr(e)
317    }
318    ce <- constantFold(e, cntxt, loc = cb$savecurloc())
319    if (is.null(ce)) {
320        <<generate code for expression [[e]]>>
321    }
322    else
323        cmpConst(ce$value, cb, cntxt)
324    if (setloc)
325        cb$restorecurloc(sloc)
326}
327@ %def cmp
328The call code generator [[cmpCall]] will recursively call [[cmp]].
329%% **** should promises/byte code produce compiler errors or runtime errors??
330
331
332\subsection{Compiling constant expressions}
333The constant code generator [[cmpConst]] is the simplest of the three
334generators. A simplified generator can be defined as
335<<simplified [[cmpConst]] function>>=
336cmpConst <- function(val, cb, cntxt) {
337    ci <- cb$putconst(val)
338    cb$putcode(LDCONST.OP, ci)
339    if (cntxt$tailcall) cb$putcode(RETURN.OP)
340}
341@ %def cmpConst
342This function enters the constant in the constant pool using the
343closure [[cb$putconst]].  The value returned by this closure is an
344index for the constant in the constant pool.  Then the code generator
345emits an instruction to load the constant at the specified constant
346pool index and push it onto the stack.  If the expression appears in
347tail position then a [[RETURN]] instruction is emitted as well.
348%% **** explain tail position here??
349
350Certain constant values, such as [[TRUE]], [[FALSE]], and [[NULL]]
351appear very often in code. It may be useful to provide and use special
352instructions for loading these. The resulting code will have slightly
353smaller constant pools and may be a little faster, though the
354difference is likely to be small.  A revised definition of
355[[cmpConst]] that makes use of instructions for loading these
356particular values is given by
357<<[[cmpConst]] function>>=
358cmpConst <- function(val, cb, cntxt) {
359    if (identical(val, NULL))
360        cb$putcode(LDNULL.OP)
361    else if (identical(val, TRUE))
362        cb$putcode(LDTRUE.OP)
363    else if (identical(val, FALSE))
364        cb$putcode(LDFALSE.OP)
365    else {
366        ci <- cb$putconst(val)
367        cb$putcode(LDCONST.OP, ci)
368    }
369    if (cntxt$tailcall) cb$putcode(RETURN.OP)
370}
371@ %def cmpConst
372It might be useful to handle other constants in a similar way, such as
373[[NA]] or small integer values; this may be done in the future.
374%% **** check out if small integers is worth doing.
375%% **** mention peephole optimization as alternative
376
377The implementation marks values in the constant pool as read-only
378after they are loaded. In the past, all values were duplicated as they
379were retrieved from the constant pool as a precaution against bad
380package code: several packages in the wild assumed that an expression
381[[TRUE]], for example, appearing in code would result in a freshly
382allocated value that could be freely modified in [[.C]] calls.
383
384
385\subsection{Compiling variable references}
386The function [[cmpSym]] handles compilation of variable
387references. For standard variables this involves entering the symbol
388in the constant pool, emitting code to look up the value of the
389variable at the specified constant pool location in the current
390environment, and, if necessary, emitting a [[RETURN]] instruction.
391
392In addition to standard variables there is the ellipsis variable
393[[...]]  and the accessors [[..1]], [[..2]], and so on that need to be
394considered. The ellipsis variable can only appear as an argument in
395function calls, so [[cmp]], like the interpreter [[eval]] itself,
396should not encounter it. The interpreter signals an error if it does
397encounter a [[...]] variable, and the compiler emits code that does
398the same at runtime.  The compiler also emits a warning at compile
399time.  Variables representing formal parameters may not have values
400provided in their calls, i.e. may have missing values. In some cases
401this should signal an error; in others the missing value can be passed
402on (for example in expressions of the form [[x[]]]). To support this,
403[[cmpSym]] takes an optional argument for allowing missing argument
404values.
405<<[[cmpSym]] function>>=
406cmpSym <- function(sym, cb, cntxt, missingOK = FALSE) {
407    if (sym == "...") {
408        notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc())
409        cb$putcode(DOTSERR.OP)
410    }
411    else if (is.ddsym(sym)) {
412        <<emit code for [[..n]] variable references>>
413    }
414    else {
415        <<emit code for standard variable references>>
416    }
417}
418@ %def cmpSym
419
420References to [[..n]] variables are also only appropriate when a
421[[...]] variable is available, so a warning is given if that is not
422the case. The virtual machine provides instructions [[DDVAL]] and
423[[DDVAL_MISSOK]] for the case where missing arguments are not allowed
424and for the case where they are, and the appropriate instruction is
425used based on the [[missingOK]] argument to [[cmpSym]].
426<<emit code for [[..n]] variable references>>=
427if (! findLocVar("...", cntxt))
428    notifyWrongDotsUse(sym, cntxt, loc = cb$savecurloc())
429ci <- cb$putconst(sym)
430if (missingOK)
431    cb$putcode(DDVAL_MISSOK.OP, ci)
432else
433    cb$putcode(DDVAL.OP, ci)
434if (cntxt$tailcall) cb$putcode(RETURN.OP)
435@ %def
436
437There are also two instructions available for obtaining the value of a
438general variable from the current environment, one that allows missing
439values and one that does not.
440<<emit code for standard variable references>>=
441if (! findVar(sym, cntxt))
442    notifyUndefVar(sym, cntxt, loc = cb$savecurloc())
443ci <- cb$putconst(sym)
444if (missingOK)
445    cb$putcode(GETVAR_MISSOK.OP, ci)
446else
447    cb$putcode(GETVAR.OP, ci)
448if (cntxt$tailcall) cb$putcode(RETURN.OP)
449@ %def
450
451For now, these instructions only take an index in the constant pool
452for the symbol as operands, not any information about where the
453variable can be found within the environment.  This approach to
454obtaining the value of variables requires a search of the current
455environment for every variable reference.  In a less dynamic language
456it would be possible to compute locations of variable bindings within
457an environment at compile time and to choose environment
458representations that allow constant time access to any variable's
459value.  Since bindings in R can be added or removed at runtime this
460would require a semantic change that would need some form of
461declaration to make legitimate. Another approach that may be worth
462exploring is some sort of caching mechanism in which the location of
463each variable is stored when it is first found by a full search, and
464that cached location is used until an event occurs that forces
465flushing of the cache. If such events are rare, as they typically are,
466then this may be effective.
467%% **** need to look into caching strategies
468
469%% **** looks like a simple cache of the local frame speeds up sum and
470%% **** Neal's em by about 10% (just lookup, not assignment -- with
471%% **** assignment should be a bit better)
472
473%% **** Is it really necessary for bcEval to save/restore stack tops?
474%% **** Shouldn't that happen automatically?
475%% **** Is it possible to have closure calling stay in the same bc?
476%% **** maybe at least for promises?
477
478
479\subsection{Compiling function calls}
480Conceptually, the R function calling mechanism uses lazy evaluation of
481arguments.  Thus calling a function involves three steps:
482\begin{itemize}
483\item finding the function to call
484\item packaging up the argument expressions into deferred evaluation
485  objects, or promises
486\item executing the call
487\end{itemize}
488Code for this process is generated by the function [[cmpCall]]. A
489simplified version is defined as
490<<simplified [[cmpCall]] function>>=
491cmpCall <- function(call, cb, cntxt) {
492    cntxt <- make.callContext(cntxt, call)
493    fun <- call[[1]]
494    args <- call[-1]
495    if (typeof(fun) == "symbol")
496        cmpCallSymFun(fun, args, call, cb, cntxt)
497    else
498        cmpCallExprFun(fun, args, call, cb, cntxt)
499}
500@ %def cmpCall
501
502Call expressions in which the function is represented by a symbol are
503compiled by [[cmpCallSymFun]].  This function emits a [[GETFUN]]
504instruction and then compiles the arguments.
505<<[[cmpCallSymFun]] function>>=
506maybeNSESymbols <- c("bquote")
507cmpCallSymFun <- function(fun, args, call, cb, cntxt) {
508    ci <- cb$putconst(fun)
509    cb$putcode(GETFUN.OP, ci)
510    nse <- as.character(fun) %in% maybeNSESymbols
511    <<compile arguments and emit [[CALL]] instruction>>
512}
513@ %def cmpCallSymFun
514The [[GETFUN]] instruction takes a constant pool index of the symbol
515as an operand, looks for a function binding to the symbol in the
516current environment, places it on the stack, and prepares the stack
517for handling function call arguments.
518%% **** need a fun cache and a var cache???
519
520Argument compilation is carried out by the function [[cmpCallArgs]],
521presented in Section \ref{subsec:callargs}, and is followed by emitting code
522to execute the call and, if necessary, return a result.  Calls to functions
523listed in [[maybeNSESymbols]] get their arguments uncompiled.  Currently
524this is only the case of [[bquote]], which does not evaluate its argument
525[[expr]] normally, but modifies the expression first (non-standard
526evaluation).  Compiling such argument could result in warnings, because the
527argument may not be a valid R expression (e.g.  when it contains [[.()]]
528subexpressions in complex assignments), and the generated code would be
529irrelevant (yet not used).  Not compiling an argument that will in fact be
530evaluated normally is safe, hence the code is not differentiating between
531individual function arguments nor is it checking whether [[bquote]] is the
532one from the [[base]] package.
533
534<<compile arguments and emit [[CALL]] instruction>>=
535cmpCallArgs(args, cb, cntxt, nse)
536ci <- cb$putconst(call)
537cb$putcode(CALL.OP, ci)
538if (cntxt$tailcall) cb$putcode(RETURN.OP)
539@ %def
540The call expression itself is stored in the constant pool and is
541available to the [[CALL]] instruction.
542
543Calls in which the function is represented by an expression other than
544a symbol are handled by [[cmpCallExprFun]].  This emits code to
545evaluate the expression, leaving the value in the stack, and then
546emits a [[CHECKFUN]] instruction.  This instruction checks that the
547value on top of the stack is a function and prepares the stack for
548receiving call arguments.  Generation of argument code and the
549[[CALL]] instruction are handled as for symbol function calls.
550<<[[cmpCallExprFun]] function>>=
551cmpCallExprFun <- function(fun, args, call, cb, cntxt) {
552    ncntxt <- make.nonTailCallContext(cntxt)
553    cmp(fun, cb, ncntxt)
554    cb$putcode(CHECKFUN.OP)
555    nse <- FALSE
556    <<compile arguments and emit [[CALL]] instruction>>
557}
558@ %def cmpCallExprFun
559
560The actual definition of [[cmpCall]] is a bit more complex than the
561simplified one given above:
562<<[[cmpCall]] function>>=
563cmpCall <- function(call, cb, cntxt, inlineOK = TRUE) {
564    sloc <- cb$savecurloc()
565    cb$setcurexpr(call)
566    cntxt <- make.callContext(cntxt, call)
567    fun <- call[[1]]
568    args <- call[-1]
569    if (typeof(fun) == "symbol") {
570        if (! (inlineOK && tryInline(call, cb, cntxt))) {
571            <<check the call to a symbol function>>
572	    cmpCallSymFun(fun, args, call, cb, cntxt)
573        }
574    }
575    else {
576        <<hack for handling [[break()]] and [[next()]] expressions>>
577        cmpCallExprFun(fun, args, call, cb, cntxt)
578    }
579    cb$restorecurloc(sloc)
580}
581@ %def cmpCall
582The main addition is the use of a [[tryInline]] function which tries
583to generate more efficient code for particular functions.  The
584[[inlineOK]] argument can be used to disable inlining. This function
585returns [[TRUE]] if it has handled code generation and [[FALSE]] if it
586has not.  Code will be generated by the inline mechanism if inline
587handlers for the particular function are available and the
588optimization level permits their use.  Details of the inlining
589mechanism are given in Section \ref{sec:inlining}.
590
591In addition to the inlining mechanism, some checking of the call is
592carried out for symbol calls. The checking code is
593<<check the call to a symbol function>>=
594if (findLocVar(fun, cntxt))
595    notifyLocalFun(fun, cntxt, loc = cb$savecurloc())
596else {
597    def <- findFunDef(fun, cntxt)
598    if (is.null(def))
599        notifyUndefFun(fun, cntxt, loc = cb$savecurloc())
600    else
601        checkCall(def, call,
602                  function(w) notifyBadCall(w, cntxt, loc = cb$savecurloc()))
603}
604@
605and [[checkCall]] is defined as
606<<[[checkCall]] function>>=
607## **** figure out how to handle multi-line deparses
608## ****     e.g. checkCall(`{`, quote({}))
609## **** better design would capture error object, wrap it up, and pass it on
610## **** use approach from codetools to capture partial argument match
611## ****     warnings if enabled?
612checkCall <- function(def, call, signal = warning) {
613    if (typeof(def) %in% c("builtin", "special"))
614        def <- args(def)
615    if (typeof(def) != "closure" || any.dots(call))
616        NA
617    else {
618        msg <- tryCatch({match.call(def, call); NULL},
619                        error = function(e) conditionMessage(e))
620        if (! is.null(msg)) {
621            emsg <- gettextf("possible error in '%s': %s",
622                             deparse(call, 20)[1], msg)
623            if (! is.null(signal)) signal(emsg)
624            FALSE
625        }
626        else TRUE
627    }
628}
629@ %def checkCall
630
631Finally, for calls where the function is an expression a hack is
632currently needed for dealing with the way the parser currently parses
633expressions of the form [[break()]] and [[next()]].  To be able to
634compile as many [[break]] and [[next]] calls as possible as simple
635[[GOTO]] instructions these need to be handled specially to avoid
636placing things on the stack.  A better solution would probably be to
637modify the parser to make expressions of the form [[break()]] be
638syntax errors.
639<<hack for handling [[break()]] and [[next()]] expressions>>=
640## **** this hack is needed for now because of the way the
641## **** parser handles break() and next() calls
642if (typeof(fun) == "language" && typeof(fun[[1]]) == "symbol" &&
643    as.character(fun[[1]]) %in% c("break", "next"))
644    return(cmp(fun, cb, cntxt))
645@
646
647
648\subsection{Compiling call arguments}
649\label{subsec:callargs}
650Function calls can contain four kinds of arguments:
651\begin{itemize}
652\item missing arguments
653\item [[...]] arguments
654\item general expressions
655\end{itemize}
656In the first and third cases the arguments can also be named.  The argument
657compilation function [[cmpCallArgs]] loops over the argument lists and
658handles each of the three cases, in addition to signaling errors for
659arguments that are literal bytecode or promise objects.  When [[nse]] is
660[[TRUE]] (non-standard evaluation), promises will only get uncompiled
661expressions.
662<<[[cmpCallArgs]] function>>=
663cmpCallArgs <- function(args, cb, cntxt, nse = FALSE) {
664    names <- names(args)
665    pcntxt <- make.promiseContext(cntxt)
666    for (i in seq_along(args)) {
667        a <- args[[i]]
668        n <- names[[i]]
669        <<compile missing argument>>
670        <<compile [[...]] argument>>
671        <<signal an error for promise or bytecode argument>>
672        <<compile a general argument>>
673    }
674}
675@ %def cmpCallArgs
676
677The missing argument case is handled by
678<<compile missing argument>>=
679if (missing(a)) { ## better test for missing??
680    cb$putcode(DOMISSING.OP)
681    cmpTag(n, cb)
682}
683@ %def
684Computations on the language related to missing arguments are tricky.
685The use of [[missing]] is a little odd, but for now at least it does
686work.
687
688An ellipsis argument [[...]] is handled by the [[DODOTS]] instruction:
689<<compile [[...]] argument>>=
690else if (is.symbol(a) && a == "...") {
691    if (! findLocVar("...", cntxt))
692        notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc())
693    cb$putcode(DODOTS.OP)
694}
695@ %def
696A warning is issued if no [[...]] argument is visible.
697
698As in [[cmp]], errors are signaled for literal bytecode or promise
699values as arguments.
700<<signal an error for promise or bytecode argument>>=
701else if (typeof(a) == "bytecode")
702    cntxt$stop(gettext("cannot compile byte code literals in code"),
703               cntxt, loc = cb$savecurloc())
704else if (typeof(a) == "promise")
705    cntxt$stop(gettext("cannot compile promise literals in code"),
706               cntxt, loc = cb$savecurloc())
707@ %def
708
709A general non-constant argument expression is compiled to a separate
710byte code object which is stored in the constant pool.  The compiler
711then emits a [[MAKEPROM]] instruction that uses the stored code
712object. Promises are not needed for literal constant arguments as
713these are self-evaluating.  Within the current implementation both the
714evaluation process and use of [[substitute]] will work properly if
715constants are placed directly in the argument list rather than being
716wrapped in promises. This could also be done in the interpreter,
717though the benefit is less clear as a runtime determination of whether
718an argument is a constant would be needed.  This may still be cheap
719enough compared to the cost of allocating a promise to be worth doing.
720Constant folding in [[cmp]] may also produce more constants, but
721promises are needed in this case in order for [[substitute]] to work
722properly.  These promises could be created as evaluated promises,
723though it is not clean how much this would gain.
724<<compile a general argument>>=
725else {
726    if (is.symbol(a) || typeof(a) == "language") {
727        if (nse)
728              ci <- cb$putconst(a)
729        else
730              ci <- cb$putconst(genCode(a, pcntxt, loc = cb$savecurloc()))
731        cb$putcode(MAKEPROM.OP, ci)
732    }
733    else
734        cmpConstArg(a, cb, cntxt)
735    cmpTag(n, cb)
736}
737@ %def
738%% **** look into using evaluated promises for constant folded arguments
739%% **** then we would use a variant of this:
740% else {
741%     ca <- constantFold(a, cntxt)
742%     if (is.null(ca)) {
743%         if (is.symbol(a) || typeof(a) == "language") {
744%             ci <- cb$putconst(genCode(a, pcntxt))
745%             cb$putcode(MAKEPROM.OP, ci)
746%         }
747%         else
748%             cmpConstArg(a, cb, cntxt)
749%     }
750%     else
751%         cmpConstArg(ca$value, cb, cntxt)
752%     cmpTag(n, cb)
753% }
754
755For calls to closures the [[MAKEPROM]] instruction retrieves the code
756object, creates a promise from the code object and the current
757environment, and pushes the promise on the argument stack. For calls
758to functions of type [[BULTIN]] the [[MAKEPROM]] instruction actually
759executes the code object in the current environment and pushes the
760resulting value on the stack.  For calls to functions of type
761[[SPECIAL]] the [[MAKEPROM]] instruction does nothing as these calls use
762only the call expression.
763
764Constant arguments are compiled by [[cmpConstArg]].  Again there are
765special instructions for the common special constants [[NULL]],
766[[TRUE]], and [[FALSE]].
767<<[[cmpConstArg]]>>=
768cmpConstArg <- function(a, cb, cntxt) {
769    if (identical(a, NULL))
770        cb$putcode(PUSHNULLARG.OP)
771    else if (identical(a, TRUE))
772        cb$putcode(PUSHTRUEARG.OP)
773    else if (identical(a, FALSE))
774        cb$putcode(PUSHFALSEARG.OP)
775    else {
776        ci <- cb$putconst(a)
777        cb$putcode(PUSHCONSTARG.OP, ci)
778    }
779}
780@ %def cmpConstArg
781
782Code to install names for named arguments is generated by [[cmpTag]]:
783<<[[cmpTag]] function>>=
784cmpTag <- function(n, cb) {
785    if (! is.null(n) && n != "") {
786        ci <- cb$putconst(as.name(n))
787        cb$putcode(SETTAG.OP, ci)
788    }
789}
790@ %def cmpTag
791
792The current implementation allocates a linked list of call arguments,
793stores tags in the list cells, and allocates promises. Alternative
794implementations that avoid some or all allocation are worth exploring.
795Also worth exploring is having an instruction specifically for calls that
796do not require matching of named arguments to formal arguments, since
797cases that use only order of arguments, not names, are quite common
798and are known at compile time. In the case of calls to functions with
799definitions known at compile time matching of named arguments to
800formal ones could also be done at compile time.
801
802
803\subsection{Discussion}
804The framework presented in this section, together with some support
805functions, is actually able to compile any legal R code.  But this is
806somewhat deceptive. The R implementation, and the [[CALL]] opcode,
807support three kinds of functions: closures (i.e. R-level functions),
808primitive functions of type [[BUILTIN]], and primitive functions of
809type [[SPECIAL]].  Primitives of type [[BUILTIN]] always evaluate
810their arguments in order, so creating promises is not necessary and in
811fact the [[MAKEPROM]] instruction does not do so --- if the function
812to be called is a [[BUILTIN]] then [[MAKEPROM]] runs the code for
813computing the argument in the current environment and pushes the value
814on the stack.  On the other hand, primitive functions of type
815[[SPECIAL]] use the call expression and evaluate bits of it as
816needed. As a result, they will be running interpreted code.  Since
817core functions like the sequencing function [[{]] and the conditional
818  evaluation function [[if]] are of type [[SPECIAL]] this means most
819  non-trivial code will be run by the standard interpreter.  This will
820  be addressed by defining inlining rules that allow functions like
821  [[{]] and [[if]] to be compiled properly.
822
823
824\section{The code buffer}
825\label{sec:codebuf}
826The code buffer is a collection of closures that accumulate code and
827constants in variables in their defining environment.  For a code
828buffer [[cb]] the closures [[cb$putcode]] and [[cb$putconst]] write an
829instruction sequence and a constant, respectively, into the code
830buffer. The closures [[cb$code]] and [[cb$consts]] extract the code
831vector and the constant pool.
832
833The function [[make.codeBuf]] creates a set of closures for managing
834the instruction stream buffer and the constant pool buffer and returns
835a list of these closures for use by the compilation functions.  In
836addition, the expression to be compiled into the code buffer is stored
837as the first constant in the constant pool; this can be used to
838retrieve the source code for a compiled expression.
839<<[[make.codeBuf]] function>>=
840make.codeBuf <- function(expr, loc = NULL) {
841    <<source location tracking implementation>>
842    <<instruction stream buffer implementation>>
843    <<constant pool buffer implementation>>
844    <<label management interface>>
845    cb <- list(code = getcode,
846               const = getconst,
847               putcode = putcode,
848               putconst = putconst,
849               makelabel = makelabel,
850               putlabel = putlabel,
851	       patchlabels = patchlabels,
852               setcurexpr = setcurexpr,
853               setcurloc = setcurloc,
854               commitlocs = commitlocs,
855               savecurloc = savecurloc,
856               restorecurloc = restorecurloc)
857    cb$putconst(expr) ## insert expression as first constant.
858      ## NOTE: this will also insert the srcref directly into the constant
859      ## pool
860    cb
861}
862@ %def make.codeBuf
863
864The instruction stream buffer uses a list structure and a count of
865elements in use, and doubles the size of the list to make room for new
866code when necessary.  By convention the first entry is a byte code
867version number; if the interpreter sees a byte code version number it
868cannot handle then it falls back to interpreting the uncompiled
869expression. The doubling strategy is needed to avoid quadratic
870compilation times for large instruction streams.
871<<instruction stream buffer implementation>>=
872codeBuf <- list(.Internal(bcVersion()))
873codeCount <- 1
874putcode <- function(...) {
875    new <- list(...)
876    newLen <- length(new)
877    while (codeCount + newLen > length(codeBuf)) {
878        codeBuf <<- c(codeBuf, vector("list", length(codeBuf)))
879        if (exprTrackingOn)
880            exprBuf <<- c(exprBuf, vector("integer", length(exprBuf)))
881        if (srcrefTrackingOn)
882            srcrefBuf <<- c(srcrefBuf, vector("integer", length(srcrefBuf)))
883    }
884    codeRange <- (codeCount + 1) : (codeCount + newLen)
885    codeBuf[codeRange] <<- new
886
887    if (exprTrackingOn) {   ## put current expression into the constant pool
888        ei <- putconst(curExpr)
889        exprBuf[codeRange] <<- ei
890    }
891    if (srcrefTrackingOn) { ## put current srcref into the constant pool
892        si <- putconst(curSrcref)
893        srcrefBuf[codeRange] <<- si
894    }
895
896    codeCount <<- codeCount + newLen
897}
898getcode <- function() as.integer(codeBuf[1 : codeCount])
899@ %def
900
901The constant pool is accumulated into a list buffer.  The zero-based
902index of the constant in the pool is returned by the insertion
903function.  Values are only entered once; if a value is already in the
904pool, as determined by [[identical]], its existing index is returned.
905Again a size-doubling strategy is used for the buffer.  [[.Internal]]
906functions are used both for performance reasons and to prevent
907duplication of the constants.
908<<constant pool buffer implementation>>=
909constBuf <- vector("list", 1)
910constCount <- 0
911putconst <- function(x) {
912    if (constCount == length(constBuf))
913        constBuf <<- .Internal(growconst(constBuf))
914    i <- .Internal(putconst(constBuf, constCount, x))
915    if (i == constCount)
916        constCount <<- constCount + 1
917    i
918}
919getconst <- function()
920    .Internal(getconst(constBuf, constCount))
921@ %def
922
923The compiler maintains a mapping from code to source locations. For each
924value in the code buffer (instruction and operand) there is a source code
925reference ([[srcref]]) and the corresponding expression (AST).  The code
926buffer implementation remembers the current location (source reference and
927expression), which can be set by [[setcurloc]], [[setcurexpr]] or
928[[restorecurloc]] and retrieved by [[savecurloc]].  In addition to emitting
929code, [[putconst]] also copies the current location information into the
930constant pool and records the resulting constant indices in a source
931reference buffer and expression buffer.  When the final code is extracted
932using [[codeBufCode]], the source reference and expression buffers are
933copied into the constant pool as vectors indexed by code offset (program
934counter).
935<<source location tracking functions>>=
936extractSrcref <- function(sref, idx) {
937    if (is.list(sref) && length(sref) >= idx)
938        sref[[idx]]
939    else if (is.integer(sref) && length(sref) >= 6)
940        sref
941    else
942        NULL
943}
944getExprSrcref <- function(expr) {
945    sattr <- attr(expr, "srcref")
946    extractSrcref(sattr, 1)
947}
948# if block is a block srcref, get its idx'th entry
949# if block is a single srcref, return this srcref
950getBlockSrcref <- function(block, idx) {
951  extractSrcref(block, idx)
952}
953addLocString <- function(msg, loc) {
954    if (is.null(loc$srcref))
955        msg
956    else
957        paste0(msg, " at ", utils::getSrcFilename(loc$srcref), ":",
958               utils::getSrcLocation(loc$srcref, "line"))
959}
960@ %def
961
962<<source location tracking implementation>>=
963exprTrackingOn <- TRUE
964srcrefTrackingOn <- TRUE
965
966if (is.null(loc)) {
967    curExpr <- expr
968    curSrcref <- getExprSrcref(expr)
969} else {
970    curExpr <- loc$expr
971    curSrcref <- loc$srcref
972}
973
974if (is.null(curSrcref))
975    ## when top-level srcref is null, we speculate there will be no
976    ##   source references within the compiled expressions either,
977    ##   disabling the tracking makes the resulting constant pool
978    ##   smaller
979    srcrefTrackingOn <- FALSE
980
981exprBuf <- NA   ## exprBuf will have the same length as codeBuf
982srcrefBuf <- NA ## srcrefBuf will have the same length as codeBuf
983
984if (!exprTrackingOn) {
985    curExpr <- NULL
986    exprBuf <- NULL
987}
988if (!srcrefTrackingOn) {
989    curSrcref <- NULL
990    srcrefBuf <- NULL
991}
992
993## set the current expression
994## also update the srcref according to expr, if expr has srcref attribute
995##   (note: never clears current srcref)
996setcurexpr <- function(expr) {
997    if (exprTrackingOn) {
998        curExpr <<- expr
999    }
1000    if (srcrefTrackingOn) {
1001        sref <- getExprSrcref(expr)
1002        if (!is.null(sref) && srcrefTrackingOn)
1003            curSrcref <<- sref
1004     }
1005}
1006## unconditionally sets the current expression and srcrefs
1007setcurloc <- function(expr, sref) {
1008    if (exprTrackingOn)
1009        curExpr <<- expr
1010    if (srcrefTrackingOn)
1011        curSrcref <<- sref
1012}
1013## add location information (current expressions, srcrefs) to the constant pool
1014commitlocs <- function() {
1015    if (exprTrackingOn) {
1016      exprs <- exprBuf[1:codeCount]
1017      class(exprs) <- "expressionsIndex"
1018      putconst(exprs)
1019    }
1020
1021    if (srcrefTrackingOn) {
1022      srefs <- srcrefBuf[1:codeCount]
1023      class(srefs) <- "srcrefsIndex"
1024      putconst(srefs)
1025    }
1026
1027    ## these entries will be at the end of the constant pool, assuming only the compiler
1028    ## uses these two classes
1029    NULL
1030}
1031savecurloc <- function() {
1032    list(expr = curExpr, srcref = curSrcref)
1033}
1034restorecurloc <- function(saved) {
1035    if (exprTrackingOn) curExpr <<- saved$expr
1036    if (srcrefTrackingOn) curSrcref <<- saved$srcref
1037}
1038@ %def
1039
1040
1041Labels are used for identifying targets for branching instruction.  The
1042label management interface creates new labels with [[makelabel]] as
1043character strings that are unique within the buffer.  These labels can
1044then be included as operands in branching instructions. The
1045[[putlabel]] function records the current code position as the value
1046of the label.
1047<<label management interface>>=
1048idx <- 0
1049labels <- vector("list")
1050makelabel <- function() { idx <<- idx + 1; paste0("L", idx) }
1051putlabel <- function(name) labels[[name]] <<- codeCount
1052@
1053
1054Once code generation is complete the symbolic labels in the code
1055stream need to be converted to numerical offset values.  This is done
1056by [[patchlabels]].  Labels can appear directly in the instruction
1057stream and in lists that have been placed in the instruction stream;
1058this is used for the [[SWITCH]] instruction.
1059<<label management interface>>=
1060patchlabels <- function(cntxt) {
1061    offset <- function(lbl) {
1062        if (is.null(labels[[lbl]]))
1063            cntxt$stop(gettextf("no offset recorded for label \"%s\"", lbl),
1064                       cntxt)
1065        labels[[lbl]]
1066    }
1067    for (i in 1 : codeCount) {
1068        v <- codeBuf[[i]]
1069        if (is.character(v))
1070            codeBuf[[i]] <<- offset(v)
1071        else if (typeof(v) == "list") {
1072            off <- as.integer(lapply(v, offset))
1073            ci <- putconst(off)
1074            codeBuf[[i]] <<- ci
1075        }
1076    }
1077}
1078@ %def
1079
1080The contents of the code buffer is extracted into a code object by
1081calling [[codeBufCode]]:
1082<<[[codeBufCode]] function>>=
1083codeBufCode <- function(cb, cntxt) {
1084    cb$patchlabels(cntxt)
1085    cb$commitlocs()
1086    .Internal(mkCode(cb$code(), cb$const()))
1087}
1088@ %def codeBufCode
1089
1090
1091\section{Compiler contexts}
1092\label{sec:contexts}
1093The compiler context object [[cntxt]] carries along information about
1094whether the expression appears in tail position and should be followed
1095by a return or, whether the result is ignored, or whether the
1096expression is contained in a loop.  The context object also contains
1097current compiler option settings as well as functions used to issue
1098warnings or signal errors.
1099
1100
1101\subsection{Top level contexts}
1102Top level compiler functions start by creating a top level context.
1103The constructor for top level contexts takes as arguments the current
1104compilation environment, described in Section \ref{sec:environments},
1105and a list of option values used to override default option settings.
1106The [[toplevel]] field will be set to [[FALSE]] for compiling
1107expressions, such as function arguments, that do not appear at top
1108level.  Top level expressions are assumed to be in tail position, so
1109the [[tailcall]] field is initialized as [[TRUE]]. The
1110[[needRETURNJMP]] specifies whether a call to the [[return]] function
1111can use the [[RETURN]] instruction or has to use a [[longjmp]] via the
1112[[RETURNJMP]] instruction.  Initially using a simple [[RETURN]] is
1113safe; this is set set to [[TRUE]] when compiling promises ad certain
1114loops where [[RETURNJMP]] is needed.
1115<<[[make.toplevelContext]] function>>=
1116make.toplevelContext <- function(cenv, options = NULL)
1117    structure(list(toplevel = TRUE,
1118                   tailcall = TRUE,
1119		   needRETURNJMP = FALSE,
1120                   env = cenv,
1121                   optimize = getCompilerOption("optimize", options),
1122                   suppressAll = getCompilerOption("suppressAll", options),
1123                   suppressNoSuperAssignVar =
1124                       getCompilerOption("suppressNoSuperAssignVar", options),
1125                   suppressUndefined = getCompilerOption("suppressUndefined",
1126                                                         options),
1127                   call = NULL,
1128                   stop = function(msg, cntxt, loc = NULL)
1129                       stop(simpleError(addLocString(msg, loc), cntxt$call)),
1130                   warn = function(x, cntxt, loc = NULL)
1131                       cat(paste("Note:", addLocString(x, loc), "\n"))
1132              ),
1133              class = "compiler_context")
1134@ %def make.toplevelContext
1135Errors are signaled using a version of [[stop]] that uses the current
1136call in the compilation context.  The default would be to use the call
1137in the compiler code where the error was raised, and that would not be
1138meaningful to the end user.  Ideally [[warn]] should do something
1139similar and also use the condition system, but for now it just prints
1140a simple message to standard output.
1141%% **** look into adding source info to errors/warnings
1142%% **** comment on class of context object
1143
1144
1145\subsection{Other compiler contexts}
1146The [[cmpCall]] function creates a new context for each call it
1147compiles.  The new context is the current context with the [[call]]
1148entry replaced by the current call --- this is be useful for issuing
1149meaningful warning and error messages.
1150<<[[make.callContext]] function>>=
1151make.callContext <- function(cntxt, call) {
1152    cntxt$call <- call
1153    cntxt
1154}
1155@ %def make.callContext
1156
1157Non-tail-call contexts are used when a value is being computed for use
1158in a subsequent computation. The constructor returns a new context
1159that is the current context with the tailcall field set to [[FALSE]].
1160<<[[make.nonTailCallContext]] function>>=
1161make.nonTailCallContext <- function(cntxt) {
1162    cntxt$tailcall <- FALSE
1163    cntxt
1164}
1165@ %def make.nonTailCallContext
1166A no value context is used in cases where the computed value will be
1167ignored.  For now this is identical to a non-tail-call context, but it
1168may eventually be useful to distinguish the two situations. This is
1169used mainly for expressions other than the final one in [[{]] calls
1170  and for compiling the bodies of loops.
1171%% **** can avoid generating push/pop pairs if novalue = TRUE is used
1172%% **** might simplify tailcall/RETURN stuff??
1173<<[[make.noValueContext]] function>>=
1174make.noValueContext <- function(cntxt) {
1175    cntxt$tailcall <- FALSE
1176    cntxt
1177}
1178@ %def make.noValueContext
1179
1180The compiler context for compiling a function is a new toplevel
1181context using the function environment and the current compiler
1182options settings.
1183%% **** copy other compiler options; maybe cntxt$options$optimize??
1184<<[[make.functionContext]] function>>=
1185make.functionContext <- function(cntxt, forms, body) {
1186    nenv <- funEnv(forms, body, cntxt)
1187    ncntxt <- make.toplevelContext(nenv)
1188    ncntxt$optimize <- cntxt$optimize
1189    ncntxt$suppressAll <- cntxt$suppressAll
1190    ncntxt$suppressNoSuperAssignVar <- cntxt$suppressNoSuperAssignVar
1191    ncntxt$suppressUndefined <- cntxt$suppressUndefined
1192    ncntxt
1193}
1194@ %def make.functionContext
1195
1196The context for compiling the body of a loop is a no value context
1197with the loop information available.
1198<<[[make.loopContext]] function>>=
1199make.loopContext <- function(cntxt, loop.label, end.label) {
1200    ncntxt <- make.noValueContext(cntxt)
1201    ncntxt$loop <- list(loop = loop.label, end = end.label, gotoOK = TRUE)
1202    ncntxt
1203}
1204@ %def make.loopContext
1205
1206The initial loop context allows [[break]] and [[next]] calls to be
1207implemented as [[GOTO]] instructions.  This is OK for calls that are
1208in top level position relative to the loop.  Calls that occur in
1209promises or in other contexts where the stack has changed from the
1210loop top level state need stack unwinding and cannot be implemented as
1211[[GOTO]] instructions. These should should be compiled with contexts
1212that have the [[loop$gotoOK]] field set to [[FALSE]].  The promise
1213context does this for promises and the argument context for other
1214settings.  The promise context also sets [[needRETURNJMP]] to [[TRUE]]
1215since a [[return]] call that is triggered by forcing a promise
1216requires a [[longjmp]] to return from the appropriate function.
1217<<[[make.argContext]] function>>=
1218make.argContext <- function(cntxt) {
1219    cntxt$toplevel <- FALSE
1220    cntxt$tailcall <- FALSE
1221    if (! is.null(cntxt$loop))
1222        cntxt$loop$gotoOK <- FALSE
1223    cntxt
1224}
1225@ %def make.argContext
1226<<[[make.promiseContext]] function>>=
1227make.promiseContext <- function(cntxt) {
1228    cntxt$toplevel <- FALSE
1229    cntxt$tailcall <- TRUE
1230    cntxt$needRETURNJMP <- TRUE
1231    if (! is.null(cntxt$loop))
1232        cntxt$loop$gotoOK <- FALSE
1233    cntxt
1234}
1235@ %def make.promiseContext
1236%% pull out gotoOK chunk
1237
1238
1239\subsection{Compiler options}
1240Default compiler options are maintained in an environment.  For now,
1241the supported options are [[optimize]], which is initialized to level
12422, and two options for controlling compiler messages.  The
1243[[suppressAll]] option, if [[TRUE]], suppresses all notifications.
1244The [[suppressNoSuperAssignVar]] option, if [[TRUE]], suppresses
1245notifications about missing binding for a super-assigned variable.
1246The [[suppressUndefined]] option can be [[TRUE]] to suppress all
1247notifications about undefined variables and functions, or it can be a
1248character vector of the names of variables for which warnings should
1249be suppressed.
1250<<compiler options data base>>=
1251compilerOptions <- new.env(hash = TRUE, parent = emptyenv())
1252compilerOptions$optimize <- 2
1253compilerOptions$suppressAll <- TRUE
1254compilerOptions$suppressNoSuperAssignVar <- FALSE
1255compilerOptions$suppressUndefined <-
1256    c(".Generic", ".Method", ".Random.seed", ".self")
1257@ %def compilerOptions
1258
1259Options are retrieved with the [[getCompilerOption]] function.
1260<<[[getCompilerOption]] function>>=
1261getCompilerOption <- function(name, options = NULL) {
1262    if (name %in% names(options))
1263        options[[name]]
1264    else
1265        get(name, compilerOptions)
1266}
1267@ %def getCompilerOption
1268
1269The [[suppressAll]] function determines whether a context has its
1270[[supressAll]] property set to [[TRUE]].
1271<<[[suppressAll]] function>>=
1272suppressAll <- function(cntxt)
1273    identical(cntxt$suppressAll, TRUE)
1274@ %def suppressAll
1275The [[suppressNoSuperAssignVar]] function determines whether a context has
1276its [[suppressNoSuperAssignVar]] property set to [[TRUE]].
1277<<[[suppressNoSuperAssignVar]] function>>=
1278suppressNoSuperAssignVar <- function(cntxt)
1279    isTRUE(cntxt$suppressNoSuperAssignVar)
1280@ %def suppressNoSuperAssignVar
1281The [[suppressUndef]] function determines whether undefined variable
1282or function definition notifications for a particular variable should
1283be suppressed in a particular compiler context.
1284<<[[suppressUndef]] function>>=
1285suppressUndef <- function(name, cntxt) {
1286    if (identical(cntxt$suppressAll, TRUE))
1287        TRUE
1288    else {
1289        suppress <- cntxt$suppressUndefined
1290        if (is.null(suppress))
1291            FALSE
1292        else if (identical(suppress, TRUE))
1293            TRUE
1294        else if (is.character(suppress) && as.character(name) %in% suppress)
1295            TRUE
1296        else FALSE
1297    }
1298}
1299@ %def suppressUndef
1300
1301At some point we will need mechanisms for setting default options from
1302the interpreter and in package meta-data. A declaration mechanism for
1303adjusting option settings locally will also be needed.
1304
1305
1306\subsection{Compiler notifications}
1307Compiler notifications are currently sent by calling the context's
1308[[warn]] function, which in turn prints a message to standard output.
1309It would be better to use an approach based on the condition system,
1310and this will be done eventually.  The use of separate notification
1311functions for each type of issue signaled is a step in this direction.
1312
1313Undefined function and undefined variable notifications are issued by
1314[[notifyUndefFun]] and [[notifyUndefVar]].  These both use
1315[[suppressUndef]] to determine whether the notification should be
1316suppressed in the current context.
1317<<[[notifyUndefFun]] function>>=
1318notifyUndefFun <- function(fun, cntxt, loc = NULL) {
1319    if (! suppressUndef(fun, cntxt)) {
1320        msg <- gettextf("no visible global function definition for '%s'",
1321                        as.character(fun))
1322        cntxt$warn(msg, cntxt, loc)
1323    }
1324}
1325@ %def notifyUndefFun
1326<<[[notifyUndefVar]] function>>=
1327notifyUndefVar <- function(var, cntxt, loc = NULL) {
1328    if (! suppressUndef(var, cntxt)) {
1329        msg <- gettextf("no visible binding for global variable '%s'",
1330                        as.character(var))
1331        cntxt$warn(msg, cntxt, loc)
1332    }
1333}
1334@ %def notifyUndefVar
1335
1336Codetools currently optionally notifies about use of local
1337functions. This is of course not an error but may sometimes be the
1338result of a mis-spelling.  For now the compiler does not notify about
1339these, but this could be changed by redefining [[notifyLocalFun]] .
1340<<[[notifyLocalFun]] function>>=
1341notifyLocalFun <- function(fun, cntxt, loc = NULL) {
1342    if (! suppressAll(cntxt))
1343        NULL
1344}
1345@ %def notifyLocalFun
1346
1347Warnings about possible improper use of [[...]] and [[..n]] variables
1348are sent by [[notifyWrongDotsUse]].
1349<<[[notifyWrongDotsUse]] function>>=
1350notifyWrongDotsUse <- function(var, cntxt, loc = NULL) {
1351    if (! suppressAll(cntxt)) {
1352        msg <- paste(var, "may be used in an incorrect context")
1353        cntxt$warn(msg, cntxt, loc)
1354    }
1355}
1356@ %def notifyWrongDotsUse
1357
1358Wrong argument count issues are signaled by [[notifyWrongArgCount]].
1359<<[[notifyWrongArgCount]] function>>=
1360notifyWrongArgCount <- function(fun, cntxt, loc = NULL) {
1361    if (! suppressAll(cntxt)) {
1362        msg <- gettextf("wrong number of arguments to '%s'",
1363                        as.character(fun))
1364        cntxt$warn(msg, cntxt, loc)
1365    }
1366}
1367@ %def notifyWrongArgCount
1368Other issues with calls that do not match their definitions are
1369signaled by [[notifyBadCall]].  Ideally these should be broken down
1370more finely, but that would require some rewriting of the error
1371signaling in [[match.call]].
1372<<[[notifyBadCall]] function>>=
1373notifyBadCall <- function(w, cntxt, loc = NULL) {
1374    if (! suppressAll(cntxt))
1375        cntxt$warn(w, cntxt, loc)
1376}
1377@ %def notifyBadCall
1378
1379[[break]] or [[next]] calls that occur in a context where no loop is
1380visible will most likely result in runtime errors, and
1381[[notifyWrongBreakNext]] is used to signal such cases.
1382<<[[notifyWrongBreakNext]] function>>=
1383notifyWrongBreakNext <- function(fun, cntxt, loc = NULL) {
1384    if (! suppressAll(cntxt)) {
1385        msg <- paste(fun, "used in wrong context: no loop is visible")
1386        cntxt$warn(msg, cntxt, loc)
1387    }
1388}
1389@ %def notifyWrongBreakNext
1390
1391Several issues can arise in assignments.  For super-assignments a
1392target variable should be defined; otherwise there will be a runtime
1393warning.
1394<<[[notifyNoSuperAssignVar]] function>>=
1395notifyNoSuperAssignVar <- function(symbol, cntxt, loc = NULL) {
1396    if (! suppressAll(cntxt) && ! suppressNoSuperAssignVar(cntxt)) {
1397        msg <- gettextf("no visible binding for '<<-' assignment to '%s'",
1398                        as.character(symbol))
1399        cntxt$warn(msg, cntxt, loc)
1400    }
1401}
1402@ %def notifyNoSuperAssignVar
1403If the compiler detects an invalid function in a complex assignment
1404then this is signaled at compile time; a corresponding error would
1405occur at runtime.
1406%% **** should put function/call into message
1407<<[[notifyBadAssignFun]] function>>=
1408notifyBadAssignFun <- function(fun, cntxt, loc = NULL) {
1409    if (! suppressAll(cntxt)) {
1410        msg <- gettext("invalid function in complex assignment")
1411        cntxt$warn(msg, cntxt, loc)
1412    }
1413}
1414@ %def notifyBadAssignFun
1415
1416In [[switch]] calls it is an error if a character selector argument is
1417used and there are multiple default alternatives.  The compiler
1418signals a possible problem with [[notifyMultipleSwitchDefaults]] if
1419there are some named cases but more than one unnamed ones.
1420<<[[notifyMultipleSwitchDefaults]] function>>=
1421notifyMultipleSwitchDefaults <- function(ndflt, cntxt, loc = NULL)
1422    if (! suppressAll(cntxt)) {
1423        msg <- gettext("more than one default provided in switch() call")
1424        cntxt$warn(msg, cntxt, loc)
1425    }
1426@ %def notifyMultipleSwitchDefaults
1427
1428<<[[notifyNoSwitchcases]] function>>=
1429notifyNoSwitchcases <- function(cntxt, loc = NULL)
1430    if (! suppressAll(cntxt)) {
1431        msg <- gettext("'switch' with no alternatives")
1432        cntxt$warn(msg, cntxt, loc)
1433    }
1434@ %def notifyNoSwitchcases
1435
1436The compiler signals when it encounters that a special syntactic function,
1437such as [[for]], has been assigned to.
1438<<[[notifyAssignSyntacticFun]] function>>=
1439notifyAssignSyntacticFun <- function(funs, cntxt, loc = NULL) {
1440    if (! suppressAll(cntxt)) {
1441        msg <- ngettext(length(funs),
1442            "local assignment to syntactic function: ",
1443            "local assignments to syntactic functions: ")
1444        cntxt$warn(paste(msg, paste(funs, collapse = ", ")),
1445                   cntxt, loc)
1446    }
1447}
1448@ %def notifyAssignSyntacticFun
1449
1450When the compiler encounters an error during JIT or package
1451compilation, it catches the error and returns the original uncompiled
1452code letting the AST interpreter handle it. This can happen due to a
1453compiler bug or when the code being compiled violates certain
1454assumptions made by the compiler (such as a certain discipline on
1455frame types in the evaluation environment, as checked in
1456[[frameTypes]]). The compiler will notify about catching such errors
1457via [[notifyCompilerError]].
1458
1459<<[[notifyCompilerError]] function>>=
1460notifyCompilerError <- function(msg)
1461    if (!compilerOptions$suppressAll)
1462        cat(paste(gettext("Error: compilation failed - "), msg, "\n"))
1463@ %def notifyCompilerError
1464
1465
1466\section{Compilation environments}
1467\label{sec:environments}
1468%% **** lambda lifting/eliminating variables not captured
1469%% **** defer SETVAR until needed; avoid GETVAR if still in register
1470%% **** Could preserve semantics by pre-test to check that fun in env
1471%% **** is inlined one.  Would need to make efficient somehow, e.g
1472%% **** increment counter each time one of inlined names is assigned
1473%% **** to and only check when count has changed.
1474At this point the compiler will essentially use the interpreter to
1475evaluate an expression of the form
1476\begin{verbatim}
1477if (x > 0) log(x) else 0
1478\end{verbatim}
1479since [[if]] is a [[SPECIAL]] function. To make further improvements
1480the compiler needs to be able to implement the [[if]] expression in
1481terms of conditional and unconditional branch instructions.  It might
1482then also be useful to implement [[>]] and [[log]] with special
1483virtual machine instructions.  To be able to do this, the compiler
1484needs to know that [[if]], [[>]], and [[log]] refer to the standard
1485versions of these functions in the base package.  While this is very
1486likely, it is not guaranteed.
1487
1488R is a very dynamic language.  Functions defined in the base and other
1489packages could be shadowed at runtime by definitions in loaded user
1490packages or by local definitions within a function.  It is even
1491possible for user code to redefine the functions in the base package,
1492though this is discouraged by binding locking and would be poor
1493programming practice.  Finally, it is possible for functions called
1494prior to evaluating an expression like the one above to reach into
1495their calling environment and add new definitions of [[log]] or [[if]]
1496that wound then be used in evaluating this expression.  Again this is
1497not common and generally not a good idea outside of a debugging
1498context.
1499
1500Ideally the compiler should completely preserve semantics of the
1501language implemented by the interpreter.  While this is possible it
1502would significantly complicate the compiler and the compiled code, and
1503carry at least some runtime penalty. The approach taken here is
1504therefore to permit the compiler to inline some functions when they
1505are not visibly shadowed in the compiled code.  What the compiler is
1506permitted to do is determined by the setting of an optimization level.
1507The details are desctibed in Section \ref{sec:inlining}.
1508
1509For the compiler to be able to decide whether is can inline a function
1510it needs to be able to determine whether there are any local variable
1511that might shadow a variable from a base package. This requires adding
1512environment information to the compilation process.
1513
1514
1515\subsection{Representing compilation environments}
1516When compiling an expression the compiler needs to take into account
1517an evaluation environment, which would typically be a toplevel
1518environment, along with local variable definitions discovered during
1519the compilation process. The evaluation environment should not be
1520modified, so the local variables need to be considered in addition to
1521ones defined in the evaluation environment.  If an expression
1522\begin{verbatim}
1523{ x <- 1; x + 2 }
1524\end{verbatim}
1525is compiled for evaluation in the global environment then existing
1526definitions in the global environment as well as the new definition
1527for [[x]] need to be taken into account. To address this the
1528compilation environment is a list of two components, an environment
1529and a list of character vectors.  The environment consists of one
1530frame for each level of local variables followed by the top level
1531evaluation environment.  The list of character vectors consist of one
1532element for each frame for which local variables have been discovered
1533by the compiler. For efficiency the compilation environment structure
1534also includes a character vector [[ftype]] classifying each frame as a
1535local, namespace, or global frame.
1536<<[[makeCenv]] function>>=
1537## Create a new compiler environment
1538## **** need to explain the structure
1539makeCenv <- function(env) {
1540    structure(list(extra = list(character(0)),
1541                   env = env,
1542                   ftypes = frameTypes(env)),
1543              class = "compiler_environment")
1544}
1545@ %def makeCenv
1546
1547When an expression is to be compiled in a particular environment a
1548first step is to identify any local variable definitions and add these
1549to the top level frame.
1550<<[[addCenvVars]] function>>=
1551## Add vars to the top compiler environment frame
1552addCenvVars <- function(cenv, vars) {
1553    cenv$extra[[1]] <- union(cenv$extra[[1]], vars)
1554    cenv
1555}
1556@ %def addCenvVars
1557
1558When compiling a function a new frame is added to the compilation
1559environment.  Typically a number of local variables are added
1560immediately, so an optional [[vars]] argument is provided so this
1561can be done without an additional call to [[addCenvVars]].
1562<<[[addCenvFrame]] function>>=
1563## Add a new frame to a compiler environment
1564addCenvFrame <- function(cenv, vars) {
1565    cenv$extra <- c(list(character(0)), cenv$extra)
1566    cenv$env <- new.env(parent = cenv$env)
1567    cenv$ftypes <- c("local", cenv$ftypes)
1568    if (missing(vars))
1569        cenv
1570    else
1571        addCenvVars(cenv, vars)
1572}
1573@ %def addCenvFrame
1574
1575The compilation environment is queried by calling [[findCenvVar]].
1576%% **** change name to findCenvVarInfo or some such??
1577If a binding for the specified variable is found then [[findCenvVar]]
1578returns a list containing information about the binding.  If no
1579binding is found then [[NULL]] is returned.
1580<<[[findCenvVar]] function>>=
1581## Find binding information for a variable (character or name).
1582## If a binding is found, return a list containing components
1583##   ftype -- one of "local", "namespace", "global"
1584##   value -- current value if available
1585##   frame -- frame containing the binding (not useful for "local" variables)
1586##   index -- index of the frame (1 for top, 2, for next one, etc.)
1587## Return NULL if no binding is found.
1588## **** drop the index, maybe value, to reduce cost? (query as needed?)
1589findCenvVar <- function(var, cenv) {
1590    if (typeof(var) == "symbol")
1591        var <- as.character(var)
1592    extra <- cenv$extra
1593    env <- cenv$env
1594    frame <- NULL
1595    <<search [[extra]] entries and environment frames>>
1596    <<search the remaining environment frames if necessary>>
1597    <<create the [[findCenvVar]] result>>
1598}
1599@ %def findCenvVar
1600
1601The initial search for a matching binding proceeds down each frame for
1602which there is also an entry in [[extra]], searching the [[extra]]
1603entry before the environment frame.
1604<<search [[extra]] entries and environment frames>>=
1605for (i in seq_along(cenv$extra)) {
1606    if (var %in% extra[[i]] || exists(var, env, inherits = FALSE)) {
1607        frame <- env
1608        break
1609    }
1610    else
1611        env <- parent.env(env)
1612}
1613@ %def
1614If [[frame]] is still [[NULL]] after the initial search then the
1615remaining environment frames from the evaluation environment for which
1616there are no corresponding entries in [[extra]] are searched.
1617<<search the remaining environment frames if necessary>>=
1618if (is.null(frame)) {
1619    empty <- emptyenv()
1620    while (! identical(env, empty)) {
1621        i <- i + 1
1622        if (exists(var, env, inherits = FALSE)) {
1623            frame <- env
1624            break
1625        }
1626        else
1627            env <- parent.env(env)
1628    }
1629}
1630@ %def
1631
1632If a binding frame is found then the result consists of a list
1633containing the frame, the frame type, the value if available, and the
1634frame index.  The value is not looked up for [[...]] variables.  A
1635promise to compute the value is stored in an environment in the
1636result.  This avoids computing the value in some cases where doing so
1637may fail or produce unwanted side effects.
1638<<create the [[findCenvVar]] result>>=
1639if (! is.null(frame)) {
1640    if (exists(var, frame, inherits = FALSE) && var != "...") {
1641        value <- new.env(parent = emptyenv())
1642        delayedAssign("value", get(var, frame, inherits = FALSE),
1643                      assign.env = value)
1644    }
1645    else
1646        value <- NULL
1647    list(frame = frame, ftype = cenv$ftypes[i], value = value, index = i)
1648}
1649else
1650    NULL
1651@ %def
1652
1653Useful functions for querying the environment associated with a
1654compilation context are [[findVar]], [[findLocVar]], and
1655[[findFunDef]]. The function [[findVar]] returns [[TRUE]] is a binding
1656for the specified variable is visible and [[FALSE]] otherwise.
1657<<[[findVar]] function>>=
1658findVar <- function(var, cntxt) {
1659    cenv <- cntxt$env
1660    info <- findCenvVar(var, cenv)
1661    ! is.null(info)
1662}
1663@ %def findVar
1664[[findLocVar]] returns [[TRUE]] only if a local binding is found.
1665<<[[findLocVar]] function>>=
1666## test whether a local version of a variable might exist
1667findLocVar <- function(var, cntxt) {
1668    cenv <- cntxt$env
1669    info <- findCenvVar(var, cenv)
1670    ! is.null(info) && info$ftype == "local"
1671}
1672@ %def findLocVar
1673[[findFunDef]] returns a function definition if one is available for
1674the specified name and [[NULL]] otherwise.
1675<<[[findFunDef]] function>>=
1676## **** should this check for local functions as well?
1677findFunDef <- function(fun, cntxt) {
1678    cenv <- cntxt$env
1679    info <- findCenvVar(fun, cenv)
1680    if (! is.null(info$value) && is.function(info$value$value))
1681        info$value$value
1682    else
1683        NULL
1684}
1685@ %def findFunDef
1686
1687
1688\subsection{Identifying possible local variables}
1689For the compiler to be able to know that it can optimize a reference
1690to a particular global function or variable it needs to be able to
1691determine that that variable will not be shadowed by a local
1692definition at runtime.  R semantics do not allow certain
1693identification of local variables.  If a function body consist of the
1694two lines
1695\begin{verbatim}
1696if (x) y <- 1
1697y
1698\end{verbatim}
1699then whether the variable [[y]] in the second line is local or global
1700depends on the value of [[x]].  Lazy evaluation of arguments also
1701means what whether and when an assignment in a function argument
1702occurs can be uncertain.
1703
1704The approach taken by the compiler is to conservatively identify all
1705variables that might be created within an expression, such as a
1706function body, and consider those to be potentially local variables
1707that inhibit optimizations.  This ignores runtime creation of new
1708variables, but as already mentioned that is generally not good
1709programming practice.
1710
1711Variables are created by the assignment operators [[<-]] and [[=]] and
1712by [[for]] loops.  In addition, calls to [[assign]] and
1713[[delayedAssign]] with a literal character name argument are
1714considered to create potential local variables if the environment
1715argument is missing, which means the assignment is in the current
1716environment.
1717
1718A simple approach for identifying all local variables created within
1719an expression is given by
1720<<findlocals0>>=
1721findLocals0 <- function(e, cntxt) {
1722    if (typeof(e) == "language") {
1723        if (typeof(e[[1]]) %in% c("symbol", "character"))
1724            switch(as.character(e[[1]]),
1725                   <<[[findLocals0]] switch clauses>>
1726                   findLocalsList0(e[-1], cntxt))
1727         else findLocalsList0(e, cntxt)
1728    }
1729    else character(0)
1730}
1731
1732findLocalsList0 <- function(elist, cntxt)
1733    unique(unlist(lapply(elist, findLocals0, cntxt)))
1734@ %def findLocals0 findLocalsList0
1735
1736For assignment expressions the assignment variable is added to any
1737variables found in the value expression.
1738<<[[findLocals0]] switch clauses>>=
1739"=" =,
1740"<-" = unique(c(getAssignedVar(e, cntxt),
1741                findLocalsList0(e[-1], cntxt))),
1742@ %def
1743The assigned variable is determined by [[getAssignedVar]]:
1744<<[[getAssignedVar]] function>>=
1745getAssignedVar <- function(e, cntxt) {
1746    v <- e[[2]]
1747    if (missing(v))
1748        cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
1749    else if (typeof(v) %in% c("symbol", "character"))
1750        as.character(v)
1751    else {
1752        while (typeof(v) == "language") {
1753            if (length(v) < 2)
1754                cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
1755            v <- v[[2]]
1756            if (missing(v))
1757                cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
1758        }
1759        if (typeof(v) != "symbol")
1760            cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt)
1761        as.character(v)
1762    }
1763}
1764@ %def getAssignedVar
1765
1766For [[for]] loops the loop variable is added to any variables found in
1767the sequence and body expressions.
1768<<[[findLocals0]] switch clauses>>=
1769"for" = unique(c(as.character(e[2]),
1770                 findLocalsList0(e[-2], cntxt))),
1771@ %def
1772
1773The variable in [[assign]] and [[delayedAssign]] expressions is
1774considered local if it is an explicit character string and there is no
1775environment argument.
1776<<[[findLocals0]] switch clauses>>=
1777"delayedAssign" =,
1778"assign" = if (length(e) == 3 &&
1779               is.character(e[[2]]) &&
1780               length(e[[2]]) == 1)
1781               c(e[[2]], findLocals0(e[[3]], cntxt))
1782           else findLocalsList0(e[1], cntxt),
1783@ %def
1784
1785Variables defined within local functions created by [[function]]
1786expressions do not shadow globals within the containing expression and
1787therefore [[function]] expressions do not contribute any new local
1788variables. Similarly, [[local]] calls without an environment argument
1789create a new environment for evaluating their expression and do not
1790add new local variables.  If an environment argument is present then
1791this might be the current environment and so assignments in the
1792expression are considered to create possible local variables.
1793Finally, [[~]], [[expression]], and [[quote]] do not
1794evaluate their arguments and so do not contribute new local variables.
1795<<[[findLocals0]] switch clauses>>=
1796"function" = character(0),
1797"~" = character(0),
1798"local" = if (length(e) == 2) character(0)
1799          else findLocalsList0(e[-1], cntxt),
1800"expression" =,
1801"quote" = character(0),
1802@ %def findLocals0
1803Other functions, for example [[Quote]] from the [[methods]] package,
1804are also known to not evaluate their arguments but these do not often
1805contain assignment expressions and so ignoring them only slightly
1806increases the degree of conservatism in this approach.
1807
1808A problem with this simple implementation is that it assumes that all
1809of the functions named in the [[switch]] correspond to the bindings in
1810the base package.  This is reasonable for the ones that are
1811syntactically special, but not for [[expression]], [[local]] and
1812[[quote]].  These might be shadowed by local definitions in a
1813surrounding function.  To allow for this we can add an optional
1814variable [[shadowed]] for providing a character vector of names of
1815variables with shadowing local definitions.
1816
1817The more sophisticated implementaiton is also slightly optimized to avoid
1818recursive calls. [[findLocals1]] now, instead of searching through the full
1819transitive closure of language objects, only searches from the first, but
1820returns what remains to be searched. The variables found are stored into an
1821environment, which avoids some extra calls and assures that each variable is
1822listed at most once.
1823<<[[findLocals1]] function>>=
1824addVar <- function(v, vars) assign(v, 1, envir = vars)
1825findLocals1 <- function(e, shadowed = character(0), cntxt, vars) {
1826    if (typeof(e) == "language") {
1827        if (typeof(e[[1]]) %in% c("symbol", "character")) {
1828            v <- as.character(e[[1]])
1829            switch(v,
1830                   <<[[findLocals1]] switch clauses>>
1831                   e[-1])
1832        }
1833        else e
1834    }
1835    else NULL
1836}
1837@ %def findLocals1
1838%% **** merge into single chunk??
1839<<[[findLocalsList1]] function>>=
1840findLocalsList1 <- function(elist, shadowed, cntxt) {
1841    todo <- elist
1842    vars <- new.env()
1843    while(length(todo) > 0) {
1844        newtodo <- list()
1845        lapply(todo, function(e)
1846            lapply(findLocals1(e, shadowed, cntxt, vars),
1847                   function(x)
1848		       if (typeof(x) == "language")
1849		           newtodo <<- append(newtodo, x))
1850        )
1851	todo <- newtodo
1852    }
1853    ls(vars, all.names=T)
1854}
1855
1856@ %def findLocalsList1
1857The handling of assignment operators, [[for]] loops, [[function]] and
1858[[~]] expressions is analogous to the approach in [[findLocals0]].
1859<<[[findLocals1]] switch clauses>>=
1860"=" =,
1861"<-" = { addVar(getAssignedVar(e, cntxt), vars); e[-1] },
1862
1863"for" = { addVar(as.character(e[2]), vars); e[-2] },
1864
1865"delayedAssign" =,
1866"assign" = if (length(e) == 3 &&
1867               is.character(e[[2]]) &&
1868               length(e[[2]]) == 1) {
1869
1870               addVar(e[[2]], vars); list(e[[3]])
1871           }
1872           else e[1],
1873"function" = character(0),
1874"~" = character(0),
1875@ %def
1876The rules for ignoring assignments in [[local]], [[expression]], and
1877[[quote]] calls are only applied if there are no shadowing
1878definitions.
1879<<[[findLocals1]] switch clauses>>=
1880"local" = if (! v %in% shadowed && length(e) == 2)
1881              NULL
1882          else e[-1],
1883"expression" =,
1884"quote" = if (! v %in% shadowed)
1885              NULL
1886          else e[-1],
1887@ %def
1888The assignment functions could also be shadowed, but this is not very
1889common, and assuming that they are not errs in the conservative
1890direction.
1891
1892This approach can handle the case where [[quote]] or one of the other
1893non-syntactic functions is shadowed by an outer definition but does not
1894handle assignments that occur in the expression itself.  For example,
1895in
1896\begin{verbatim}
1897function (f, x, y) {
1898    local <- f
1899    local(x <- y)
1900    x
1901}
1902\end{verbatim}
1903the reference to [[x]] in the third line has to be considered
1904potentially local.  To deal with this multiple passes are needed.  The
1905first pass assumes that [[expression]], [[local]] or [[quote]] might
1906be shadowed by local assignments.  If no assignments to some of them
1907are visible, then a second pass can be used in which they are assumed
1908not to be shadowed.  This can be iterated to convergence.  It is also
1909useful to check before returning whether any of the syntactically
1910special variables has been assigned to.  If so, so a warning is
1911issued.
1912%% **** look into speeding up findLocalsList
1913<<[[findLocalsList]] function>>=
1914findLocalsList <- function(elist, cntxt) {
1915    initialShadowedFuns <- c("expression", "local", "quote")
1916    shadowed <- Filter(function(n) ! isBaseVar(n, cntxt), initialShadowedFuns)
1917    specialSyntaxFuns <- c("~", "<-", "=", "for", "function")
1918    sf <- initialShadowedFuns
1919    nsf <- length(sf)
1920    repeat {
1921        vals <- findLocalsList1(elist, sf, cntxt)
1922        redefined <- sf %in% vals
1923        last.nsf <- nsf
1924        sf <- unique(c(shadowed, sf[redefined]))
1925        nsf <- length(sf)
1926        ## **** need to fix the termination condition used in codetools!!!
1927        if (last.nsf == nsf) {
1928            rdsf <- vals %in% specialSyntaxFuns
1929            if (any(rdsf))
1930                ## cannot get location info (source reference) here
1931                notifyAssignSyntacticFun(vals[rdsf], cntxt)
1932            return(vals)
1933        }
1934    }
1935}
1936@ %def findLocalsList
1937<<[[findLocals]] function>>=
1938findLocals <- function(e, cntxt)
1939    findLocalsList(list(e), cntxt)
1940@ %def findLocals
1941
1942Standard definitions for all functions in [[initialShadowedFuns]] are
1943in the base package and [[isBaseVar]] checks the compilation
1944environment to see whether the specified variable's definition comes
1945from that package either via a namespace or the global environment.
1946<<[[isBaseVar]] function>>=
1947isBaseVar <- function(var, cntxt) {
1948    info <- getInlineInfo(var, cntxt)
1949    (! is.null(info) &&
1950     (identical(info$frame, .BaseNamespaceEnv) ||
1951      identical(info$frame, baseenv())))
1952}
1953@ %def isBaseVar
1954The use of [[getInlineInfo]], defined in Section \ref{sec:inlining}, means
1955that the setting of the [[optimize]] compiler option will influence whether
1956a variable should be considered to be from the base package or not.
1957It might also be useful to warn about assignments to other functions.
1958
1959When a [[function]] expression is compiled, its body and default
1960arguments need to be compiled using a compilation environment that
1961contains a new frame for the function that contains variables for the
1962arguments and any assignments in the body and the default expressions.
1963[[funEnv]] creates such an environment.
1964<<[[funEnv]] function>>=
1965## augment compiler environment with function args and locals
1966funEnv <- function(forms, body, cntxt) {
1967    cntxt$env <- addCenvFrame(cntxt$env, names(forms))
1968    locals <- findLocalsList(c(forms, body), cntxt)
1969    addCenvVars(cntxt$env, locals)
1970}
1971@ %def funEnv
1972
1973
1974\section{The inlining mechanism}
1975\label{sec:inlining}
1976To allow for inline coding of calls to some functions the [[cmpCall]]
1977function calls the [[tryInline]] function.  The [[tryInline]] function
1978will either generate code for the call and return [[TRUE]], or it will
1979decline to do so and return [[FALSE]], in which case the standard code
1980generation process for a function call is used.
1981
1982The function [[tryInline]] calls [[getInlineInfo]] to determine
1983whether inlining is permissible given the current environment and
1984optimization settings.  There are four possible optimization levels:
1985\begin{description}
1986\item[Level 0:] No inlining.
1987\item[Level 1:] Functions in the base packages found through a
1988  namespace that are not shadowed by function arguments or visible
1989  local assignments may be inlined.
1990\item[Level 2:] In addition to the inlining permitted by Level 1,
1991  functions that are syntactically special or are considered core
1992  language functions and are found via the global environment at
1993  compile time may be inlined. Other functions in the base packages
1994  found via the global environment may be inlined with a guard that
1995  ensures at runtime that the inlined function has not been masked;
1996  if it has, then the call in handled by the AST interpreter.
1997\item[Level 3:] Any function in the base packages found via the global
1998  environment may be inlined.
1999  %% **** should there be an explicit list of functions where inlining
2000  %% **** is OK here??
2001\end{description}
2002The syntactically special and core language functions are
2003<<[[languageFuns]] definition>>=
2004languageFuns <- c("^", "~", "<", "<<-", "<=", "<-", "=", "==", ">", ">=",
2005                  "|", "||", "-", ":", "!", "!=", "/", "(", "[", "[<-", "[[",
2006                  "[[<-", "{", "@", "$", "$<-", "*", "&", "&&", "%/%", "%*%",
2007                  "%%", "+",
2008                  "::", ":::", "@<-",
2009                  "break", "for", "function", "if", "next", "repeat", "while",
2010                  "local", "return", "switch")
2011@ %def languageFuns
2012%% **** local, return, and switch are dubious here
2013%% **** if we allow them, should we also allow a few others, like .Internal?
2014The default optimization level is Level 2. Future versions of the
2015compiler may allow some functions to be explicitly excluded from
2016inlining and may provide a means for allowing user-defined functions
2017to be declared eligible for inlining.
2018
2019If inlining is permissible then the result returned by
2020[[getInlineInfo]] contains the packages associated with the specified
2021variable in the current environment. The variable name and package are
2022then looked up in a data base of handlers.  If a handler is found then
2023the handler is called.  The handler can either generate code and
2024return [[TRUE]] or decline to and return [[FALSE]].  If inlining is
2025not possible then [[getInlineInfo]] returns [[NULL]] and [[tryInline]]
2026returns [[FALSE]].
2027%% **** think about adding GETNSFUN to use when inlining is OK
2028<<[[tryInline]] function>>=
2029tryInline <- function(e, cb, cntxt) {
2030    name <- as.character(e[[1]])
2031    info <- getInlineInfo(name, cntxt, guardOK = TRUE)
2032    if (is.null(info))
2033        FALSE
2034    else {
2035        h <- getInlineHandler(name, info$package)
2036        if (! is.null(h)) {
2037            if (info$guard) {
2038	        <<inline with a guard instruction>>
2039            }
2040            else h(e, cb, cntxt)
2041	}
2042        else FALSE
2043    }
2044}
2045@ %def tryInline
2046If a guard instruction is needed then the instruction is emitted that
2047will check validity of the inlined function at runtime; if the inlined
2048code is not valid the guard instruction will evaluate the call in the
2049AST interpreter and jump over the inlined code. The inlined code is
2050handled as a non-tail-call; if the call is in tail position, then a
2051return instruction is emitted.
2052<<inline with a guard instruction>>=
2053tailcall <- cntxt$tailcall
2054if (tailcall) cntxt$tailcall <- FALSE
2055expridx <- cb$putconst(e)
2056endlabel <- cb$makelabel()
2057cb$putcode(BASEGUARD.OP, expridx, endlabel)
2058if (! h(e, cb, cntxt))
2059    cmpCall(e, cb, cntxt, inlineOK = FALSE)
2060cb$putlabel(endlabel)
2061if (tailcall) cb$putcode(RETURN.OP)
2062TRUE
2063@
2064
2065The function [[getInlineInfo]] implements the optimization rules
2066described at the beginning of this section.
2067<<[[getInlineInfo]] function>>=
2068noInlineSymbols <- c("standardGeneric")
2069
2070getInlineInfo <- function(name, cntxt, guardOK = FALSE) {
2071    optimize <- cntxt$optimize
2072    if (optimize > 0 && ! (name %in% noInlineSymbols)) {
2073        info <- findCenvVar(name, cntxt$env)
2074        if (is.null(info))
2075            NULL
2076        else {
2077            ftype <- info$ftype
2078            frame <- info$frame
2079            if (ftype == "namespace") {
2080                <<fixup for a namespace import frame>>
2081                info$package <- nsName(findHomeNS(name, frame, cntxt))
2082		info$guard <- FALSE
2083                info
2084            }
2085            else if (ftype == "global" &&
2086                     (optimize >= 3 ||
2087                      (optimize >= 2 && name %in% languageFuns))) {
2088                info$package <- packFrameName(frame)
2089		info$guard <- FALSE
2090                info
2091            }
2092            else if (guardOK && ftype == "global" &&
2093                     packFrameName(frame) == "base") {
2094                info$package <- packFrameName(frame)
2095                info$guard <- TRUE
2096                info
2097            }
2098            else NULL
2099        }
2100    }
2101    else NULL
2102}
2103@ %def getInlineInfo
2104The code for finding the home namespace from a namespace import frame
2105is needed here to deal with the fact that a namespace may not be
2106registered when this function is called, so the mechanism used in
2107[[findHomeNS]] to locate the namespace to which an import frame
2108belongs may not work.
2109<<fixup for a namespace import frame>>=
2110if (! isNamespace(frame)) {
2111    ## should be the import frame of the current topenv
2112    top <- topenv(cntxt$env$env)
2113    if (! isNamespace(top) ||
2114        ! identical(frame, parent.env(top)))
2115        cntxt$stop(gettext("bad namespace import frame"))
2116    frame <- top
2117}
2118@
2119
2120For this version of the compiler the inline handler data base is
2121managed as an environment in which handlers are entered and looked up
2122by name.  For now it is assumed that a name can only appear associated
2123with one package and an error is signaled if an attempt is made to
2124redefine a handler for a given name for a different package than an
2125existing definition.  This can easily be changed if it should prove too
2126restrictive.
2127%% **** note on haveInlineHandler
2128%% **** allow same name in multiple packages?
2129<<inline handler implementation>>=
2130inlineHandlers <- new.env(hash = TRUE, parent = emptyenv())
2131
2132setInlineHandler <- function(name, h, package = "base") {
2133    if (exists(name, inlineHandlers, inherits = FALSE)) {
2134        entry <- get(name, inlineHandlers)
2135        if (entry$package != package) {
2136            fmt <- "handler for '%s' is already defined for another package"
2137            stop(gettextf(fmt, name), domain = NA)
2138        }
2139    }
2140    entry <- list(handler = h, package = package)
2141    assign(name, entry, inlineHandlers)
2142}
2143
2144getInlineHandler <- function(name, package = "base") {
2145    if (exists(name, inlineHandlers, inherits = FALSE)) {
2146        hinfo <- get(name, inlineHandlers)
2147        if (hinfo$package == package)
2148            hinfo$handler
2149        else NULL
2150    }
2151    else NULL
2152}
2153
2154haveInlineHandler <- function(name, package = "base") {
2155    if (exists(name, inlineHandlers, inherits = FALSE)) {
2156        hinfo <- get(name, inlineHandlers)
2157        package == hinfo$package
2158    }
2159    else FALSE
2160}
2161@ %def inlineHandlers getInlineHandler setInlineHandler haveInlineHandler
2162
2163
2164\section{Default inlining rules for primitives}
2165This section defines generic handlers for [[BUILTIN]] and [[SPECIAL]]
2166functions.  These are installed programmatically for all [[BUILTIN]]
2167and [[SPECIAL]] functions.  The following sections present more
2168specialized handlers for a range of functions that are installed in
2169place of the default ones.
2170<<install default inlining handlers>>=
2171local({
2172    <<install default [[SPECIAL]] handlers>>
2173    <<install default [[BUILTIN]] handlers>>
2174})
2175@
2176The handler installations are wrapped in a [[local]] call to reduce
2177environment pollution.
2178
2179
2180\subsection{[[BUILTIN]] functions}
2181Calls to functions known at compile time to be of type [[BUILTIN]] can
2182be handled more efficiently. The interpreter evaluates all arguments
2183for [[BUILTIN]] functions before calling the function, so the compiler
2184can evaluate the arguments in line without the need for creating
2185promises.
2186
2187A generic handler for inlining a call to a [[BUILIN]] function is
2188provided by [[cmpBuiltin]].  For now, the handler returns [[FALSE]] if
2189the call contains missing arguments, which are currently not allowed
2190in [[BUILTIN]] functions, or [[...]]  arguments.  The handling of
2191[[...]] arguments should be improved.
2192%% **** look into improving handling ... arguments to BUILTINs?
2193For [[BUILTIN]] functions the function to call is pushed on the stack
2194with the [[GETBUILTIN]] instruction.  The [[internal]] argument allows
2195[[cmpBuiltin]] to be used with [[.Internal]] functions of type
2196[[BUILTIN]] as well; this is used in the handler for [[.Internal]]
2197defined in Section \ref{subsec:.Internal}.
2198<<[[cmpBuiltin]] function>>=
2199cmpBuiltin <- function(e, cb, cntxt, internal = FALSE) {
2200    fun <- e[[1]]
2201    args <- e[-1]
2202    names <- names(args)
2203    if (dots.or.missing(args))
2204        FALSE
2205    else {
2206        ci <- cb$putconst(fun)
2207        if (internal)
2208            cb$putcode(GETINTLBUILTIN.OP, ci)
2209        else
2210            cb$putcode(GETBUILTIN.OP, ci)
2211        cmpBuiltinArgs(args, names, cb, cntxt)
2212        ci <- cb$putconst(e)
2213        cb$putcode(CALLBUILTIN.OP, ci)
2214        if (cntxt$tailcall) cb$putcode(RETURN.OP)
2215        TRUE
2216    }
2217}
2218@ %def cmpBuiltin
2219
2220Argument evaluation code is generated by [[cmpBuiltinArgs]].  In the
2221context of [[BUILTIN]] functions missing arguments are currently not
2222allowed.  But to allow [[cmpBuiltinArgs]] to be used in other contexts
2223missing arguments are supported if the optional argument [[missingOK]]
2224is [[TRUE]].
2225%% **** should this warn/stop if there are missings and missingOK is FALSE??
2226%% **** can this be adjusted so error messages match the interpreter?
2227%% **** for f <- function(x, y) x + y compare errors for f(1,) and cmpfun(f)(1,)
2228%% **** test code for constant folding is needed (sym and non-sym)
2229<<[[cmpBuiltinArgs]] function>>=
2230cmpBuiltinArgs <- function(args, names, cb, cntxt, missingOK = FALSE) {
2231    ncntxt <- make.argContext(cntxt)
2232    for (i in seq_along(args)) {
2233        a <- args[[i]]
2234        n <- names[[i]]
2235        <<compile missing [[BUILTIN]] argument>>
2236        ## **** handle ... here ??
2237        <<signal an error for promise or bytecode argument>>
2238        <<compile a general [[BUILTIN]] argument>>
2239    }
2240}
2241@ %def cmpBuiltinArgs
2242
2243Missing argument code is generated by
2244<<compile missing [[BUILTIN]] argument>>=
2245if (missing(a)) {
2246    if (missingOK) {
2247        cb$putcode(DOMISSING.OP)
2248        cmpTag(n, cb)
2249    }
2250    else
2251        cntxt$stop(gettext("missing arguments are not allowed"), cntxt,
2252                   loc = cb$savecurloc())
2253}
2254@
2255The error case should not be reached as [[cmpBuiltinArgs]] should not
2256be called with missing arguments unless [[missingOK]] is [[TRUE]].
2257
2258The code for general arguments handles symbols separately to allow for
2259the case when missing values are acceptable.  Constant folding is
2260tried first since the constant folding code in [[cmp]] is not reached
2261in this case.  Constant folding is needed here since it doesn't go
2262through [[cmp]].
2263<<compile a general [[BUILTIN]] argument>>=
2264else {
2265    if (is.symbol(a)) {
2266        ca <- constantFold(a, cntxt, loc = cb$savecurloc())
2267        if (is.null(ca)) {
2268            cmpSym(a, cb, ncntxt, missingOK)
2269            cb$putcode(PUSHARG.OP)
2270        }
2271        else
2272            cmpConstArg(ca$value, cb, cntxt)
2273    }
2274    else if (typeof(a) == "language") {
2275        cmp(a, cb, ncntxt)
2276        cb$putcode(PUSHARG.OP)
2277    }
2278    else
2279        cmpConstArg(a, cb, cntxt)
2280    cmpTag(n, cb)
2281}
2282@ %def
2283Handling the constant case separately is not really necessary but
2284makes the code a bit cleaner.
2285
2286Default handlers for all [[BUILTIN]] functions in the [[base]] package
2287are installed programmatically by
2288<<install default [[BUILTIN]] handlers>>=
2289for (b in basevars[types == "builtin"])
2290    if (! haveInlineHandler(b, "base"))
2291        setInlineHandler(b, cmpBuiltin)
2292@ %def
2293
2294
2295\subsection{[[SPECIAL]] functions}
2296Calls to functions known to be of type [[SPECIAL]] can also be
2297compiled somewhat more efficiently by the [[cmpSpecial]] function:
2298<<[[cmpSpecial]] function>>=
2299cmpSpecial <- function(e, cb, cntxt) {
2300    fun <- e[[1]]
2301    if (typeof(fun) == "character")
2302        fun <- as.name(fun)
2303    ci <- cb$putconst(e)
2304    cb$putcode(CALLSPECIAL.OP, ci)
2305    if (cntxt$tailcall)
2306        cb$putcode(RETURN.OP)
2307    TRUE
2308}
2309@ %def cmpSpecial
2310
2311This handler is installed for all [[SPECIAL]] functions in the base
2312package with
2313<<install default [[SPECIAL]] handlers>>=
2314basevars <- ls('package:base', all.names = TRUE)
2315types <- sapply(basevars, function(n) typeof(get(n)))
2316for (s in basevars[types == "special"])
2317    if (! haveInlineHandler(s, "base"))
2318        setInlineHandler(s, cmpSpecial)
2319@ %def
2320
2321
2322\section{Some simple inlining handlers}
2323This section presents inlining handlers for a number of core primitive
2324functions.  With these additions the compiler will begin to show some
2325performance improvements.
2326
2327\subsection{The left brace sequencing function}
2328The inlining handler for [[{]] needs to consider that a pair of braces
2329[[{]] and [[}]] can surround zero, one, or more expressions.  A set
2330of empty braces is equivalent to the constant [[NULL]].  If there is
2331more than one expression, then all the values of all expressions other
2332than the last are ignored.  These expressions are compiled in a
2333no-value context (currently equivalent to a non-tail-call context),
2334and then code is generated to pop their values off the stack.  The
2335final expression is then compiled according to the context in which
2336the braces expression occurs.
2337<<inlining handler for left brace function>>=
2338setInlineHandler("{", function(e, cb, cntxt) {
2339    n <- length(e)
2340    if (n == 1)
2341        cmp(NULL, cb, cntxt)
2342    else {
2343        sloc <- cb$savecurloc()
2344        bsrefs <- attr(e, "srcref")
2345        if (n > 2) {
2346            ncntxt <- make.noValueContext(cntxt)
2347            for (i in 2 : (n - 1)) {
2348                subexp <- e[[i]]
2349                cb$setcurloc(subexp, getBlockSrcref(bsrefs, i))
2350                cmp(subexp, cb, ncntxt, setloc = FALSE)
2351                cb$putcode(POP.OP)
2352            }
2353        }
2354        subexp <- e[[n]]
2355        cb$setcurloc(subexp, getBlockSrcref(bsrefs, n))
2356        cmp(subexp, cb, cntxt, setloc = FALSE)
2357        cb$restorecurloc(sloc)
2358    }
2359    TRUE
2360})
2361@ %def
2362
2363
2364\subsection{The closure constructor function}
2365Compiling of [[function]] expressions is somewhat similar to compiling
2366promises for function arguments.  The body of a function is compiled
2367into a separate byte code object and stored in the constant pool
2368together with the formals.  Then code is emitted for creating a
2369closure from the formals, compiled body, and the current environment.
2370For now, only the body of functions is compiled, not the
2371default argument expressions.  This should be changed in future
2372versions of the compiler.
2373<<inlining handler for [[function]]>>=
2374setInlineHandler("function", function(e, cb, cntxt) {
2375    forms <- e[[2]]
2376    body <- e[[3]]
2377    sref <- if (length(e) > 3) e[[4]] else NULL
2378    ncntxt <- make.functionContext(cntxt, forms, body)
2379    if (mayCallBrowser(body, cntxt))
2380        return(FALSE)
2381    cbody <- genCode(body, ncntxt, loc = cb$savecurloc())
2382    ci <- cb$putconst(list(forms, cbody, sref))
2383    cb$putcode(MAKECLOSURE.OP, ci)
2384    if (cntxt$tailcall) cb$putcode(RETURN.OP)
2385    TRUE
2386})
2387@ %def
2388
2389
2390\subsection{The left parenthesis function}
2391In R an expression of the form [[(expr)]] is interpreted as a call to
2392the function [[(]] with the argument [[expr]].  Parentheses are used
2393to guide the parser, and for the most part [[(expr)]] is equivalent to
2394[[expr]]. There are two exceptions:
2395\begin{itemize}
2396\item Since [[(]] is a function an expression of the form [[(...)]] is
2397  legal whereas just [[...]] may not be, depending on the context.  A
2398  runtime error will occur unless the [[...]] argument expands to
2399  exactly one non-missing argument.
2400\item In tail position a call to [[(]] sets the visible flag to
2401  [[TRUE]].  So at top level for example the result of an assignment
2402  expression [[x <- 1]] would not be printed, but the result of [[(x
2403  <- 1]] would be printed.  It is not clear that this feature really
2404  needs to be preserved within functions --- it could be made a
2405  feature of the read-eval-print loop --- but for now it is a feature
2406  of the interpreter that the compiler should preserve.
2407\end{itemize}
2408
2409The inlining handler for [[(]] calls handles a [[...]]  argument case
2410or a case with fewer or more than one argument as a generic
2411[[BUILTIN]] call.  If the expression is in tail position then the
2412argument is compiled in a non-tail-call context, a [[VISIBLE]]
2413instruction is emitted to set the visible flag to [[TRUE]], and a
2414[[RETURN]] instruction is emitted.  If the expression is in non-tail
2415position, then code for the argument is generated in the current context.
2416<<inlining handler for [[(]]>>=
2417setInlineHandler("(", function(e, cb, cntxt) {
2418    if (any.dots(e))
2419        cmpBuiltin(e, cb, cntxt) ## punt
2420    else if (length(e) != 2) {
2421        notifyWrongArgCount("(", cntxt, loc = cb$savecurloc())
2422        cmpBuiltin(e, cb, cntxt) ## punt
2423    }
2424    else if (cntxt$tailcall) {
2425        ncntxt <- make.nonTailCallContext(cntxt)
2426        cmp(e[[2]], cb, ncntxt)
2427        cb$putcode(VISIBLE.OP)
2428        cb$putcode(RETURN.OP)
2429        TRUE
2430    }
2431    else {
2432        cmp(e[[2]], cb, cntxt)
2433        TRUE
2434    }
2435})
2436@ %def
2437
2438
2439\subsection{The [[.Internal]] function}
2440\label{subsec:.Internal}
2441One frequently used [[SPECIAL]] function is [[.Internal]]. When the
2442[[.Internal]] function called is of type [[BUILTIN]] it is useful to
2443compile the call as for a [[BUILTIN]] function.  For [[.Internal]]
2444functions of type [[SPECIAL]] there is less of an advantage, and so
2445the [[.Internal]] expression is compiled with [[cmpSpecial]].  It may
2446be useful to introduce a [[GETINTLSPECIAL]] instruction and handle
2447these analogously to [[.Internal]] functions of type [[BUILTIN]].  The
2448handler is assigned to the variable [[cmpDotInternalCall]] to allow
2449its use in inlining.
2450%% **** look into adding GETINTLSPECIAL??
2451<<inlining handler for [[.Internal]]>>=
2452cmpDotInternalCall <- function(e, cb, cntxt) {
2453    ee <- e[[2]]
2454    sym <- ee[[1]]
2455    if (.Internal(is.builtin.internal(sym)))
2456        cmpBuiltin(ee, cb, cntxt, internal = TRUE)
2457    else
2458        cmpSpecial(e, cb, cntxt)
2459}
2460
2461setInlineHandler(".Internal", cmpDotInternalCall)
2462@ %def cmpDotInternalCall
2463
2464
2465\subsection{The [[local]] function}
2466While [[local]] is currently implemented as a closure, because of its
2467importance relative to local variable determination it is a good idea
2468to inline it as well. The current semantics are such that the
2469interpreter treats
2470\begin{verbatim}
2471local(expr)
2472\end{verbatim}
2473essentially the same as
2474\begin{verbatim}
2475(function() expr)()
2476\end{verbatim}
2477There may be some minor differences related to what the [[sys.xyz]]
2478functions return. An instance of this was found in the [[RefManageR]]
2479package which used [[parent.frame(2)]] to access the environment from which
2480[[local]] was invoked. In this case, the use of [[parent.frame]] was
2481unnecessary (and [[local]] was not needed either); the maintainer
2482accepted a patch fixing this. The code pattern in the package was
2483\begin{verbatim}
2484MakeBibLaTeX <- function(docstyle = "text") local({
2485  docstyle <- get("docstyle", parent.frame(2))
2486  sortKeys <- function() 42
2487  environment()
2488})
2489\end{verbatim}
2490and the suggested fix was
2491\begin{verbatim}
2492MakeBibLaTeX <- function(docstyle = "text") {
2493  sortKeys <- function() 42
2494  environment()
2495}
2496\end{verbatim}
2497So the compiler handles one argument [[local]] calls by making this
2498conversion and compiling the result.
2499%% **** add to language manual?
2500<<inlining handler for [[local]] function>>=
2501setInlineHandler("local", function(e, cb, cntxt) {
2502    if (length(e) == 2) {
2503        ee <- as.call(list(as.call(list(
2504            as.name("function"), NULL, e[[2]], NULL))))
2505        cmp(ee, cb, cntxt)
2506        TRUE
2507    }
2508    else FALSE
2509})
2510@ %def
2511The interpreter couls, and probably should, be modified to handle this
2512case of a [[local]] call expression in the same way as the compiler.
2513
2514\subsection{The [[return]] function}
2515\label{subsec:return}
2516A call to [[return]] causes a return from the associated function
2517call, as determined by the lexical context in which the [[return]]
2518expression is defined.  If the [[return]] is captured in a closure and
2519is executed within a callee then this requires a [[longjmp]].  A
2520[[longjmp]] is also needed if the [[return]] call occurs within a loop
2521that is compiled to a separate code object to support a [[setjmp]] for
2522[[break]] or [[next]] calls.  The [[RETURNJMP]] instruction is
2523provided for this purpose.  In all other cases an ordinary [[RETURN]]
2524instruction can be used.
2525%% **** if function body code was tagged as such then changing from
2526%% **** RETURNJMP to RETURN could be done by post-processing the
2527%% **** bytecode or by putcode
2528[[return]] calls with [[...]], which may be legal if [[...]] contains
2529only one argument, or missing arguments or more than one argument,
2530which will produce runtime errors, are compiled as generic [[SPECIAL]]
2531calls.
2532<<inlining handler for [[return]] function>>=
2533setInlineHandler("return", function(e, cb, cntxt) {
2534    if (dots.or.missing(e) || length(e) > 2)
2535        cmpSpecial(e, cb, cntxt) ## **** punt for now
2536    else {
2537        if (length(e) == 1)
2538            val <- NULL
2539        else
2540            val <- e[[2]]
2541        ncntxt <- make.nonTailCallContext(cntxt)
2542        cmp(val, cb, ncntxt)
2543	if (cntxt$needRETURNJMP)
2544            cb$putcode(RETURNJMP.OP)
2545        else
2546            cb$putcode(RETURN.OP)
2547    }
2548    TRUE
2549})
2550@
2551
2552
2553\section{Branching and labels}
2554The code generated so far is straight line code without conditional or
2555unconditional branches.  To implement conditional evaluation
2556constructs and loops we need to add conditional and unconditional
2557branching instructions.  These make use of the labels mechanism
2558provided by the code buffer.
2559
2560
2561\subsection{Inlining handler for [[if]] expressions}
2562Using the labels mechanism we can implement an inlining handler for
2563[[if]] expressions.  The first step extracts the components of the
2564expression.  An [[if]] expression with no [[else]] clause will
2565invisibly return [[NULL]] if the test is [[FALSE]], but the visible
2566flag setting only matters if the [[if]] expression is in tail
2567position.  So the case of no [[else]] clause will be handled slightly
2568differently in tail and non-tail contexts.
2569%% **** In no value contexts it would be good to avoid pushing and
2570%% **** immediately popping constants. Alternatively a peephole optimizer
2571%% **** could clean these up.
2572%% **** Should there be error checking for either two or three arguments here??
2573<<[[if]] inline handler body>>=
2574test <- e[[2]]
2575then.expr <- e[[3]]
2576if (length(e) == 4) {
2577    have.else.expr <- TRUE
2578    else.expr <- e[[4]]
2579}
2580else have.else.expr <- FALSE
2581@ %def
2582
2583To deal with use of [[if (FALSE) ...]] for commenting out code and of
2584[[if (is.R()) ... else ...]] for handling both R and Splus code it is
2585useful to attempt to constant-fold the test.  If this succeeds and
2586produces either [[TRUE]] or [[FALSE]] then only the appropriate branch
2587is compiled and the handler returns [[TRUE]].
2588<<[[if]] inline handler body>>=
2589ct <- constantFold(test, cntxt, loc = cb$savecurloc())
2590if (! is.null(ct) && is.logical(ct$value) && length(ct$value) == 1
2591    && ! is.na(ct$value)) {
2592    if (ct$value)
2593        cmp(then.expr, cb, cntxt)
2594    else if (have.else.expr)
2595        cmp(else.expr, cb, cntxt)
2596    else if (cntxt$tailcall) {
2597        cb$putcode(LDNULL.OP)
2598        cb$putcode(INVISIBLE.OP)
2599        cb$putcode(RETURN.OP)
2600    }
2601    else cb$putcode(LDNULL.OP)
2602    return(TRUE)
2603}
2604@
2605
2606Next, the test code is compiled, a label for the start of code for the
2607[[else]] clause is generated, and a conditional branch instruction
2608that branches to the [[else]] label if the test fails is emitted.
2609This is followed by code for the consequent (test is [[TRUE]])
2610expression.  The [[BRIFNOT]] takes two operand, the constant pool
2611index for the call and the label to branch to if the value on the
2612stack is [[FALSE]].  The call is used if an error needs to be signaled
2613for an improper test result on the stack.
2614<<[[if]] inline handler body>>=
2615ncntxt <- make.nonTailCallContext(cntxt)
2616cmp(test, cb, ncntxt)
2617callidx <- cb$putconst(e)
2618else.label <- cb$makelabel()
2619cb$putcode(BRIFNOT.OP, callidx, else.label)
2620cmp(then.expr, cb, cntxt)
2621@
2622
2623The code for the alternative [[else]] expression will be placed after
2624the code for the consequent expression.  If the [[if]] expression
2625appears in tail position then the code for the consequent will end with
2626a [[RETURN]] instruction and there is no need to jump over the
2627following instructions for the [[else]] expression. All that is needed
2628is to record the value of the label for the [[else]] clause and to
2629emit the code for the [[else]] clause.  If no [[else]] clause was
2630provided then that code arranges for the value [[NULL]] to be returned
2631invisibly.
2632<<[[if]] inline handler body>>=
2633if (cntxt$tailcall) {
2634    cb$putlabel(else.label)
2635    if (have.else.expr)
2636        cmp(else.expr, cb, cntxt)
2637    else {
2638        cb$putcode(LDNULL.OP)
2639        cb$putcode(INVISIBLE.OP)
2640        cb$putcode(RETURN.OP)
2641    }
2642}
2643@ %def
2644On the other hand, if the [[if]] expression is not in tail position
2645then a label for the next instruction after the [[else]] expression
2646code is needed, and the consequent expression code needs to end with a
2647[[GOTO]] instruction to that label.  If the expression does not
2648include an [[else]] clause then the alternative code just places
2649[[NULL]] on the stack.
2650<<[[if]] inline handler body>>=
2651else {
2652    end.label <- cb$makelabel()
2653    cb$putcode(GOTO.OP, end.label)
2654    cb$putlabel(else.label)
2655    if (have.else.expr)
2656        cmp(else.expr, cb, cntxt)
2657    else
2658        cb$putcode(LDNULL.OP)
2659    cb$putlabel(end.label)
2660}
2661@ %def
2662
2663The resulting handler definition is
2664<<inlining handler for [[if]]>>=
2665setInlineHandler("if", function(e, cb, cntxt) {
2666    ## **** test for missing, ...
2667    <<[[if]] inline handler body>>
2668    TRUE
2669})
2670@ %def
2671%% **** need some assembly code examples??
2672
2673
2674\subsection{Inlining handlers for [[&&]] and [[||]] expressions}
2675In many languages it is possible to convert the expression [[a && b]]
2676to an equivalent [[if]] expression of the form
2677\begin{verbatim}
2678if (a) { if (b) TRUE else FALSE }
2679\end{verbatim}
2680Similarly, in these languages the expression [[a || b]] is equivalent
2681to
2682\begin{verbatim}
2683if (a) TRUE else if (b) TRUE else FALSE
2684\end{verbatim}
2685Compilation of these expressions is thus reduced to compiling [[if]]
2686expressions.
2687
2688Unfortunately, because of the possibility of [[NA]] values, these
2689equivalencies do not hold in R. In R, [[NA || TRUE]] should evaluate
2690to [[TRUE]] and [[NA && FALSE]] to [[FALSE]].  This is handled by
2691introducing special instructions [[AND1ST]] and [[AND2ND]] for [[&&]]
2692expressions and [[OR1ST]] and [[OR2ND]] for [[||]].
2693
2694The code generator for [[&&]] expressions generates code to evaluate
2695the first argument and then emits an [[AND1ST]] instruction. The
2696[[AND1ST]] instruction has one operand, the label for the instruction
2697following code for the second argument.  If the value on the stack
2698produced by the first argument is [[FALSE]] then [[AND1ST]] jumps to
2699the label and skips evaluation of the second argument; the value of
2700the expression is [[FALSE]].  The code for the second argument is
2701generated next, followed by an [[AND2ND]] instruction.  This removes
2702the values of the two arguments to [[&&]] from the stack and pushes
2703the value of the expression onto the stack.  A [[RETURN]] instruction
2704is generated if the [[&&]] expression was in tail position.
2705%% **** check over all uses of argContext vs nonTailCallContext
2706%% **** The first argument can use nonTailCallCOntext because nothing
2707%% **** is on the stack yet.  The second one has to use argContext.
2708%% **** This wouldn't be an issue if break/next could reset the stack
2709%% **** before the jump.
2710<<inlining handler for [[&&]]>>=
2711setInlineHandler("&&", function(e, cb, cntxt) {
2712    ## **** arity check??
2713    ncntxt <- make.argContext(cntxt)
2714    callidx <- cb$putconst(e)
2715    label <- cb$makelabel()
2716    cmp(e[[2]], cb, ncntxt)
2717    cb$putcode(AND1ST.OP, callidx, label)
2718    cmp(e[[3]], cb, ncntxt)
2719    cb$putcode(AND2ND.OP, callidx)
2720    cb$putlabel(label)
2721    if (cntxt$tailcall)
2722        cb$putcode(RETURN.OP)
2723    TRUE
2724})
2725@ %def
2726
2727The code generator for [[||]] expressions is analogous.
2728<<inlining handler for [[||]]>>=
2729setInlineHandler("||", function(e, cb, cntxt) {
2730    ## **** arity check??
2731    ncntxt <- make.argContext(cntxt)
2732    callidx <- cb$putconst(e)
2733    label <- cb$makelabel()
2734    cmp(e[[2]], cb, ncntxt)
2735    cb$putcode(OR1ST.OP, callidx, label)
2736    cmp(e[[3]], cb, ncntxt)
2737    cb$putcode(OR2ND.OP, callidx)
2738    cb$putlabel(label)
2739    if (cntxt$tailcall)
2740        cb$putcode(RETURN.OP)
2741    TRUE
2742})
2743@ %def
2744
2745
2746\section{Loops}
2747\label{sec:loops}
2748In principle code for [[repeat]] and [[while]] loops can be generated
2749using just [[GOTO]] and [[BRIFNOT]] instructions.  [[for]] loops
2750require a little more to manage the loop variable and termination.  A
2751complication arises due to the need to support [[break]] and [[next]]
2752calls in the context of lazy evaluation of arguments: if a [[break]]
2753or [[next]] expression appears in a function argument that is compiled
2754as a closure, then the expression may be evaluated deep inside a
2755series of nested function calls and require a non-local jump.  A
2756similar issue arises for calls to the [[return]] function as described
2757in Section \ref{subsec:return}.
2758
2759To support these non-local jumps the interpreter sets up a [[setjmp]]
2760context for each loop, and [[break]] and [[next]] use [[longjmp]] to
2761transfer control.  In general, compiled loops need to use a similar
2762approach.  For now, this is implemented by the [[STARTLOOPCNTXT]] and
2763[[ENDLOOPCNTXT]] instructions.  The [[STARTLOOPCNTXT]] instructions
2764takes two operands, a flag indicating whether the loop is a [[for]]
2765loop or not, and a label which points after the loop. The interpreter
2766jumps to this label in case of a non-local jump implementing
2767[[break]].  The loop body should end with a call to [[ENDLOOPCNTXT]],
2768which takes one operand indicating whether this is a [[for]] loop or
2769not.  [[ENDLOOPCNTXT]] terminates the context established by
2770[[STARTLOOPCNTXT]] and pops it off the context stack.  The context
2771data is stored on the byte code interpreter stack; in the case of a
2772[[for]] loop some loop state information is duplicated on the stack by
2773[[STARTLOOPCNTXT]] and removed again by [[ENDLOOPCNTXT]]. The byte
2774code intepreter stores the [[pc]] in a slot in the [[RCNTXT]]
2775structure so it is available after a [[longjmp]] triggered by a
2776[[break]] for retrieving the label on the [[ENDLOOPCNTXT]]
2777instruction.  An alternative would be to add separate
2778[[STARTFORLOOPCNTXT]] and [[ENDFORLOOPCNTXT]] instructions. Then the
2779[[pc]] or the label could be stored on the note stack.
2780
2781At least with some assumptions it is often possible to implement
2782[[break]] and [[next]] calls as simple [[GOTO]]s.  If all [[break]]
2783and [[next]] calls in a loop can be implemented using [[GOTO]]s then
2784the loop context is not necessary.  The mechanism to enable the
2785simpler code generation is presented in Section
2786\ref{subsec:skipcntxt}.
2787
2788The current engine implementation executes one [[setjmp]] per
2789[[STARTLOOPCNTXT]] and uses nested calls to [[bceval]] to run the
2790code.  Eventually we should be able to reduce the need for nested
2791[[bceval]] calls and to arrange that [[setjmp]] buffers be reused for
2792multiple purposes.
2793
2794
2795\subsection{[[repeat]] loops}
2796The simplest loop in R is the [[repeat]] loop.  The code generator is
2797defined as
2798<<inlining handler for [[repeat]] loops>>=
2799setInlineHandler("repeat", function(e, cb, cntxt) {
2800    body <- e[[2]]
2801    <<generate context and body for [[repeat]] loop>>
2802    <<generate [[repeat]] and [[while]] loop wrap-up code>>
2803    TRUE
2804})
2805@ %def
2806
2807If a loop context is not needed then the code for the loop body is
2808just written to the original code buffer.  The [[else]] clause in the
2809code chunk below generates the code for the general case. The need for
2810using [[RETURNJMP]] for [[return]] calls is indicated by setting the
2811[[needRETURNJMP]] flag in the compiler context to [[TRUE]].
2812<<generate context and body for [[repeat]] loop>>=
2813if (checkSkipLoopCntxt(body, cntxt))
2814    cmpRepeatBody(body, cb, cntxt)
2815else {
2816    cntxt$needRETURNJMP <- TRUE ## **** do this a better way
2817    ljmpend.label <- cb$makelabel()
2818    cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label)
2819    cmpRepeatBody(body, cb, cntxt)
2820    cb$putlabel(ljmpend.label)
2821    cb$putcode(ENDLOOPCNTXT.OP, 0)
2822}
2823@ %def
2824
2825The loop body uses two labels. [[loop.label]] marks the top of the
2826loop and is the target of the [[GOTO]] instruction at the end of the
2827body.  This label is also used by [[next]] expressions that do not
2828require [[longjmp]]s.  The [[end.loop]] label is placed after the
2829[[GOTO]] instruction and is used by [[break]] expressions that do not
2830require [[longjmp]]s.  The body is compiled in a context that makes
2831these labels available, and the value left on the stack is removed by
2832a [[POP]] instruction.  The [[POP]] instruction is followed by a
2833[[GOTO]] instruction that returns to the top of the loop.
2834<<[[cmpRepeatBody]] function>>=
2835cmpRepeatBody <- function(body, cb, cntxt) {
2836    loop.label <- cb$makelabel()
2837    end.label <- cb$makelabel()
2838    cb$putlabel(loop.label)
2839    lcntxt <- make.loopContext(cntxt, loop.label, end.label)
2840    cmp(body, cb, lcntxt)
2841    cb$putcode(POP.OP)
2842    cb$putcode(GOTO.OP, loop.label)
2843    cb$putlabel(end.label)
2844}
2845@ %def cmpRepeatBody
2846
2847
2848The wrap-up code for the loop places the [[NULL]] value of the loop
2849expression on the stack and emits [[INVISIBLE]] and [[RETURN]]
2850instructions to return the value if the loop appears in tail position.
2851<<generate [[repeat]] and [[while]] loop wrap-up code>>=
2852cb$putcode(LDNULL.OP)
2853if (cntxt$tailcall) {
2854    cb$putcode(INVISIBLE.OP)
2855    cb$putcode(RETURN.OP)
2856}
2857@ %def
2858
2859The [[break]] and [[next]] code generators emit [[GOTO]] instructions
2860if the loop information is available and the [[gotoOK]] compiler
2861context flag is [[TRUE]].  A warning is issued if no loop is visible
2862in the compilation context.
2863<<inlining handlers for [[next]] and [[break]]>>=
2864setInlineHandler("break", function(e, cb, cntxt) {
2865    if (is.null(cntxt$loop)) {
2866        notifyWrongBreakNext("break", cntxt, loc = cb$savecurloc())
2867        cmpSpecial(e, cb, cntxt)
2868    }
2869    else if (cntxt$loop$gotoOK) {
2870        cb$putcode(GOTO.OP, cntxt$loop$end)
2871        TRUE
2872    }
2873    else cmpSpecial(e, cb, cntxt)
2874})
2875
2876setInlineHandler("next", function(e, cb, cntxt) {
2877    if (is.null(cntxt$loop)) {
2878        notifyWrongBreakNext("next", cntxt, loc = cb$savecurloc())
2879        cmpSpecial(e, cb, cntxt)
2880    }
2881    else if (cntxt$loop$gotoOK) {
2882        cb$putcode(GOTO.OP, cntxt$loop$loop)
2883        TRUE
2884    }
2885    else cmpSpecial(e, cb, cntxt)
2886})
2887@ %def
2888
2889
2890\subsection{[[while]] loops}
2891%% could just compile repeat{ if (condition) body else break } ??
2892The structure for the [[while]] loop code generator is similar to the
2893structure of the [[repeat]] code generator:
2894<<inlining handler for [[while]] loops>>=
2895setInlineHandler("while", function(e, cb, cntxt) {
2896    cond <- e[[2]]
2897    body <- e[[3]]
2898    <<generate context and body for [[while]] loop>>
2899    <<generate [[repeat]] and [[while]] loop wrap-up code>>
2900    TRUE
2901})
2902@ %def
2903The context and body generation chunk is similar as well. The
2904expression stored in the code object isn't quite right as what is
2905compiled includes both the test and the body, but this code object
2906should not be externally visible.
2907<<generate context and body for [[while]] loop>>=
2908if (checkSkipLoopCntxt(cond, cntxt) && checkSkipLoopCntxt(body, cntxt))
2909    cmpWhileBody(e, cond, body, cb, cntxt)
2910else {
2911    cntxt$needRETURNJMP <- TRUE ## **** do this a better way
2912    ljmpend.label <- cb$makelabel()
2913    cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label)
2914    cmpWhileBody(e, cond, body, cb, cntxt)
2915    cb$putlabel(ljmpend.label)
2916    cb$putcode(ENDLOOPCNTXT.OP, 0)
2917}
2918@ %def
2919
2920Again two labels are used, one at the top of the loop and one at the
2921end.  The [[loop.label]] is followed by code for the test.  Next is a
2922[[BRIFNOT]] instruction that jumps to the end of the loop if the value
2923left on the stack by the test is [[FALSE]].  This is followed by the
2924code for the body, a [[POP]] instruction, and a [[GOTO]] instruction
2925that jumps to the top of the loop. Finally, the [[end.label]] is
2926recorded.
2927<<[[cmpWhileBody]] function>>=
2928cmpWhileBody <- function(call, cond, body, cb, cntxt) {
2929    loop.label <- cb$makelabel()
2930    end.label <- cb$makelabel()
2931    cb$putlabel(loop.label)
2932    lcntxt <- make.loopContext(cntxt, loop.label, end.label)
2933    cmp(cond, cb, lcntxt)
2934    callidx <- cb$putconst(call)
2935    cb$putcode(BRIFNOT.OP, callidx, end.label)
2936    cmp(body, cb, lcntxt)
2937    cb$putcode(POP.OP)
2938    cb$putcode(GOTO.OP, loop.label)
2939    cb$putlabel(end.label)
2940}
2941@ cmpWhileBody
2942
2943
2944\subsection{[[for]] loops}
2945%% could compile repeat { if (stepfor) body else break } and peephole a bit ??
2946Code generation for [[for]] loops is a little more complex because of
2947the need to manage the loop variable value and stepping through the
2948sequence.  Code for [[for]] loops uses three additional instructions.
2949[[STARTFOR]] takes the constant pool index of the call, the constant
2950pool index of the loop variable symbol, and the label of the start
2951instruction as operands. It finds the sequence to iterate over on the
2952stack and places information for accessing the loop variable binding
2953and stepping the sequence on the stack before jumping to the label.
2954The call is used if an error for an improper for loop sequence needs
2955to be signaled.  The [[STEPFOR]] instruction takes a label for the top
2956of the loop as its operand.  If there are more elements in the
2957sequence then [[STEPFOR]] advances the position within the sequence,
2958sets the loop variable, and jumps to the top of the loop.  Otherwise
2959it drops through to the next instruction.  Finally [[ENDFOR]] cleans
2960up the loop information stored on the stack by [[STARTFOR]] and leaves
2961the [[NULL]] loop value on the stack.
2962
2963The inlining handler for a [[for]] loop starts out by checking the
2964loop variable and issuing a warning if it is not a symbol.  The code
2965generator then declines to inline the loop expression.  This means it
2966is compiled as a generic function call and will signal an error at
2967runtime.  An alternative would be do generate code to signal the error
2968as is done with improper use of [[...]] arguments.  After checking the
2969symbol, code to compute the sequence to iterate over is generated.
2970From then on the structure is similar to the structure of the other
2971loop code generators.
2972%% **** do cmpSpecial instead of returning FALSE??
2973<<inlining handler for [[for]] loops>>=
2974setInlineHandler("for", function(e, cb, cntxt) {
2975    sym <- e[[2]]
2976    seq <- e[[3]]
2977    body <- e[[4]]
2978    if (! is.name(sym)) {
2979        ## not worth warning here since the parser should not allow this
2980        return(FALSE)
2981    }
2982    ncntxt <- make.nonTailCallContext(cntxt)
2983    cmp(seq, cb, ncntxt)
2984    ci <- cb$putconst(sym)
2985    callidx <- cb$putconst(e)
2986    <<generate context and body for [[for]] loop>>
2987    <<generate [[for]] loop wrap-up code>>
2988    TRUE
2989})
2990@ %def
2991
2992When a [[setjmp]] context is needed, the label given to [[STARTFOR]]
2993is just the following instruction, which is a [[STARTLOOPCNTXT]]
2994instruction.  If the context is not needed then the label for the
2995[[STARTFOR]] instruction will be the loop's [[STEPFOR]] instruction;
2996if the context is needed then the first instruction in the code object
2997for the body will be a [[GOTO]] instruction that jumps to the
2998[[STEPFOR]] instruction.  This design means the stepping and the jump
2999can be handled by one instruction instead of two, a step instruction
3000and a [[GOTO]].
3001<<generate context and body for [[for]] loop>>=
3002if (checkSkipLoopCntxt(body, cntxt))
3003    cmpForBody(callidx, body, ci, cb, cntxt)
3004else {
3005    cntxt$needRETURNJMP <- TRUE ## **** do this a better way
3006    ctxt.label <- cb$makelabel()
3007    cb$putcode(STARTFOR.OP, callidx, ci, ctxt.label)
3008    cb$putlabel(ctxt.label)
3009    ljmpend.label <- cb$makelabel()
3010    cb$putcode(STARTLOOPCNTXT.OP, 1, ljmpend.label)
3011    cmpForBody(NULL, body, NULL, cb, cntxt)
3012    cb$putlabel(ljmpend.label)
3013    cb$putcode(ENDLOOPCNTXT.OP, 1)
3014}
3015@ %def
3016
3017The body code generator takes an additional argument, the index of the
3018loop label.  For the case where a [[setjmp]] context is needed this
3019argument is [[NULL]], and the first instruction generated is a
3020[[GOTO]] targeting the [[STEPFOR]] instruction.  This is labeled by
3021the [[loop.label]] label, since this will also be the target used by a
3022[[next]] expression. An additional label, [[body.label]] is needed for
3023the top of the loop, which is used by [[STEPFOR]] if there are more
3024loop elements to process.  When the [[ci]] argument is not [[NULL]]
3025code is being generated for the case without a [[setjmp]] context, and
3026the first instruction is the [[STARTFOR]] instruction which
3027initializes the loop and jumps to [[loop.label]] at the [[STEPLOOP]]
3028instruction.
3029<<[[cmpForBody]] function>>=
3030cmpForBody <- function(callidx, body, ci, cb, cntxt) {
3031    body.label <- cb$makelabel()
3032    loop.label <- cb$makelabel()
3033    end.label <- cb$makelabel()
3034    if (is.null(ci))
3035        cb$putcode(GOTO.OP, loop.label)
3036    else
3037        cb$putcode(STARTFOR.OP, callidx, ci, loop.label)
3038    cb$putlabel(body.label)
3039    lcntxt <- make.loopContext(cntxt, loop.label, end.label)
3040    cmp(body, cb, lcntxt)
3041    cb$putcode(POP.OP)
3042    cb$putlabel(loop.label)
3043    cb$putcode(STEPFOR.OP, body.label)
3044    cb$putlabel(end.label)
3045}
3046@ %def cmpForBody
3047
3048The wrap-up code issues an [[ENDFOR]] instruction instead of the
3049[[LDNULL]] instruction used for [[repeat]] and [[while]] loops.
3050<<generate [[for]] loop wrap-up code>>=
3051cb$putcode(ENDFOR.OP)
3052if (cntxt$tailcall) {
3053    cb$putcode(INVISIBLE.OP)
3054    cb$putcode(RETURN.OP)
3055}
3056@ %def
3057
3058
3059\subsection{Avoiding runtime loop contexts}
3060\label{subsec:skipcntxt}
3061When all uses of [[break]] or [[next]] in a loop occur only in top
3062level contexts then all [[break]] and [[next]] calls can be
3063implemented with simple [[GOTO]] instructions and a [[setjmp]] context
3064for the loop is not needed. Top level contexts are the loop body
3065itself and argument expressions in top level calls to [[if]], [[{]],
3066and [[(]].  The [[switch]] functions will eventually be included as well.
3067%% **** need to add switch to the top level functions
3068%% **** may not be OK if switch uses vmpSpecial because of ... arg
3069The function [[checkSkipLoopContxt]] recursively traverses an
3070expression tree to determine whether all relevant uses of [[break]] or
3071[[next]] are safe to compile as [[GOTO]] instructions. The search
3072returns [[FALSE]] if a [[break]] or [[next]] call occurs in an unsafe
3073place.  The search stops and returns [[TRUE]] for any expression that
3074cannot contain relevant [[break]] or [[next]] calls.  These stop
3075expressions are calls to the three loop functions and to [[function]].
3076Calls to functions like [[quote]] that are known not to evaluate their
3077arguments could also be included among the stop functions but this
3078doesn't seem particularly worth while at this time. Loops that include a
3079call to [[eval]] (or [[evalq]], [[source]]) are compiled with context to support a
3080programming pattern present e.g. in package [[Rmpi]]: a server application
3081is implemented using an infinite loop, which evaluates de-serialized code
3082received from the client; the server shuts down when it receives a
3083serialized version of [[break]].
3084
3085The recursive checking function is defined as
3086<<[[checkSkipLoopCntxt]] function>>=
3087checkSkipLoopCntxt <- function(e, cntxt, breakOK = TRUE) {
3088    if (typeof(e) == "language") {
3089        fun <- e[[1]]
3090        if (typeof(fun) == "symbol") {
3091            fname <- as.character(fun)
3092            if (! breakOK && fname %in% c("break", "next"))
3093                FALSE
3094            else if (isLoopStopFun(fname, cntxt))
3095                TRUE
3096            else if (isLoopTopFun(fname, cntxt))
3097                checkSkipLoopCntxtList(e[-1], cntxt, breakOK)
3098            else if (fname %in% c("eval", "evalq", "source"))
3099                FALSE
3100            else
3101                checkSkipLoopCntxtList(e[-1], cntxt, FALSE)
3102        }
3103        else
3104            checkSkipLoopCntxtList(e, cntxt, FALSE)
3105    }
3106    else TRUE
3107}
3108@ %def checkSkipLoopCntxt
3109A version that operates on a list of expressions is given by
3110<<[[checkSkipLoopCntxtList]] function>>=
3111checkSkipLoopCntxtList <- function(elist, cntxt, breakOK) {
3112    for (a in as.list(elist))
3113        if (! missing(a) && ! checkSkipLoopCntxt(a, cntxt, breakOK))
3114            return(FALSE)
3115    TRUE
3116}
3117@ %def checkSkipLoopCntxtList
3118
3119The stop functions are identified by [[isLoopStopFun]].  This uses
3120[[isBaseVar]] to ensure that interpreting a reference to a stop
3121function name as referring to the corresponding function in the
3122[[base]] package is permitted by the current optimization settings.
3123%% **** could also stop for quote() and some others.
3124<<[[isLoopStopFun]] function>>=
3125isLoopStopFun <- function(fname, cntxt)
3126    (fname %in% c("function", "for", "while", "repeat") &&
3127     isBaseVar(fname, cntxt))
3128@ %def isLoopStopFun
3129
3130The top level functions are identified by [[isLoopTopFun]].  Again the
3131compilation context is consulted to ensure that candidate can be
3132assumed to be from the [[base]] package.
3133%% **** eventually add "switch"
3134<<[[isLoopTopFun]] function>>=
3135isLoopTopFun <- function(fname, cntxt)
3136    (fname %in% c("(", "{", "if") &&
3137     isBaseVar(fname, cntxt))
3138@ %def isLoopTopFun
3139
3140The [[checkSkipLoopCntxt]] function does not check whether calls to
3141[[break]] or [[next]] are indeed calls to the [[base]] functions.
3142Given the special syntactic nature of [[break]] and [[next]] this is
3143very unlikely to cause problems, but if it does it will result in some
3144safe loops being considered unsafe and so errs in the conservative
3145direction.
3146
3147
3148\section{More inlining}
3149
3150\subsection{Basic arithmetic expressions}
3151The addition and subtraction functions [[+]] and [[-]] are [[BUILTIN]]
3152functions that can both be called with one or two arguments.
3153Multiplication and division functions [[*]] and [[/]] require two
3154arguments.  Since code generation for all one arguments cases and all
3155two argument cases is very similar these are abstracted out into
3156functions [[cmpPrim1]] and [[cmpPrim2]].
3157
3158The code generators for addition and subtraction are given by
3159<<inline handlers for [[+]] and [[-]]>>=
3160setInlineHandler("+", function(e, cb, cntxt) {
3161    if (length(e) == 3)
3162        cmpPrim2(e, cb, ADD.OP, cntxt)
3163    else
3164        cmpPrim1(e, cb, UPLUS.OP, cntxt)
3165})
3166
3167setInlineHandler("-", function(e, cb, cntxt) {
3168    if (length(e) == 3)
3169        cmpPrim2(e, cb, SUB.OP, cntxt)
3170    else
3171        cmpPrim1(e, cb, UMINUS.OP, cntxt)
3172})
3173@ %def
3174The code generators for multiplication and division are
3175<<inline handlers for [[*]] and [[/]]>>=
3176setInlineHandler("*", function(e, cb, cntxt)
3177    cmpPrim2(e, cb, MUL.OP, cntxt))
3178
3179setInlineHandler("/", function(e, cb, cntxt)
3180    cmpPrim2(e, cb, DIV.OP, cntxt))
3181@ %def
3182
3183Code for instructions corresponding to calls to a [[BUILTIN]] function
3184with one argument are generated by [[cmpPrim1]]. The generator
3185produces code for a generic [[BUILTIN]] call using [[cmpBuiltin]] if
3186if there are any missing or [[...]] arguments or if the number of
3187arguments is not equal to one.  Otherwise code for the argument is
3188generated in a non-tail-call context, and the instruction provided as
3189the [[op]] argument is emitted followed by a [[RETURN]] instruction
3190for an expression in tail position. The [[op]] instructions take the
3191call as operand for use in error message and for internal dispatching.
3192<<[[cmpPrim1]] function>>=
3193cmpPrim1 <- function(e, cb, op, cntxt) {
3194    if (dots.or.missing(e[-1]))
3195        cmpBuiltin(e, cb, cntxt)
3196    else if (length(e) != 2) {
3197        notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
3198        cmpBuiltin(e, cb, cntxt)
3199    }
3200    else {
3201        ncntxt <- make.nonTailCallContext(cntxt)
3202        cmp(e[[2]], cb, ncntxt);
3203	ci <- cb$putconst(e)
3204        cb$putcode(op, ci)
3205        if (cntxt$tailcall)
3206            cb$putcode(RETURN.OP)
3207        TRUE
3208    }
3209}
3210@ %def cmpPrim1
3211
3212Code generation for the two argument case is similar, except that the
3213second argument has to be compiled with an argument context since the
3214stack already has the value of the first argument on it and that would
3215need to be popped before a jump.
3216<<[[cmpPrim2]] function>>=
3217cmpPrim2 <- function(e, cb, op, cntxt) {
3218    if (dots.or.missing(e[-1]))
3219        cmpBuiltin(e, cb, cntxt)
3220    else if (length(e) != 3) {
3221        notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
3222        cmpBuiltin(e, cb, cntxt)
3223    }
3224    else {
3225        needInc <- checkNeedsInc(e[[3]], cntxt)
3226        ncntxt <- make.nonTailCallContext(cntxt)
3227        cmp(e[[2]], cb, ncntxt);
3228        if (needInc) cb$putcode(INCLNK.OP)
3229        ncntxt <- make.argContext(cntxt)
3230        cmp(e[[3]], cb, ncntxt)
3231        if (needInc) cb$putcode(DECLNK.OP)
3232        ci <- cb$putconst(e)
3233        cb$putcode(op, ci)
3234        if (cntxt$tailcall)
3235            cb$putcode(RETURN.OP)
3236        TRUE
3237    }
3238}
3239@ %def cmpPrim2
3240
3241The [[INCLNK]] and [[DECLNK]] instructions are used to protect
3242evaluated arguents on the stack from modifications during evaluation
3243of subsequent arguments. These instructions can be omitted if the
3244subsequent argument evaluations cannot modify values on the stack.
3245With changes to stack protection this is mechanism is no longer
3246needed, so the check just returns [[FALSE]].
3247<<[[checkNeedsInc]] function>>=
3248checkNeedsInc <- function(e, cntxt)
3249    return(FALSE)
3250@ %def checkNeedsInc
3251<<old [[checkNeedsInc]] function>>=
3252checkNeedsInc <- function(e, cntxt) {
3253    type <- typeof(e)
3254    if (type %in% c("language", "bytecode", "promise"))
3255        TRUE
3256    else FALSE ## symbols and constants
3257}
3258@ %def checkNeedsInc
3259
3260Calls to the power function [[^]] and the functions [[exp]] and
3261[[sqrt]] can be compiled using [[cmpPrim1]] and [[cmpPrim2]] as well:
3262<<inline handlers for [[^]], [[exp]], and [[sqrt]]>>=
3263setInlineHandler("^", function(e, cb, cntxt)
3264    cmpPrim2(e, cb, EXPT.OP, cntxt))
3265
3266setInlineHandler("exp", function(e, cb, cntxt)
3267    cmpPrim1(e, cb, EXP.OP, cntxt))
3268
3269setInlineHandler("sqrt", function(e, cb, cntxt)
3270    cmpPrim1(e, cb, SQRT.OP, cntxt))
3271@
3272
3273The [[log]] function is currently defined as a [[SPECIAL]].  The
3274default inline handler action is therefore to use [[cmpSpecial]]. For
3275calls with one unnamed argument the [[LOG.OP]] instruction is
3276used. For two unnamed arguments [[LOGBASE.OP]] is used. It might be
3277useful to introduce instructions for [[log2]] and [[log10]] as well
3278but this has not been done yet.
3279<<inline handler for [[log]]>>=
3280setInlineHandler("log", function(e, cb, cntxt) {
3281    if (dots.or.missing(e) || ! is.null(names(e)) ||
3282        length(e) < 2 || length(e) > 3)
3283        cmpSpecial(e, cb, cntxt)
3284    else {
3285        ci <- cb$putconst(e)
3286        ncntxt <- make.nonTailCallContext(cntxt)
3287        cmp(e[[2]], cb, ncntxt);
3288        if (length(e) == 2)
3289            cb$putcode(LOG.OP, ci)
3290        else {
3291            needInc <- checkNeedsInc(e[[3]], cntxt)
3292            if (needInc) cb$putcode(INCLNK.OP)
3293            ncntxt <- make.argContext(cntxt)
3294            cmp(e[[3]], cb, ncntxt)
3295            if (needInc) cb$putcode(DECLNK.OP)
3296            cb$putcode(LOGBASE.OP, ci)
3297        }
3298        if (cntxt$tailcall)
3299            cb$putcode(RETURN.OP)
3300        TRUE
3301    }
3302})
3303@
3304
3305A number of one argument math functions are handled by the interpreter
3306using the function [[math1]] in [[arithmetic.c]]. The [[MATH1.OP]]
3307instruction handles these for compuled code. The instruction takes two
3308operands, an index for the call expression in the constant table, and
3309an index for the function to be called in a table of function
3310pointers. The table of names in the byte code compiler has to match
3311the function pointer array in the byte code interpreter.  It would
3312have been possible to use the same indices as the offset values used
3313in [[names.c]], but keeping this consistent seemed more challenging.
3314<<list of one argument math functions>>=
3315## Keep the order consistent with the order in the internal byte code
3316## interpreter!
3317math1funs <- c("floor", "ceiling", "sign",
3318               "expm1", "log1p",
3319               "cos", "sin", "tan", "acos", "asin", "atan",
3320               "cosh", "sinh", "tanh", "acosh", "asinh", "atanh",
3321               "lgamma", "gamma", "digamma", "trigamma",
3322               "cospi", "sinpi", "tanpi")
3323@ %def math1funs
3324
3325The code generation is done by [[cmpMath1]]:
3326<<[[cmpMath1]] function>>=
3327cmpMath1 <- function(e, cb, cntxt) {
3328    if (dots.or.missing(e[-1]))
3329        cmpBuiltin(e, cb, cntxt)
3330    else if (length(e) != 2) {
3331        notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc())
3332        cmpBuiltin(e, cb, cntxt)
3333    }
3334    else {
3335        name <- as.character(e[[1]])
3336        idx <- match(name, math1funs) - 1
3337        if (is.na(idx))
3338            cntxt$stop(
3339                paste(sQuote(name), "is not a registered math1 function"),
3340                cntxt, loc = cb$savecurloc())
3341        ncntxt <- make.nonTailCallContext(cntxt)
3342        cmp(e[[2]], cb, ncntxt);
3343        ci <- cb$putconst(e)
3344        cb$putcode(MATH1.OP, ci, idx)
3345        if (cntxt$tailcall)
3346            cb$putcode(RETURN.OP)
3347        TRUE
3348    }
3349}
3350@ %def cmpMath1
3351The generators are installed by
3352<<inline one argument math functions>>=
3353for (name in math1funs)
3354    setInlineHandler(name, cmpMath1)
3355@
3356
3357\subsection{Logical operators}
3358Two argument instructions are provided for the comparison operators
3359and code for them can be generated using [[cmpPrim2]]:
3360<<inline handlers for comparison operators>>=
3361setInlineHandler("==", function(e, cb, cntxt)
3362   cmpPrim2(e, cb, EQ.OP, cntxt))
3363
3364setInlineHandler("!=", function(e, cb, cntxt)
3365   cmpPrim2(e, cb, NE.OP, cntxt))
3366
3367setInlineHandler("<", function(e, cb, cntxt)
3368   cmpPrim2(e, cb, LT.OP, cntxt))
3369
3370setInlineHandler("<=", function(e, cb, cntxt)
3371   cmpPrim2(e, cb, LE.OP, cntxt))
3372
3373setInlineHandler(">=", function(e, cb, cntxt)
3374   cmpPrim2(e, cb, GE.OP, cntxt))
3375
3376setInlineHandler(">", function(e, cb, cntxt)
3377   cmpPrim2(e, cb, GT.OP, cntxt))
3378@ %def
3379
3380The vectorized [[&]] and [[|]] functions are handled similarly:
3381<<inline handlers for [[&]] and [[|]]>>=
3382setInlineHandler("&", function(e, cb, cntxt)
3383   cmpPrim2(e, cb, AND.OP, cntxt))
3384
3385setInlineHandler("|", function(e, cb, cntxt)
3386   cmpPrim2(e, cb, OR.OP, cntxt))
3387@ %def
3388
3389The negation operator [[!]] takes only one argument and code for calls
3390to it are generated using [[cmpPrim1]]:
3391<<inline handler for [[!]]>>=
3392setInlineHandler("!", function(e, cb, cntxt)
3393   cmpPrim1(e, cb, NOT.OP, cntxt))
3394@ %def
3395
3396%% **** do log() somewhere around here?
3397%% **** is log(x,) == log(x)???
3398%% **** is log(,y) allowed?
3399
3400
3401\subsection{Subsetting and related operations}
3402\label{subsec:subset}
3403Current R semantics are such that the subsetting operator [[[]] and a
3404number of others may not evaluate some of their arguments if S3 or S4
3405methods are available.  S-plus has different semantics---there the
3406subsetting operator is guaranteed to evaluate its arguments.
3407% In the case of the concatenation function [[c]] it is not clear
3408% whether these semantics are worth preserving; changing [[c]] to a
3409% [[BUILTIN]] seems to cause no problems on [[CRAN]] and [[BioC]]
3410% packages tested.
3411For subsetting there are [[CRAN]] packages that use non-standard
3412evaluation of their arguments ([[igraph]] is one example), so this
3413probably can no longer be changed.
3414
3415The compiler preserve these semantics.  To do so subsetting is
3416implemented in terms of two instructions, [[STARTSUBSET]] and
3417[[DFLTSUBSET]].  The object being subsetted is evaluated and placed on
3418the stack. [[STARTSUBSET]] takes a constant table index for the
3419expression and a label operand as operands and examines the object on
3420the stack.  If an internal S3 or S4 dispatch succeeds then the
3421receiver object is removed and the result is placed on the stack and a
3422jump to the label is carried out.  If the dispatch fails then code to
3423evaluate and execute the arguments is executed followed by a
3424[[DFLTSUBSET]] instruction.
3425This pattern is used for several other operations and is abstracted
3426into the code generation function [[cmpDispatch]]. Code for subsetting
3427and other operations is then generated by
3428<<inlining handlers for some dispatching SPECIAL functions>>=
3429# **** this is now handled differently; see "Improved subset ..."
3430# setInlineHandler("[", function(e, cb, cntxt)
3431#     cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt))
3432
3433# **** c() is now a BUILTIN
3434# setInlineHandler("c", function(e, cb, cntxt)
3435#     cmpDispatch(STARTC.OP, DFLTC.OP, e, cb, cntxt, FALSE))
3436
3437# **** this is now handled differently; see "Improved subset ..."
3438# setInlineHandler("[[", function(e, cb, cntxt)
3439#     cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt))
3440@
3441
3442The [[cmpDispatch]] function takes the two opcodes as arguments.  It
3443declines to handle cases with [[...]] arguments in the call or with a
3444missing first argument --- these will be handled as calls to a
3445[[SPECIAL]] primitive. For the case handled it generates code for the
3446first argument, followed by a call to the first [[start.op]]
3447instruction.  The operands for the [[start.op]] are a constant pool
3448index for the expression and a label for the instruction following the
3449[[dflt.op]] instruction that allows skipping over the default case
3450code. The default case code consists of code to compute and push the
3451arguments followed by the [[dflt.op]] instruction.
3452<<[[cmpDispatch]] function>>=
3453cmpDispatch <- function(start.op, dflt.op, e, cb, cntxt, missingOK = TRUE) {
3454    if ((missingOK && any.dots(e)) ||
3455        (! missingOK && dots.or.missing(e)) ||
3456        length(e) == 1)
3457        cmpSpecial(e, cb, cntxt) ## punt
3458    else {
3459        ne <- length(e)
3460        oe <- e[[2]]
3461        if (missing(oe))
3462            cmpSpecial(e, cb, cntxt) ## punt
3463        else {
3464            ncntxt <- make.argContext(cntxt)
3465            cmp(oe, cb, ncntxt)
3466            ci <- cb$putconst(e)
3467            end.label <- cb$makelabel()
3468            cb$putcode(start.op, ci, end.label)
3469            if (ne > 2)
3470                cmpBuiltinArgs(e[-(1:2)], names(e)[-(1:2)], cb, cntxt,
3471                               missingOK)
3472            cb$putcode(dflt.op)
3473            cb$putlabel(end.label)
3474            if (cntxt$tailcall) cb$putcode(RETURN.OP)
3475            TRUE
3476        }
3477    }
3478}
3479@ %def cmpDispatch
3480%% **** The implementation currently implies that arguments to things
3481%% **** with S4 methods may be evaluated more than once if dispatch
3482%% **** does not happen.  It would be better to rewrite this so if
3483%% **** arguments are evaluated we stay with the interpreted version
3484%% **** all the way.  This requires a bit of refactoring of
3485%% **** DispatchOrEval code to get it to work. But it should not
3486%% **** affect the compiler.
3487%% ****
3488%% **** There may be some merit to always go with the interpreted code
3489%% **** if the receiver has the object bit set -- that way the
3490%% **** sequence could be done as
3491%% ****
3492%% ****     if (object bit set)
3493%% ****         CALLSPECIAL
3494%% ****     else
3495%% ****         do default thing
3496%% ****
3497%% **** and in some cases the object bit test can be hoisted.
3498
3499The [[$]] function is simpler to implement since its selector argument
3500is never evaluated.  The [[DOLLAR]] instruction takes the object to
3501extract a component from off the stack and takes a constant index
3502argument specifying the selection symbol.
3503%% signal warning if selector is not a symbol or a string??
3504%% also decline if any missing args?
3505<<inlining handler for [[$]]>>=
3506setInlineHandler("$", function(e, cb, cntxt) {
3507    if (any.dots(e) || length(e) != 3)
3508        cmpSpecial(e, cb, cntxt)
3509    else {
3510        sym <- if (is.character(e[[3]]) && length(e[[3]]) == 1
3511                   && e[[3]] != "")
3512            as.name(e[[3]]) else e[[3]]
3513        if (is.name(sym)) {
3514            ncntxt <- make.argContext(cntxt)
3515            cmp(e[[2]], cb, ncntxt)
3516            ci <- cb$putconst(e)
3517            csi <- cb$putconst(sym)
3518            cb$putcode(DOLLAR.OP, ci, csi)
3519            if (cntxt$tailcall) cb$putcode(RETURN.OP)
3520            TRUE
3521        }
3522        else cmpSpecial(e, cb, cntxt)
3523    }
3524})
3525@ %def
3526
3527
3528\subsection{Inlining simple [[.Internal]] functions}
3529A number of functions are defined as simple wrappers around
3530[[.Internal]] calls. One example is [[dnorm]], which is currently
3531defined as
3532\begin{verbatim}
3533dnorm <- function (x, mean = 0, sd = 1, log = FALSE)
3534    .Internal(dnorm(x, mean, sd, log))
3535\end{verbatim}
3536The implementation of [[.Internal]] functions can be of type
3537[[BUILTIN]] or [[SPECIAL]].  The [[dnorm]] implementation is of type
3538[[BUILTIN]], so its arguments are guaranteed to be evaluated in order,
3539and this particular function doe not depend on the position of its
3540calls in the evaluation stack. As a result, a call of the form
3541\begin{verbatim}
3542dnorm(2, 1)
3543\end{verbatim}
3544can be replaced by the call
3545\begin{verbatim}
3546.Internal(dnorm(2, 1, 1, FALSE))
3547\end{verbatim}
3548%% **** except for error messages maybe??
3549This can result in considerable speed-up since it avoids the overhead
3550of the call to the wrapper function.
3551
3552The substitution of a call to the wrapper with a [[.Internal]] call
3553can be done by a function [[inlineSimpleInternalCall]] defined as
3554<<[[inlineSimpleInternalCall]] function>>=
3555inlineSimpleInternalCall <- function(e, def) {
3556    if (! dots.or.missing(e) && is.simpleInternal(def)) {
3557        forms <- formals(def)
3558        b <- body(def)
3559        if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{")
3560            b <- b[[2]]
3561        icall <- b[[2]]
3562        defaults <- forms ## **** could strip missings but OK not to?
3563        cenv <- c(as.list(match.call(def, e, F))[-1], defaults)
3564        subst <- function(n)
3565            if (typeof(n) == "symbol") cenv[[as.character(n)]] else n
3566        args <- lapply(as.list(icall[-1]), subst)
3567        as.call(list(quote(.Internal), as.call(c(icall[[1]], args))))
3568    }
3569    else NULL
3570}
3571@ %def inlineSimpleInternalCall
3572
3573Code for an inlined simple internal function can then be generated by
3574[[cmpSimpleInternal]]:
3575<<[[cmpSimpleInternal]] function>>=
3576cmpSimpleInternal <- function(e, cb, cntxt) {
3577    if (any.dots(e))
3578        FALSE
3579    else {
3580        name <- as.character(e[[1]])
3581        def <- findFunDef(name, cntxt)
3582        if (! checkCall(def, e, NULL)) return(FALSE)
3583        call <- inlineSimpleInternalCall(e, def)
3584        if (is.null(call))
3585            FALSE
3586	else
3587            cmpDotInternalCall(call, cb, cntxt)
3588    }
3589}
3590@ %def cmpSimpleInternal
3591
3592<<inline safe simple [[.Internal]] functions from [[base]]>>=
3593safeBaseInternals <- c("atan2", "besselY", "beta", "choose",
3594                       "drop", "inherits", "is.vector", "lbeta", "lchoose",
3595                       "nchar", "polyroot", "typeof", "vector", "which.max",
3596                       "which.min", "is.loaded", "identical",
3597                       "match", "rep.int", "rep_len")
3598
3599for (i in safeBaseInternals) setInlineHandler(i,  cmpSimpleInternal)
3600@ %def safeBaseInternals
3601
3602%% **** nextn would also be OK with a broader definition of 'safe'
3603<<inline safe simple [[.Internal]] functions from [[stats]]>>=
3604safeStatsInternals <- c("dbinom", "dcauchy", "dgeom", "dhyper", "dlnorm",
3605                        "dlogis", "dnorm", "dpois", "dunif", "dweibull",
3606                        "fft", "mvfft", "pbinom", "pcauchy",
3607                        "pgeom", "phyper", "plnorm", "plogis", "pnorm",
3608                        "ppois", "punif", "pweibull", "qbinom", "qcauchy",
3609                        "qgeom", "qhyper", "qlnorm", "qlogis", "qnorm",
3610                        "qpois", "qunif", "qweibull", "rbinom", "rcauchy",
3611                        "rgeom", "rhyper", "rlnorm", "rlogis", "rnorm",
3612                        "rpois", "rsignrank",  "runif", "rweibull",
3613                        "rwilcox", "ptukey", "qtukey")
3614
3615for (i in safeStatsInternals) setInlineHandler(i,  cmpSimpleInternal, "stats")
3616@ %def
3617
3618It is possible to automate the process of identifying functions with
3619the simple wrapper form and with [[.Internal]] implementations of type
3620[[BUILTIN]], and the function [[simpleInternals]] produces a list of
3621such candidates for a given package on the search path.  But
3622determining whether such a candidate can be safely inlined needs to be
3623done manually.  Most can, but some, such as [[sys.call]], cannot since
3624they depend on their position on the call stack (removing the wrapper
3625call that the implementation expects would change the result).
3626Nevertheless, [[simpleInternals]] is useful for providing a list of
3627candidates to screen. The [[is.simpleInternal]] function can be used
3628in test code to check that the assumption made in the compiler is
3629valid.  The implementation is
3630<<[[simpleInternals]] function>>=
3631simpleInternals <- function(pos = "package:base") {
3632    names <- ls(pos = pos, all.names = TRUE)
3633    if (length(names) == 0)
3634        character(0)
3635    else {
3636        fn <-  function(n)
3637            is.simpleInternal(get(n, pos = pos))
3638        names[sapply(names, fn)]
3639    }
3640}
3641@ %def simpleInternals
3642
3643<<[[is.simpleInternal]] function>>=
3644is.simpleInternal <- function(def) {
3645    if (typeof(def) == "closure" && simpleFormals(def)) {
3646        b <- body(def)
3647        if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{")
3648            b <- b[[2]]
3649        if (typeof(b) == "language" &&
3650            typeof(b[[1]]) == "symbol" &&
3651            b[[1]] == ".Internal") {
3652            icall <- b[[2]]
3653            ifun <- icall[[1]]
3654            typeof(ifun) == "symbol" &&
3655            .Internal(is.builtin.internal(as.name(ifun))) &&
3656            simpleArgs(icall, names(formals(def)))
3657        }
3658        else FALSE
3659    }
3660    else FALSE
3661}
3662@ %def is.simpleInternal
3663
3664<<[[simpleFormals]] function>>=
3665simpleFormals <- function(def) {
3666    forms <- formals(def)
3667    if ("..." %in% names(forms))
3668        return(FALSE)
3669    for (d in as.list(forms)) {
3670        if (! missing(d)) {
3671            ## **** check constant folding
3672            if (typeof(d) %in% c("symbol", "language", "promise", "bytecode"))
3673                return(FALSE)
3674        }
3675    }
3676    TRUE
3677}
3678@ %def simpleFormals
3679
3680<<[[simpleArgs]] function>>=
3681simpleArgs <- function(icall, fnames) {
3682    for (a in as.list(icall[-1])) {
3683        if (missing(a))
3684            return(FALSE)
3685        else if (typeof(a) == "symbol") {
3686            if (! (as.character(a) %in% fnames))
3687                return(FALSE)
3688        }
3689        else if (typeof(a) %in% c("language", "promise", "bytecode"))
3690            return(FALSE)
3691    }
3692    TRUE
3693}
3694@ %def simpleArgs
3695
3696
3697\subsection{Inlining [[is.xyz]] functions}
3698Most of the [[is.xyz]] functions in [[base]] are simple [[BUILTIN]]s
3699that do not do internal dispatch.  They have simple instructions
3700defined for them and are compiled in a common way.  [[cmpIs]] abstract
3701out the common compilation process.
3702<<[[cmpIs]] function>>=
3703cmpIs <- function(op, e, cb, cntxt) {
3704    if (any.dots(e) || length(e) != 2)
3705        cmpBuiltin(e, cb, cntxt)
3706    else {
3707        ## **** check that the function is a builtin somewhere??
3708        s<-make.argContext(cntxt)
3709        cmp(e[[2]], cb, s)
3710        cb$putcode(op)
3711        if (cntxt$tailcall) cb$putcode(RETURN.OP)
3712        TRUE
3713    }
3714}
3715@ %def cmpIs
3716
3717Inlining handlers are then defined by
3718<<inlining handlers for [[is.xyz]] functions>>=
3719setInlineHandler("is.character", function(e, cb, cntxt)
3720    cmpIs(ISCHARACTER.OP, e, cb, cntxt))
3721setInlineHandler("is.complex", function(e, cb, cntxt)
3722    cmpIs(ISCOMPLEX.OP, e, cb, cntxt))
3723setInlineHandler("is.double", function(e, cb, cntxt)
3724    cmpIs(ISDOUBLE.OP, e, cb, cntxt))
3725setInlineHandler("is.integer", function(e, cb, cntxt)
3726    cmpIs(ISINTEGER.OP, e, cb, cntxt))
3727setInlineHandler("is.logical", function(e, cb, cntxt)
3728    cmpIs(ISLOGICAL.OP, e, cb, cntxt))
3729setInlineHandler("is.name", function(e, cb, cntxt)
3730     cmpIs(ISSYMBOL.OP, e, cb, cntxt))
3731setInlineHandler("is.null", function(e, cb, cntxt)
3732    cmpIs(ISNULL.OP, e, cb, cntxt))
3733setInlineHandler("is.object", function(e, cb, cntxt)
3734    cmpIs(ISOBJECT.OP, e, cb, cntxt))
3735setInlineHandler("is.symbol", function(e, cb, cntxt)
3736    cmpIs(ISSYMBOL.OP, e, cb, cntxt))
3737@ %def
3738
3739At present [[is.numeric]], [[is.matrix]], and [[is.array]] do internal
3740dispatching so we just handle them as ordinary [[BUILTIN]]s.  It might
3741be worth defining virtual machine instructions for them as well.
3742
3743
3744\subsection{Inline handler for calling C functions}
3745The [[.Call]] interface is now the preferred interface for calling C
3746functions and is also used in base packages like [[stat]]. The
3747[[DOTCALL.OP]] instruction allows these calls to be made without
3748allocating a list of arguments---the arguments are accumulated on the
3749stack. For now only 16 or fewer arguments are handled; more arguments,
3750and cases with named arguments, are handled by the standard [[.Call]]
3751[[BUILTIN]].
3752<<inline handler for [[.Call]]>>=
3753setInlineHandler(".Call", function(e, cb, cntxt) {
3754    nargsmax <- 16 ## should match DOTCALL_MAX in eval.c
3755    if (dots.or.missing(e[-1]) || ! is.null(names(e)) ||
3756        length(e) < 2 || length(e) > nargsmax + 2)
3757        cmpBuiltin(e, cb, cntxt) ## punt
3758    else {
3759        ncntxt <- make.nonTailCallContext(cntxt)
3760        cmp(e[[2]], cb, ncntxt);
3761        nargs <- length(e) - 2
3762        if (nargs > 0) {
3763            ncntxt <- make.argContext(cntxt)
3764            for (a in as.list(e[-(1:2)]))
3765                cmp(a, cb, ncntxt);
3766        }
3767        ci <- cb$putconst(e)
3768        cb$putcode(DOTCALL.OP, ci, nargs)
3769        if (cntxt$tailcall)
3770            cb$putcode(RETURN.OP)
3771        TRUE
3772    }
3773})
3774@
3775
3776
3777\subsection{Inline handlers for generating integer sequences}
3778The colon operator and the [[BUILTIN]] functions [[seq_along]] and
3779[[seq_len]] generate sequences (the sequence might not be integers if
3780long vectors are involved or the colon operator is given no-integer
3781arguments). The [[COLON.OP]], [[SEQALONG.OP]], and [[SEQLEN.OP]]
3782instructions implement these operations in byte code. This allows an
3783implementation in which the result stored on the stack is not a fully
3784realized sequence but only a recipe that the [[for]] loop, for
3785example, can use to run the loop without generating the sequence.
3786This is optionally implemented in the byte code interpreter. It would
3787also be possible to allow the compact sequence representation to be
3788stored in variables, etc., but this would require more extensive
3789changes.
3790<<inline handlers for integer sequences>>=
3791setInlineHandler(":", function(e, cb, cntxt)
3792    cmpPrim2(e, cb, COLON.OP, cntxt))
3793
3794setInlineHandler("seq_along", function(e, cb, cntxt)
3795    cmpPrim1(e, cb, SEQALONG.OP, cntxt))
3796
3797setInlineHandler("seq_len", function(e, cb, cntxt)
3798    cmpPrim1(e, cb, SEQLEN.OP, cntxt))
3799@
3800
3801\subsection{Inlining handlers for controlling warnings}
3802The inlining handlers in this section do not actually affect code
3803generation. Their purpose is to suppress warnings.
3804
3805Compiling calls to the [[::]] and [[:::]] functions without special
3806handling would generate undefined variable warnings for the arguments.
3807This is avoided by converting the arguments from symbols to strings,
3808which these functions would do anyway at runtime, and then compiling
3809the modified calls. The common process is handled by [[cmpMultiColon]].
3810<<[[cmpMultiColon]] function>>=
3811cmpMultiColon <- function(e, cb, cntxt) {
3812    if (! dots.or.missing(e) && length(e) == 3) {
3813	goodType <- function(a)
3814	    typeof(a) == "symbol" ||
3815	    (typeof(a) == "character" && length(a) == 1)
3816        fun <- e[[1]]
3817        x <- e[[2]]
3818	y <- e[[3]]
3819	if (goodType(x) && goodType(y)) {
3820	    args <- list(as.character(x), as.character(y))
3821            cmpCallSymFun(fun, args, e, cb, cntxt)
3822	    TRUE
3823	}
3824	else FALSE
3825    }
3826    else FALSE
3827}
3828@ %def cmpMultiColon
3829Code generators are then registered by
3830<<inlining handlers for [[::]] and [[:::]]>>=
3831setInlineHandler("::", cmpMultiColon)
3832setInlineHandler(":::", cmpMultiColon)
3833@
3834
3835Calls to with will often generate spurious undefined variable warning
3836for variables appearing in the expression argument.  A crude approach
3837is to compile the entire call with undefined variable warnings
3838suppressed.
3839<<inlining handler for [[with]]>>=
3840setInlineHandler("with", function(e, cb, cntxt) {
3841    cntxt$suppressUndefined <- TRUE
3842    cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt)
3843    TRUE
3844})
3845@
3846
3847A similar issue arises for [[require]], where an unquoted argument is
3848often used.
3849<<inlining handler for [[require]]>>=
3850setInlineHandler("require", function(e, cb, cntxt) {
3851    cntxt$suppressUndefined <- TRUE
3852    cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt)
3853    TRUE
3854})
3855@
3856
3857
3858\section{The [[switch]] function}
3859The [[switch]] function has somewhat awkward semantics that vary
3860depending on whether the value of the first argument is a character
3861string or is numeric.  For a string all or all but one of the
3862alternatives must be named, and empty case arguments are allowed and
3863result in falling through to the next non-empty case.  In the numeric
3864case selecting an empty case produces an error.  If there is more than
3865one alternative case and no cases are named then a character selector
3866argument will produce an error, so one can assume that a numeric
3867switch is intended.  But a [[switch]] with named arguments can be used
3868with a numeric selector, so it is not in general possible to determine
3869the intended type of the [[switch]] call from the structure of the
3870call alone.  The compiled code therefore has to allow for both
3871possibilities.
3872
3873The inlining handler goes through a number of steps collecting and
3874processing information computed from the call and finally emits code
3875for the non-empty alternatives.  If the [[switch]] expression appears
3876in tail position then each alternative will end in a [[RETURN]]
3877instruction.  If the call is not in tail position then each
3878alternative will end with a [[GOTO]] than jumps to a label placed
3879after the code for the final alternative.
3880<<inline handler for [[switch]]>>=
3881setInlineHandler("switch", function(e, cb, cntxt) {
3882    if (length(e) < 2 || any.dots(e))
3883        cmpSpecial(e, cb, cntxt)
3884    else {
3885        ## **** check name on EXPR, if any, partially matches EXPR?
3886        <<extract the [[switch]] expression components>>
3887
3888        <<collect information on named alternatives>>
3889
3890        <<create the labels>>
3891
3892        <<create the map from names to labels for a character switch>>
3893
3894        <<emit code for the [[EXPR]] argument>>
3895
3896        <<emit the switch instruction>>
3897
3898        <<emit error code for empty alternative in numerical switch>>
3899
3900        <<emit code for the default case>>
3901
3902        <<emit code for non-empty alternatives>>
3903
3904        if (! cntxt$tailcall)
3905            cb$putlabel(endLabel)
3906    }
3907    TRUE
3908})
3909@
3910
3911The first step in processing the [[switch]] expression is to extract
3912the selector expression [[expr]] and the case expressions, to identify
3913which, if any, of the cases are empty, and to extract the names of the
3914cases as [[nm]].  A warning is issued if there are no cases.  If there
3915is only one case and that case is not named then setting [[nm = ""]]
3916allows this situation to be processed by code used when names are
3917present.
3918<<extract the [[switch]] expression components>>=
3919expr <- e[[2]]
3920cases <-e[-c(1, 2)]
3921
3922if (is.null(cases))
3923    notifyNoSwitchcases(cntxt, loc = cb$savecurloc())
3924
3925miss <- missingArgs(cases)
3926nm <- names(cases)
3927
3928## allow for corner cases like switch(x, 1) which always
3929## returns 1 if x is a character scalar.
3930if (is.null(nm) && length(cases) == 1)
3931    nm <- ""
3932@ %def
3933
3934The next step in the case where some cases are named is to check for a
3935default expression.  If there is more than one expression then the
3936[[switch]] is compiled by [[cmpSpecial]].  This avoids having to
3937reproduce the runtime error that would be generated if the [[switch]]
3938is called with a character selector.
3939%% **** would probably be better to not punt though -- then we could
3940%% **** allow break/next to use GOTO
3941<<collect information on named alternatives>>=
3942## collect information on named alternatives and check for
3943## multiple default cases.
3944if (! is.null(nm)) {
3945    haveNames <- TRUE
3946    ndflt <- sum(nm == "")
3947    if (ndflt > 1) {
3948        notifyMultipleSwitchDefaults(ndflt, cntxt, loc = cb$savecurloc())
3949        ## **** punt back to interpreted version for now to get
3950        ## **** runtime error message for multiple defaults
3951        cmpSpecial(e, cb, cntxt)
3952        return(TRUE)
3953    }
3954    if (ndflt > 0)
3955        haveCharDflt <- TRUE
3956    else
3957        haveCharDflt <- FALSE
3958}
3959else {
3960    haveNames <- FALSE
3961    haveCharDflt <- FALSE
3962}
3963@ %def
3964
3965Next the labels are generated.  [[missLabel]] will be the label for
3966code that signals an error if a numerical selector expression chooses
3967a case with an empty argument.  The label [[dfltLabel]] will be for
3968code that invisibly procures the value [[NULL]], which is the default
3969case for a numerical selector argument and also for a character
3970selector when no unnamed default case is provided. All non-empty cases
3971are given their own labels, and [[endLabel]] is generated if it will
3972be needed as the [[GOTO]] target for a [[switch]] expression that is
3973not in tail position.
3974<<create the labels>>=
3975## create the labels
3976if (any(miss))
3977    missLabel <- cb$makelabel()
3978dfltLabel <- cb$makelabel()
3979
3980lab <- function(m)
3981    if (m) missLabel
3982    else cb$makelabel()
3983labels <- c(lapply(miss, lab), list(dfltLabel))
3984
3985if (! cntxt$tailcall)
3986    endLabel <- cb$makelabel()
3987@ %def
3988
3989When there are named cases a map from the case names to the
3990corresponding code labels is constructed next.  If no unnamed default
3991was provided one is added that uses the [[dfltLabel]].
3992<<create the map from names to labels for a character switch>>=
3993## create the map from names to labels for a character switch
3994if (haveNames) {
3995    unm <- unique(nm[nm != ""])
3996    if (haveCharDflt)
3997        unm <- c(unm, "")
3998    nlabels <- labels[unlist(lapply(unm, findActionIndex, nm, miss))]
3999    ## if there is no unnamed case to act as a default for a
4000    ## character switch then the numeric default becomes the
4001    ## character default as well.
4002    if (! haveCharDflt) {
4003        unm <- c(unm, "")
4004        nlabels <- c(nlabels, list(dfltLabel))
4005    }
4006}
4007else {
4008    unm <- NULL
4009    nlabels <- NULL
4010}
4011@ %def
4012The computation of the index of the appropriate label for a given name
4013is carried out by [[findActionIndex]].
4014%% **** rewrite this to directly return the label?
4015<<[[findActionIndex]] function>>=
4016findActionIndex <- function(name, nm, miss) {
4017    start <- match(name, nm)
4018    aidx <- c(which(! miss), length(nm) + 1)
4019    min(aidx[aidx >= start])
4020}
4021@ %def findActionIndex
4022
4023At this point we are ready to start emitting code into the code
4024buffer.  First code to compute the selector is emitted.  As with the
4025condition for an [[if]] expression a non-tail-call context is used.
4026<<emit code for the [[EXPR]] argument>>=
4027## compile the EXPR argument
4028ncntxt <- make.nonTailCallContext(cntxt)
4029cmp(expr, cb, ncntxt)
4030@ %def
4031
4032The switch instruction takes the selector off the stack and four
4033operands from the instruction stream: the call index, an index for the
4034names, or [[NULL]] if there are none, and indices for the labels for a
4035character selector and for a numeric selector.  At this point lists of
4036labels are placed in the instruction buffer.  At code extraction time
4037these will be replaced by indices for numeric offset vectors by the
4038[[patchlables]] function of the code buffer.
4039<<emit the switch instruction>>=
4040## emit the SWITCH instruction
4041cei <- cb$putconst(e)
4042if (haveNames) {
4043    cni <- cb$putconst(unm)
4044    cb$putcode(SWITCH.OP, cei, cni, nlabels, labels)
4045}
4046else {
4047    cni <- cb$putconst(NULL)
4048    cb$putcode(SWITCH.OP, cei, cni, cni, labels)
4049}
4050@ %def
4051
4052If there are empty alternatives then code to signal an error for a
4053numeric selector that chooses one of these is needed and is
4054identified by the label [[missLabel]].
4055<<emit error code for empty alternative in numerical switch>>=
4056## emit code to signal an error if a numeric switch hist an
4057## empty alternative (fall through, as for character, might
4058## make more sense but that isn't the way switch() works)
4059if (any(miss)) {
4060    cb$putlabel(missLabel)
4061    cmp(quote(stop("empty alternative in numeric switch")), cb, cntxt)
4062}
4063@ %def
4064
4065Code for the numeric default case, corresponding to [[dfltLabel]],
4066places [[NULL]] on the stack, and for a [[switch]] in tail position
4067this is followed by an [[INVISIBLE]] and a [[RETURN]] instruction.
4068<<emit code for the default case>>=
4069## emit code for the default case
4070cb$putlabel(dfltLabel)
4071cb$putcode(LDNULL.OP)
4072if (cntxt$tailcall) {
4073    cb$putcode(INVISIBLE.OP)
4074    cb$putcode(RETURN.OP)
4075}
4076else
4077    cb$putcode(GOTO.OP, endLabel)
4078@ %def
4079
4080Finally the labels and code for the non-empty alternatives are written
4081to the code buffer.  In non-tail position the code is followed by a
4082[[GOTO]] instruction that jumps to [[endLabel]].  The final case does
4083not need this [[GOTO]].
4084%% **** maybe try to drop the final GOTO
4085<<emit code for non-empty alternatives>>=
4086## emit code for the non-empty alternatives
4087for (i in seq_along(cases)) {
4088    if (! miss[i]) {
4089        cb$putlabel(labels[[i]])
4090        cmp(cases[[i]], cb, cntxt)
4091        if (! cntxt$tailcall)
4092            cb$putcode(GOTO.OP, endLabel)
4093    }
4094}
4095@ %def
4096
4097
4098\section{Assignments expressions}
4099R supports simple assignments in which the left-hand side of the
4100assignment expression is a symbol and complex assignments of the form
4101\begin{verbatim}
4102f(x) <- v
4103\end{verbatim}
4104or
4105\begin{verbatim}
4106g(f(x)) <- v
4107\end{verbatim}
4108The second form is sometimes called a nested complex assignment.
4109Ordinary assignment creates or modifies a binding in the current
4110environment.  Superassignment via the [[<<-]] operator modifies a
4111binding in a containing environment.
4112
4113Assignment expressions are compiled by [[cmpAssign]].  This function
4114checks the form of the assignment expression and, for well formed
4115expressions then uses [[cmpSymbolAssign]] for simple assignments and
4116[[cmpComplexAssign]] for complex assignments.
4117
4118For now, a temporary hack is needed to address a discrepancy between
4119byte code and AST code that can be caused by assignments in arguments
4120to primitives. The root issue is that we are not recording referenced
4121to arguments that have been evaluated. Once that is addressed we can
4122remove this hack.
4123<<temporary hack to deal with assignments in arguments issue>>=
4124## if (! cntxt$toplevel)
4125##    return(cmpSpecial(e, cb, cntxt))
4126@
4127<<[[cmpAssign]] function>>=
4128cmpAssign <- function(e, cb, cntxt) {
4129    <<temporary hack to deal with assignments in arguments issue>>
4130    if (! checkAssign(e, cntxt, loc = cb$savecurloc()))
4131        return(cmpSpecial(e, cb, cntxt))
4132    superAssign <- as.character(e[[1]]) == "<<-"
4133    lhs <- e[[2]]
4134    value <- e[[3]]
4135    symbol <- as.name(getAssignedVar(e, cntxt))
4136    if (superAssign && ! findVar(symbol, cntxt))
4137        notifyNoSuperAssignVar(symbol, cntxt, loc = cb$savecurloc())
4138    if (is.name(lhs) || is.character(lhs))
4139        cmpSymbolAssign(symbol, value, superAssign, cb, cntxt)
4140    else if (typeof(lhs) == "language")
4141        cmpComplexAssign(symbol, lhs, value, superAssign, cb, cntxt)
4142    else cmpSpecial(e, cb, cntxt) # punt for now
4143}
4144@ %def cmpAssign
4145
4146The code generators for the assignment operators [[<-]] and [[=]] and
4147the superassignment operator [[<<-]] are registered by
4148<<inlining handlers for [[<-]], [[=]], and [[<<-]]>>=
4149setInlineHandler("<-", cmpAssign)
4150setInlineHandler("=", cmpAssign)
4151setInlineHandler("<<-", cmpAssign)
4152@ %def
4153
4154The function [[checkAssign]] is used to check that an assignment
4155expression is well-formed.
4156<<[[checkAssign]] function>>=
4157checkAssign <- function(e, cntxt, loc = NULL) {
4158    if (length(e) != 3)
4159        FALSE
4160    else {
4161        place <- e[[2]]
4162        if (typeof(place) == "symbol" ||
4163            (typeof(place) == "character" && length(place) == 1))
4164            TRUE
4165        else {
4166            <<check left hand side call>>
4167        }
4168    }
4169}
4170@ %def checkAssign
4171A valid left hand side call must have a function that is either a
4172symbol or is of the form [[foo::bar]] or [[foo:::bar]], and the first
4173argument must be a symbol or another valid left hand side call.  A
4174[[while]] loop is used to unravel nested calls.
4175<<check left hand side call>>=
4176while (typeof(place) == "language") {
4177    fun <- place[[1]]
4178    if (typeof(fun) != "symbol" &&
4179        ! (typeof(fun) == "language" && length(fun) == 3 &&
4180           typeof(fun[[1]]) == "symbol" &&
4181           as.character(fun[[1]]) %in% c("::", ":::"))) {
4182        notifyBadAssignFun(fun, cntxt, loc)
4183        return(FALSE)
4184    }
4185    place = place[[2]]
4186}
4187if (typeof(place) == "symbol")
4188    TRUE
4189else FALSE
4190@ %def
4191
4192\subsection{Simple assignment expressions}
4193%% **** handle fun defs specially for message purposes??
4194Code for assignment to a symbol is generated by [[cmpSymbolAssign]].
4195<<[[cmpSymbolAssign]] function>>=
4196cmpSymbolAssign <- function(symbol, value, superAssign, cb, cntxt) {
4197    <<compile the right hand side value expression>>
4198    <<emit code for the symbol assignment instruction>>
4199    <<for tail calls return the value invisibly>>
4200    TRUE
4201}
4202@ %def cmpSymbolAssign
4203
4204A non-tail-call context is used to generate code for the right hand
4205side value expression.
4206<<compile the right hand side value expression>>=
4207ncntxt <- make.nonTailCallContext(cntxt)
4208cmp(value, cb, ncntxt)
4209@ %def
4210
4211The [[SETVAR]] and [[SETVAR2]] instructions assign the value on the
4212stack to the symbol specified by its constant pool index operand.  The
4213[[SETVAR]] instruction is used by ordinary assignment to assign in the
4214local frame, and [[SETVAR2]] for superassignments.
4215<<emit code for the symbol assignment instruction>>=
4216ci <- cb$putconst(symbol)
4217if (superAssign)
4218    cb$putcode(SETVAR2.OP, ci)
4219else
4220    cb$putcode(SETVAR.OP, ci)
4221@ %def
4222The super-assignment case does not need to check for and warn about a
4223missing binding since this is done in [[cmpAssign]].
4224
4225The [[SETVAR]] and [[SETVAR2]] instructions leave the value on the
4226stack as the value of the assignment expression; if the expression
4227appears in tail position then this value is returned with the visible
4228flag set to [[FALSE]].
4229<<for tail calls return the value invisibly>>=
4230if (cntxt$tailcall) {
4231    cb$putcode(INVISIBLE.OP)
4232    cb$putcode(RETURN.OP)
4233}
4234@ %def
4235
4236
4237\subsection{Complex assignment expressions}
4238\label{subsec:complexassign}
4239It seems somehow appropriate at this point to mention that the code in
4240[[eval.c]] implementing the interpreter semantics starts with the
4241following comment:
4242\begin{verbatim}
4243    /*
4244     *  Assignments for complex LVAL specifications. This is the stuff that
4245     *  nightmares are made of ...
4246\end{verbatim}
4247
4248There are some issues with the semantics for complex assignment as
4249implemented by the interpreter:
4250\begin{itemize}
4251\item With the current approach the following legal, though strange,
4252  code fails:
4253<<inner assignment trashes temporary>>=
4254f <-function(x, y) x
4255`f<-` <- function(x, y, value) { y; x}
4256x <- 1
4257y <- 2
4258f(x, y[] <- 1) <- 3
4259@ %def
4260  The reason is that the current left hand side object is maintained in
4261  a variable [[*tmp*]], and processing the assignment in the second
4262  argument first overwrites the value of [[*tmp*]] and then removes
4263  [[*tmp*]] before the first argument is evaluated. Using evaluated
4264  promises as arguments, as is done for the right hand side value,
4265  solves this.
4266
4267\item The current approach of using a temporary variable [[*tmp*]] to
4268  hold the evaluated LHS object requires an internal cleanup context
4269  to ensure that the variable is removed in the event of a non-local
4270  exit. Implementing this in the compiler would introduce significant
4271  overhead.
4272
4273\item The asymmetry of handling the pre-evaluated right hand side
4274  value via an evaluated promise and the pre-evaluated left hand side
4275  via a temporary variable makes the code harder to understand and the
4276  semantics harder to explain.
4277
4278\item Using promises in an expression passed to eval means promises
4279  can leak out into R via sys.call. This is something we have tried to
4280  avoid and should try to avoid so we can have the freedom to
4281  implement lazy evaluation differently if that seems useful. [It may
4282  be possible at some point to avoid allocation of promise objects in
4283  compiled code.] The compiler can avoid this by using promises only
4284  in the argument lists passed to function calls, not in the call
4285  expressions.  A similar change could be made in the interpreter but
4286  it would have a small runtime penalty for constructing an expression
4287  in addition to an argument list I would prefer to avoid that for now
4288  until the compiler has been turned on by default.
4289
4290\item The current approach of installing the intermediate RHS value as
4291  the expression for the RHS promise in nested complex assignments has
4292  several drawbacks:
4293  \begin{itemize}
4294  \item it can produce huge expressions.
4295
4296  \item the result is misleading if the intermediate RHS value is a
4297    symbol or a language object.
4298
4299  \item to maintain this in compiled code it would be necessary to
4300    construct the assignment function call expression at runtime even
4301    though it is usually not needed (or it would require significant
4302    rewriting to allow on-demand computation of the call). If *vtmp*
4303    is used as a marker for the expression and documented as not a
4304    real variable then the call can be constructed at compile time.
4305  \end{itemize}
4306
4307\item In nested complex assignments the additional arguments of the
4308  inner functions are evaluated twice. This is illustrated by running
4309  this code:
4310<<multiple evaluation of arguments in assignments>>=
4311f <- function(x, y) {y ; x }
4312`f<-` <- function(x, y, value) { y; x }
4313g <- function(x, y) {y ; x }
4314`g<-` <- function(x, y, value) { y; x }
4315x <- 1
4316y <- 2
4317f(g(x, print(y)), y) <- 3
4318@ %def
4319  This is something we have lived with, and I don't propose to change
4320  it at this time. But it would be good to be able to change it in the
4321  future.
4322\end{itemize}
4323
4324Because of these issues the compiler implements slightly different
4325semantics for complex assignment than the current intepreter.
4326\emph{Evaluation} semantics should be identical; the difference arises
4327in how intermediate values are managed and has some effect on results
4328produced by [[substitute]].  In particular, no intermediate [[*tmp*]]
4329value is used and therefore no cleanup frame is needed.  This does
4330mean that uses of the form
4331\begin{verbatim}
4332    eval(substitute(<first arg>), parent.frame())
4333\end{verbatim}
4334will no longer work.  In tests of most of CRAN and BioC this directly
4335affected only one function, [[$.proto]] in the [[proto]] package, and
4336indirectly about 30 packages using proto failed.  I looked at the
4337[[$.proto]] implementation, and it turned out that the
4338[[eval(substitute())]] approach used there could be replaced by
4339standard evaluation using lexical scope. This produces better code,
4340and the result works with both the current R interpreter and compiled
4341code (proto and all the dependent packages pass check with this
4342change).  The [[proto]] maintainer has changed [[proto]] along these
4343lines.  It would be good to soon change the interpreter to also use
4344evaluated promises in place of the [[*tmp*]] variable to bring the
4345compiled and interpreted semantics closer together.
4346
4347Complex assignment expressions are compiled by [[cmpComplexAssign]].
4348<<[[cmpComplexAssign]] function>>=
4349cmpComplexAssign <- function(symbol, lhs, value, superAssign, cb, cntxt) {
4350    <<select complex assignment instructions>>
4351    <<protect the stack during a non-top-level complex assignment>>
4352    <<compile the right hand side value expression>>
4353    <<compile the left hand side call>>
4354    <<unprotect the stack after a not-top-level complex assignment>>
4355    <<for tail calls return the value invisibly>>
4356    TRUE;
4357}
4358@ %def cmpComplexAssign
4359
4360Assignment code is bracketed by a start and an end instruction.
4361<<compile the left hand side call>>=
4362csi <- cb$putconst(symbol)
4363cb$putcode(startOP, csi)
4364
4365<<compile code to compute left hand side values>>
4366<<compile code to compute right hand side values>>
4367
4368cb$putcode(endOP, csi)
4369@ %def
4370The appropriate instructions [[startOP]] and [[endOP]] depend on
4371whether the assignment is an ordinary assignment or a superassignment.
4372<<select complex assignment instructions>>=
4373if (superAssign) {
4374    startOP <- STARTASSIGN2.OP
4375    endOP <- ENDASSIGN2.OP
4376}
4377else {
4378    if (! findVar(symbol, cntxt))
4379        notifyUndefVar(symbol, cntxt, loc = cb$savecurloc())
4380    startOP <- STARTASSIGN.OP
4381    endOP <- ENDASSIGN.OP
4382}
4383@ %def
4384An undefined variable notification is issued for ordinary assignment,
4385since this will produce a runtime error. For superassignment
4386[[cmpAssign]] has already checked for an undefined left-hand-side
4387variable and issued a notification if none was found.
4388
4389The start instructions obtain the initial value of the left-hand-side
4390variable and in the case of standard assignment assign it in the local
4391frame if it is not assigned there already. They also prepare the stack
4392for the assignment process.  The stack invariant maintained by the
4393assignment process is that the current right hand side value is on the
4394top, followed by the evaluated left hand side values, the binding
4395cell, and the original right hand side value. Thus the start
4396instruction leaves the right hand side value on the top, then the
4397value of the left hand side variable, the binding cell, and again the
4398right hand side value on the stack.
4399
4400The end instruction finds the final right hand side value followed by
4401the original right hand side value on the top of the stack.  The final
4402value is removed and assigned to the appropriate variable binding.
4403The original right hand side value is left on the top of the stack as
4404the value of the assignment expression.
4405
4406Evaluating a nested complex assignment involves evaluating a sequence
4407of expressions to obtain the left hand sides to modify, and then
4408evaluating a sequence of corresponding calls to replacement functions
4409in the opposite order. The function [[flattenPlace]] returns a list
4410of the expressions that need to be considered, with [[*tmp*]] in place
4411of the current left hand side argument. For example, for an assignment
4412of the form [[f(g(h(x, k), j), i) <- v]] this produces
4413\begin{verbatim}
4414> flattenPlace(quote(f(g(h(x, k), j), i)))$places
4415[[1]]
4416f(`*tmp*`, i)
4417
4418[[2]]
4419g(`*tmp*`, j)
4420
4421[[3]]
4422h(`*tmp*`, k)
4423\end{verbatim}
4424The sequence of left hand side values needed consists of the original
4425variable value, which is already on the stack, and the values of
4426[[h(`*tmp*`, k)]] and [[g(`*tmp*`, j)]].
4427
4428In general the additional evaluations needed are of all but the first
4429expression produced by [[flattenPlace]], evaluated in reverse
4430order. An argument context is used since there are already values on
4431the stack.
4432<<compile code to compute left hand side values>>=
4433ncntxt <- make.argContext(cntxt)
4434flat <- flattenPlace(lhs, cntxt, loc = cb$savecurloc())
4435flatOrigPlace <- flat$origplaces
4436flatPlace <- flat$places
4437flatPlaceIdxs <- seq_along(flatPlace)[-1]
4438for (i in rev(flatPlaceIdxs))
4439    cmpGetterCall(flatPlace[[i]], flatOrigPlace[[i]], cb, ncntxt)
4440@ %def
4441The compilation of the individual calls carried out by
4442[[cmpGetterCall]], which is presented in Section \ref{subsec:getter}.
4443Each compilation places the new left hand side value on the top of the
4444stack and then switches it with the value below, which is the original
4445right hand side value, to preserve the stack invariant.
4446
4447The function [[flattenPlace]] is defined as
4448<<[[flattenPlace]] function>>=
4449flattenPlace <- function(place, cntxt, loc = NULL) {
4450    places <- NULL
4451    origplaces <- NULL
4452    while (typeof(place) == "language") {
4453        if (length(place) < 2)
4454            cntxt$stop(gettext("bad assignment 1"), cntxt, loc = loc)
4455        origplaces <- c(origplaces, list(place))
4456        tplace <- place
4457        tplace[[2]] <- as.name("*tmp*")
4458        places <- c(places, list(tplace))
4459        place <- place[[2]]
4460    }
4461    if (typeof(place) != "symbol")
4462        cntxt$stop(gettext("bad assignment 2"), cntxt, loc = loc)
4463    list(places = places, origplaces = origplaces)
4464}
4465@ %def flattenPlace
4466
4467After the right hand side values have been computed the stack contains
4468the original right hand side value followed by the left hand side
4469values in the order in which they need to be modified. Code to call
4470the sequence of replacement functions is generated by
4471<<compile code to compute right hand side values>>=
4472cmpSetterCall(flatPlace[[1]], flatOrigPlace[[1]], value, cb, ncntxt)
4473for (i in flatPlaceIdxs)
4474    cmpSetterCall(flatPlace[[i]], flatOrigPlace[[i]], as.name("*vtmp*"), cb, ncntxt)
4475@ %def
4476The first call uses the expression for the original right hand side in
4477its call; all others will use [[*vtmp*]].  Each replacement function
4478call compiled by [[cmpSetterCall]] will remove the top two elements
4479from the stack and then push the new right hand side value on the
4480stack.  [[cmpSetterCall]] is described in Section \ref{subsec:setter}.
4481
4482For non-top-level complex assignments values on the stack need to be
4483protected from mutation during the assignment.
4484<<protect the stack during a non-top-level complex assignment>>=
4485if (! cntxt$toplevel) cb$putcode(INCLNKSTK.OP)
4486@
4487<<unprotect the stack after a not-top-level complex assignment>>=
4488if (! cntxt$toplevel) cb$putcode(DECLNKSTK.OP)
4489@
4490
4491
4492\subsection{Compiling setter calls}
4493\label{subsec:setter}
4494Setter calls, or calls to replacement functions, in compiled
4495assignment expressions find stack that contains the current right hand
4496side value on the top followed by the current left hand side value.
4497Some replacement function calls, such as calls to [[$<-]], are handled
4498by an inlining mechanism described below.  The general case when the
4499function is specified by a symbol is handled a [[GETFUN]] instruction
4500to push the function on the stack, pushing any additional arguments on
4501the stack, and using the [[SETTER_CALL]] instruction to execute the
4502call.  This instruction adjusts the argument list by inserting as the
4503first argument an evaluated promise for the left hand side value and
4504as the last argument an evaluated promise for the right hand side
4505value; the final argument also has the [[value]] tag. The case where
4506the function is specified in the form [[foo::bar]] or [[foo:::bar]]
4507differs only compiling the function expression and using [[CHECKFUN]]
4508to verify the result and prepare the stack.
4509<<[[cmpSetterCall]] function>>=
4510cmpSetterCall <- function(place, origplace, vexpr, cb, cntxt) {
4511    afun <- getAssignFun(place[[1]])
4512    acall <- as.call(c(afun, as.list(place[-1]), list(value = vexpr)))
4513    acall[[2]] <- as.name("*tmp*")
4514    ncntxt <- make.callContext(cntxt, acall)
4515    sloc <- cb$savecurloc()
4516    cexpr <- as.call(c(afun, as.list(origplace[-1]), list(value = vexpr)))
4517    cb$setcurexpr(cexpr)
4518    if (is.null(afun))
4519        ## **** warn instead and arrange for cmpSpecial?
4520        ## **** or generate code to signal runtime error?
4521        cntxt$stop(gettext("invalid function in complex assignment"),
4522                   loc = cb$savecurloc())
4523    else if (typeof(afun) == "symbol") {
4524        if (! trySetterInline(afun, place, origplace, acall, cb, ncntxt)) {
4525            ci <- cb$putconst(afun)
4526            cb$putcode(GETFUN.OP, ci)
4527            <<compile additional arguments and call to setter function>>
4528        }
4529    }
4530    else {
4531        cmp(afun, cb, ncntxt)
4532        cb$putcode(CHECKFUN.OP)
4533        <<compile additional arguments and call to setter function>>
4534    }
4535    cb$restorecurloc(sloc)
4536}
4537@ %def cmpSetterCall
4538The common code for compiling additional arguments and issuing the
4539[[SETTER_CALL]] instruction is given by
4540<<compile additional arguments and call to setter function>>=
4541cb$putcode(PUSHNULLARG.OP)
4542cmpCallArgs(place[-c(1, 2)], cb, ncntxt)
4543cci <- cb$putconst(acall)
4544cvi <- cb$putconst(vexpr)
4545cb$putcode(SETTER_CALL.OP, cci, cvi)
4546@ %def
4547The [[PUSHNULL]] instruction places [[NULL]] in the argument list as a
4548first argument to serve as a place holder; [[SETTER_CALL]] replaces
4549this with the evaluated promise for the current left hand side value.
4550
4551The replacement function corresponding to [[fun]] is computed by
4552[[getAssignFun]].  If [[fun]] is a symbol then the assignment function
4553is the symbol followed by [[<-]].  The function [[fun]] can also be an
4554expression of the form [[foo::bar]], in which case the replacement
4555function is the expression [[foo::`bar<-`]].  [[NULL]] is returned if
4556[[fun]] does not fit into one of these two cases.
4557<<[[getAssignFun]] function>>=
4558getAssignFun <- function(fun) {
4559    if (typeof(fun) == "symbol")
4560        as.name(paste0(fun, "<-"))
4561    else {
4562        ## check for and handle foo::bar(x) <- y assignments here
4563        if (typeof(fun) == "language" && length(fun) == 3 &&
4564            (as.character(fun[[1]]) %in% c("::", ":::")) &&
4565            typeof(fun[[2]]) == "symbol" && typeof(fun[[3]]) == "symbol") {
4566            afun <- fun
4567            afun[[3]] <- as.name(paste0(fun[[3]],"<-"))
4568            afun
4569        }
4570        else NULL
4571    }
4572}
4573@ %def getAssignFun
4574
4575To produce more efficient code some replacement function calls can be
4576inlined and use specialized instructions.  The most important of these
4577are [[$<-]], [[[<-]], and [[[[<-]].  An inlining mechanism similar to
4578the one described in Section \ref{sec:inlining} is used for this
4579purpose.  A separate mechanism is needed because of the fact that in
4580the present context two arguments, the left hand side and right hand
4581side values, are already on the stack.
4582<<setter inlining mechanism>>=
4583setterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv())
4584
4585setSetterInlineHandler <- function(name, h, package = "base") {
4586    if (exists(name, setterInlineHandlers, inherits = FALSE)) {
4587        entry <- get(name, setterInlineHandlers)
4588        if (entry$package != package) {
4589            fmt <- "handler for '%s' is already defined for another package"
4590            stop(gettextf(fmt, name), domain = NA)
4591        }
4592    }
4593    entry <- list(handler = h, package = package)
4594    assign(name, entry, setterInlineHandlers)
4595}
4596
4597getSetterInlineHandler <- function(name, package = "base") {
4598    if (exists(name, setterInlineHandlers, inherits = FALSE)) {
4599        hinfo <- get(name, setterInlineHandlers)
4600        if (hinfo$package == package)
4601            hinfo$handler
4602        else NULL
4603    }
4604    else NULL
4605}
4606
4607trySetterInline <- function(afun, place, origplace, call, cb, cntxt) {
4608    name <- as.character(afun)
4609    info <- getInlineInfo(name, cntxt)
4610    if (is.null(info))
4611        FALSE
4612    else {
4613        h <- getSetterInlineHandler(name, info$package)
4614        if (! is.null(h))
4615            h(afun, place, origplace, call, cb, cntxt)
4616        else FALSE
4617    }
4618}
4619@ %def
4620
4621The inline handler for [[$<-]] replacement calls uses the
4622[[DOLLARGETS]] instruction.  The handler declines to handle cases that
4623would produce runtime errors; these are compiled by the generic
4624mechanism.
4625%% **** might be useful to signal a warning at compile time
4626<<setter inline handler for [[$<-]]>>=
4627setSetterInlineHandler("$<-", function(afun, place, origplace, call, cb, cntxt) {
4628    if (any.dots(place) || length(place) != 3)
4629        FALSE
4630    else {
4631        sym <- place[[3]]
4632        if (is.character(sym))
4633            sym <- as.name(sym)
4634        if (is.name(sym)) {
4635            ci <- cb$putconst(call)
4636            csi <- cb$putconst(sym)
4637            cb$putcode(DOLLARGETS.OP, ci, csi)
4638            TRUE
4639        }
4640        else FALSE
4641    }
4642})
4643@ %def
4644
4645The replacement functions [[[<-]] and [[[[<-]]] are implemented as
4646[[SPECIAL]] functions that do internal dispatching.  They are
4647therefore compiled along the same lines as their corresponding
4648accessor functions as described in Section \ref{subsec:subset}.  The
4649common pattern is implemented by [[cmpSetterDispatch]].
4650<<[[cmpSetterDispatch]] function>>=
4651cmpSetterDispatch <- function(start.op, dflt.op, afun, place, call, cb, cntxt) {
4652    if (any.dots(place))
4653        FALSE ## punt
4654    else {
4655        ci <- cb$putconst(call)
4656        end.label <- cb$makelabel()
4657        cb$putcode(start.op, ci, end.label)
4658        if (length(place) > 2) {
4659            args <- place[-(1:2)]
4660            cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE)
4661        }
4662        cb$putcode(dflt.op)
4663        cb$putlabel(end.label)
4664        TRUE
4665    }
4666}
4667@ %def cmpSetterDispatch
4668The two inlining handlers are then defined as
4669<<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>>=
4670# **** this is now handled differently; see "Improved subset ..."
4671# setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt)
4672#     cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP,
4673#                       afun, place, call, cb, cntxt))
4674
4675# setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt)
4676#     cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP,
4677#                       afun, place, call, cb, cntxt))
4678@ %def
4679
4680An inline handler is defined for [[@<-]] in order to suppress spurious
4681warnings about the slot name symbol. A call in which the slot is
4682specified by a symbol is converted to one using a string instead, and
4683is then compiled by a recursive call to [[cmpSetterCall]]; the handler
4684will decline in this second call and the default compilation strategy
4685will be used.
4686<<setter inlining handler for [[@<-]]>>=
4687setSetterInlineHandler("@<-", function(afun, place, origplace, acall, cb, cntxt) {
4688    if (! dots.or.missing(place) && length(place) == 3 &&
4689        typeof(place[[3]]) == "symbol") {
4690        place[[3]] <- as.character(place[[3]])
4691        vexpr <- acall[[length(acall)]]
4692	cmpSetterCall(place, origplace, vexpr, cb, cntxt)
4693        TRUE
4694    }
4695    else FALSE
4696})
4697@
4698
4699
4700\subsection{Compiling getter calls}
4701\label{subsec:getter}
4702Getter calls within an assignment also need special handling because
4703of the left hand side argument being on the stack already and because
4704of the need to restore the stack invariant. There are again two cases
4705for installing the getter function on the stack.  These are then
4706followed by common code for handling the additional arguments and the
4707call.
4708<<[[cmpGetterCall]] function>>=
4709cmpGetterCall <- function(place, origplace, cb, cntxt) {
4710    ncntxt <- make.callContext(cntxt, place)
4711    sloc <- cb$savecurloc()
4712    cb$setcurexpr(origplace)
4713    fun <- place[[1]]
4714    if (typeof(fun) == "symbol") {
4715        if (! tryGetterInline(place, cb, ncntxt)) {
4716            ci <- cb$putconst(fun)
4717            cb$putcode(GETFUN.OP, ci)
4718	    <<compile additional arguments and call to getter function>>
4719        }
4720    }
4721    else {
4722        cmp(fun, cb, ncntxt)
4723        cb$putcode(CHECKFUN.OP)
4724	<<compile additional arguments and call to getter function>>
4725    }
4726    cb$restorecurloc(sloc)
4727}
4728@ %def cmpGetterCall
4729In the common code, as in setter calls a [[NULL]] is placed on the
4730argument stack as a place holder for the left hand side promise.  Then
4731the additional arguments are placed on the stack and the
4732[[GETTER-CALL]] instruction is issued.  This instruction installs the
4733evaluated promise with the left hand side value as the first argument
4734and executes the call.  The call will leave the next right left hand
4735side on the top of the stack.  A [[SWAP]] instruction then switches
4736the top two stack entries.  This leaves the original right hand side
4737value on top followed by the new left hand side value returned by the
4738getter call and any other left hand side values produced by earlier
4739getter call.
4740<<compile additional arguments and call to getter function>>=
4741cb$putcode(PUSHNULLARG.OP)
4742cmpCallArgs(place[-c(1, 2)], cb, ncntxt)
4743cci <- cb$putconst(place)
4744cb$putcode(GETTER_CALL.OP, cci)
4745cb$putcode(SWAP.OP)
4746@ %def
4747
4748Again an inlining mechanism is needed to handle calls to functions like
4749[[$]] and [[[]].  These are able to use the same instructions as the
4750inline handlers in Section \ref{subsec:subset} for ordinary calls to
4751[[$]] and [[[]] but require some additional work to deal with
4752maintaining the stack invariant.
4753
4754The inlining mechanism itself is analogous to the general one and the
4755one for inlining setter calls.
4756<<getter inlining mechanism>>=
4757getterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv())
4758
4759setGetterInlineHandler <- function(name, h, package = "base") {
4760    if (exists(name, getterInlineHandlers, inherits = FALSE)) {
4761        entry <- get(name, getterInlineHandlers)
4762        if (entry$package != package) {
4763            fmt <- "handler for '%s' is already defined for another package"
4764            stop(gettextf(fmt, name), domain = NA)
4765        }
4766    }
4767    entry <- list(handler = h, package = package)
4768    assign(name, entry, getterInlineHandlers)
4769}
4770
4771getGetterInlineHandler <- function(name, package = "base") {
4772    if (exists(name, getterInlineHandlers, inherits = FALSE)) {
4773        hinfo <- get(name, getterInlineHandlers)
4774        if (hinfo$package == package)
4775            hinfo$handler
4776        else NULL
4777    }
4778    else NULL
4779}
4780
4781tryGetterInline <- function(call, cb, cntxt) {
4782    name <- as.character(call[[1]])
4783    info <- getInlineInfo(name, cntxt)
4784    if (is.null(info))
4785        FALSE
4786    else {
4787        h <- getGetterInlineHandler(name, info$package)
4788        if (! is.null(h))
4789            h(call, cb, cntxt)
4790        else FALSE
4791    }
4792}
4793@ %def
4794
4795The inline handler for [[$]] in a getter context uses the [[DUP2ND]]
4796instruction to push the second value on the stack, the previous left
4797hand side value, onto the stack.  The [[DOLLAR]] instruction pops this
4798value, computes the component for this value and the symbol in the
4799constant pool, and pushes the result on the stack.  A [[SWAP]]
4800instruction then interchanges this value with the next value, which is
4801the original right hand side value, thus restoring the stack
4802invariant.
4803<<getter inline handler for [[$]]>>=
4804setGetterInlineHandler("$", function(call, cb, cntxt) {
4805    if (any.dots(call) || length(call) != 3)
4806        FALSE
4807    else {
4808        sym <- call[[3]]
4809        if (is.character(sym))
4810            sym <- as.name(sym)
4811        if (is.name(sym)) {
4812            ci <- cb$putconst(call)
4813            csi <- cb$putconst(sym)
4814            cb$putcode(DUP2ND.OP)
4815            cb$putcode(DOLLAR.OP, ci, csi)
4816            cb$putcode(SWAP.OP)
4817            TRUE
4818        }
4819        else FALSE
4820    }
4821})
4822@ %def
4823
4824Calls to [[[]] and [[[[]] again need two instructions to support the
4825internal dispatch.  The general pattern is implemented in
4826[[cmpGetterDispatch]].  A [[DUP2ND]] instruction is used to place the
4827first argument for the call on top of the stack, code analogous to the
4828code for ordinary calls to [[[]] and [[[[]] is used to make the call,
4829and this is followed by a [[SWAP]] instruction to rearrange the stack.
4830<<[[cmpGetterDispatch]] function>>=
4831cmpGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) {
4832    if (any.dots(call))
4833        FALSE ## punt
4834    else {
4835        ci <- cb$putconst(call)
4836        end.label <- cb$makelabel()
4837        cb$putcode(DUP2ND.OP)
4838        cb$putcode(start.op, ci, end.label)
4839        if (length(call) > 2) {
4840            args <- call[-(1:2)]
4841            cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE)
4842        }
4843        cb$putcode(dflt.op)
4844        cb$putlabel(end.label)
4845        cb$putcode(SWAP.OP)
4846        TRUE
4847    }
4848}
4849@ %def cmpGetterDispatch
4850The two inline handlers are then defined as
4851<<getter inline handlers for [[[]] and [[[[]]>>=
4852# **** this is now handled differently; see "Improved subset ..."
4853# setGetterInlineHandler("[", function(call, cb, cntxt)
4854#     cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt))
4855
4856# setGetterInlineHandler("[[", function(call, cb, cntxt)
4857#     cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt))
4858@ %def
4859
4860
4861\section{Constant folding}
4862A very valuable compiler optimization is constant folding. For
4863example, an expression for computing a normal density function may
4864include the code
4865\begin{verbatim}
48661 / sqrt(2 * pi)
4867\end{verbatim}
4868The interpreter would have to evaluate this expression each time it is
4869needed, but a compiler can often compute the value once at compile
4870time.
4871
4872The constant folding optimization can be applied at various points in
4873the compilation process: It can be applied to the source code before
4874code generation or to the generated code in a separate optimization
4875phase.  For now, constant folding is applied during the code
4876generation phase.
4877
4878The [[constantFold]] function examines its expression argument and
4879handles each expression type by calling an appropriate function.
4880<<[[constantFold]] function>>=
4881## **** rewrite using switch??
4882constantFold <- function(e, cntxt, loc = NULL) {
4883    type = typeof(e)
4884    if (type == "language")
4885        constantFoldCall(e, cntxt)
4886    else if (type == "symbol")
4887        constantFoldSym(e, cntxt)
4888    else if (type == "promise")
4889        cntxt$stop(gettext("cannot constant fold literal promises"),
4890                   cntxt, loc = loc)
4891    else if (type == "bytecode")
4892        cntxt$stop(gettext("cannot constant fold literal bytecode objects"),
4893                   cntxt, loc = loc)
4894    else checkConst(e)
4895}
4896@ %def constantFold
4897%% **** warn and return NULL instead of calling stop??
4898
4899The [[checkConst]] function decides whether a value is a constant that
4900is small enough and simple enough to enter into the constant pool.  If
4901so, then [[checkConst]] wraps the value in a list as the [[value]]
4902component.  If not, then [[NULL]] is returned.
4903<<[[checkConst]] function>>=
4904checkConst <- function(e) {
4905    if (mode(e) %in% constModes && length(e) <= maxConstSize)
4906        list(value = e)
4907    else
4908        NULL
4909}
4910@ %def checkConst
4911The maximal size and acceptable modes are defined by
4912<<[[maxConstSize]] and [[constModes]] definitions>>=
4913maxConstSize <- 10
4914
4915constModes <- c("numeric", "logical", "NULL", "complex", "character")
4916@ %def maxConstSize constModes
4917
4918For now, constant folding is only applied for a particular set of
4919variables and functions defined in the base package.  The constant
4920folding code uses [[isBaseVar]] to determine whether a variable can be
4921assumed to reference the corresponding base variable given the
4922current compilation environment and optimization setting.
4923[[constantFoldSym]] is applied to base variables in the [[constNames]]
4924list.
4925<<[[constantFoldSym]] function>>=
4926## Assumes all constants will be defined in base.
4927## Eventually allow other packages to define constants.
4928## Any variable with locked binding could be used if type is right.
4929## Allow local declaration of optimize, notinline declaration.
4930constantFoldSym <- function(var, cntxt) {
4931    var <- as.character(var)
4932    if (var %in% constNames && isBaseVar(var, cntxt))
4933        checkConst(get(var, .BaseNamespaceEnv))
4934    else NULL
4935}
4936@ %def constantFoldSym
4937<<[[constNames]] definition>>=
4938constNames <- c("pi", "T", "F")
4939@ %def constNames
4940
4941Call expressions are handled by determining whether the function
4942called is eligible for constant folding, attempting to constant fold
4943the arguments, and calling the folding function.  The result is the
4944passed to [[checkConst]].  If an error or a warning occurs in the call to the
4945folding function then [[constantFoldCall]] returns [[NULL]].
4946<<[[constantFoldCall]] function>>=
4947constantFoldCall <- function(e, cntxt) {
4948    fun <- e[[1]]
4949    if (typeof(fun) == "symbol") {
4950        ffun <- getFoldFun(fun, cntxt)
4951        if (! is.null(ffun)) {
4952            args <- as.list(e[-1])
4953            for (i in seq_along(args)) {
4954                a <- args[[i]]
4955                if (missing(a))
4956                    return(NULL)
4957                val <- constantFold(a, cntxt)
4958                if (! is.null(val))
4959                    args[i] <- list(val$value) ## **** in case value is NULL
4960                else return(NULL)
4961            }
4962            modes <- unlist(lapply(args, mode))
4963            if (all(modes %in% constModes)) {
4964                tryCatch(checkConst(do.call(ffun, args)),
4965                         error = function(e) NULL, warning = function(w) NULL)
4966                ## **** issue warning??
4967            }
4968            else NULL
4969        }
4970        else NULL
4971    }
4972    else NULL
4973}
4974@ %def constantFoldCall
4975%% **** separate out and explain argument processing chunk (maybe also the call)
4976
4977The functions in the base package eligible for constant folding are
4978<<[[foldFuns]] definition>>=
4979foldFuns <- c("+", "-", "*", "/", "^", "(",
4980              ">", ">=", "==", "!=", "<", "<=", "||", "&&", "!",
4981              "|", "&", "%%",
4982              "c", "rep", ":",
4983              "abs", "acos", "acosh", "asin", "asinh", "atan", "atan2",
4984              "atanh", "ceiling", "choose", "cos", "cosh", "exp", "expm1",
4985              "floor", "gamma", "lbeta", "lchoose", "lgamma", "log", "log10",
4986              "log1p", "log2", "max", "min", "prod", "range", "round",
4987              "seq_along", "seq.int", "seq_len", "sign", "signif",
4988              "sin", "sinh", "sqrt", "sum", "tan", "tanh", "trunc",
4989              "baseenv", "emptyenv", "globalenv",
4990              "Arg", "Conj", "Im", "Mod", "Re",
4991              "is.R")
4992@ %def foldFuns
4993[[getFoldFun]] checks the called function against this list and
4994whether the binding for the variable can be assumed to be from the
4995base package. If then returns the appropriate function from the base
4996package or [[NULL]].
4997<<[[getFoldFun]] function>>=
4998## For now assume all foldable functions are in base
4999getFoldFun <- function(var, cntxt) {
5000    var <- as.character(var)
5001    if (var %in% foldFuns && isBaseVar(var, cntxt)) {
5002        val <- get(var, .BaseNamespaceEnv)
5003        if (is.function(val))
5004            val
5005        else
5006            NULL
5007    }
5008    else NULL
5009}
5010@ %def getFoldFun
5011
5012
5013\section{More top level functions}
5014\subsection{Compiling closures}
5015The function [[cmpfun]] is for compiling a closure.  The body is
5016compiled with [[genCode]] and combined with the closure's formals and
5017environment to form a compiled closure.  The [[.Internal]] function
5018[[bcClose]] does this. Some additional fiddling is needed if the
5019closure is an S4 generic.  The need for the [[asS4]] bit seems a bit
5020odd but it is apparently needed at this point.
5021<<[[cmpfun]] function>>=
5022cmpfun <- function(f, options = NULL) {
5023    type <- typeof(f)
5024    if (type == "closure") {
5025        cntxt <- make.toplevelContext(makeCenv(environment(f)), options)
5026        ncntxt <- make.functionContext(cntxt, formals(f), body(f))
5027        if (mayCallBrowser(body(f), ncntxt))
5028            return(f)
5029        if (typeof(body(f)) != "language" || body(f)[1] != "{")
5030            loc <- list(expr = body(f), srcref = getExprSrcref(f))
5031        else
5032            loc <- NULL
5033        b <- genCode(body(f), ncntxt, loc = loc)
5034        val <- .Internal(bcClose(formals(f), b, environment(f)))
5035        attrs <- attributes(f)
5036        if (! is.null(attrs))
5037            attributes(val) <- attrs
5038        if (isS4(f)) ## **** should this really be needed??
5039            val <- asS4(val)
5040        val
5041    }
5042    else if (type == "builtin" || type == "special")
5043        f
5044    else stop("cannot compile a non-function")
5045}
5046@ %def cmpfun
5047
5048For use in compiling packages and in JIT compilation it is useful to
5049have a variant that returns the uncompiled function if there is an
5050error during compilation.
5051<<[[tryCmpfun]] function>>=
5052tryCmpfun <- function(f)
5053    tryCatch(cmpfun(f), error = function(e) {
5054        notifyCompilerError(paste(e$message, "at", deparse(e$call)))
5055        f
5056    })
5057@ %def tryCmpfun
5058
5059A similar utility for expressions for use in JIT compilation of loops:
5060<<[[tryCompile]] function>>=
5061tryCompile <- function(e, ...)
5062    tryCatch(compile(e, ...), error = function(err) {
5063        notifyCompilerError(paste(err$message, "at", deparse(err$call)))
5064        e
5065    })
5066@ %def tryCompile
5067
5068If a function contains a call to [[browser]], it should not be compiled,
5069because the byte-code interpreter does not support command-by-command
5070execution ("n"). This function explores the AST of a closure to find out if
5071it may contain a call to [[browser]]:
5072
5073<<[[mayCallBrowser]] function>>=
5074mayCallBrowser <- function(e, cntxt) {
5075    if (typeof(e) == "language") {
5076        fun <- e[[1]]
5077        if (typeof(fun) == "symbol") {
5078            fname <- as.character(fun)
5079            if (fname == "browser") ## not checking isBaseVar to err on the
5080                                    ## positive
5081                TRUE
5082            else if (fname == "function" && isBaseVar(fname, cntxt))
5083                FALSE
5084            else
5085                mayCallBrowserList(e[-1], cntxt)
5086        }
5087        else
5088            mayCallBrowserList(e, cntxt)
5089    }
5090    else FALSE
5091}
5092@ %def mayCallBrowser
5093
5094A version that operates on a list of expressions is
5095<<[[mayCallBrowserList]] function>>=
5096mayCallBrowserList <- function(elist, cntxt) {
5097    for (a in as.list(elist))
5098        if (! missing(a) && mayCallBrowser(a, cntxt))
5099            return(TRUE)
5100    FALSE
5101}
5102@ %def mayCallBrowserList
5103
5104
5105\subsection{Compiling and loading files}
5106A file can be compiled with [[cmpfile]] and loaded with [[loadcmp]].
5107[[cmpfile]] reads in the expressions, compiles them, and serializes
5108the list of compiled expressions by calling the [[.Internal]] function
5109[[save.to.file]].
5110<<[[cmpfile]] function>>=
5111cmpfile <- function(infile, outfile, ascii = FALSE, env = .GlobalEnv,
5112                    verbose = FALSE, options = NULL, version = NULL) {
5113    if (! is.environment(env) || ! identical(env, topenv(env)))
5114        stop("'env' must be a top level environment")
5115    <<create [[outfile]] if argument is missing>>
5116    <<check that [[infile]] and [[outfile]] are not the same>>
5117    forms <- parse(infile)
5118    nforms <- length(forms)
5119    srefs <- attr(forms, "srcref")
5120    if (nforms > 0) {
5121        expr.needed <- 1000
5122        expr.old <- getOption("expressions")
5123        if (expr.old < expr.needed) {
5124            options(expressions = expr.needed)
5125            on.exit(options(expressions = expr.old))
5126        }
5127        cforms <- vector("list", nforms)
5128        cenv <- makeCenv(env)
5129        cntxt <- make.toplevelContext(cenv, options)
5130        cntxt$env <- addCenvVars(cenv, findLocalsList(forms, cntxt))
5131        for (i in 1:nforms) {
5132            e <- forms[[i]]
5133            sref <- srefs[[i]]
5134            if (verbose) {
5135                if (typeof(e) == "language" && e[[1]] == "<-" &&
5136                    typeof(e[[3]]) == "language" && e[[3]][[1]] == "function")
5137                    cat(paste0("compiling function \"", e[[2]], "\"\n"))
5138                else
5139                    cat(paste("compiling expression", deparse(e, 20)[1],
5140                              "...\n"))
5141            }
5142            if (!mayCallBrowser(e, cntxt))
5143                cforms[[i]] <- genCode(e, cntxt,
5144                                       loc = list(expr = e, srcref = sref))
5145        }
5146        cat(gettextf("saving to file \"%s\" ... ", outfile))
5147        .Internal(save.to.file(cforms, outfile, ascii, version))
5148        cat(gettext("done"), "\n", sep = "")
5149    }
5150    else warning("empty input file; no output written");
5151    invisible(NULL)
5152}
5153@ %def cmpfile
5154
5155The default output file name is the base name of the input file with a
5156[[.Rc]] extension.
5157<<create [[outfile]] if argument is missing>>=
5158if (missing(outfile)) {
5159    basename <- sub("\\.[a-zA-Z0-9]$", "", infile)
5160    outfile <- paste0(basename, ".Rc")
5161}
5162@ %def
5163As a precaution it is useful to check that [[infile]] and [[outfile]]
5164are not the same and signal an error if they are.
5165<<check that [[infile]] and [[outfile]] are not the same>>=
5166if (infile == outfile)
5167    stop("input and output file names are the same")
5168@ %def
5169
5170The [[loadcmp]] reads in the serialized list of expressions
5171using the [[.Internal]] function [[load.from.file]].  The compiled
5172expressions are then evaluated in the global environment.
5173<<[[loadcmp]] function>>=
5174loadcmp <- function (file, envir = .GlobalEnv, chdir = FALSE) {
5175    if (!(is.character(file) && file.exists(file)))
5176        stop(gettextf("file '%s' does not exist", file), domain = NA)
5177    exprs <- .Internal(load.from.file(file))
5178    if (length(exprs) == 0)
5179        return(invisible())
5180    if (chdir && (path <- dirname(file)) != ".") {
5181        owd <- getwd()
5182        on.exit(setwd(owd), add = TRUE)
5183        setwd(path)
5184    }
5185    for (i in exprs) {
5186        eval(i, envir)
5187    }
5188    invisible()
5189}
5190@ %def loadcmp
5191[[loadcmp]] is the analog to [[source]] for compiled files.
5192
5193Two additional functions that are currently not exported or used are
5194[[cmpframe]] and [[cmplib]].  They should probably be removed.
5195<<[[cmpframe]] function>>=
5196cmpframe <- function(inpos, file) {
5197    expr.needed <- 1000
5198    expr.old <- getOption("expressions")
5199    if (expr.old < expr.needed)
5200       options(expressions = expr.needed)
5201    on.exit(options(expressions = expr.old))
5202
5203    attach(NULL, name="<compiled>")
5204    inpos <- inpos + 1
5205    outpos <- 2
5206    on.exit(detach(pos=outpos), add=TRUE)
5207
5208    for (f in ls(pos = inpos, all.names = TRUE)) {
5209        def <- get(f, pos = inpos)
5210        if (typeof(def) == "closure") {
5211                cat(gettextf("compiling '%s'", f), "\n", sep = "")
5212                fc <- cmpfun(def)
5213                assign(f, fc, pos=outpos)
5214        }
5215    }
5216    cat(gettextf("saving to file \"%s\" ... ", file))
5217    save(list = ls(pos = outpos, all.names = TRUE), file = file)
5218    cat(gettext("done"), "\n", sep = "")
5219}
5220@ %def cmpframe
5221
5222<<[[cmplib]] function>>=
5223cmplib <- function(package, file) {
5224    package <- as.character(substitute(package))
5225    pkgname <- paste("package", package, sep = ":")
5226    pos <- match(pkgname, search());
5227    if (missing(file))
5228        file <- paste0(package,".Rc")
5229    if (is.na(pos)) {
5230        library(package, character.only = TRUE)
5231        pos <- match(pkgname, search());
5232        on.exit(detach(pos=match(pkgname, search())))
5233    }
5234    cmpframe(pos, file)
5235}
5236@ %def cmplib
5237
5238
5239\subsection{Enabling implicit compilation}
5240<<[[enableJIT]] function>>=
5241enableJIT <- function(level)
5242    .Internal(enableJIT(level))
5243@ %def enableJIT
5244<<[[compilePKGS]] function>>=
5245compilePKGS <- function(enable)
5246    .Internal(compilePKGS(enable))
5247@ %def compilePKGS
5248
5249
5250\subsection{Setting compiler options}
5251The [[setCompilerOptions]] function provides a means for users to
5252adjust the default compiler option values.  This interface is
5253experimental and may change.
5254<<[[setCompilerOptions]] function>>=
5255setCompilerOptions <- function(...) {
5256    options <- list(...)
5257    nm <- names(options)
5258    for (n in nm)
5259        if (! exists(n, compilerOptions))
5260            stop(gettextf("'%s' is not a valid compiler option", n),
5261                 domain = NA)
5262    old <- list()
5263    newOptions <- as.list(compilerOptions) # copy options
5264    for (n in nm) {
5265        op <- options[[n]]
5266        switch(n,
5267               optimize = {
5268                   op <- as.integer(op)
5269                   if (length(op) == 1 && 0 <= op && op <= 3) {
5270                       old <- c(old, list(optimize =
5271                                          compilerOptions$optimize))
5272                       newOptions$optimize <- op
5273                   }
5274               },
5275               suppressAll = {
5276                   if (identical(op, TRUE) || identical(op, FALSE)) {
5277                       old <- c(old, list(suppressAll =
5278                                          compilerOptions$suppressAll))
5279                       newOptions$suppressAll <- op
5280                   }
5281               },
5282               suppressNoSuperAssignVar = {
5283                   if (isTRUE(op) || isFALSE(op)) {
5284                       old <- c(old, list(
5285                           suppressNoSuperAssignVar =
5286                               compilerOptions$suppressNoSuperAssignVar))
5287                       newOptions$suppressNoSuperAssignVar <- op
5288                   }
5289               },
5290               suppressUndefined = {
5291                   if (identical(op, TRUE) || identical(op, FALSE) ||
5292                       is.character(op)) {
5293                       old <- c(old, list(suppressUndefined =
5294                                          compilerOptions$suppressUndefined))
5295                       newOptions$suppressUndefined <- op
5296                   }
5297               })
5298    }
5299    jitEnabled <- enableJIT(-1)
5300    if (checkCompilerOptions(jitEnabled, newOptions))
5301        for(n in names(newOptions)) # commit the new options
5302            assign(n, newOptions[[n]], compilerOptions)
5303    invisible(old)
5304}
5305@ %def
5306
5307For now, a [[.onLoad]] function is used to allow all warning to be
5308suppressed.  This is probably useful for building packages, since the
5309way lazy loading is done means variables defined in shared libraries
5310are not available and produce a raft of warnings.  The [[.onLoad]]
5311function also allows undefined variables to be suppressed and the
5312optimization level to be specified using environment variables.
5313<<[[.onLoad]] function>>=
5314.onLoad <- function(libname, pkgname) {
5315    envAsLogical <- function(varName) {
5316        value = Sys.getenv(varName)
5317        if (value == "")
5318            NA
5319        else
5320            switch(value,
5321                "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE,
5322                "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE,
5323                stop(gettextf("invalid environment variable value: %s==%s",
5324                    varName, value)))
5325    }
5326    val <- envAsLogical("R_COMPILER_SUPPRESS_ALL")
5327    if (!is.na(val))
5328        setCompilerOptions(suppressAll = val)
5329    val <- envAsLogical("R_COMPILER_SUPPRESS_UNDEFINED")
5330    if (!is.na(val))
5331        setCompilerOptions(suppressUndefined = val)
5332    val <- envAsLogical("R_COMPILER_SUPPRESS_NO_SUPER_ASSIGN_VAR")
5333    if (!is.na(val))
5334        setCompilerOptions(suppressNoSuperAssignVar = val)
5335    if (Sys.getenv("R_COMPILER_OPTIMIZE") != "")
5336        tryCatch({
5337            lev <- as.integer(Sys.getenv("R_COMPILER_OPTIMIZE"))
5338            if (0 <= lev && lev <= 3)
5339                setCompilerOptions(optimize = lev)
5340        }, error = function(e) e, warning = function(w) w)
5341}
5342@ %def .onLoad
5343
5344When [[enableJIT]] is set to 3, loops should be compiled before executing.
5345However, if the [[optimize]] option is set to 0 or 1, a compiled loop will
5346call to the same primitive function as is used by the AST interpretter (e.g.
5347[[do_for]]), and the compilation would run into infinite recursion.
5348[[checkCompilerOptions]] will detect invalid combinations of [[enableJIT]]
5349and [[optimize]] and report a warning.
5350%% **** could also change the interface and atomically set enableJIT and optimize
5351<<[[checkCompilerOptions]] function>>=
5352checkCompilerOptions <- function(jitEnabled, options = NULL) {
5353    optimize <- getCompilerOption("optimize", options)
5354    if (jitEnabled <= 2 || optimize >= 2)
5355        TRUE
5356    else {
5357        stop(gettextf(
5358            "invalid compiler options: optimize(==%d)<2 and jitEnabled(==%d)>2",
5359            optimize, jitEnabled))
5360        FALSE
5361    }
5362}
5363@ %def checkCompilerOptions
5364
5365
5366\subsection{Disassembler}
5367A minimal disassembler is provided by [[disassemble]]. This is
5368primarily useful for debugging the compiler.  A more readable output
5369representation might be nice to have. It would also probably make
5370sense to give the result a class and write a print method.
5371<<[[disassemble]] function>>=
5372disassemble <- function(code) {
5373    .CodeSym <- as.name(".Code")
5374    disasm.const<-function(x)
5375        if (typeof(x)=="list" && length(x) > 0 && identical(x[[1]], .CodeSym))
5376            disasm(x) else x
5377    disasm <-function(code) {
5378        code[[2]]<-bcDecode(code[[2]])
5379        code[[3]]<-lapply(code[[3]], disasm.const)
5380        code
5381    }
5382    if (typeof(code)=="closure") {
5383        code <- .Internal(bodyCode(code))
5384        if (typeof(code) != "bytecode")
5385            stop("function is not compiled")
5386    }
5387    dput(disasm(.Internal(disassemble(code))))
5388}
5389@ %def disassemble
5390
5391The [[.Internal]] function [[disassemble]] extracts the numeric code
5392vector and constant pool.  The function [[bcDecode]] uses the
5393[[Opcodes.names]] array to translate the numeric opcodes into symbolic
5394ones.  At this point not enough information is available in a
5395reasonable place to also convert labels back to symbolic form.
5396<<[[bcDecode]] function>>=
5397bcDecode <- function(code) {
5398    n <- length(code)
5399    ncode <- vector("list", n)
5400    ncode[[1]] <- code[1] # version number
5401    i <- 2
5402    while (i <= n) {
5403        name<-Opcodes.names[code[i]+1]
5404        argc<-Opcodes.argc[[code[i]+1]]
5405        ncode[[i]] <- as.name(name)
5406        i<-i+1
5407        if (argc > 0)
5408            for (j in 1:argc) {
5409                ncode[[i]]<-code[i]
5410                i<-i+1
5411            }
5412    }
5413    ncode
5414}
5415@ %def bcDecode
5416
5417
5418\section{Improved subset and sub-assignment handling}
5419This section describes changes that allow subset and subassign
5420operations to inmost case be handled without allocating list of the
5421index arguments --- the arguments are passed on the stack instead.
5422The function [[cmpSubsetDispatch]] is analogous to [[cmpDispatch]]
5423described above. the [[dflt.op]] argument passed information about the
5424instruction to be emitted. For instructions designed for a particular
5425number of arguments the [[rank]] component is [[FALSE]] and no index
5426count is emitted; this is used for [[VECSUBSET.OP]] and
5427[[MATSUBSET.OP]] instructions. If [[rank]] is [[TRUE]], then the
5428number of indices is emitted as an operand; this is used by the
5429[[SUBSET_N.OP]] instruction.
5430<<[[cmpSubsetDispatch]] function>>=
5431cmpSubsetDispatch <- function(start.op, dflt.op, e, cb, cntxt) {
5432    if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
5433        cntxt$stop(gettext("cannot compile this expression"), cntxt,
5434                   loc = cb$savecurloc())
5435    else {
5436        oe <- e[[2]]
5437        if (missing(oe))
5438            cntxt$stop(gettext("cannot compile this expression"), cntxt,
5439                       loc = cb$savecurloc())
5440        ncntxt <- make.argContext(cntxt)
5441        ci <- cb$putconst(e)
5442        label <- cb$makelabel()
5443        cmp(oe, cb, ncntxt)
5444        cb$putcode(start.op, ci, label)
5445        indices <- e[-c(1, 2)]
5446        cmpIndices(indices, cb, ncntxt)
5447        if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices))
5448        else cb$putcode(dflt.op$code, ci)
5449        cb$putlabel(label)
5450        if (cntxt$tailcall) cb$putcode(RETURN.OP)
5451        TRUE
5452    }
5453}
5454@ %def cmpSubsetDispatch
5455
5456Index expressions are compiled by
5457<<[[cmpIndices]] function>>=
5458cmpIndices <- function(indices, cb, cntxt) {
5459    n <- length(indices)
5460    needInc <- FALSE
5461    for (i in seq_along(indices))
5462        if (i > 1 && checkNeedsInc(indices[[i]], cntxt)) {
5463            needInc <- TRUE
5464            break
5465        }
5466    for (i in seq_along(indices)) {
5467        cmp(indices[[i]], cb, cntxt, TRUE)
5468        if (needInc && i < n) cb$putcode(INCLNK.OP)
5469    }
5470    if (needInc) {
5471        if (n == 2) cb$putcode(DECLNK.OP)
5472        else if (n > 2) cb$putcode(DECLNK_N.OP, n - 1)
5473    }
5474}
5475@ %def cmpIndices
5476This adds instructions to increment and later decrement link counts on
5477previously computed values to prevent later computations from
5478modifying earlier ones. Eventually it should be possible to eliminate
5479some of these increment/decrement instructions in an optimization
5480phase.
5481
5482The subsetting handlers fall back to using [[cmpDispatch]] if there
5483are any named arguments or if an error would need to be signaled (we
5484could issue a compiler warning at this point as well). If all
5485arguments are unnamed and there are no dots then [[cmpSubsetDispatch]]
5486is used; the instruction emitted depends on the argument count.
5487<<inline handlers for subsetting>>=
5488setInlineHandler("[", function(e, cb, cntxt) {
5489    if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
5490        cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt) ## punt
5491    else {
5492        nidx <- length(e) - 2;
5493        if (nidx == 1)
5494            dflt.op <- list(code = VECSUBSET.OP, rank = FALSE)
5495        else if (nidx == 2)
5496            dflt.op <- list(code = MATSUBSET.OP, rank = FALSE)
5497        else
5498            dflt.op <- list(code = SUBSET_N.OP, rank = TRUE)
5499        cmpSubsetDispatch(STARTSUBSET_N.OP, dflt.op, e, cb, cntxt)
5500    }
5501})
5502
5503setInlineHandler("[[", function(e, cb, cntxt) {
5504    if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3)
5505        cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt) ## punt
5506    else {
5507        nidx <- length(e) - 2;
5508        if (nidx == 1)
5509            dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE)
5510        else if (nidx == 2)
5511            dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE)
5512        else
5513            dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE)
5514        cmpSubsetDispatch(STARTSUBSET2_N.OP, dflt.op, e, cb, cntxt)
5515    }
5516})
5517@
5518
5519Similarly, [[cmpSubassignDispatch]] is a variant of
5520[[cmpSetterDispatch]] that passes index arguments on the stack and
5521emits an index count if necessary.
5522<<[[cmpSubassignDispatch]] function>>=
5523cmpSubassignDispatch <- function(start.op, dflt.op, afun, place, call, cb,
5524                                 cntxt) {
5525    if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
5526        cntxt$stop(gettext("cannot compile this expression"), cntxt,
5527                   loc = cb$savecurloc())
5528    else {
5529        ci <- cb$putconst(call)
5530        label <- cb$makelabel()
5531        cb$putcode(start.op, ci, label)
5532        indices <- place[-c(1, 2)]
5533        cmpIndices(indices, cb, cntxt)
5534        if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices))
5535        else cb$putcode(dflt.op$code, ci)
5536        cb$putlabel(label)
5537        TRUE
5538    }
5539}
5540@  %def cmpSubassignDispatch
5541
5542Again the handlers fall back to [[cmpSetterDispatch]] if there are
5543named arguments or other complication.
5544<<inline handlers for subassignment>>=
5545setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt) {
5546    if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
5547        cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP,
5548                          afun, place, call, cb, cntxt) ## punt
5549    else {
5550        nidx <- length(place) - 2
5551        if (nidx == 1)
5552            dflt.op <- list(code = VECSUBASSIGN.OP, rank = FALSE)
5553        else if (nidx == 2)
5554            dflt.op <- list(code = MATSUBASSIGN.OP, rank = FALSE)
5555        else
5556            dflt.op <- list(code = SUBASSIGN_N.OP, rank = TRUE)
5557        cmpSubassignDispatch(STARTSUBASSIGN_N.OP, dflt.op, afun, place, call,
5558                             cb, cntxt)
5559    }
5560})
5561
5562setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt) {
5563    if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3)
5564        cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP,
5565                          afun, place, call, cb, cntxt) ## punt
5566    else {
5567        nidx <- length(place) - 2
5568        if (nidx == 1)
5569            dflt.op <- list(code = VECSUBASSIGN2.OP, rank = FALSE)
5570        else if (nidx == 2)
5571            dflt.op <- list(code = MATSUBASSIGN2.OP, rank = FALSE)
5572        else
5573            dflt.op <- list(code = SUBASSIGN2_N.OP, rank = TRUE)
5574        cmpSubassignDispatch(STARTSUBASSIGN2_N.OP, dflt.op, afun, place, call,
5575                             cb, cntxt)
5576    }
5577})
5578@
5579
5580Similarly, again, [[cmpSubsetGetterDispatch]] is a variant of
5581[[cmpGetterDispatch]] that passes index arguments on the stack.
5582<<[[cmpSubsetGetterDispatch]] function>>=
5583cmpSubsetGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) {
5584    if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
5585        cntxt$stop(gettext("cannot compile this expression"), cntxt,
5586                   loc = cb$savecurloc())
5587    else {
5588        ci <- cb$putconst(call)
5589        end.label <- cb$makelabel()
5590        cb$putcode(DUP2ND.OP)
5591        cb$putcode(start.op, ci, end.label)
5592        indices <- call[-c(1, 2)]
5593        cmpIndices(indices, cb, cntxt)
5594        if (dflt.op$rank)
5595            cb$putcode(dflt.op$code, ci, length(indices))
5596        else
5597            cb$putcode(dflt.op$code, ci)
5598        cb$putlabel(end.label)
5599        cb$putcode(SWAP.OP)
5600        TRUE
5601    }
5602}
5603@  %def cmpSubsetGetterDispatch
5604
5605And again the handlers fall back to [[cmpGetterDispatch]] if necessary.
5606<<inline handlers for subset getters>>=
5607setGetterInlineHandler("[", function(call, cb, cntxt) {
5608    if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
5609        cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt)
5610    else {
5611        nidx <- length(call) - 2;
5612        if (nidx == 1)
5613            dflt.op <- list(code = VECSUBSET.OP, rank = FALSE)
5614        else if (nidx == 2)
5615            dflt.op <- list(code = MATSUBSET.OP, rank = FALSE)
5616        else
5617            dflt.op <- list(code = SUBSET_N.OP, rank = TRUE)
5618        cmpSubsetGetterDispatch(STARTSUBSET_N.OP, dflt.op, call, cb, cntxt)
5619    }
5620})
5621
5622setGetterInlineHandler("[[", function(call, cb, cntxt) {
5623    if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3)
5624        cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt)
5625    else {
5626        nidx <- length(call) - 2;
5627        if (nidx == 1)
5628            dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE)
5629        else if (nidx == 2)
5630            dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE)
5631        else
5632            dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE)
5633        cmpSubsetGetterDispatch(STARTSUBSET2_N.OP, dflt.op, call, cb, cntxt)
5634    }
5635})
5636@
5637
5638\section{Discussion and future directions}
5639Despite its long gestation period this compiler should be viewed as a
5640first pass at creating a byte code compiler for R.  The compiler
5641itself is very simple in design as a single pass compiler with no
5642separate optimization phases.  Similarly the virtual machine uses a
5643very simple stack design.  While the compiler already achieves some
5644useful performance improvements on loop-intensive code, more can be
5645achieved with more sophisticated approaches.  This will be explored in
5646future work.
5647
5648A major objective of this first version was to reproduce R's
5649interpreted semantics with as few departures as possible while at the
5650same time optimizing a number of aspect of the execution process. The
5651inlining rules controlled by an optimization level setting seem to
5652provide a good way of doing this, and the default optimization setting
5653seems to be reasonably effective.  Mechanisms for adjusting the
5654default settings via declarations will be explored and added in the
5655near future.
5656
5657Future versions of the compiler and the engine will explore a number
5658of alternative designs.  Switching to a register-based virtual machine
5659will be explored fairly soon. Preliminary experiments suggest that
5660this can provide significant improvements in the case of tight loops
5661by allowing allocation of intermediate results to be avoided in many
5662cases.  It may be possible at least initially to keep the current
5663compiler ant just translate the stack-based machine code to a
5664register-based code.
5665
5666Another direction that will be explored is whether sequences of
5667arithmetic and other numerical operations can be fused and possibly
5668vectorized. Again preliminary experiments are promising, but more
5669exploration is needed.
5670
5671Other improvements to be examined may affect interpreted code as much
5672as compiled code.  These include more efficient environment
5673representations and more efficient calling conventions.
5674
5675%% **** add some benchmarks
5676%% **** comment on engine
5677
5678%% **** lots of other builtins, specials, and .Internals
5679%% **** controlling compiler warnings
5680%% **** merging in codetools features
5681%% **** put in install-time tests of assumptions about BUILTINs, etc.
5682
5683%% **** switch to register-based engine
5684%% **** make function calls more efficient
5685%% **** try to stay within a single bceval call
5686
5687%% **** think about optimizing things like mean?
5688
5689%% **** jit that compiles all expressions??
5690%% **** can it record code for expr/env pairs or some such?
5691%% **** can it inline primitives at that point?
5692
5693%% **** Stuff to think about:
5694%% ****   alternate environment representation for compiler
5695%% ****   optimizing function calls in general
5696%% ****   avoiding matching for BOA calls
5697
5698%% **** Think about different ways of handling environments.  Should every op
5699%% **** return a new env object that includes (possible) new local vars?
5700
5701%% **** Useful to be able to distinguish ... in args from assigned-to ...
5702
5703%% **** matrix subsetting has to be slow because of the way dim is stored.
5704%% **** might make sense to explicitly compile as multiple operations and
5705%% **** invariant hoisting  out of loops for loops?
5706
5707%% **** look at tail call optimization -- pass call and parent.frame??
5708%% **** eliminating variables not needed?
5709%% **** alternate builtin call implementations?
5710
5711%% **** install compiled promises in code body??
5712
5713%% **** catch errors at dispatching of inliners; fall back to runtime error
5714
5715
5716\appendix
5717\section{General utility functions}
5718This appendix provides a few general utility functions.
5719
5720The utility function [[pasteExpr]] is used in the error messages.
5721%% **** use elipsis instead of collapse??
5722%% **** use error context or catch errors?
5723%% **** maybe don't need expression if we catch errors?
5724<<[[pasteExpr]] function>>=
5725pasteExpr <- function(e, prefix = "\n    ") {
5726    de <- deparse(e)
5727    if (length(de) == 1) sQuote(de)
5728    else paste(prefix, deparse(e), collapse="")
5729}
5730@ %def pasteExpr
5731
5732The function [[dots.or.missing]] checks the argument list for any
5733missing or [[...]] arguments:
5734<<[[dots.or.missing]] function>>=
5735dots.or.missing <- function(args) {
5736    for (i in 1:length(args)) {
5737        a <-args[[i]]
5738        if (missing(a)) return(TRUE) #**** better test?
5739        if (typeof(a) == "symbol" && a == "...") return(TRUE)
5740    }
5741    return(FALSE)
5742}
5743@ %def dots.or.missing
5744
5745The function [[any.dots]] is defined as
5746<<[[any.dots]] function>>=
5747any.dots <- function(args) {
5748    for (i in 1:length(args)) {
5749        a <-args[[i]]
5750        if (! missing(a) && typeof(a) == "symbol" && a == "...")
5751            return(TRUE)
5752    }
5753    return(FALSE)
5754}
5755@ %def any.dots
5756
5757The utility function [[is.ddsym]] is used to recognize symbols of the
5758form [[..1]], [[..2]], and so on.
5759<<[[is.ddsym]] function>>=
5760is.ddsym <- function(name) {
5761    (is.symbol(name) || is.character(name)) &&
5762    length(grep("^\\.\\.[0-9]+$", as.character(name))) != 0
5763}
5764@ %def is.ddsym
5765
5766[[missingArgs]] takes an argument list for a call a logical vector
5767indicating for each argument whether it is empty (missing) or not.
5768<<[[missingArgs]] function>>=
5769missingArgs <- function(args) {
5770    val <- logical(length(args))
5771    for (i in seq_along(args)) {
5772        a <- args[[i]]
5773        if (missing(a))
5774            val[i] <- TRUE
5775        else
5776            val[i] <- FALSE
5777    }
5778    val
5779}
5780@ %def missingArgs
5781
5782
5783
5784\section{Environment utilities}
5785This appendix presents some utilities for computations on environments.
5786
5787The function [[frameTypes]] takes an environment argument and returns
5788a character vector with elements for each frame in the environment
5789classifying the frame as local, namespace, or global. The environment
5790is assumed to be a standard evaluation environment that contains
5791[[.GlobalEnv]] as one of its parents. It does this by computing the
5792number of local, namespace, and global frames and then generating the
5793result using [[rep]].
5794<<[[frameTypes]] function>>=
5795frameTypes <- function(env) {
5796    top <- topenv(env)
5797    empty <- emptyenv()
5798    <<find the number [[nl]] of local frames>>
5799    <<find the number [[nn]] of namespace frames>>
5800    <<find the number [[ng]] of global frames>>
5801    rep(c("local", "namespace", "global"), c(nl, nn, ng))
5802}
5803@ %def frameTypes
5804The number of local frames is computes by marching down the parent
5805frames with [[parent.env]] until the top level environment is reached.
5806<<find the number [[nl]] of local frames>>=
5807nl <- 0
5808while (! identical(env, top)) {
5809    if (isNamespace(env))
5810        stop("namespace found within local environments")
5811    env <- parent.env(env)
5812    nl <- nl + 1
5813    if (identical(env, empty))
5814        stop("not a proper evaluation environment")
5815}
5816@ %def
5817The number of namespace frames is computed by continuing down the
5818parent frames until [[.GlobalEnv]] is reached.
5819<<find the number [[nn]] of namespace frames>>=
5820nn <- 0
5821if (isNamespace(env)) {
5822    while (! identical(env, .GlobalEnv)) {
5823        if (!isNamespace(env)) {
5824            name <- attr(env, "name")
5825            if (!is.character(name) || !startsWith(name, "imports:"))
5826		stop("non-namespace found within namespace environments")
5827        }
5828        env <- parent.env(env)
5829        nn <- nn + 1
5830        if (identical(env, empty))
5831            stop("not a proper evaluation environment")
5832    }
5833}
5834@ %def
5835Finally the number of global frames is computed by continuing until
5836the empty environment is reached.  An alternative would be to compute
5837the length of the result returned by [[search]]
5838<<find the number [[ng]] of global frames>>=
5839ng <- 0
5840while (! identical(env, empty)) {
5841    if (isNamespace(env))
5842	stop("namespace found within global environments")
5843    env <- parent.env(env)
5844    ng <- ng + 1
5845}
5846@
5847
5848The function [[findHomeNS]] takes a variable name and a namespace
5849frame, or a namespace imports frame, and returns the namespace frame
5850in which the variable was originally defined, if any. The code assumes
5851that renaming has not been used (it may no longer be supported in the
5852namespace implementation in any case). Just in case, an attempt is
5853made to check for renaming.  The result returned is the namaspace
5854frame for the namespace in which the variable was defined or [[NULL]]
5855if the variable was not defined in the specified namespace or one of
5856its imports, or if the home namespace cannot be determined.
5857<<[[findHomeNS]] function>>=
5858## Given a symbol name and a namespace environment (or a namespace
5859## imports environment) find the namespace in which the symbol's value
5860## was originally defined. Returns NULL if the symbol is not found via
5861## the namespace.
5862findHomeNS <- function(sym, ns, cntxt) {
5863    <<if [[ns]] is an imports frame find the corresponding namespace>>
5864    if (exists(sym, ns, inherits = FALSE))
5865        ns
5866    else if (exists(".__NAMESPACE__.", ns, inherits = FALSE)) {
5867        <<search the imports for [[sym]]>>
5868        NULL
5869    }
5870    else NULL
5871}
5872@ %def findHomeNS
5873
5874If the [[ns]] argument is not a namespace frame it should be the
5875imports frame of a namespace.  Such an imports frame should have a
5876[[name]] attribute or the form [["imports:foo"]] it it is associated
5877with namespace [["foo"]]. This is used to find the namespace frame
5878that owns the imports frame in this case, and this frames is then
5879assigned to [[ns]].
5880<<if [[ns]] is an imports frame find the corresponding namespace>>=
5881if (! isNamespace(ns)) {
5882    ## As a convenience this allows for 'ns' to be the imports fame
5883    ## of a namespace. It appears that these now have a 'name'
5884    ## attribute of the form 'imports:foo' if 'foo' is the
5885    ## namespace.
5886    name <- attr(ns, "name")
5887    if (is.null(name))
5888        cntxt$stop("'ns' must be a namespace or a namespace imports environment",
5889            cntxt)
5890    ns <- getNamespace(sub("imports:", "", attr(ns, "name")))
5891}
5892@ %def
5893
5894The imports are searched in reverse order since in the case of name
5895conflicts the last one imported will take precedence.  Full imports
5896via an [[import]] directive have to be handled differently than
5897selective imports created with [[importFrom]] directives.
5898<<search the imports for [[sym]]>>=
5899imports <- get(".__NAMESPACE__.", ns)$imports
5900for (i in rev(seq_along(imports))) {
5901    iname <- names(imports)[i]
5902    ins <- getNamespace(iname)
5903    if (identical(imports[[i]], TRUE)) {
5904        <<search in a full import>>
5905    }
5906    else {
5907        <<search in a selective import>>
5908    }
5909}
5910@ %def
5911
5912If an entry in the [[imports]] specification for the import source
5913namespace [[ins]] has value [[TRUE]], then all exports of the [[ins]]
5914have been imported.  If [[sym]] is in the exports then the result of a
5915recursive call to [[findHomeNS]] is returned.
5916<<search in a full import>>=
5917if (identical(ins, .BaseNamespaceEnv))
5918    exports <- .BaseNamespaceEnv
5919else
5920    exports <- get(".__NAMESPACE__.", ins)$exports
5921if (exists(sym, exports, inherits = FALSE))
5922    return(findHomeNS(sym, ins, cntxt))
5923@ %def
5924
5925For selective imports the [[imports]] entry is a named character
5926vector mapping export name to import name.  In the absence of renaming
5927the names should match the values; if this is not the case [[NULL]] is
5928returned. Otherwise, a match results again in returning a recursive
5929call to [[findHomeNS]].
5930<<search in a selective import>>=
5931exports <- imports[[i]]
5932pos <- match(sym, names(exports), 0)
5933if (pos) {
5934    ## If renaming has been used things get too
5935    ## confusing so return NULL. (It is not clear if
5936    ## renaming this is still supported by the
5937    ## namespace code.)
5938    if (sym == exports[pos])
5939        return(findHomeNS(sym, ins, cntxt))
5940    else
5941        return(NULL)
5942}
5943@
5944
5945Given a package package frame from the global environment the function
5946[[packFrameName]] returns the associated package name, which is
5947computed from the [[name]] attribute.
5948%% **** might be good the check the name is of the form package:foo
5949<<[[packFrameName]] function>>=
5950packFrameName <- function(frame) {
5951    fname <- attr(frame, "name")
5952    if (is.character(fname))
5953        sub("package:", "", fname)
5954    else if (identical(frame , baseenv()))
5955        "base"
5956    else ""
5957}
5958@ %def packFrameName
5959
5960For a namespace frame the function [[nsName]] retrieves the namespace
5961name from the namespace information structure.
5962<<[[nsName]] function>>=
5963nsName <- function(ns) {
5964    if (identical(ns, .BaseNamespaceEnv))
5965        "base"
5966    else {
5967        name <- ns$.__NAMESPACE__.$spec["name"]
5968        if (is.character(name))
5969            as.character(name) ## strip off names
5970        else ""
5971    }
5972}
5973@ %def nsName
5974
5975
5976\section{Experimental utilities}
5977
5978This section presents two experimental utililities that, for now, are
5979not exported. The first is a simple byte code profiler. This requires
5980that the file [[eval.c]] be compiled with [[BC_PROFILING]] enabled,
5981which on [[gcc]]-compatible compilers will disable threaded code. The
5982byte code profiler uses the profile timer to record the active byte
5983code instruction at interrupt time. The function [[bcprof]] runs the
5984profiler while evaluating its argument expression and returns a
5985summary of the counts.
5986
5987<<[[bcprof]] function>>=
5988bcprof <- function(expr) {
5989    .Internal(bcprofstart())
5990    expr
5991    .Internal(bcprofstop())
5992    val <- structure(.Internal(bcprofcounts()),
5993                     names = Opcodes.names)
5994    hits <- sort(val[val > 0], decreasing = TRUE)
5995    pct <- round(100 * hits / sum(hits), 1)
5996    data.frame(hits = hits, pct = pct)
5997}
5998@ %def bcprof
5999
6000The second utility is a simple interface to the code building
6001mechanism that may help with experimenting with code optimizations.
6002<<[[asm]] function>>=
6003asm <- function(e, gen, env = .GlobalEnv, options = NULL) {
6004    cenv <- makeCenv(env)
6005    cntxt <- make.toplevelContext(cenv, options)
6006    cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt))
6007    genCode(e, cntxt, gen = gen)
6008}
6009@ %def asm
6010
6011\section{Opcode constants}
6012\subsection{Symbolic opcode names}
6013<<opcode definitions>>=
6014BCMISMATCH.OP <- 0
6015RETURN.OP <- 1
6016GOTO.OP <- 2
6017BRIFNOT.OP <- 3
6018POP.OP <- 4
6019DUP.OP <- 5
6020PRINTVALUE.OP <- 6
6021STARTLOOPCNTXT.OP <- 7
6022ENDLOOPCNTXT.OP <- 8
6023DOLOOPNEXT.OP <- 9
6024DOLOOPBREAK.OP <- 10
6025STARTFOR.OP <- 11
6026STEPFOR.OP <- 12
6027ENDFOR.OP <- 13
6028SETLOOPVAL.OP <- 14
6029INVISIBLE.OP <- 15
6030LDCONST.OP <- 16
6031LDNULL.OP <- 17
6032LDTRUE.OP <- 18
6033LDFALSE.OP <- 19
6034GETVAR.OP <- 20
6035DDVAL.OP <- 21
6036SETVAR.OP <- 22
6037GETFUN.OP <- 23
6038GETGLOBFUN.OP <- 24
6039GETSYMFUN.OP <- 25
6040GETBUILTIN.OP <- 26
6041GETINTLBUILTIN.OP <- 27
6042CHECKFUN.OP <- 28
6043MAKEPROM.OP <- 29
6044DOMISSING.OP <- 30
6045SETTAG.OP <- 31
6046DODOTS.OP <- 32
6047PUSHARG.OP <- 33
6048PUSHCONSTARG.OP <- 34
6049PUSHNULLARG.OP <- 35
6050PUSHTRUEARG.OP <- 36
6051PUSHFALSEARG.OP <- 37
6052CALL.OP <- 38
6053CALLBUILTIN.OP <- 39
6054CALLSPECIAL.OP <- 40
6055MAKECLOSURE.OP <- 41
6056UMINUS.OP <- 42
6057UPLUS.OP <- 43
6058ADD.OP <- 44
6059SUB.OP <- 45
6060MUL.OP <- 46
6061DIV.OP <- 47
6062EXPT.OP <- 48
6063SQRT.OP <- 49
6064EXP.OP <- 50
6065EQ.OP <- 51
6066NE.OP <- 52
6067LT.OP <- 53
6068LE.OP <- 54
6069GE.OP <- 55
6070GT.OP <- 56
6071AND.OP <- 57
6072OR.OP <- 58
6073NOT.OP <- 59
6074DOTSERR.OP <- 60
6075STARTASSIGN.OP <- 61
6076ENDASSIGN.OP <- 62
6077STARTSUBSET.OP <- 63
6078DFLTSUBSET.OP <- 64
6079STARTSUBASSIGN.OP <- 65
6080DFLTSUBASSIGN.OP <- 66
6081STARTC.OP <- 67
6082DFLTC.OP <- 68
6083STARTSUBSET2.OP <- 69
6084DFLTSUBSET2.OP <- 70
6085STARTSUBASSIGN2.OP <- 71
6086DFLTSUBASSIGN2.OP <- 72
6087DOLLAR.OP <- 73
6088DOLLARGETS.OP <- 74
6089ISNULL.OP <- 75
6090ISLOGICAL.OP <- 76
6091ISINTEGER.OP <- 77
6092ISDOUBLE.OP <- 78
6093ISCOMPLEX.OP <- 79
6094ISCHARACTER.OP <- 80
6095ISSYMBOL.OP <- 81
6096ISOBJECT.OP <- 82
6097ISNUMERIC.OP <- 83
6098VECSUBSET.OP <- 84
6099MATSUBSET.OP <- 85
6100VECSUBASSIGN.OP <- 86
6101MATSUBASSIGN.OP <- 87
6102AND1ST.OP <- 88
6103AND2ND.OP <- 89
6104OR1ST.OP <- 90
6105OR2ND.OP <- 91
6106GETVAR_MISSOK.OP <- 92
6107DDVAL_MISSOK.OP <- 93
6108VISIBLE.OP <- 94
6109SETVAR2.OP <- 95
6110STARTASSIGN2.OP <- 96
6111ENDASSIGN2.OP <- 97
6112SETTER_CALL.OP <- 98
6113GETTER_CALL.OP <- 99
6114SWAP.OP <- 100
6115DUP2ND.OP <- 101
6116SWITCH.OP <- 102
6117RETURNJMP.OP <- 103
6118STARTSUBSET_N.OP <- 104
6119STARTSUBASSIGN_N.OP <- 105
6120VECSUBSET2.OP <- 106
6121MATSUBSET2.OP <- 107
6122VECSUBASSIGN2.OP <- 108
6123MATSUBASSIGN2.OP <- 109
6124STARTSUBSET2_N.OP <- 110
6125STARTSUBASSIGN2_N.OP <- 111
6126SUBSET_N.OP <- 112
6127SUBSET2_N.OP <- 113
6128SUBASSIGN_N.OP <- 114
6129SUBASSIGN2_N.OP <-115
6130LOG.OP <- 116
6131LOGBASE.OP <- 117
6132MATH1.OP <- 118
6133DOTCALL.OP <- 119
6134COLON.OP <- 120
6135SEQALONG.OP <- 121
6136SEQLEN.OP <- 122
6137BASEGUARD.OP <- 123
6138INCLNK.OP <- 124
6139DECLNK.OP <- 125
6140DECLNK_N.OP <- 126
6141INCLNKSTK.OP <- 127
6142DECLNKSTK.OP <- 128
6143@
6144
6145\subsection{Instruction argument counts and names}
6146<<opcode argument counts>>=
6147Opcodes.argc <- list(
6148BCMISMATCH.OP = 0,
6149RETURN.OP = 0,
6150GOTO.OP = 1,
6151BRIFNOT.OP = 2,
6152POP.OP = 0,
6153DUP.OP = 0,
6154PRINTVALUE.OP = 0,
6155STARTLOOPCNTXT.OP = 2,
6156ENDLOOPCNTXT.OP = 1,
6157DOLOOPNEXT.OP = 0,
6158DOLOOPBREAK.OP = 0,
6159STARTFOR.OP = 3,
6160STEPFOR.OP = 1,
6161ENDFOR.OP = 0,
6162SETLOOPVAL.OP = 0,
6163INVISIBLE.OP = 0,
6164LDCONST.OP = 1,
6165LDNULL.OP = 0,
6166LDTRUE.OP = 0,
6167LDFALSE.OP = 0,
6168GETVAR.OP = 1,
6169DDVAL.OP = 1,
6170SETVAR.OP = 1,
6171GETFUN.OP = 1,
6172GETGLOBFUN.OP = 1,
6173GETSYMFUN.OP = 1,
6174GETBUILTIN.OP = 1,
6175GETINTLBUILTIN.OP = 1,
6176CHECKFUN.OP = 0,
6177MAKEPROM.OP = 1,
6178DOMISSING.OP = 0,
6179SETTAG.OP = 1,
6180DODOTS.OP = 0,
6181PUSHARG.OP = 0,
6182PUSHCONSTARG.OP = 1,
6183PUSHNULLARG.OP = 0,
6184PUSHTRUEARG.OP = 0,
6185PUSHFALSEARG.OP = 0,
6186CALL.OP = 1,
6187CALLBUILTIN.OP = 1,
6188CALLSPECIAL.OP = 1,
6189MAKECLOSURE.OP = 1,
6190UMINUS.OP = 1,
6191UPLUS.OP = 1,
6192ADD.OP = 1,
6193SUB.OP = 1,
6194MUL.OP = 1,
6195DIV.OP = 1,
6196EXPT.OP = 1,
6197SQRT.OP = 1,
6198EXP.OP = 1,
6199EQ.OP = 1,
6200NE.OP = 1,
6201LT.OP = 1,
6202LE.OP = 1,
6203GE.OP = 1,
6204GT.OP = 1,
6205AND.OP = 1,
6206OR.OP = 1,
6207NOT.OP = 1,
6208DOTSERR.OP = 0,
6209STARTASSIGN.OP = 1,
6210ENDASSIGN.OP = 1,
6211STARTSUBSET.OP = 2,
6212DFLTSUBSET.OP = 0,
6213STARTSUBASSIGN.OP = 2,
6214DFLTSUBASSIGN.OP = 0,
6215STARTC.OP = 2,
6216DFLTC.OP = 0,
6217STARTSUBSET2.OP = 2,
6218DFLTSUBSET2.OP = 0,
6219STARTSUBASSIGN2.OP = 2,
6220DFLTSUBASSIGN2.OP = 0,
6221DOLLAR.OP = 2,
6222DOLLARGETS.OP = 2,
6223ISNULL.OP = 0,
6224ISLOGICAL.OP = 0,
6225ISINTEGER.OP = 0,
6226ISDOUBLE.OP = 0,
6227ISCOMPLEX.OP = 0,
6228ISCHARACTER.OP = 0,
6229ISSYMBOL.OP = 0,
6230ISOBJECT.OP = 0,
6231ISNUMERIC.OP = 0,
6232VECSUBSET.OP = 1,
6233MATSUBSET.OP = 1,
6234VECSUBASSIGN.OP = 1,
6235MATSUBASSIGN.OP = 1,
6236AND1ST.OP = 2,
6237AND2ND.OP = 1,
6238OR1ST.OP = 2,
6239OR2ND.OP = 1,
6240GETVAR_MISSOK.OP = 1,
6241DDVAL_MISSOK.OP = 1,
6242VISIBLE.OP = 0,
6243SETVAR2.OP = 1,
6244STARTASSIGN2.OP = 1,
6245ENDASSIGN2.OP = 1,
6246SETTER_CALL.OP = 2,
6247GETTER_CALL.OP = 1,
6248SWAP.OP = 0,
6249DUP2ND.OP = 0,
6250SWITCH.OP = 4,
6251RETURNJMP.OP = 0,
6252STARTSUBSET_N.OP = 2,
6253STARTSUBASSIGN_N.OP = 2,
6254VECSUBSET2.OP = 1,
6255MATSUBSET2.OP = 1,
6256VECSUBASSIGN2.OP = 1,
6257MATSUBASSIGN2.OP = 1,
6258STARTSUBSET2_N.OP = 2,
6259STARTSUBASSIGN2_N.OP = 2,
6260SUBSET_N.OP = 2,
6261SUBSET2_N.OP = 2,
6262SUBASSIGN_N.OP = 2,
6263SUBASSIGN2_N.OP = 2,
6264LOG.OP = 1,
6265LOGBASE.OP = 1,
6266MATH1.OP = 2,
6267DOTCALL.OP = 2,
6268COLON.OP = 1,
6269SEQALONG.OP = 1,
6270SEQLEN.OP = 1,
6271BASEGUARD.OP = 2,
6272INCLNK.OP = 0,
6273DECLNK.OP = 0,
6274DECLNK_N.OP = 1,
6275INCLNKSTK.OP = 0,
6276DECLNKSTK.OP = 0
6277)
6278@
6279
6280<<opcode names>>=
6281Opcodes.names <- names(Opcodes.argc)
6282@ %def Opcodes.names
6283
6284
6285\section{Implementation file}
6286%% Benchmark code:
6287% sf <- function(x) {
6288%     s <- 0
6289%     for (y in x)
6290%         s <- s + y
6291%     s
6292% }
6293
6294% sfc <- cmpfun(sf)
6295
6296% x <- 1 : 1000000
6297
6298% system.time(sf(x))
6299% system.time(sfc(x))
6300
6301
6302%% **** need header/copyright/license stuff here
6303<<cmp.R>>=
6304#  Automatically generated from ../noweb/compiler.nw.
6305#
6306#  File src/library/compiler/R/cmp.R
6307#  Part of the R package, https://www.R-project.org
6308#  Copyright (C) 2001-2014 Luke Tierney
6309#
6310#  This program is free software; you can redistribute it and/or modify
6311#  it under the terms of the GNU General Public License as published by
6312#  the Free Software Foundation; either version 2 of the License, or
6313#  (at your option) any later version.
6314#
6315#  This program is distributed in the hope that it will be useful,
6316#  but WITHOUT ANY WARRANTY; without even the implied warranty of
6317#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
6318#  GNU General Public License for more details.
6319#
6320#  A copy of the GNU General Public License is available at
6321#  https://www.R-project.org/Licenses/
6322
6323##
6324## Compiler options
6325##
6326
6327<<compiler options data base>>
6328
6329<<[[getCompilerOption]] function>>
6330
6331
6332##
6333## General Utilities
6334##
6335
6336<<[[pasteExpr]] function>>
6337
6338<<[[dots.or.missing]] function>>
6339
6340<<[[any.dots]] function>>
6341
6342<<[[is.ddsym]] function>>
6343
6344
6345<<[[missingArgs]] function>>
6346
6347
6348##
6349## Environment utilities
6350##
6351
6352<<[[frameTypes]] function>>
6353
6354<<[[findHomeNS]] function>>
6355
6356<<[[packFrameName]] function>>
6357
6358<<[[nsName]] function>>
6359
6360
6361##
6362## Finding possible local variables
6363##
6364
6365<<[[getAssignedVar]] function>>
6366
6367<<[[findLocals1]] function>>
6368
6369<<[[findLocalsList1]] function>>
6370
6371<<[[findLocals]] function>>
6372
6373<<[[findLocalsList]] function>>
6374
6375
6376##
6377## Compilation environment implementation
6378##
6379
6380<<[[makeCenv]] function>>
6381
6382<<[[addCenvVars]] function>>
6383
6384<<[[addCenvFrame]] function>>
6385
6386<<[[findCenvVar]] function>>
6387
6388<<[[isBaseVar]] function>>
6389
6390<<[[funEnv]] function>>
6391
6392<<[[findLocVar]] function>>
6393
6394<<[[findFunDef]] function>>
6395
6396<<[[findVar]] function>>
6397
6398
6399##
6400## Constant folding
6401##
6402
6403<<[[maxConstSize]] and [[constModes]] definitions>>
6404
6405<<[[constNames]] definition>>
6406
6407<<[[checkConst]] function>>
6408
6409<<[[constantFoldSym]] function>>
6410
6411<<[[getFoldFun]] function>>
6412
6413<<[[constantFoldCall]] function>>
6414
6415<<[[constantFold]] function>>
6416
6417<<[[foldFuns]] definition>>
6418
6419<<[[languageFuns]] definition>>
6420
6421
6422##
6423## Opcode constants
6424##
6425
6426<<opcode argument counts>>
6427
6428<<opcode names>>
6429
6430<<opcode definitions>>
6431
6432
6433##
6434## Code buffer implementation
6435##
6436
6437<<source location tracking functions>>
6438
6439<<[[make.codeBuf]] function>>
6440
6441<<[[codeBufCode]] function>>
6442
6443<<[[genCode]] function>>
6444
6445
6446##
6447## Compiler contexts
6448##
6449
6450<<[[make.toplevelContext]] function>>
6451
6452<<[[make.callContext]] function>>
6453
6454<<[[make.promiseContext]] function>>
6455
6456<<[[make.functionContext]] function>>
6457
6458<<[[make.nonTailCallContext]] function>>
6459
6460<<[[make.argContext]] function>>
6461
6462<<[[make.noValueContext]] function>>
6463
6464<<[[make.loopContext]] function>>
6465
6466
6467##
6468## Compiler top level
6469##
6470
6471<<[[cmp]] function>>
6472
6473<<[[cmpConst]] function>>
6474
6475<<[[cmpSym]] function>>
6476
6477<<[[cmpCall]] function>>
6478
6479<<[[cmpCallSymFun]] function>>
6480
6481<<[[cmpCallExprFun]] function>>
6482
6483<<[[cmpCallArgs]] function>>
6484
6485<<[[cmpConstArg]]>>
6486
6487<<[[checkCall]] function>>
6488
6489## **** need to handle ... and ..n arguments specially
6490## **** separate call opcode for calls with named args?
6491## **** for (a in e[[-1]]) ... goes into infinite loop
6492
6493<<[[cmpTag]] function>>
6494
6495<<[[mayCallBrowser]] function>>
6496
6497<<[[mayCallBrowserList]] function>>
6498
6499##
6500## Inlining mechanism
6501##
6502
6503<<inline handler implementation>>
6504
6505## tryInline implements the rule permitting inlining as they stand now:
6506## Inlining is controlled by the optimize compiler option, with possible
6507## values 0, 1, 2, 3.
6508
6509<<[[getInlineInfo]] function>>
6510
6511<<[[tryInline]] function>>
6512
6513
6514##
6515## Inline handlers for some SPECIAL functions
6516##
6517
6518<<inlining handler for [[function]]>>
6519
6520<<inlining handler for left brace function>>
6521
6522<<inlining handler for [[if]]>>
6523
6524<<inlining handler for [[&&]]>>
6525
6526<<inlining handler for [[||]]>>
6527
6528
6529##
6530## Inline handlers for assignment expressions
6531##
6532
6533<<setter inlining mechanism>>
6534
6535<<getter inlining mechanism>>
6536
6537<<[[cmpAssign]] function>>
6538
6539<<[[flattenPlace]] function>>
6540
6541<<[[cmpGetterCall]] function>>
6542
6543<<[[checkAssign]] function>>
6544
6545<<[[cmpSymbolAssign]] function>>
6546
6547<<[[cmpComplexAssign]] function>>
6548
6549<<[[cmpSetterCall]] function>>
6550
6551<<[[getAssignFun]] function>>
6552
6553<<[[cmpSetterDispatch]] function>>
6554
6555<<inlining handlers for [[<-]], [[=]], and [[<<-]]>>
6556
6557<<setter inline handler for [[$<-]]>>
6558
6559<<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>>
6560
6561<<[[cmpGetterDispatch]] function>>
6562
6563<<getter inline handler for [[$]]>>
6564
6565<<getter inline handlers for [[[]] and [[[[]]>>
6566
6567
6568##
6569## Inline handlers for loops
6570##
6571
6572<<inlining handlers for [[next]] and [[break]]>>
6573
6574<<[[isLoopStopFun]] function>>
6575
6576<<[[isLoopTopFun]] function>>
6577
6578<<[[checkSkipLoopCntxtList]] function>>
6579
6580<<[[checkSkipLoopCntxt]] function>>
6581
6582<<inlining handler for [[repeat]] loops>>
6583
6584<<[[cmpRepeatBody]] function>>
6585
6586<<inlining handler for [[while]] loops>>
6587
6588<<[[cmpWhileBody]] function>>
6589
6590<<inlining handler for [[for]] loops>>
6591
6592<<[[cmpForBody]] function>>
6593
6594
6595##
6596## Inline handlers for one and two argument primitives
6597##
6598
6599<<[[cmpPrim1]] function>>
6600
6601<<[[checkNeedsInc]] function>>
6602
6603<<[[cmpPrim2]] function>>
6604
6605<<inline handlers for [[+]] and [[-]]>>
6606
6607<<inline handlers for [[*]] and [[/]]>>
6608
6609<<inline handlers for [[^]], [[exp]], and [[sqrt]]>>
6610
6611<<inline handler for [[log]]>>
6612
6613<<list of one argument math functions>>
6614
6615<<[[cmpMath1]] function>>
6616
6617<<inline one argument math functions>>
6618
6619<<inline handlers for comparison operators>>
6620
6621<<inline handlers for [[&]] and [[|]]>>
6622
6623<<inline handler for [[!]]>>
6624
6625
6626##
6627## Inline handlers for the left parenthesis function
6628##
6629
6630<<inlining handler for [[(]]>>
6631
6632
6633##
6634## Inline handlers for general BUILTIN and SPECIAL functions
6635##
6636
6637<<[[cmpBuiltin]] function>>
6638
6639<<[[cmpBuiltinArgs]] function>>
6640
6641<<[[cmpSpecial]] function>>
6642
6643<<inlining handler for [[.Internal]]>>
6644
6645
6646##
6647## Inline handlers for subsetting and related operators
6648##
6649
6650<<[[cmpDispatch]] function>>
6651
6652<<inlining handlers for some dispatching SPECIAL functions>>
6653
6654<<inlining handler for [[$]]>>
6655
6656
6657##
6658## Inline handler for local() and return() functions
6659##
6660
6661<<inlining handler for [[local]] function>>
6662
6663<<inlining handler for [[return]] function>>
6664
6665
6666##
6667## Inline handlers for the family of is.xyz primitives
6668##
6669
6670<<[[cmpIs]] function>>
6671
6672<<inlining handlers for [[is.xyz]] functions>>
6673
6674
6675##
6676## Default inline handlers for BUILTIN and SPECIAL functions
6677##
6678
6679<<install default inlining handlers>>
6680
6681
6682##
6683## Inline handlers for some .Internal functions
6684##
6685
6686<<[[simpleFormals]] function>>
6687
6688<<[[simpleArgs]] function>>
6689
6690<<[[is.simpleInternal]] function>>
6691
6692<<[[inlineSimpleInternalCall]] function>>
6693
6694<<[[cmpSimpleInternal]] function>>
6695
6696<<inline safe simple [[.Internal]] functions from [[base]]>>
6697
6698<<inline safe simple [[.Internal]] functions from [[stats]]>>
6699
6700
6701##
6702## Inline handler for switch
6703##
6704
6705<<[[findActionIndex]] function>>
6706
6707<<inline handler for [[switch]]>>
6708
6709
6710##
6711## Inline handler for .Call
6712##
6713
6714<<inline handler for [[.Call]]>>
6715
6716
6717##
6718## Inline handlers for generating integer sequences
6719##
6720
6721<<inline handlers for integer sequences>>
6722
6723
6724##
6725## Inline handlers to control warnings
6726##
6727
6728<<[[cmpMultiColon]] function>>
6729
6730<<inlining handlers for [[::]] and [[:::]]>>
6731
6732<<setter inlining handler for [[@<-]]>>
6733
6734<<inlining handler for [[with]]>>
6735
6736<<inlining handler for [[require]]>>
6737
6738
6739##
6740## Compiler warnings
6741##
6742
6743<<[[suppressAll]] function>>
6744
6745<<[[suppressNoSuperAssignVar]] function>>
6746
6747<<[[suppressUndef]] function>>
6748
6749<<[[notifyLocalFun]] function>>
6750
6751<<[[notifyUndefFun]] function>>
6752
6753<<[[notifyUndefVar]] function>>
6754
6755<<[[notifyNoSuperAssignVar]] function>>
6756
6757<<[[notifyWrongArgCount]] function>>
6758
6759<<[[notifyWrongDotsUse]] function>>
6760
6761<<[[notifyWrongBreakNext]] function>>
6762
6763<<[[notifyBadCall]] function>>
6764
6765<<[[notifyBadAssignFun]] function>>
6766
6767<<[[notifyMultipleSwitchDefaults]] function>>
6768
6769<<[[notifyNoSwitchcases]] function>>
6770
6771<<[[notifyAssignSyntacticFun]] function>>
6772
6773<<[[notifyCompilerError]] function>>
6774
6775
6776##
6777## Compiler interface
6778##
6779
6780<<[[compile]] function>>
6781
6782<<[[cmpfun]] function>>
6783
6784<<[[tryCmpfun]] function>>
6785
6786<<[[tryCompile]] function>>
6787
6788<<[[cmpframe]] function>>
6789
6790<<[[cmplib]] function>>
6791
6792<<[[cmpfile]] function>>
6793
6794<<[[loadcmp]] function>>
6795
6796<<[[enableJIT]] function>>
6797
6798<<[[compilePKGS]] function>>
6799
6800<<[[setCompilerOptions]] function>>
6801
6802<<[[.onLoad]] function>>
6803
6804<<[[checkCompilerOptions]] function>>
6805
6806
6807##
6808## Disassembler
6809##
6810
6811<<[[bcDecode]] function>>
6812
6813<<[[disassemble]] function>>
6814
6815
6816##
6817## Experimental Utilities
6818##
6819
6820<<[[bcprof]] function>>
6821
6822<<[[asm]] function>>
6823
6824
6825##
6826## Improved subset and subassign handling
6827##
6828
6829<<[[cmpIndices]] function>>
6830
6831<<[[cmpSubsetDispatch]] function>>
6832
6833<<inline handlers for subsetting>>
6834
6835<<[[cmpSubassignDispatch]] function>>
6836
6837<<inline handlers for subassignment>>
6838
6839<<[[cmpSubsetGetterDispatch]] function>>
6840
6841<<inline handlers for subset getters>>
6842@
6843\end{document}
6844