1% -*- mode: ess-noweb; ess-noweb-code-mode: R-mode -*- 2 3\documentclass[11pt]{article} 4\usepackage{hyperref} 5\usepackage[headings]{fullpage} 6\usepackage{verbatim} 7\usepackage{noweb} 8 9% This is a minor modification to the new verbatim environment that uses 10% the same size as noweb's code size and indents the same amount at 11% noweb's code chunks. I redefine verbatim instead of defining my own 12% environment since the html converter only seems to understand 13% verbatim, not new definitions. 14\makeatletter 15\addto@hook{\every@verbatim}{\nowebsize\setlength{\leftmargin}{50mm}} 16\def\verbatim@processline{\hspace{\codemargin}\the\verbatim@line\par} 17\makeatother 18 19% The following try to prevent wasteful page breaks 20\def\nwendcode{\endtrivlist \endgroup} 21\let\nwdocspar=\par 22 23\pagestyle{noweb} 24\bibliographystyle{plain} 25 26\noweboptions{noidentxref,longchunks,smallcode} 27 28\title{A Byte Code Compiler for R} 29\author{Luke Tierney\\ 30 Department of Statistics and Actuarial Science\\ 31 University of Iowa} 32 33\begin{document} 34\maketitle 35This document presents the current implementation of the byte code 36compiler for R. The compiler produces code for a virtual machine that 37is then executed by a virtual machine runtime system. The virtual 38machine is a stack based machine. Thus instructions for the virtual 39machine take arguments off a stack and may leave one or more results 40on the stack. Byte code objects consists of an integer vector 41representing instruction opcodes and operands, and a generic vector 42representing a constant pool. The compiler is implemented almost 43entirely in R, with just a few support routines in C to manage 44compiled code objects. 45 46The virtual machine instruction set is designed to allow much of the 47interpreter internals to be re-used. In particular, for now the 48mechanism for calling functions of all types from compiled code 49remains the same as the function calling mechanism for interpreted 50code. There are opportunities for efficiency improvements through 51using a different mechanism for calls from compiled functions to 52compiled functions, or changing the mechanism for both interpreted and 53compiled code; this will be explored in future work. 54 55The style used by the compiler for building up code objects is 56imperative: A code buffer object is created that contains buffers for 57the instruction stream and the constant pool. Instructions and 58constants are written to the buffer as the compiler processes an 59expression tree, and at the end a code object is constructed. A more 60functional design in which each compiler step returns a modified code 61object might be more elegant in principle, but it would be more 62difficult to make efficient. 63 64A multi-pass compiler in which a first pass produces an intermediate 65representation, subsequent passes optimize the intermediate 66representation, and a final pass produces actual code would also be 67useful and might be able to produce better code. A future version of 68the compiler may use this approach. But for now to keep things simple 69a single pass is used. 70 71%% **** Some peephole optimization is probably possible, and at least 72%% **** some constant folding could be done on the bytecode, but more 73%% **** sophisticated optimizations like inlining or R code would require 74%% **** a more suitable intermediate representation. 75 76%% **** I _think_ conversion from stack-based byte code to register-based 77%% **** code is reasonably straight forward but I haven't thought it 78%% **** through thoroughly yet. 79 80 81\section{The compiler interface} 82The compiler can be used either explicitly by calling certain 83functions to carry out compilations, or implicitly by enabling 84compilation to occur automatically at certain points. 85 86\subsection{Explicit compilation} 87The primary functions for explicit compilation are [[compile]], 88[[cmpfun]], and [[cmpfile]]. 89 90The [[compile]] function compiles an expression and returns a byte code 91object, which can then be passed to [[eval]]. A simple example is 92\begin{verbatim} 93> library(compiler) 94> compile(quote(1+3)) 95<bytecode: 0x25ba070> 96> eval(compile(quote(1+3))) 97[1] 4 98\end{verbatim} 99 100A closure can be compiled using [[cmpfun]]. If the function [[f]] is 101defined as 102\begin{verbatim} 103f <- function(x) { 104 s <- 0.0 105 for (y in x) 106 s <- s + y 107 s 108} 109\end{verbatim} 110then a compiled version is produced by 111\begin{verbatim} 112fc <- cmpfun(f) 113\end{verbatim} 114We can then compare the performance of the interpreted and compiled 115versions: 116\begin{verbatim} 117> x <- as.double(1 : 10000000) 118> system.time(f(x)) 119 user system elapsed 120 6.470 0.010 6.483 121> system.time(fc(x)) 122 user system elapsed 123 1.870 0.000 1.865 124\end{verbatim} 125 126A source file can be compiled with [[cmpfile]]. For now, the resulting 127file has to then be loaded with [[loadcmp]]. In the future it may 128make sense to allow [[source]] to either load a pre-compiled file or 129to optionally compile while sourcing. 130 131\subsection{Implicit compilation} 132Implicit compilation can be used to compile packages as they are 133installed or for just-in-time (JIT) compilation of functions or 134expressions. The mechanism for enabling these is experimental and 135likely to change. 136 137For now, compilation of packages requires the use of lazy loading and can be 138enabled either by calling [[compilePKGS]] with argument [[TRUE]] or by 139starting R with the environment variable [[_R_COMPILE_PKGS_]] set to a 140positive integer value. These settings are used internally during R build 141to compile the base package (and tools, utils, methods, etc) and by 142[[R CMD INSTALL]]. 143Functions are compiled as they are written to the lazy loading database. 144Compilation of packages should only be enabled for that time, because it 145adds noticeable time and space overhead to any serialization. 146 147In a UNIX-like environment, for example, installing a package with 148\begin{verbatim} 149env R_COMPILE_PKGS=1 R CMD INSTALL foo.tar.gz 150\end{verbatim} 151will internally enable package compilation using [[compilePKGS]]. 152 153If R is installed from source then the base and required packages can 154be compiled on installation using 155\begin{verbatim} 156make bytecode 157\end{verbatim} 158This does not require setting the [[_R_COMPILE_PKGS_]] environment variable. 159 160JIT compilation can be enabled from within R by calling [[enableJIT]] 161with a non-negative integer argument or by starting R with the 162environment variable [[R_ENABLE_JIT]] set to a non-negative integer. 163The possible values of the argument to [[enableJIT]] and their 164meanings are 165\begin{itemize} 166\item[0] turn off JIT 167\item[1] compile closures before they are called the first time 168\item[2] same as 1, plus compile closures before duplicating (useful 169 for packages that store closures in lists, like lattice) 170\item[3] same as 2, plus compile all [[for()]], [[while()]], and 171 [[repeat()]] loops before executing. 172\end{itemize} 173R may initially be somewhat sluggish if JIT is enabled and base and 174recommended packages have not been pre-compiled as almost everything 175will initially need some compilation. 176 177 178\section{The basic compiler} 179This section presents the basic compiler for compiling R expressions 180to byte code objects. 181 182 183\subsection{The compiler top level} 184R expressions consist of function calls, variable references, and 185literal constants. To create a byte code object representing an R 186expression the compiler has to walk the expression tree and emit code 187for the different node types in encounters. The code emitted may 188depend on the environment in which the expression will be evaluated as 189well as various compiler option settings. 190 191The simplest function in the top level compiler interface is the 192function [[compile]]. This function requires an expression argument 193and takes three optional arguments: an environment, a list of 194options and source code reference. The default environment is the global 195environment. By default, the source reference argument is [[NULL]] and the 196source reference is taken from the [[srcref]] attribute of the expression 197argument. 198<<[[compile]] function>>= 199compile <- function(e, env = .GlobalEnv, options = NULL, srcref = NULL) { 200 cenv <- makeCenv(env) 201 cntxt <- make.toplevelContext(cenv, options) 202 cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt)) 203 if (mayCallBrowser(e, cntxt)) 204 ## NOTE: compilation will be attempted repeatedly 205 e 206 else if (is.null(srcref)) 207 genCode(e, cntxt) 208 else 209 genCode(e, cntxt, loc = list(expr = e, srcref = srcref)) 210} 211@ %def compile 212The supplied environment is converted into a compilation environment 213data structure. This compilation environment and any options 214provided are then used to construct a compiler context. The function 215[[genCode]] is then used to generate a byte code object for the 216expression and the constructed compilation context. 217 218Compilation environments are described in Section 219\ref{sec:environments} and compiler contexts in Section 220\ref{sec:contexts}. The [[genCode]] function is defined as 221<<[[genCode]] function>>= 222genCode <- function(e, cntxt, gen = NULL, loc = NULL) { 223 cb <- make.codeBuf(e, loc) 224 if (is.null(gen)) 225 cmp(e, cb, cntxt, setloc = FALSE) 226 else 227 gen(cb, cntxt) 228 codeBufCode(cb, cntxt) 229} 230@ %def genCode 231[[genCode]] creates a code buffer, fills the code buffer, and then 232calls [[codeBufCode]] to extract and return the byte code object. In 233the most common case [[genCode]] uses the low level recursive 234compilation function [[cmp]], described in Section \ref{subsec:cmp}, 235to generate the code. For added flexibility it can be given a 236generator function that emits code into the code buffer based on the 237provided context. This is used in Section \ref{sec:loops} for 238compilation of loop bodies in loops that require an explicit loop context 239(and a long jump in the byte-code interpreter). 240 241 242\subsection{Basic code buffer interface} 243Code buffers are used to accumulate the compiled code and related 244constant values. A code buffer [[cb]] is a list containing a number 245of closures used to manipulate the content of the code buffer. In 246this section two closures are used, [[putconst]] and [[putcode]]. 247 248The closure [[cb$putconst]] is used to enter constants into the 249constant pool. It takes a single argument, an arbitrary R object to 250be entered into the constant pool, and returns an integer index into 251the pool. The [[cb$putcode]] closure takes an instruction opcode and 252any operands the opcode requires and emits them into the code buffer. 253The operands are typically constant pool indices or labels, to be 254introduced in Section \ref{sec:codebuf}. 255 256As an example, the [[GETVAR]] instruction takes one operand, the index 257in the constant pool of a symbol. The opcode for this instruction is 258[[GETVAR.OP]]. The instruction retrieves the symbol from the constant 259pool, looks up its value in the current environment, and pushes the 260value on the stack. If [[sym]] is a variable with value of a symbol, 261then code to enter the symbol in the constant pool and emit an 262instruction to get its value would be 263<<example of emitting a [[GETVAR]] instruction>>= 264ci <- cb$putconst(sym) 265cb$putcode(GETVAR.OP, ci) 266@ %def 267 268The complete code buffer implementation is given in Section 269\ref{sec:codebuf}. 270 271 272\subsection{The recursive code generator} 273\label{subsec:cmp} 274The function [[cmp]] is the basic code generation function. It 275recursively traverses the expression tree and emits code as it visits 276each node in the tree. 277 278Before generating code for an expression the function [[cmp]] attempts 279to determine the value of the expression by constant folding using the 280function [[constantFold]]. If constant folding is successful then 281[[contantFold]] returns a named list containing a [[value]] element. 282Otherwise it returns [[NULL]]. If constant folding is successful, 283then the result is compiled as a constant. Otherwise, the standard 284code generation process is used. 285%% **** comment on alternative of doing constant folding as an 286%% **** optimization on the bytecode or an intermediate representation? 287 288In the interpreter there are four types of objects that are not 289treated as constants, i.e. as evaluating to themselves: function calls 290of type [["language"]], variable references of type [["symbol"]], 291promises, and byte code objects. Neither promises nor byte code 292objects should appear as literals in code so an error is signaled for 293those. The language, symbol, and constant cases are each handled by 294their own code generators. 295%% **** promises do appear in the expressions generated by the 296%% **** interpreter for evaluating complex assignment expressions 297<<generate code for expression [[e]]>>= 298if (typeof(e) == "language") 299 cmpCall(e, cb, cntxt) 300else if (typeof(e) == "symbol") 301 cmpSym(e, cb, cntxt, missingOK) 302else if (typeof(e) == "bytecode") 303 cntxt$stop(gettext("cannot compile byte code literals in code"), 304 cntxt, loc = cb$savecurloc()) 305else if (typeof(e) == "promise") 306 cntxt$stop(gettext("cannot compile promise literals in code"), 307 cntxt, loc = cb$savecurloc()) 308else 309 cmpConst(e, cb, cntxt) 310@ 311The function [[cmp]] is then defined as 312<<[[cmp]] function>>= 313cmp <- function(e, cb, cntxt, missingOK = FALSE, setloc = TRUE) { 314 if (setloc) { 315 sloc <- cb$savecurloc() 316 cb$setcurexpr(e) 317 } 318 ce <- constantFold(e, cntxt, loc = cb$savecurloc()) 319 if (is.null(ce)) { 320 <<generate code for expression [[e]]>> 321 } 322 else 323 cmpConst(ce$value, cb, cntxt) 324 if (setloc) 325 cb$restorecurloc(sloc) 326} 327@ %def cmp 328The call code generator [[cmpCall]] will recursively call [[cmp]]. 329%% **** should promises/byte code produce compiler errors or runtime errors?? 330 331 332\subsection{Compiling constant expressions} 333The constant code generator [[cmpConst]] is the simplest of the three 334generators. A simplified generator can be defined as 335<<simplified [[cmpConst]] function>>= 336cmpConst <- function(val, cb, cntxt) { 337 ci <- cb$putconst(val) 338 cb$putcode(LDCONST.OP, ci) 339 if (cntxt$tailcall) cb$putcode(RETURN.OP) 340} 341@ %def cmpConst 342This function enters the constant in the constant pool using the 343closure [[cb$putconst]]. The value returned by this closure is an 344index for the constant in the constant pool. Then the code generator 345emits an instruction to load the constant at the specified constant 346pool index and push it onto the stack. If the expression appears in 347tail position then a [[RETURN]] instruction is emitted as well. 348%% **** explain tail position here?? 349 350Certain constant values, such as [[TRUE]], [[FALSE]], and [[NULL]] 351appear very often in code. It may be useful to provide and use special 352instructions for loading these. The resulting code will have slightly 353smaller constant pools and may be a little faster, though the 354difference is likely to be small. A revised definition of 355[[cmpConst]] that makes use of instructions for loading these 356particular values is given by 357<<[[cmpConst]] function>>= 358cmpConst <- function(val, cb, cntxt) { 359 if (identical(val, NULL)) 360 cb$putcode(LDNULL.OP) 361 else if (identical(val, TRUE)) 362 cb$putcode(LDTRUE.OP) 363 else if (identical(val, FALSE)) 364 cb$putcode(LDFALSE.OP) 365 else { 366 ci <- cb$putconst(val) 367 cb$putcode(LDCONST.OP, ci) 368 } 369 if (cntxt$tailcall) cb$putcode(RETURN.OP) 370} 371@ %def cmpConst 372It might be useful to handle other constants in a similar way, such as 373[[NA]] or small integer values; this may be done in the future. 374%% **** check out if small integers is worth doing. 375%% **** mention peephole optimization as alternative 376 377The implementation marks values in the constant pool as read-only 378after they are loaded. In the past, all values were duplicated as they 379were retrieved from the constant pool as a precaution against bad 380package code: several packages in the wild assumed that an expression 381[[TRUE]], for example, appearing in code would result in a freshly 382allocated value that could be freely modified in [[.C]] calls. 383 384 385\subsection{Compiling variable references} 386The function [[cmpSym]] handles compilation of variable 387references. For standard variables this involves entering the symbol 388in the constant pool, emitting code to look up the value of the 389variable at the specified constant pool location in the current 390environment, and, if necessary, emitting a [[RETURN]] instruction. 391 392In addition to standard variables there is the ellipsis variable 393[[...]] and the accessors [[..1]], [[..2]], and so on that need to be 394considered. The ellipsis variable can only appear as an argument in 395function calls, so [[cmp]], like the interpreter [[eval]] itself, 396should not encounter it. The interpreter signals an error if it does 397encounter a [[...]] variable, and the compiler emits code that does 398the same at runtime. The compiler also emits a warning at compile 399time. Variables representing formal parameters may not have values 400provided in their calls, i.e. may have missing values. In some cases 401this should signal an error; in others the missing value can be passed 402on (for example in expressions of the form [[x[]]]). To support this, 403[[cmpSym]] takes an optional argument for allowing missing argument 404values. 405<<[[cmpSym]] function>>= 406cmpSym <- function(sym, cb, cntxt, missingOK = FALSE) { 407 if (sym == "...") { 408 notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc()) 409 cb$putcode(DOTSERR.OP) 410 } 411 else if (is.ddsym(sym)) { 412 <<emit code for [[..n]] variable references>> 413 } 414 else { 415 <<emit code for standard variable references>> 416 } 417} 418@ %def cmpSym 419 420References to [[..n]] variables are also only appropriate when a 421[[...]] variable is available, so a warning is given if that is not 422the case. The virtual machine provides instructions [[DDVAL]] and 423[[DDVAL_MISSOK]] for the case where missing arguments are not allowed 424and for the case where they are, and the appropriate instruction is 425used based on the [[missingOK]] argument to [[cmpSym]]. 426<<emit code for [[..n]] variable references>>= 427if (! findLocVar("...", cntxt)) 428 notifyWrongDotsUse(sym, cntxt, loc = cb$savecurloc()) 429ci <- cb$putconst(sym) 430if (missingOK) 431 cb$putcode(DDVAL_MISSOK.OP, ci) 432else 433 cb$putcode(DDVAL.OP, ci) 434if (cntxt$tailcall) cb$putcode(RETURN.OP) 435@ %def 436 437There are also two instructions available for obtaining the value of a 438general variable from the current environment, one that allows missing 439values and one that does not. 440<<emit code for standard variable references>>= 441if (! findVar(sym, cntxt)) 442 notifyUndefVar(sym, cntxt, loc = cb$savecurloc()) 443ci <- cb$putconst(sym) 444if (missingOK) 445 cb$putcode(GETVAR_MISSOK.OP, ci) 446else 447 cb$putcode(GETVAR.OP, ci) 448if (cntxt$tailcall) cb$putcode(RETURN.OP) 449@ %def 450 451For now, these instructions only take an index in the constant pool 452for the symbol as operands, not any information about where the 453variable can be found within the environment. This approach to 454obtaining the value of variables requires a search of the current 455environment for every variable reference. In a less dynamic language 456it would be possible to compute locations of variable bindings within 457an environment at compile time and to choose environment 458representations that allow constant time access to any variable's 459value. Since bindings in R can be added or removed at runtime this 460would require a semantic change that would need some form of 461declaration to make legitimate. Another approach that may be worth 462exploring is some sort of caching mechanism in which the location of 463each variable is stored when it is first found by a full search, and 464that cached location is used until an event occurs that forces 465flushing of the cache. If such events are rare, as they typically are, 466then this may be effective. 467%% **** need to look into caching strategies 468 469%% **** looks like a simple cache of the local frame speeds up sum and 470%% **** Neal's em by about 10% (just lookup, not assignment -- with 471%% **** assignment should be a bit better) 472 473%% **** Is it really necessary for bcEval to save/restore stack tops? 474%% **** Shouldn't that happen automatically? 475%% **** Is it possible to have closure calling stay in the same bc? 476%% **** maybe at least for promises? 477 478 479\subsection{Compiling function calls} 480Conceptually, the R function calling mechanism uses lazy evaluation of 481arguments. Thus calling a function involves three steps: 482\begin{itemize} 483\item finding the function to call 484\item packaging up the argument expressions into deferred evaluation 485 objects, or promises 486\item executing the call 487\end{itemize} 488Code for this process is generated by the function [[cmpCall]]. A 489simplified version is defined as 490<<simplified [[cmpCall]] function>>= 491cmpCall <- function(call, cb, cntxt) { 492 cntxt <- make.callContext(cntxt, call) 493 fun <- call[[1]] 494 args <- call[-1] 495 if (typeof(fun) == "symbol") 496 cmpCallSymFun(fun, args, call, cb, cntxt) 497 else 498 cmpCallExprFun(fun, args, call, cb, cntxt) 499} 500@ %def cmpCall 501 502Call expressions in which the function is represented by a symbol are 503compiled by [[cmpCallSymFun]]. This function emits a [[GETFUN]] 504instruction and then compiles the arguments. 505<<[[cmpCallSymFun]] function>>= 506maybeNSESymbols <- c("bquote") 507cmpCallSymFun <- function(fun, args, call, cb, cntxt) { 508 ci <- cb$putconst(fun) 509 cb$putcode(GETFUN.OP, ci) 510 nse <- as.character(fun) %in% maybeNSESymbols 511 <<compile arguments and emit [[CALL]] instruction>> 512} 513@ %def cmpCallSymFun 514The [[GETFUN]] instruction takes a constant pool index of the symbol 515as an operand, looks for a function binding to the symbol in the 516current environment, places it on the stack, and prepares the stack 517for handling function call arguments. 518%% **** need a fun cache and a var cache??? 519 520Argument compilation is carried out by the function [[cmpCallArgs]], 521presented in Section \ref{subsec:callargs}, and is followed by emitting code 522to execute the call and, if necessary, return a result. Calls to functions 523listed in [[maybeNSESymbols]] get their arguments uncompiled. Currently 524this is only the case of [[bquote]], which does not evaluate its argument 525[[expr]] normally, but modifies the expression first (non-standard 526evaluation). Compiling such argument could result in warnings, because the 527argument may not be a valid R expression (e.g. when it contains [[.()]] 528subexpressions in complex assignments), and the generated code would be 529irrelevant (yet not used). Not compiling an argument that will in fact be 530evaluated normally is safe, hence the code is not differentiating between 531individual function arguments nor is it checking whether [[bquote]] is the 532one from the [[base]] package. 533 534<<compile arguments and emit [[CALL]] instruction>>= 535cmpCallArgs(args, cb, cntxt, nse) 536ci <- cb$putconst(call) 537cb$putcode(CALL.OP, ci) 538if (cntxt$tailcall) cb$putcode(RETURN.OP) 539@ %def 540The call expression itself is stored in the constant pool and is 541available to the [[CALL]] instruction. 542 543Calls in which the function is represented by an expression other than 544a symbol are handled by [[cmpCallExprFun]]. This emits code to 545evaluate the expression, leaving the value in the stack, and then 546emits a [[CHECKFUN]] instruction. This instruction checks that the 547value on top of the stack is a function and prepares the stack for 548receiving call arguments. Generation of argument code and the 549[[CALL]] instruction are handled as for symbol function calls. 550<<[[cmpCallExprFun]] function>>= 551cmpCallExprFun <- function(fun, args, call, cb, cntxt) { 552 ncntxt <- make.nonTailCallContext(cntxt) 553 cmp(fun, cb, ncntxt) 554 cb$putcode(CHECKFUN.OP) 555 nse <- FALSE 556 <<compile arguments and emit [[CALL]] instruction>> 557} 558@ %def cmpCallExprFun 559 560The actual definition of [[cmpCall]] is a bit more complex than the 561simplified one given above: 562<<[[cmpCall]] function>>= 563cmpCall <- function(call, cb, cntxt, inlineOK = TRUE) { 564 sloc <- cb$savecurloc() 565 cb$setcurexpr(call) 566 cntxt <- make.callContext(cntxt, call) 567 fun <- call[[1]] 568 args <- call[-1] 569 if (typeof(fun) == "symbol") { 570 if (! (inlineOK && tryInline(call, cb, cntxt))) { 571 <<check the call to a symbol function>> 572 cmpCallSymFun(fun, args, call, cb, cntxt) 573 } 574 } 575 else { 576 <<hack for handling [[break()]] and [[next()]] expressions>> 577 cmpCallExprFun(fun, args, call, cb, cntxt) 578 } 579 cb$restorecurloc(sloc) 580} 581@ %def cmpCall 582The main addition is the use of a [[tryInline]] function which tries 583to generate more efficient code for particular functions. The 584[[inlineOK]] argument can be used to disable inlining. This function 585returns [[TRUE]] if it has handled code generation and [[FALSE]] if it 586has not. Code will be generated by the inline mechanism if inline 587handlers for the particular function are available and the 588optimization level permits their use. Details of the inlining 589mechanism are given in Section \ref{sec:inlining}. 590 591In addition to the inlining mechanism, some checking of the call is 592carried out for symbol calls. The checking code is 593<<check the call to a symbol function>>= 594if (findLocVar(fun, cntxt)) 595 notifyLocalFun(fun, cntxt, loc = cb$savecurloc()) 596else { 597 def <- findFunDef(fun, cntxt) 598 if (is.null(def)) 599 notifyUndefFun(fun, cntxt, loc = cb$savecurloc()) 600 else 601 checkCall(def, call, 602 function(w) notifyBadCall(w, cntxt, loc = cb$savecurloc())) 603} 604@ 605and [[checkCall]] is defined as 606<<[[checkCall]] function>>= 607## **** figure out how to handle multi-line deparses 608## **** e.g. checkCall(`{`, quote({})) 609## **** better design would capture error object, wrap it up, and pass it on 610## **** use approach from codetools to capture partial argument match 611## **** warnings if enabled? 612checkCall <- function(def, call, signal = warning) { 613 if (typeof(def) %in% c("builtin", "special")) 614 def <- args(def) 615 if (typeof(def) != "closure" || any.dots(call)) 616 NA 617 else { 618 msg <- tryCatch({match.call(def, call); NULL}, 619 error = function(e) conditionMessage(e)) 620 if (! is.null(msg)) { 621 emsg <- gettextf("possible error in '%s': %s", 622 deparse(call, 20)[1], msg) 623 if (! is.null(signal)) signal(emsg) 624 FALSE 625 } 626 else TRUE 627 } 628} 629@ %def checkCall 630 631Finally, for calls where the function is an expression a hack is 632currently needed for dealing with the way the parser currently parses 633expressions of the form [[break()]] and [[next()]]. To be able to 634compile as many [[break]] and [[next]] calls as possible as simple 635[[GOTO]] instructions these need to be handled specially to avoid 636placing things on the stack. A better solution would probably be to 637modify the parser to make expressions of the form [[break()]] be 638syntax errors. 639<<hack for handling [[break()]] and [[next()]] expressions>>= 640## **** this hack is needed for now because of the way the 641## **** parser handles break() and next() calls 642if (typeof(fun) == "language" && typeof(fun[[1]]) == "symbol" && 643 as.character(fun[[1]]) %in% c("break", "next")) 644 return(cmp(fun, cb, cntxt)) 645@ 646 647 648\subsection{Compiling call arguments} 649\label{subsec:callargs} 650Function calls can contain four kinds of arguments: 651\begin{itemize} 652\item missing arguments 653\item [[...]] arguments 654\item general expressions 655\end{itemize} 656In the first and third cases the arguments can also be named. The argument 657compilation function [[cmpCallArgs]] loops over the argument lists and 658handles each of the three cases, in addition to signaling errors for 659arguments that are literal bytecode or promise objects. When [[nse]] is 660[[TRUE]] (non-standard evaluation), promises will only get uncompiled 661expressions. 662<<[[cmpCallArgs]] function>>= 663cmpCallArgs <- function(args, cb, cntxt, nse = FALSE) { 664 names <- names(args) 665 pcntxt <- make.promiseContext(cntxt) 666 for (i in seq_along(args)) { 667 a <- args[[i]] 668 n <- names[[i]] 669 <<compile missing argument>> 670 <<compile [[...]] argument>> 671 <<signal an error for promise or bytecode argument>> 672 <<compile a general argument>> 673 } 674} 675@ %def cmpCallArgs 676 677The missing argument case is handled by 678<<compile missing argument>>= 679if (missing(a)) { ## better test for missing?? 680 cb$putcode(DOMISSING.OP) 681 cmpTag(n, cb) 682} 683@ %def 684Computations on the language related to missing arguments are tricky. 685The use of [[missing]] is a little odd, but for now at least it does 686work. 687 688An ellipsis argument [[...]] is handled by the [[DODOTS]] instruction: 689<<compile [[...]] argument>>= 690else if (is.symbol(a) && a == "...") { 691 if (! findLocVar("...", cntxt)) 692 notifyWrongDotsUse("...", cntxt, loc = cb$savecurloc()) 693 cb$putcode(DODOTS.OP) 694} 695@ %def 696A warning is issued if no [[...]] argument is visible. 697 698As in [[cmp]], errors are signaled for literal bytecode or promise 699values as arguments. 700<<signal an error for promise or bytecode argument>>= 701else if (typeof(a) == "bytecode") 702 cntxt$stop(gettext("cannot compile byte code literals in code"), 703 cntxt, loc = cb$savecurloc()) 704else if (typeof(a) == "promise") 705 cntxt$stop(gettext("cannot compile promise literals in code"), 706 cntxt, loc = cb$savecurloc()) 707@ %def 708 709A general non-constant argument expression is compiled to a separate 710byte code object which is stored in the constant pool. The compiler 711then emits a [[MAKEPROM]] instruction that uses the stored code 712object. Promises are not needed for literal constant arguments as 713these are self-evaluating. Within the current implementation both the 714evaluation process and use of [[substitute]] will work properly if 715constants are placed directly in the argument list rather than being 716wrapped in promises. This could also be done in the interpreter, 717though the benefit is less clear as a runtime determination of whether 718an argument is a constant would be needed. This may still be cheap 719enough compared to the cost of allocating a promise to be worth doing. 720Constant folding in [[cmp]] may also produce more constants, but 721promises are needed in this case in order for [[substitute]] to work 722properly. These promises could be created as evaluated promises, 723though it is not clean how much this would gain. 724<<compile a general argument>>= 725else { 726 if (is.symbol(a) || typeof(a) == "language") { 727 if (nse) 728 ci <- cb$putconst(a) 729 else 730 ci <- cb$putconst(genCode(a, pcntxt, loc = cb$savecurloc())) 731 cb$putcode(MAKEPROM.OP, ci) 732 } 733 else 734 cmpConstArg(a, cb, cntxt) 735 cmpTag(n, cb) 736} 737@ %def 738%% **** look into using evaluated promises for constant folded arguments 739%% **** then we would use a variant of this: 740% else { 741% ca <- constantFold(a, cntxt) 742% if (is.null(ca)) { 743% if (is.symbol(a) || typeof(a) == "language") { 744% ci <- cb$putconst(genCode(a, pcntxt)) 745% cb$putcode(MAKEPROM.OP, ci) 746% } 747% else 748% cmpConstArg(a, cb, cntxt) 749% } 750% else 751% cmpConstArg(ca$value, cb, cntxt) 752% cmpTag(n, cb) 753% } 754 755For calls to closures the [[MAKEPROM]] instruction retrieves the code 756object, creates a promise from the code object and the current 757environment, and pushes the promise on the argument stack. For calls 758to functions of type [[BULTIN]] the [[MAKEPROM]] instruction actually 759executes the code object in the current environment and pushes the 760resulting value on the stack. For calls to functions of type 761[[SPECIAL]] the [[MAKEPROM]] instruction does nothing as these calls use 762only the call expression. 763 764Constant arguments are compiled by [[cmpConstArg]]. Again there are 765special instructions for the common special constants [[NULL]], 766[[TRUE]], and [[FALSE]]. 767<<[[cmpConstArg]]>>= 768cmpConstArg <- function(a, cb, cntxt) { 769 if (identical(a, NULL)) 770 cb$putcode(PUSHNULLARG.OP) 771 else if (identical(a, TRUE)) 772 cb$putcode(PUSHTRUEARG.OP) 773 else if (identical(a, FALSE)) 774 cb$putcode(PUSHFALSEARG.OP) 775 else { 776 ci <- cb$putconst(a) 777 cb$putcode(PUSHCONSTARG.OP, ci) 778 } 779} 780@ %def cmpConstArg 781 782Code to install names for named arguments is generated by [[cmpTag]]: 783<<[[cmpTag]] function>>= 784cmpTag <- function(n, cb) { 785 if (! is.null(n) && n != "") { 786 ci <- cb$putconst(as.name(n)) 787 cb$putcode(SETTAG.OP, ci) 788 } 789} 790@ %def cmpTag 791 792The current implementation allocates a linked list of call arguments, 793stores tags in the list cells, and allocates promises. Alternative 794implementations that avoid some or all allocation are worth exploring. 795Also worth exploring is having an instruction specifically for calls that 796do not require matching of named arguments to formal arguments, since 797cases that use only order of arguments, not names, are quite common 798and are known at compile time. In the case of calls to functions with 799definitions known at compile time matching of named arguments to 800formal ones could also be done at compile time. 801 802 803\subsection{Discussion} 804The framework presented in this section, together with some support 805functions, is actually able to compile any legal R code. But this is 806somewhat deceptive. The R implementation, and the [[CALL]] opcode, 807support three kinds of functions: closures (i.e. R-level functions), 808primitive functions of type [[BUILTIN]], and primitive functions of 809type [[SPECIAL]]. Primitives of type [[BUILTIN]] always evaluate 810their arguments in order, so creating promises is not necessary and in 811fact the [[MAKEPROM]] instruction does not do so --- if the function 812to be called is a [[BUILTIN]] then [[MAKEPROM]] runs the code for 813computing the argument in the current environment and pushes the value 814on the stack. On the other hand, primitive functions of type 815[[SPECIAL]] use the call expression and evaluate bits of it as 816needed. As a result, they will be running interpreted code. Since 817core functions like the sequencing function [[{]] and the conditional 818 evaluation function [[if]] are of type [[SPECIAL]] this means most 819 non-trivial code will be run by the standard interpreter. This will 820 be addressed by defining inlining rules that allow functions like 821 [[{]] and [[if]] to be compiled properly. 822 823 824\section{The code buffer} 825\label{sec:codebuf} 826The code buffer is a collection of closures that accumulate code and 827constants in variables in their defining environment. For a code 828buffer [[cb]] the closures [[cb$putcode]] and [[cb$putconst]] write an 829instruction sequence and a constant, respectively, into the code 830buffer. The closures [[cb$code]] and [[cb$consts]] extract the code 831vector and the constant pool. 832 833The function [[make.codeBuf]] creates a set of closures for managing 834the instruction stream buffer and the constant pool buffer and returns 835a list of these closures for use by the compilation functions. In 836addition, the expression to be compiled into the code buffer is stored 837as the first constant in the constant pool; this can be used to 838retrieve the source code for a compiled expression. 839<<[[make.codeBuf]] function>>= 840make.codeBuf <- function(expr, loc = NULL) { 841 <<source location tracking implementation>> 842 <<instruction stream buffer implementation>> 843 <<constant pool buffer implementation>> 844 <<label management interface>> 845 cb <- list(code = getcode, 846 const = getconst, 847 putcode = putcode, 848 putconst = putconst, 849 makelabel = makelabel, 850 putlabel = putlabel, 851 patchlabels = patchlabels, 852 setcurexpr = setcurexpr, 853 setcurloc = setcurloc, 854 commitlocs = commitlocs, 855 savecurloc = savecurloc, 856 restorecurloc = restorecurloc) 857 cb$putconst(expr) ## insert expression as first constant. 858 ## NOTE: this will also insert the srcref directly into the constant 859 ## pool 860 cb 861} 862@ %def make.codeBuf 863 864The instruction stream buffer uses a list structure and a count of 865elements in use, and doubles the size of the list to make room for new 866code when necessary. By convention the first entry is a byte code 867version number; if the interpreter sees a byte code version number it 868cannot handle then it falls back to interpreting the uncompiled 869expression. The doubling strategy is needed to avoid quadratic 870compilation times for large instruction streams. 871<<instruction stream buffer implementation>>= 872codeBuf <- list(.Internal(bcVersion())) 873codeCount <- 1 874putcode <- function(...) { 875 new <- list(...) 876 newLen <- length(new) 877 while (codeCount + newLen > length(codeBuf)) { 878 codeBuf <<- c(codeBuf, vector("list", length(codeBuf))) 879 if (exprTrackingOn) 880 exprBuf <<- c(exprBuf, vector("integer", length(exprBuf))) 881 if (srcrefTrackingOn) 882 srcrefBuf <<- c(srcrefBuf, vector("integer", length(srcrefBuf))) 883 } 884 codeRange <- (codeCount + 1) : (codeCount + newLen) 885 codeBuf[codeRange] <<- new 886 887 if (exprTrackingOn) { ## put current expression into the constant pool 888 ei <- putconst(curExpr) 889 exprBuf[codeRange] <<- ei 890 } 891 if (srcrefTrackingOn) { ## put current srcref into the constant pool 892 si <- putconst(curSrcref) 893 srcrefBuf[codeRange] <<- si 894 } 895 896 codeCount <<- codeCount + newLen 897} 898getcode <- function() as.integer(codeBuf[1 : codeCount]) 899@ %def 900 901The constant pool is accumulated into a list buffer. The zero-based 902index of the constant in the pool is returned by the insertion 903function. Values are only entered once; if a value is already in the 904pool, as determined by [[identical]], its existing index is returned. 905Again a size-doubling strategy is used for the buffer. [[.Internal]] 906functions are used both for performance reasons and to prevent 907duplication of the constants. 908<<constant pool buffer implementation>>= 909constBuf <- vector("list", 1) 910constCount <- 0 911putconst <- function(x) { 912 if (constCount == length(constBuf)) 913 constBuf <<- .Internal(growconst(constBuf)) 914 i <- .Internal(putconst(constBuf, constCount, x)) 915 if (i == constCount) 916 constCount <<- constCount + 1 917 i 918} 919getconst <- function() 920 .Internal(getconst(constBuf, constCount)) 921@ %def 922 923The compiler maintains a mapping from code to source locations. For each 924value in the code buffer (instruction and operand) there is a source code 925reference ([[srcref]]) and the corresponding expression (AST). The code 926buffer implementation remembers the current location (source reference and 927expression), which can be set by [[setcurloc]], [[setcurexpr]] or 928[[restorecurloc]] and retrieved by [[savecurloc]]. In addition to emitting 929code, [[putconst]] also copies the current location information into the 930constant pool and records the resulting constant indices in a source 931reference buffer and expression buffer. When the final code is extracted 932using [[codeBufCode]], the source reference and expression buffers are 933copied into the constant pool as vectors indexed by code offset (program 934counter). 935<<source location tracking functions>>= 936extractSrcref <- function(sref, idx) { 937 if (is.list(sref) && length(sref) >= idx) 938 sref[[idx]] 939 else if (is.integer(sref) && length(sref) >= 6) 940 sref 941 else 942 NULL 943} 944getExprSrcref <- function(expr) { 945 sattr <- attr(expr, "srcref") 946 extractSrcref(sattr, 1) 947} 948# if block is a block srcref, get its idx'th entry 949# if block is a single srcref, return this srcref 950getBlockSrcref <- function(block, idx) { 951 extractSrcref(block, idx) 952} 953addLocString <- function(msg, loc) { 954 if (is.null(loc$srcref)) 955 msg 956 else 957 paste0(msg, " at ", utils::getSrcFilename(loc$srcref), ":", 958 utils::getSrcLocation(loc$srcref, "line")) 959} 960@ %def 961 962<<source location tracking implementation>>= 963exprTrackingOn <- TRUE 964srcrefTrackingOn <- TRUE 965 966if (is.null(loc)) { 967 curExpr <- expr 968 curSrcref <- getExprSrcref(expr) 969} else { 970 curExpr <- loc$expr 971 curSrcref <- loc$srcref 972} 973 974if (is.null(curSrcref)) 975 ## when top-level srcref is null, we speculate there will be no 976 ## source references within the compiled expressions either, 977 ## disabling the tracking makes the resulting constant pool 978 ## smaller 979 srcrefTrackingOn <- FALSE 980 981exprBuf <- NA ## exprBuf will have the same length as codeBuf 982srcrefBuf <- NA ## srcrefBuf will have the same length as codeBuf 983 984if (!exprTrackingOn) { 985 curExpr <- NULL 986 exprBuf <- NULL 987} 988if (!srcrefTrackingOn) { 989 curSrcref <- NULL 990 srcrefBuf <- NULL 991} 992 993## set the current expression 994## also update the srcref according to expr, if expr has srcref attribute 995## (note: never clears current srcref) 996setcurexpr <- function(expr) { 997 if (exprTrackingOn) { 998 curExpr <<- expr 999 } 1000 if (srcrefTrackingOn) { 1001 sref <- getExprSrcref(expr) 1002 if (!is.null(sref) && srcrefTrackingOn) 1003 curSrcref <<- sref 1004 } 1005} 1006## unconditionally sets the current expression and srcrefs 1007setcurloc <- function(expr, sref) { 1008 if (exprTrackingOn) 1009 curExpr <<- expr 1010 if (srcrefTrackingOn) 1011 curSrcref <<- sref 1012} 1013## add location information (current expressions, srcrefs) to the constant pool 1014commitlocs <- function() { 1015 if (exprTrackingOn) { 1016 exprs <- exprBuf[1:codeCount] 1017 class(exprs) <- "expressionsIndex" 1018 putconst(exprs) 1019 } 1020 1021 if (srcrefTrackingOn) { 1022 srefs <- srcrefBuf[1:codeCount] 1023 class(srefs) <- "srcrefsIndex" 1024 putconst(srefs) 1025 } 1026 1027 ## these entries will be at the end of the constant pool, assuming only the compiler 1028 ## uses these two classes 1029 NULL 1030} 1031savecurloc <- function() { 1032 list(expr = curExpr, srcref = curSrcref) 1033} 1034restorecurloc <- function(saved) { 1035 if (exprTrackingOn) curExpr <<- saved$expr 1036 if (srcrefTrackingOn) curSrcref <<- saved$srcref 1037} 1038@ %def 1039 1040 1041Labels are used for identifying targets for branching instruction. The 1042label management interface creates new labels with [[makelabel]] as 1043character strings that are unique within the buffer. These labels can 1044then be included as operands in branching instructions. The 1045[[putlabel]] function records the current code position as the value 1046of the label. 1047<<label management interface>>= 1048idx <- 0 1049labels <- vector("list") 1050makelabel <- function() { idx <<- idx + 1; paste0("L", idx) } 1051putlabel <- function(name) labels[[name]] <<- codeCount 1052@ 1053 1054Once code generation is complete the symbolic labels in the code 1055stream need to be converted to numerical offset values. This is done 1056by [[patchlabels]]. Labels can appear directly in the instruction 1057stream and in lists that have been placed in the instruction stream; 1058this is used for the [[SWITCH]] instruction. 1059<<label management interface>>= 1060patchlabels <- function(cntxt) { 1061 offset <- function(lbl) { 1062 if (is.null(labels[[lbl]])) 1063 cntxt$stop(gettextf("no offset recorded for label \"%s\"", lbl), 1064 cntxt) 1065 labels[[lbl]] 1066 } 1067 for (i in 1 : codeCount) { 1068 v <- codeBuf[[i]] 1069 if (is.character(v)) 1070 codeBuf[[i]] <<- offset(v) 1071 else if (typeof(v) == "list") { 1072 off <- as.integer(lapply(v, offset)) 1073 ci <- putconst(off) 1074 codeBuf[[i]] <<- ci 1075 } 1076 } 1077} 1078@ %def 1079 1080The contents of the code buffer is extracted into a code object by 1081calling [[codeBufCode]]: 1082<<[[codeBufCode]] function>>= 1083codeBufCode <- function(cb, cntxt) { 1084 cb$patchlabels(cntxt) 1085 cb$commitlocs() 1086 .Internal(mkCode(cb$code(), cb$const())) 1087} 1088@ %def codeBufCode 1089 1090 1091\section{Compiler contexts} 1092\label{sec:contexts} 1093The compiler context object [[cntxt]] carries along information about 1094whether the expression appears in tail position and should be followed 1095by a return or, whether the result is ignored, or whether the 1096expression is contained in a loop. The context object also contains 1097current compiler option settings as well as functions used to issue 1098warnings or signal errors. 1099 1100 1101\subsection{Top level contexts} 1102Top level compiler functions start by creating a top level context. 1103The constructor for top level contexts takes as arguments the current 1104compilation environment, described in Section \ref{sec:environments}, 1105and a list of option values used to override default option settings. 1106The [[toplevel]] field will be set to [[FALSE]] for compiling 1107expressions, such as function arguments, that do not appear at top 1108level. Top level expressions are assumed to be in tail position, so 1109the [[tailcall]] field is initialized as [[TRUE]]. The 1110[[needRETURNJMP]] specifies whether a call to the [[return]] function 1111can use the [[RETURN]] instruction or has to use a [[longjmp]] via the 1112[[RETURNJMP]] instruction. Initially using a simple [[RETURN]] is 1113safe; this is set set to [[TRUE]] when compiling promises ad certain 1114loops where [[RETURNJMP]] is needed. 1115<<[[make.toplevelContext]] function>>= 1116make.toplevelContext <- function(cenv, options = NULL) 1117 structure(list(toplevel = TRUE, 1118 tailcall = TRUE, 1119 needRETURNJMP = FALSE, 1120 env = cenv, 1121 optimize = getCompilerOption("optimize", options), 1122 suppressAll = getCompilerOption("suppressAll", options), 1123 suppressNoSuperAssignVar = 1124 getCompilerOption("suppressNoSuperAssignVar", options), 1125 suppressUndefined = getCompilerOption("suppressUndefined", 1126 options), 1127 call = NULL, 1128 stop = function(msg, cntxt, loc = NULL) 1129 stop(simpleError(addLocString(msg, loc), cntxt$call)), 1130 warn = function(x, cntxt, loc = NULL) 1131 cat(paste("Note:", addLocString(x, loc), "\n")) 1132 ), 1133 class = "compiler_context") 1134@ %def make.toplevelContext 1135Errors are signaled using a version of [[stop]] that uses the current 1136call in the compilation context. The default would be to use the call 1137in the compiler code where the error was raised, and that would not be 1138meaningful to the end user. Ideally [[warn]] should do something 1139similar and also use the condition system, but for now it just prints 1140a simple message to standard output. 1141%% **** look into adding source info to errors/warnings 1142%% **** comment on class of context object 1143 1144 1145\subsection{Other compiler contexts} 1146The [[cmpCall]] function creates a new context for each call it 1147compiles. The new context is the current context with the [[call]] 1148entry replaced by the current call --- this is be useful for issuing 1149meaningful warning and error messages. 1150<<[[make.callContext]] function>>= 1151make.callContext <- function(cntxt, call) { 1152 cntxt$call <- call 1153 cntxt 1154} 1155@ %def make.callContext 1156 1157Non-tail-call contexts are used when a value is being computed for use 1158in a subsequent computation. The constructor returns a new context 1159that is the current context with the tailcall field set to [[FALSE]]. 1160<<[[make.nonTailCallContext]] function>>= 1161make.nonTailCallContext <- function(cntxt) { 1162 cntxt$tailcall <- FALSE 1163 cntxt 1164} 1165@ %def make.nonTailCallContext 1166A no value context is used in cases where the computed value will be 1167ignored. For now this is identical to a non-tail-call context, but it 1168may eventually be useful to distinguish the two situations. This is 1169used mainly for expressions other than the final one in [[{]] calls 1170 and for compiling the bodies of loops. 1171%% **** can avoid generating push/pop pairs if novalue = TRUE is used 1172%% **** might simplify tailcall/RETURN stuff?? 1173<<[[make.noValueContext]] function>>= 1174make.noValueContext <- function(cntxt) { 1175 cntxt$tailcall <- FALSE 1176 cntxt 1177} 1178@ %def make.noValueContext 1179 1180The compiler context for compiling a function is a new toplevel 1181context using the function environment and the current compiler 1182options settings. 1183%% **** copy other compiler options; maybe cntxt$options$optimize?? 1184<<[[make.functionContext]] function>>= 1185make.functionContext <- function(cntxt, forms, body) { 1186 nenv <- funEnv(forms, body, cntxt) 1187 ncntxt <- make.toplevelContext(nenv) 1188 ncntxt$optimize <- cntxt$optimize 1189 ncntxt$suppressAll <- cntxt$suppressAll 1190 ncntxt$suppressNoSuperAssignVar <- cntxt$suppressNoSuperAssignVar 1191 ncntxt$suppressUndefined <- cntxt$suppressUndefined 1192 ncntxt 1193} 1194@ %def make.functionContext 1195 1196The context for compiling the body of a loop is a no value context 1197with the loop information available. 1198<<[[make.loopContext]] function>>= 1199make.loopContext <- function(cntxt, loop.label, end.label) { 1200 ncntxt <- make.noValueContext(cntxt) 1201 ncntxt$loop <- list(loop = loop.label, end = end.label, gotoOK = TRUE) 1202 ncntxt 1203} 1204@ %def make.loopContext 1205 1206The initial loop context allows [[break]] and [[next]] calls to be 1207implemented as [[GOTO]] instructions. This is OK for calls that are 1208in top level position relative to the loop. Calls that occur in 1209promises or in other contexts where the stack has changed from the 1210loop top level state need stack unwinding and cannot be implemented as 1211[[GOTO]] instructions. These should should be compiled with contexts 1212that have the [[loop$gotoOK]] field set to [[FALSE]]. The promise 1213context does this for promises and the argument context for other 1214settings. The promise context also sets [[needRETURNJMP]] to [[TRUE]] 1215since a [[return]] call that is triggered by forcing a promise 1216requires a [[longjmp]] to return from the appropriate function. 1217<<[[make.argContext]] function>>= 1218make.argContext <- function(cntxt) { 1219 cntxt$toplevel <- FALSE 1220 cntxt$tailcall <- FALSE 1221 if (! is.null(cntxt$loop)) 1222 cntxt$loop$gotoOK <- FALSE 1223 cntxt 1224} 1225@ %def make.argContext 1226<<[[make.promiseContext]] function>>= 1227make.promiseContext <- function(cntxt) { 1228 cntxt$toplevel <- FALSE 1229 cntxt$tailcall <- TRUE 1230 cntxt$needRETURNJMP <- TRUE 1231 if (! is.null(cntxt$loop)) 1232 cntxt$loop$gotoOK <- FALSE 1233 cntxt 1234} 1235@ %def make.promiseContext 1236%% pull out gotoOK chunk 1237 1238 1239\subsection{Compiler options} 1240Default compiler options are maintained in an environment. For now, 1241the supported options are [[optimize]], which is initialized to level 12422, and two options for controlling compiler messages. The 1243[[suppressAll]] option, if [[TRUE]], suppresses all notifications. 1244The [[suppressNoSuperAssignVar]] option, if [[TRUE]], suppresses 1245notifications about missing binding for a super-assigned variable. 1246The [[suppressUndefined]] option can be [[TRUE]] to suppress all 1247notifications about undefined variables and functions, or it can be a 1248character vector of the names of variables for which warnings should 1249be suppressed. 1250<<compiler options data base>>= 1251compilerOptions <- new.env(hash = TRUE, parent = emptyenv()) 1252compilerOptions$optimize <- 2 1253compilerOptions$suppressAll <- TRUE 1254compilerOptions$suppressNoSuperAssignVar <- FALSE 1255compilerOptions$suppressUndefined <- 1256 c(".Generic", ".Method", ".Random.seed", ".self") 1257@ %def compilerOptions 1258 1259Options are retrieved with the [[getCompilerOption]] function. 1260<<[[getCompilerOption]] function>>= 1261getCompilerOption <- function(name, options = NULL) { 1262 if (name %in% names(options)) 1263 options[[name]] 1264 else 1265 get(name, compilerOptions) 1266} 1267@ %def getCompilerOption 1268 1269The [[suppressAll]] function determines whether a context has its 1270[[supressAll]] property set to [[TRUE]]. 1271<<[[suppressAll]] function>>= 1272suppressAll <- function(cntxt) 1273 identical(cntxt$suppressAll, TRUE) 1274@ %def suppressAll 1275The [[suppressNoSuperAssignVar]] function determines whether a context has 1276its [[suppressNoSuperAssignVar]] property set to [[TRUE]]. 1277<<[[suppressNoSuperAssignVar]] function>>= 1278suppressNoSuperAssignVar <- function(cntxt) 1279 isTRUE(cntxt$suppressNoSuperAssignVar) 1280@ %def suppressNoSuperAssignVar 1281The [[suppressUndef]] function determines whether undefined variable 1282or function definition notifications for a particular variable should 1283be suppressed in a particular compiler context. 1284<<[[suppressUndef]] function>>= 1285suppressUndef <- function(name, cntxt) { 1286 if (identical(cntxt$suppressAll, TRUE)) 1287 TRUE 1288 else { 1289 suppress <- cntxt$suppressUndefined 1290 if (is.null(suppress)) 1291 FALSE 1292 else if (identical(suppress, TRUE)) 1293 TRUE 1294 else if (is.character(suppress) && as.character(name) %in% suppress) 1295 TRUE 1296 else FALSE 1297 } 1298} 1299@ %def suppressUndef 1300 1301At some point we will need mechanisms for setting default options from 1302the interpreter and in package meta-data. A declaration mechanism for 1303adjusting option settings locally will also be needed. 1304 1305 1306\subsection{Compiler notifications} 1307Compiler notifications are currently sent by calling the context's 1308[[warn]] function, which in turn prints a message to standard output. 1309It would be better to use an approach based on the condition system, 1310and this will be done eventually. The use of separate notification 1311functions for each type of issue signaled is a step in this direction. 1312 1313Undefined function and undefined variable notifications are issued by 1314[[notifyUndefFun]] and [[notifyUndefVar]]. These both use 1315[[suppressUndef]] to determine whether the notification should be 1316suppressed in the current context. 1317<<[[notifyUndefFun]] function>>= 1318notifyUndefFun <- function(fun, cntxt, loc = NULL) { 1319 if (! suppressUndef(fun, cntxt)) { 1320 msg <- gettextf("no visible global function definition for '%s'", 1321 as.character(fun)) 1322 cntxt$warn(msg, cntxt, loc) 1323 } 1324} 1325@ %def notifyUndefFun 1326<<[[notifyUndefVar]] function>>= 1327notifyUndefVar <- function(var, cntxt, loc = NULL) { 1328 if (! suppressUndef(var, cntxt)) { 1329 msg <- gettextf("no visible binding for global variable '%s'", 1330 as.character(var)) 1331 cntxt$warn(msg, cntxt, loc) 1332 } 1333} 1334@ %def notifyUndefVar 1335 1336Codetools currently optionally notifies about use of local 1337functions. This is of course not an error but may sometimes be the 1338result of a mis-spelling. For now the compiler does not notify about 1339these, but this could be changed by redefining [[notifyLocalFun]] . 1340<<[[notifyLocalFun]] function>>= 1341notifyLocalFun <- function(fun, cntxt, loc = NULL) { 1342 if (! suppressAll(cntxt)) 1343 NULL 1344} 1345@ %def notifyLocalFun 1346 1347Warnings about possible improper use of [[...]] and [[..n]] variables 1348are sent by [[notifyWrongDotsUse]]. 1349<<[[notifyWrongDotsUse]] function>>= 1350notifyWrongDotsUse <- function(var, cntxt, loc = NULL) { 1351 if (! suppressAll(cntxt)) { 1352 msg <- paste(var, "may be used in an incorrect context") 1353 cntxt$warn(msg, cntxt, loc) 1354 } 1355} 1356@ %def notifyWrongDotsUse 1357 1358Wrong argument count issues are signaled by [[notifyWrongArgCount]]. 1359<<[[notifyWrongArgCount]] function>>= 1360notifyWrongArgCount <- function(fun, cntxt, loc = NULL) { 1361 if (! suppressAll(cntxt)) { 1362 msg <- gettextf("wrong number of arguments to '%s'", 1363 as.character(fun)) 1364 cntxt$warn(msg, cntxt, loc) 1365 } 1366} 1367@ %def notifyWrongArgCount 1368Other issues with calls that do not match their definitions are 1369signaled by [[notifyBadCall]]. Ideally these should be broken down 1370more finely, but that would require some rewriting of the error 1371signaling in [[match.call]]. 1372<<[[notifyBadCall]] function>>= 1373notifyBadCall <- function(w, cntxt, loc = NULL) { 1374 if (! suppressAll(cntxt)) 1375 cntxt$warn(w, cntxt, loc) 1376} 1377@ %def notifyBadCall 1378 1379[[break]] or [[next]] calls that occur in a context where no loop is 1380visible will most likely result in runtime errors, and 1381[[notifyWrongBreakNext]] is used to signal such cases. 1382<<[[notifyWrongBreakNext]] function>>= 1383notifyWrongBreakNext <- function(fun, cntxt, loc = NULL) { 1384 if (! suppressAll(cntxt)) { 1385 msg <- paste(fun, "used in wrong context: no loop is visible") 1386 cntxt$warn(msg, cntxt, loc) 1387 } 1388} 1389@ %def notifyWrongBreakNext 1390 1391Several issues can arise in assignments. For super-assignments a 1392target variable should be defined; otherwise there will be a runtime 1393warning. 1394<<[[notifyNoSuperAssignVar]] function>>= 1395notifyNoSuperAssignVar <- function(symbol, cntxt, loc = NULL) { 1396 if (! suppressAll(cntxt) && ! suppressNoSuperAssignVar(cntxt)) { 1397 msg <- gettextf("no visible binding for '<<-' assignment to '%s'", 1398 as.character(symbol)) 1399 cntxt$warn(msg, cntxt, loc) 1400 } 1401} 1402@ %def notifyNoSuperAssignVar 1403If the compiler detects an invalid function in a complex assignment 1404then this is signaled at compile time; a corresponding error would 1405occur at runtime. 1406%% **** should put function/call into message 1407<<[[notifyBadAssignFun]] function>>= 1408notifyBadAssignFun <- function(fun, cntxt, loc = NULL) { 1409 if (! suppressAll(cntxt)) { 1410 msg <- gettext("invalid function in complex assignment") 1411 cntxt$warn(msg, cntxt, loc) 1412 } 1413} 1414@ %def notifyBadAssignFun 1415 1416In [[switch]] calls it is an error if a character selector argument is 1417used and there are multiple default alternatives. The compiler 1418signals a possible problem with [[notifyMultipleSwitchDefaults]] if 1419there are some named cases but more than one unnamed ones. 1420<<[[notifyMultipleSwitchDefaults]] function>>= 1421notifyMultipleSwitchDefaults <- function(ndflt, cntxt, loc = NULL) 1422 if (! suppressAll(cntxt)) { 1423 msg <- gettext("more than one default provided in switch() call") 1424 cntxt$warn(msg, cntxt, loc) 1425 } 1426@ %def notifyMultipleSwitchDefaults 1427 1428<<[[notifyNoSwitchcases]] function>>= 1429notifyNoSwitchcases <- function(cntxt, loc = NULL) 1430 if (! suppressAll(cntxt)) { 1431 msg <- gettext("'switch' with no alternatives") 1432 cntxt$warn(msg, cntxt, loc) 1433 } 1434@ %def notifyNoSwitchcases 1435 1436The compiler signals when it encounters that a special syntactic function, 1437such as [[for]], has been assigned to. 1438<<[[notifyAssignSyntacticFun]] function>>= 1439notifyAssignSyntacticFun <- function(funs, cntxt, loc = NULL) { 1440 if (! suppressAll(cntxt)) { 1441 msg <- ngettext(length(funs), 1442 "local assignment to syntactic function: ", 1443 "local assignments to syntactic functions: ") 1444 cntxt$warn(paste(msg, paste(funs, collapse = ", ")), 1445 cntxt, loc) 1446 } 1447} 1448@ %def notifyAssignSyntacticFun 1449 1450When the compiler encounters an error during JIT or package 1451compilation, it catches the error and returns the original uncompiled 1452code letting the AST interpreter handle it. This can happen due to a 1453compiler bug or when the code being compiled violates certain 1454assumptions made by the compiler (such as a certain discipline on 1455frame types in the evaluation environment, as checked in 1456[[frameTypes]]). The compiler will notify about catching such errors 1457via [[notifyCompilerError]]. 1458 1459<<[[notifyCompilerError]] function>>= 1460notifyCompilerError <- function(msg) 1461 if (!compilerOptions$suppressAll) 1462 cat(paste(gettext("Error: compilation failed - "), msg, "\n")) 1463@ %def notifyCompilerError 1464 1465 1466\section{Compilation environments} 1467\label{sec:environments} 1468%% **** lambda lifting/eliminating variables not captured 1469%% **** defer SETVAR until needed; avoid GETVAR if still in register 1470%% **** Could preserve semantics by pre-test to check that fun in env 1471%% **** is inlined one. Would need to make efficient somehow, e.g 1472%% **** increment counter each time one of inlined names is assigned 1473%% **** to and only check when count has changed. 1474At this point the compiler will essentially use the interpreter to 1475evaluate an expression of the form 1476\begin{verbatim} 1477if (x > 0) log(x) else 0 1478\end{verbatim} 1479since [[if]] is a [[SPECIAL]] function. To make further improvements 1480the compiler needs to be able to implement the [[if]] expression in 1481terms of conditional and unconditional branch instructions. It might 1482then also be useful to implement [[>]] and [[log]] with special 1483virtual machine instructions. To be able to do this, the compiler 1484needs to know that [[if]], [[>]], and [[log]] refer to the standard 1485versions of these functions in the base package. While this is very 1486likely, it is not guaranteed. 1487 1488R is a very dynamic language. Functions defined in the base and other 1489packages could be shadowed at runtime by definitions in loaded user 1490packages or by local definitions within a function. It is even 1491possible for user code to redefine the functions in the base package, 1492though this is discouraged by binding locking and would be poor 1493programming practice. Finally, it is possible for functions called 1494prior to evaluating an expression like the one above to reach into 1495their calling environment and add new definitions of [[log]] or [[if]] 1496that wound then be used in evaluating this expression. Again this is 1497not common and generally not a good idea outside of a debugging 1498context. 1499 1500Ideally the compiler should completely preserve semantics of the 1501language implemented by the interpreter. While this is possible it 1502would significantly complicate the compiler and the compiled code, and 1503carry at least some runtime penalty. The approach taken here is 1504therefore to permit the compiler to inline some functions when they 1505are not visibly shadowed in the compiled code. What the compiler is 1506permitted to do is determined by the setting of an optimization level. 1507The details are desctibed in Section \ref{sec:inlining}. 1508 1509For the compiler to be able to decide whether is can inline a function 1510it needs to be able to determine whether there are any local variable 1511that might shadow a variable from a base package. This requires adding 1512environment information to the compilation process. 1513 1514 1515\subsection{Representing compilation environments} 1516When compiling an expression the compiler needs to take into account 1517an evaluation environment, which would typically be a toplevel 1518environment, along with local variable definitions discovered during 1519the compilation process. The evaluation environment should not be 1520modified, so the local variables need to be considered in addition to 1521ones defined in the evaluation environment. If an expression 1522\begin{verbatim} 1523{ x <- 1; x + 2 } 1524\end{verbatim} 1525is compiled for evaluation in the global environment then existing 1526definitions in the global environment as well as the new definition 1527for [[x]] need to be taken into account. To address this the 1528compilation environment is a list of two components, an environment 1529and a list of character vectors. The environment consists of one 1530frame for each level of local variables followed by the top level 1531evaluation environment. The list of character vectors consist of one 1532element for each frame for which local variables have been discovered 1533by the compiler. For efficiency the compilation environment structure 1534also includes a character vector [[ftype]] classifying each frame as a 1535local, namespace, or global frame. 1536<<[[makeCenv]] function>>= 1537## Create a new compiler environment 1538## **** need to explain the structure 1539makeCenv <- function(env) { 1540 structure(list(extra = list(character(0)), 1541 env = env, 1542 ftypes = frameTypes(env)), 1543 class = "compiler_environment") 1544} 1545@ %def makeCenv 1546 1547When an expression is to be compiled in a particular environment a 1548first step is to identify any local variable definitions and add these 1549to the top level frame. 1550<<[[addCenvVars]] function>>= 1551## Add vars to the top compiler environment frame 1552addCenvVars <- function(cenv, vars) { 1553 cenv$extra[[1]] <- union(cenv$extra[[1]], vars) 1554 cenv 1555} 1556@ %def addCenvVars 1557 1558When compiling a function a new frame is added to the compilation 1559environment. Typically a number of local variables are added 1560immediately, so an optional [[vars]] argument is provided so this 1561can be done without an additional call to [[addCenvVars]]. 1562<<[[addCenvFrame]] function>>= 1563## Add a new frame to a compiler environment 1564addCenvFrame <- function(cenv, vars) { 1565 cenv$extra <- c(list(character(0)), cenv$extra) 1566 cenv$env <- new.env(parent = cenv$env) 1567 cenv$ftypes <- c("local", cenv$ftypes) 1568 if (missing(vars)) 1569 cenv 1570 else 1571 addCenvVars(cenv, vars) 1572} 1573@ %def addCenvFrame 1574 1575The compilation environment is queried by calling [[findCenvVar]]. 1576%% **** change name to findCenvVarInfo or some such?? 1577If a binding for the specified variable is found then [[findCenvVar]] 1578returns a list containing information about the binding. If no 1579binding is found then [[NULL]] is returned. 1580<<[[findCenvVar]] function>>= 1581## Find binding information for a variable (character or name). 1582## If a binding is found, return a list containing components 1583## ftype -- one of "local", "namespace", "global" 1584## value -- current value if available 1585## frame -- frame containing the binding (not useful for "local" variables) 1586## index -- index of the frame (1 for top, 2, for next one, etc.) 1587## Return NULL if no binding is found. 1588## **** drop the index, maybe value, to reduce cost? (query as needed?) 1589findCenvVar <- function(var, cenv) { 1590 if (typeof(var) == "symbol") 1591 var <- as.character(var) 1592 extra <- cenv$extra 1593 env <- cenv$env 1594 frame <- NULL 1595 <<search [[extra]] entries and environment frames>> 1596 <<search the remaining environment frames if necessary>> 1597 <<create the [[findCenvVar]] result>> 1598} 1599@ %def findCenvVar 1600 1601The initial search for a matching binding proceeds down each frame for 1602which there is also an entry in [[extra]], searching the [[extra]] 1603entry before the environment frame. 1604<<search [[extra]] entries and environment frames>>= 1605for (i in seq_along(cenv$extra)) { 1606 if (var %in% extra[[i]] || exists(var, env, inherits = FALSE)) { 1607 frame <- env 1608 break 1609 } 1610 else 1611 env <- parent.env(env) 1612} 1613@ %def 1614If [[frame]] is still [[NULL]] after the initial search then the 1615remaining environment frames from the evaluation environment for which 1616there are no corresponding entries in [[extra]] are searched. 1617<<search the remaining environment frames if necessary>>= 1618if (is.null(frame)) { 1619 empty <- emptyenv() 1620 while (! identical(env, empty)) { 1621 i <- i + 1 1622 if (exists(var, env, inherits = FALSE)) { 1623 frame <- env 1624 break 1625 } 1626 else 1627 env <- parent.env(env) 1628 } 1629} 1630@ %def 1631 1632If a binding frame is found then the result consists of a list 1633containing the frame, the frame type, the value if available, and the 1634frame index. The value is not looked up for [[...]] variables. A 1635promise to compute the value is stored in an environment in the 1636result. This avoids computing the value in some cases where doing so 1637may fail or produce unwanted side effects. 1638<<create the [[findCenvVar]] result>>= 1639if (! is.null(frame)) { 1640 if (exists(var, frame, inherits = FALSE) && var != "...") { 1641 value <- new.env(parent = emptyenv()) 1642 delayedAssign("value", get(var, frame, inherits = FALSE), 1643 assign.env = value) 1644 } 1645 else 1646 value <- NULL 1647 list(frame = frame, ftype = cenv$ftypes[i], value = value, index = i) 1648} 1649else 1650 NULL 1651@ %def 1652 1653Useful functions for querying the environment associated with a 1654compilation context are [[findVar]], [[findLocVar]], and 1655[[findFunDef]]. The function [[findVar]] returns [[TRUE]] is a binding 1656for the specified variable is visible and [[FALSE]] otherwise. 1657<<[[findVar]] function>>= 1658findVar <- function(var, cntxt) { 1659 cenv <- cntxt$env 1660 info <- findCenvVar(var, cenv) 1661 ! is.null(info) 1662} 1663@ %def findVar 1664[[findLocVar]] returns [[TRUE]] only if a local binding is found. 1665<<[[findLocVar]] function>>= 1666## test whether a local version of a variable might exist 1667findLocVar <- function(var, cntxt) { 1668 cenv <- cntxt$env 1669 info <- findCenvVar(var, cenv) 1670 ! is.null(info) && info$ftype == "local" 1671} 1672@ %def findLocVar 1673[[findFunDef]] returns a function definition if one is available for 1674the specified name and [[NULL]] otherwise. 1675<<[[findFunDef]] function>>= 1676## **** should this check for local functions as well? 1677findFunDef <- function(fun, cntxt) { 1678 cenv <- cntxt$env 1679 info <- findCenvVar(fun, cenv) 1680 if (! is.null(info$value) && is.function(info$value$value)) 1681 info$value$value 1682 else 1683 NULL 1684} 1685@ %def findFunDef 1686 1687 1688\subsection{Identifying possible local variables} 1689For the compiler to be able to know that it can optimize a reference 1690to a particular global function or variable it needs to be able to 1691determine that that variable will not be shadowed by a local 1692definition at runtime. R semantics do not allow certain 1693identification of local variables. If a function body consist of the 1694two lines 1695\begin{verbatim} 1696if (x) y <- 1 1697y 1698\end{verbatim} 1699then whether the variable [[y]] in the second line is local or global 1700depends on the value of [[x]]. Lazy evaluation of arguments also 1701means what whether and when an assignment in a function argument 1702occurs can be uncertain. 1703 1704The approach taken by the compiler is to conservatively identify all 1705variables that might be created within an expression, such as a 1706function body, and consider those to be potentially local variables 1707that inhibit optimizations. This ignores runtime creation of new 1708variables, but as already mentioned that is generally not good 1709programming practice. 1710 1711Variables are created by the assignment operators [[<-]] and [[=]] and 1712by [[for]] loops. In addition, calls to [[assign]] and 1713[[delayedAssign]] with a literal character name argument are 1714considered to create potential local variables if the environment 1715argument is missing, which means the assignment is in the current 1716environment. 1717 1718A simple approach for identifying all local variables created within 1719an expression is given by 1720<<findlocals0>>= 1721findLocals0 <- function(e, cntxt) { 1722 if (typeof(e) == "language") { 1723 if (typeof(e[[1]]) %in% c("symbol", "character")) 1724 switch(as.character(e[[1]]), 1725 <<[[findLocals0]] switch clauses>> 1726 findLocalsList0(e[-1], cntxt)) 1727 else findLocalsList0(e, cntxt) 1728 } 1729 else character(0) 1730} 1731 1732findLocalsList0 <- function(elist, cntxt) 1733 unique(unlist(lapply(elist, findLocals0, cntxt))) 1734@ %def findLocals0 findLocalsList0 1735 1736For assignment expressions the assignment variable is added to any 1737variables found in the value expression. 1738<<[[findLocals0]] switch clauses>>= 1739"=" =, 1740"<-" = unique(c(getAssignedVar(e, cntxt), 1741 findLocalsList0(e[-1], cntxt))), 1742@ %def 1743The assigned variable is determined by [[getAssignedVar]]: 1744<<[[getAssignedVar]] function>>= 1745getAssignedVar <- function(e, cntxt) { 1746 v <- e[[2]] 1747 if (missing(v)) 1748 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt) 1749 else if (typeof(v) %in% c("symbol", "character")) 1750 as.character(v) 1751 else { 1752 while (typeof(v) == "language") { 1753 if (length(v) < 2) 1754 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt) 1755 v <- v[[2]] 1756 if (missing(v)) 1757 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt) 1758 } 1759 if (typeof(v) != "symbol") 1760 cntxt$stop(gettextf("bad assignment: %s", pasteExpr(e)), cntxt) 1761 as.character(v) 1762 } 1763} 1764@ %def getAssignedVar 1765 1766For [[for]] loops the loop variable is added to any variables found in 1767the sequence and body expressions. 1768<<[[findLocals0]] switch clauses>>= 1769"for" = unique(c(as.character(e[2]), 1770 findLocalsList0(e[-2], cntxt))), 1771@ %def 1772 1773The variable in [[assign]] and [[delayedAssign]] expressions is 1774considered local if it is an explicit character string and there is no 1775environment argument. 1776<<[[findLocals0]] switch clauses>>= 1777"delayedAssign" =, 1778"assign" = if (length(e) == 3 && 1779 is.character(e[[2]]) && 1780 length(e[[2]]) == 1) 1781 c(e[[2]], findLocals0(e[[3]], cntxt)) 1782 else findLocalsList0(e[1], cntxt), 1783@ %def 1784 1785Variables defined within local functions created by [[function]] 1786expressions do not shadow globals within the containing expression and 1787therefore [[function]] expressions do not contribute any new local 1788variables. Similarly, [[local]] calls without an environment argument 1789create a new environment for evaluating their expression and do not 1790add new local variables. If an environment argument is present then 1791this might be the current environment and so assignments in the 1792expression are considered to create possible local variables. 1793Finally, [[~]], [[expression]], and [[quote]] do not 1794evaluate their arguments and so do not contribute new local variables. 1795<<[[findLocals0]] switch clauses>>= 1796"function" = character(0), 1797"~" = character(0), 1798"local" = if (length(e) == 2) character(0) 1799 else findLocalsList0(e[-1], cntxt), 1800"expression" =, 1801"quote" = character(0), 1802@ %def findLocals0 1803Other functions, for example [[Quote]] from the [[methods]] package, 1804are also known to not evaluate their arguments but these do not often 1805contain assignment expressions and so ignoring them only slightly 1806increases the degree of conservatism in this approach. 1807 1808A problem with this simple implementation is that it assumes that all 1809of the functions named in the [[switch]] correspond to the bindings in 1810the base package. This is reasonable for the ones that are 1811syntactically special, but not for [[expression]], [[local]] and 1812[[quote]]. These might be shadowed by local definitions in a 1813surrounding function. To allow for this we can add an optional 1814variable [[shadowed]] for providing a character vector of names of 1815variables with shadowing local definitions. 1816 1817The more sophisticated implementaiton is also slightly optimized to avoid 1818recursive calls. [[findLocals1]] now, instead of searching through the full 1819transitive closure of language objects, only searches from the first, but 1820returns what remains to be searched. The variables found are stored into an 1821environment, which avoids some extra calls and assures that each variable is 1822listed at most once. 1823<<[[findLocals1]] function>>= 1824addVar <- function(v, vars) assign(v, 1, envir = vars) 1825findLocals1 <- function(e, shadowed = character(0), cntxt, vars) { 1826 if (typeof(e) == "language") { 1827 if (typeof(e[[1]]) %in% c("symbol", "character")) { 1828 v <- as.character(e[[1]]) 1829 switch(v, 1830 <<[[findLocals1]] switch clauses>> 1831 e[-1]) 1832 } 1833 else e 1834 } 1835 else NULL 1836} 1837@ %def findLocals1 1838%% **** merge into single chunk?? 1839<<[[findLocalsList1]] function>>= 1840findLocalsList1 <- function(elist, shadowed, cntxt) { 1841 todo <- elist 1842 vars <- new.env() 1843 while(length(todo) > 0) { 1844 newtodo <- list() 1845 lapply(todo, function(e) 1846 lapply(findLocals1(e, shadowed, cntxt, vars), 1847 function(x) 1848 if (typeof(x) == "language") 1849 newtodo <<- append(newtodo, x)) 1850 ) 1851 todo <- newtodo 1852 } 1853 ls(vars, all.names=T) 1854} 1855 1856@ %def findLocalsList1 1857The handling of assignment operators, [[for]] loops, [[function]] and 1858[[~]] expressions is analogous to the approach in [[findLocals0]]. 1859<<[[findLocals1]] switch clauses>>= 1860"=" =, 1861"<-" = { addVar(getAssignedVar(e, cntxt), vars); e[-1] }, 1862 1863"for" = { addVar(as.character(e[2]), vars); e[-2] }, 1864 1865"delayedAssign" =, 1866"assign" = if (length(e) == 3 && 1867 is.character(e[[2]]) && 1868 length(e[[2]]) == 1) { 1869 1870 addVar(e[[2]], vars); list(e[[3]]) 1871 } 1872 else e[1], 1873"function" = character(0), 1874"~" = character(0), 1875@ %def 1876The rules for ignoring assignments in [[local]], [[expression]], and 1877[[quote]] calls are only applied if there are no shadowing 1878definitions. 1879<<[[findLocals1]] switch clauses>>= 1880"local" = if (! v %in% shadowed && length(e) == 2) 1881 NULL 1882 else e[-1], 1883"expression" =, 1884"quote" = if (! v %in% shadowed) 1885 NULL 1886 else e[-1], 1887@ %def 1888The assignment functions could also be shadowed, but this is not very 1889common, and assuming that they are not errs in the conservative 1890direction. 1891 1892This approach can handle the case where [[quote]] or one of the other 1893non-syntactic functions is shadowed by an outer definition but does not 1894handle assignments that occur in the expression itself. For example, 1895in 1896\begin{verbatim} 1897function (f, x, y) { 1898 local <- f 1899 local(x <- y) 1900 x 1901} 1902\end{verbatim} 1903the reference to [[x]] in the third line has to be considered 1904potentially local. To deal with this multiple passes are needed. The 1905first pass assumes that [[expression]], [[local]] or [[quote]] might 1906be shadowed by local assignments. If no assignments to some of them 1907are visible, then a second pass can be used in which they are assumed 1908not to be shadowed. This can be iterated to convergence. It is also 1909useful to check before returning whether any of the syntactically 1910special variables has been assigned to. If so, so a warning is 1911issued. 1912%% **** look into speeding up findLocalsList 1913<<[[findLocalsList]] function>>= 1914findLocalsList <- function(elist, cntxt) { 1915 initialShadowedFuns <- c("expression", "local", "quote") 1916 shadowed <- Filter(function(n) ! isBaseVar(n, cntxt), initialShadowedFuns) 1917 specialSyntaxFuns <- c("~", "<-", "=", "for", "function") 1918 sf <- initialShadowedFuns 1919 nsf <- length(sf) 1920 repeat { 1921 vals <- findLocalsList1(elist, sf, cntxt) 1922 redefined <- sf %in% vals 1923 last.nsf <- nsf 1924 sf <- unique(c(shadowed, sf[redefined])) 1925 nsf <- length(sf) 1926 ## **** need to fix the termination condition used in codetools!!! 1927 if (last.nsf == nsf) { 1928 rdsf <- vals %in% specialSyntaxFuns 1929 if (any(rdsf)) 1930 ## cannot get location info (source reference) here 1931 notifyAssignSyntacticFun(vals[rdsf], cntxt) 1932 return(vals) 1933 } 1934 } 1935} 1936@ %def findLocalsList 1937<<[[findLocals]] function>>= 1938findLocals <- function(e, cntxt) 1939 findLocalsList(list(e), cntxt) 1940@ %def findLocals 1941 1942Standard definitions for all functions in [[initialShadowedFuns]] are 1943in the base package and [[isBaseVar]] checks the compilation 1944environment to see whether the specified variable's definition comes 1945from that package either via a namespace or the global environment. 1946<<[[isBaseVar]] function>>= 1947isBaseVar <- function(var, cntxt) { 1948 info <- getInlineInfo(var, cntxt) 1949 (! is.null(info) && 1950 (identical(info$frame, .BaseNamespaceEnv) || 1951 identical(info$frame, baseenv()))) 1952} 1953@ %def isBaseVar 1954The use of [[getInlineInfo]], defined in Section \ref{sec:inlining}, means 1955that the setting of the [[optimize]] compiler option will influence whether 1956a variable should be considered to be from the base package or not. 1957It might also be useful to warn about assignments to other functions. 1958 1959When a [[function]] expression is compiled, its body and default 1960arguments need to be compiled using a compilation environment that 1961contains a new frame for the function that contains variables for the 1962arguments and any assignments in the body and the default expressions. 1963[[funEnv]] creates such an environment. 1964<<[[funEnv]] function>>= 1965## augment compiler environment with function args and locals 1966funEnv <- function(forms, body, cntxt) { 1967 cntxt$env <- addCenvFrame(cntxt$env, names(forms)) 1968 locals <- findLocalsList(c(forms, body), cntxt) 1969 addCenvVars(cntxt$env, locals) 1970} 1971@ %def funEnv 1972 1973 1974\section{The inlining mechanism} 1975\label{sec:inlining} 1976To allow for inline coding of calls to some functions the [[cmpCall]] 1977function calls the [[tryInline]] function. The [[tryInline]] function 1978will either generate code for the call and return [[TRUE]], or it will 1979decline to do so and return [[FALSE]], in which case the standard code 1980generation process for a function call is used. 1981 1982The function [[tryInline]] calls [[getInlineInfo]] to determine 1983whether inlining is permissible given the current environment and 1984optimization settings. There are four possible optimization levels: 1985\begin{description} 1986\item[Level 0:] No inlining. 1987\item[Level 1:] Functions in the base packages found through a 1988 namespace that are not shadowed by function arguments or visible 1989 local assignments may be inlined. 1990\item[Level 2:] In addition to the inlining permitted by Level 1, 1991 functions that are syntactically special or are considered core 1992 language functions and are found via the global environment at 1993 compile time may be inlined. Other functions in the base packages 1994 found via the global environment may be inlined with a guard that 1995 ensures at runtime that the inlined function has not been masked; 1996 if it has, then the call in handled by the AST interpreter. 1997\item[Level 3:] Any function in the base packages found via the global 1998 environment may be inlined. 1999 %% **** should there be an explicit list of functions where inlining 2000 %% **** is OK here?? 2001\end{description} 2002The syntactically special and core language functions are 2003<<[[languageFuns]] definition>>= 2004languageFuns <- c("^", "~", "<", "<<-", "<=", "<-", "=", "==", ">", ">=", 2005 "|", "||", "-", ":", "!", "!=", "/", "(", "[", "[<-", "[[", 2006 "[[<-", "{", "@", "$", "$<-", "*", "&", "&&", "%/%", "%*%", 2007 "%%", "+", 2008 "::", ":::", "@<-", 2009 "break", "for", "function", "if", "next", "repeat", "while", 2010 "local", "return", "switch") 2011@ %def languageFuns 2012%% **** local, return, and switch are dubious here 2013%% **** if we allow them, should we also allow a few others, like .Internal? 2014The default optimization level is Level 2. Future versions of the 2015compiler may allow some functions to be explicitly excluded from 2016inlining and may provide a means for allowing user-defined functions 2017to be declared eligible for inlining. 2018 2019If inlining is permissible then the result returned by 2020[[getInlineInfo]] contains the packages associated with the specified 2021variable in the current environment. The variable name and package are 2022then looked up in a data base of handlers. If a handler is found then 2023the handler is called. The handler can either generate code and 2024return [[TRUE]] or decline to and return [[FALSE]]. If inlining is 2025not possible then [[getInlineInfo]] returns [[NULL]] and [[tryInline]] 2026returns [[FALSE]]. 2027%% **** think about adding GETNSFUN to use when inlining is OK 2028<<[[tryInline]] function>>= 2029tryInline <- function(e, cb, cntxt) { 2030 name <- as.character(e[[1]]) 2031 info <- getInlineInfo(name, cntxt, guardOK = TRUE) 2032 if (is.null(info)) 2033 FALSE 2034 else { 2035 h <- getInlineHandler(name, info$package) 2036 if (! is.null(h)) { 2037 if (info$guard) { 2038 <<inline with a guard instruction>> 2039 } 2040 else h(e, cb, cntxt) 2041 } 2042 else FALSE 2043 } 2044} 2045@ %def tryInline 2046If a guard instruction is needed then the instruction is emitted that 2047will check validity of the inlined function at runtime; if the inlined 2048code is not valid the guard instruction will evaluate the call in the 2049AST interpreter and jump over the inlined code. The inlined code is 2050handled as a non-tail-call; if the call is in tail position, then a 2051return instruction is emitted. 2052<<inline with a guard instruction>>= 2053tailcall <- cntxt$tailcall 2054if (tailcall) cntxt$tailcall <- FALSE 2055expridx <- cb$putconst(e) 2056endlabel <- cb$makelabel() 2057cb$putcode(BASEGUARD.OP, expridx, endlabel) 2058if (! h(e, cb, cntxt)) 2059 cmpCall(e, cb, cntxt, inlineOK = FALSE) 2060cb$putlabel(endlabel) 2061if (tailcall) cb$putcode(RETURN.OP) 2062TRUE 2063@ 2064 2065The function [[getInlineInfo]] implements the optimization rules 2066described at the beginning of this section. 2067<<[[getInlineInfo]] function>>= 2068noInlineSymbols <- c("standardGeneric") 2069 2070getInlineInfo <- function(name, cntxt, guardOK = FALSE) { 2071 optimize <- cntxt$optimize 2072 if (optimize > 0 && ! (name %in% noInlineSymbols)) { 2073 info <- findCenvVar(name, cntxt$env) 2074 if (is.null(info)) 2075 NULL 2076 else { 2077 ftype <- info$ftype 2078 frame <- info$frame 2079 if (ftype == "namespace") { 2080 <<fixup for a namespace import frame>> 2081 info$package <- nsName(findHomeNS(name, frame, cntxt)) 2082 info$guard <- FALSE 2083 info 2084 } 2085 else if (ftype == "global" && 2086 (optimize >= 3 || 2087 (optimize >= 2 && name %in% languageFuns))) { 2088 info$package <- packFrameName(frame) 2089 info$guard <- FALSE 2090 info 2091 } 2092 else if (guardOK && ftype == "global" && 2093 packFrameName(frame) == "base") { 2094 info$package <- packFrameName(frame) 2095 info$guard <- TRUE 2096 info 2097 } 2098 else NULL 2099 } 2100 } 2101 else NULL 2102} 2103@ %def getInlineInfo 2104The code for finding the home namespace from a namespace import frame 2105is needed here to deal with the fact that a namespace may not be 2106registered when this function is called, so the mechanism used in 2107[[findHomeNS]] to locate the namespace to which an import frame 2108belongs may not work. 2109<<fixup for a namespace import frame>>= 2110if (! isNamespace(frame)) { 2111 ## should be the import frame of the current topenv 2112 top <- topenv(cntxt$env$env) 2113 if (! isNamespace(top) || 2114 ! identical(frame, parent.env(top))) 2115 cntxt$stop(gettext("bad namespace import frame")) 2116 frame <- top 2117} 2118@ 2119 2120For this version of the compiler the inline handler data base is 2121managed as an environment in which handlers are entered and looked up 2122by name. For now it is assumed that a name can only appear associated 2123with one package and an error is signaled if an attempt is made to 2124redefine a handler for a given name for a different package than an 2125existing definition. This can easily be changed if it should prove too 2126restrictive. 2127%% **** note on haveInlineHandler 2128%% **** allow same name in multiple packages? 2129<<inline handler implementation>>= 2130inlineHandlers <- new.env(hash = TRUE, parent = emptyenv()) 2131 2132setInlineHandler <- function(name, h, package = "base") { 2133 if (exists(name, inlineHandlers, inherits = FALSE)) { 2134 entry <- get(name, inlineHandlers) 2135 if (entry$package != package) { 2136 fmt <- "handler for '%s' is already defined for another package" 2137 stop(gettextf(fmt, name), domain = NA) 2138 } 2139 } 2140 entry <- list(handler = h, package = package) 2141 assign(name, entry, inlineHandlers) 2142} 2143 2144getInlineHandler <- function(name, package = "base") { 2145 if (exists(name, inlineHandlers, inherits = FALSE)) { 2146 hinfo <- get(name, inlineHandlers) 2147 if (hinfo$package == package) 2148 hinfo$handler 2149 else NULL 2150 } 2151 else NULL 2152} 2153 2154haveInlineHandler <- function(name, package = "base") { 2155 if (exists(name, inlineHandlers, inherits = FALSE)) { 2156 hinfo <- get(name, inlineHandlers) 2157 package == hinfo$package 2158 } 2159 else FALSE 2160} 2161@ %def inlineHandlers getInlineHandler setInlineHandler haveInlineHandler 2162 2163 2164\section{Default inlining rules for primitives} 2165This section defines generic handlers for [[BUILTIN]] and [[SPECIAL]] 2166functions. These are installed programmatically for all [[BUILTIN]] 2167and [[SPECIAL]] functions. The following sections present more 2168specialized handlers for a range of functions that are installed in 2169place of the default ones. 2170<<install default inlining handlers>>= 2171local({ 2172 <<install default [[SPECIAL]] handlers>> 2173 <<install default [[BUILTIN]] handlers>> 2174}) 2175@ 2176The handler installations are wrapped in a [[local]] call to reduce 2177environment pollution. 2178 2179 2180\subsection{[[BUILTIN]] functions} 2181Calls to functions known at compile time to be of type [[BUILTIN]] can 2182be handled more efficiently. The interpreter evaluates all arguments 2183for [[BUILTIN]] functions before calling the function, so the compiler 2184can evaluate the arguments in line without the need for creating 2185promises. 2186 2187A generic handler for inlining a call to a [[BUILIN]] function is 2188provided by [[cmpBuiltin]]. For now, the handler returns [[FALSE]] if 2189the call contains missing arguments, which are currently not allowed 2190in [[BUILTIN]] functions, or [[...]] arguments. The handling of 2191[[...]] arguments should be improved. 2192%% **** look into improving handling ... arguments to BUILTINs? 2193For [[BUILTIN]] functions the function to call is pushed on the stack 2194with the [[GETBUILTIN]] instruction. The [[internal]] argument allows 2195[[cmpBuiltin]] to be used with [[.Internal]] functions of type 2196[[BUILTIN]] as well; this is used in the handler for [[.Internal]] 2197defined in Section \ref{subsec:.Internal}. 2198<<[[cmpBuiltin]] function>>= 2199cmpBuiltin <- function(e, cb, cntxt, internal = FALSE) { 2200 fun <- e[[1]] 2201 args <- e[-1] 2202 names <- names(args) 2203 if (dots.or.missing(args)) 2204 FALSE 2205 else { 2206 ci <- cb$putconst(fun) 2207 if (internal) 2208 cb$putcode(GETINTLBUILTIN.OP, ci) 2209 else 2210 cb$putcode(GETBUILTIN.OP, ci) 2211 cmpBuiltinArgs(args, names, cb, cntxt) 2212 ci <- cb$putconst(e) 2213 cb$putcode(CALLBUILTIN.OP, ci) 2214 if (cntxt$tailcall) cb$putcode(RETURN.OP) 2215 TRUE 2216 } 2217} 2218@ %def cmpBuiltin 2219 2220Argument evaluation code is generated by [[cmpBuiltinArgs]]. In the 2221context of [[BUILTIN]] functions missing arguments are currently not 2222allowed. But to allow [[cmpBuiltinArgs]] to be used in other contexts 2223missing arguments are supported if the optional argument [[missingOK]] 2224is [[TRUE]]. 2225%% **** should this warn/stop if there are missings and missingOK is FALSE?? 2226%% **** can this be adjusted so error messages match the interpreter? 2227%% **** for f <- function(x, y) x + y compare errors for f(1,) and cmpfun(f)(1,) 2228%% **** test code for constant folding is needed (sym and non-sym) 2229<<[[cmpBuiltinArgs]] function>>= 2230cmpBuiltinArgs <- function(args, names, cb, cntxt, missingOK = FALSE) { 2231 ncntxt <- make.argContext(cntxt) 2232 for (i in seq_along(args)) { 2233 a <- args[[i]] 2234 n <- names[[i]] 2235 <<compile missing [[BUILTIN]] argument>> 2236 ## **** handle ... here ?? 2237 <<signal an error for promise or bytecode argument>> 2238 <<compile a general [[BUILTIN]] argument>> 2239 } 2240} 2241@ %def cmpBuiltinArgs 2242 2243Missing argument code is generated by 2244<<compile missing [[BUILTIN]] argument>>= 2245if (missing(a)) { 2246 if (missingOK) { 2247 cb$putcode(DOMISSING.OP) 2248 cmpTag(n, cb) 2249 } 2250 else 2251 cntxt$stop(gettext("missing arguments are not allowed"), cntxt, 2252 loc = cb$savecurloc()) 2253} 2254@ 2255The error case should not be reached as [[cmpBuiltinArgs]] should not 2256be called with missing arguments unless [[missingOK]] is [[TRUE]]. 2257 2258The code for general arguments handles symbols separately to allow for 2259the case when missing values are acceptable. Constant folding is 2260tried first since the constant folding code in [[cmp]] is not reached 2261in this case. Constant folding is needed here since it doesn't go 2262through [[cmp]]. 2263<<compile a general [[BUILTIN]] argument>>= 2264else { 2265 if (is.symbol(a)) { 2266 ca <- constantFold(a, cntxt, loc = cb$savecurloc()) 2267 if (is.null(ca)) { 2268 cmpSym(a, cb, ncntxt, missingOK) 2269 cb$putcode(PUSHARG.OP) 2270 } 2271 else 2272 cmpConstArg(ca$value, cb, cntxt) 2273 } 2274 else if (typeof(a) == "language") { 2275 cmp(a, cb, ncntxt) 2276 cb$putcode(PUSHARG.OP) 2277 } 2278 else 2279 cmpConstArg(a, cb, cntxt) 2280 cmpTag(n, cb) 2281} 2282@ %def 2283Handling the constant case separately is not really necessary but 2284makes the code a bit cleaner. 2285 2286Default handlers for all [[BUILTIN]] functions in the [[base]] package 2287are installed programmatically by 2288<<install default [[BUILTIN]] handlers>>= 2289for (b in basevars[types == "builtin"]) 2290 if (! haveInlineHandler(b, "base")) 2291 setInlineHandler(b, cmpBuiltin) 2292@ %def 2293 2294 2295\subsection{[[SPECIAL]] functions} 2296Calls to functions known to be of type [[SPECIAL]] can also be 2297compiled somewhat more efficiently by the [[cmpSpecial]] function: 2298<<[[cmpSpecial]] function>>= 2299cmpSpecial <- function(e, cb, cntxt) { 2300 fun <- e[[1]] 2301 if (typeof(fun) == "character") 2302 fun <- as.name(fun) 2303 ci <- cb$putconst(e) 2304 cb$putcode(CALLSPECIAL.OP, ci) 2305 if (cntxt$tailcall) 2306 cb$putcode(RETURN.OP) 2307 TRUE 2308} 2309@ %def cmpSpecial 2310 2311This handler is installed for all [[SPECIAL]] functions in the base 2312package with 2313<<install default [[SPECIAL]] handlers>>= 2314basevars <- ls('package:base', all.names = TRUE) 2315types <- sapply(basevars, function(n) typeof(get(n))) 2316for (s in basevars[types == "special"]) 2317 if (! haveInlineHandler(s, "base")) 2318 setInlineHandler(s, cmpSpecial) 2319@ %def 2320 2321 2322\section{Some simple inlining handlers} 2323This section presents inlining handlers for a number of core primitive 2324functions. With these additions the compiler will begin to show some 2325performance improvements. 2326 2327\subsection{The left brace sequencing function} 2328The inlining handler for [[{]] needs to consider that a pair of braces 2329[[{]] and [[}]] can surround zero, one, or more expressions. A set 2330of empty braces is equivalent to the constant [[NULL]]. If there is 2331more than one expression, then all the values of all expressions other 2332than the last are ignored. These expressions are compiled in a 2333no-value context (currently equivalent to a non-tail-call context), 2334and then code is generated to pop their values off the stack. The 2335final expression is then compiled according to the context in which 2336the braces expression occurs. 2337<<inlining handler for left brace function>>= 2338setInlineHandler("{", function(e, cb, cntxt) { 2339 n <- length(e) 2340 if (n == 1) 2341 cmp(NULL, cb, cntxt) 2342 else { 2343 sloc <- cb$savecurloc() 2344 bsrefs <- attr(e, "srcref") 2345 if (n > 2) { 2346 ncntxt <- make.noValueContext(cntxt) 2347 for (i in 2 : (n - 1)) { 2348 subexp <- e[[i]] 2349 cb$setcurloc(subexp, getBlockSrcref(bsrefs, i)) 2350 cmp(subexp, cb, ncntxt, setloc = FALSE) 2351 cb$putcode(POP.OP) 2352 } 2353 } 2354 subexp <- e[[n]] 2355 cb$setcurloc(subexp, getBlockSrcref(bsrefs, n)) 2356 cmp(subexp, cb, cntxt, setloc = FALSE) 2357 cb$restorecurloc(sloc) 2358 } 2359 TRUE 2360}) 2361@ %def 2362 2363 2364\subsection{The closure constructor function} 2365Compiling of [[function]] expressions is somewhat similar to compiling 2366promises for function arguments. The body of a function is compiled 2367into a separate byte code object and stored in the constant pool 2368together with the formals. Then code is emitted for creating a 2369closure from the formals, compiled body, and the current environment. 2370For now, only the body of functions is compiled, not the 2371default argument expressions. This should be changed in future 2372versions of the compiler. 2373<<inlining handler for [[function]]>>= 2374setInlineHandler("function", function(e, cb, cntxt) { 2375 forms <- e[[2]] 2376 body <- e[[3]] 2377 sref <- if (length(e) > 3) e[[4]] else NULL 2378 ncntxt <- make.functionContext(cntxt, forms, body) 2379 if (mayCallBrowser(body, cntxt)) 2380 return(FALSE) 2381 cbody <- genCode(body, ncntxt, loc = cb$savecurloc()) 2382 ci <- cb$putconst(list(forms, cbody, sref)) 2383 cb$putcode(MAKECLOSURE.OP, ci) 2384 if (cntxt$tailcall) cb$putcode(RETURN.OP) 2385 TRUE 2386}) 2387@ %def 2388 2389 2390\subsection{The left parenthesis function} 2391In R an expression of the form [[(expr)]] is interpreted as a call to 2392the function [[(]] with the argument [[expr]]. Parentheses are used 2393to guide the parser, and for the most part [[(expr)]] is equivalent to 2394[[expr]]. There are two exceptions: 2395\begin{itemize} 2396\item Since [[(]] is a function an expression of the form [[(...)]] is 2397 legal whereas just [[...]] may not be, depending on the context. A 2398 runtime error will occur unless the [[...]] argument expands to 2399 exactly one non-missing argument. 2400\item In tail position a call to [[(]] sets the visible flag to 2401 [[TRUE]]. So at top level for example the result of an assignment 2402 expression [[x <- 1]] would not be printed, but the result of [[(x 2403 <- 1]] would be printed. It is not clear that this feature really 2404 needs to be preserved within functions --- it could be made a 2405 feature of the read-eval-print loop --- but for now it is a feature 2406 of the interpreter that the compiler should preserve. 2407\end{itemize} 2408 2409The inlining handler for [[(]] calls handles a [[...]] argument case 2410or a case with fewer or more than one argument as a generic 2411[[BUILTIN]] call. If the expression is in tail position then the 2412argument is compiled in a non-tail-call context, a [[VISIBLE]] 2413instruction is emitted to set the visible flag to [[TRUE]], and a 2414[[RETURN]] instruction is emitted. If the expression is in non-tail 2415position, then code for the argument is generated in the current context. 2416<<inlining handler for [[(]]>>= 2417setInlineHandler("(", function(e, cb, cntxt) { 2418 if (any.dots(e)) 2419 cmpBuiltin(e, cb, cntxt) ## punt 2420 else if (length(e) != 2) { 2421 notifyWrongArgCount("(", cntxt, loc = cb$savecurloc()) 2422 cmpBuiltin(e, cb, cntxt) ## punt 2423 } 2424 else if (cntxt$tailcall) { 2425 ncntxt <- make.nonTailCallContext(cntxt) 2426 cmp(e[[2]], cb, ncntxt) 2427 cb$putcode(VISIBLE.OP) 2428 cb$putcode(RETURN.OP) 2429 TRUE 2430 } 2431 else { 2432 cmp(e[[2]], cb, cntxt) 2433 TRUE 2434 } 2435}) 2436@ %def 2437 2438 2439\subsection{The [[.Internal]] function} 2440\label{subsec:.Internal} 2441One frequently used [[SPECIAL]] function is [[.Internal]]. When the 2442[[.Internal]] function called is of type [[BUILTIN]] it is useful to 2443compile the call as for a [[BUILTIN]] function. For [[.Internal]] 2444functions of type [[SPECIAL]] there is less of an advantage, and so 2445the [[.Internal]] expression is compiled with [[cmpSpecial]]. It may 2446be useful to introduce a [[GETINTLSPECIAL]] instruction and handle 2447these analogously to [[.Internal]] functions of type [[BUILTIN]]. The 2448handler is assigned to the variable [[cmpDotInternalCall]] to allow 2449its use in inlining. 2450%% **** look into adding GETINTLSPECIAL?? 2451<<inlining handler for [[.Internal]]>>= 2452cmpDotInternalCall <- function(e, cb, cntxt) { 2453 ee <- e[[2]] 2454 sym <- ee[[1]] 2455 if (.Internal(is.builtin.internal(sym))) 2456 cmpBuiltin(ee, cb, cntxt, internal = TRUE) 2457 else 2458 cmpSpecial(e, cb, cntxt) 2459} 2460 2461setInlineHandler(".Internal", cmpDotInternalCall) 2462@ %def cmpDotInternalCall 2463 2464 2465\subsection{The [[local]] function} 2466While [[local]] is currently implemented as a closure, because of its 2467importance relative to local variable determination it is a good idea 2468to inline it as well. The current semantics are such that the 2469interpreter treats 2470\begin{verbatim} 2471local(expr) 2472\end{verbatim} 2473essentially the same as 2474\begin{verbatim} 2475(function() expr)() 2476\end{verbatim} 2477There may be some minor differences related to what the [[sys.xyz]] 2478functions return. An instance of this was found in the [[RefManageR]] 2479package which used [[parent.frame(2)]] to access the environment from which 2480[[local]] was invoked. In this case, the use of [[parent.frame]] was 2481unnecessary (and [[local]] was not needed either); the maintainer 2482accepted a patch fixing this. The code pattern in the package was 2483\begin{verbatim} 2484MakeBibLaTeX <- function(docstyle = "text") local({ 2485 docstyle <- get("docstyle", parent.frame(2)) 2486 sortKeys <- function() 42 2487 environment() 2488}) 2489\end{verbatim} 2490and the suggested fix was 2491\begin{verbatim} 2492MakeBibLaTeX <- function(docstyle = "text") { 2493 sortKeys <- function() 42 2494 environment() 2495} 2496\end{verbatim} 2497So the compiler handles one argument [[local]] calls by making this 2498conversion and compiling the result. 2499%% **** add to language manual? 2500<<inlining handler for [[local]] function>>= 2501setInlineHandler("local", function(e, cb, cntxt) { 2502 if (length(e) == 2) { 2503 ee <- as.call(list(as.call(list( 2504 as.name("function"), NULL, e[[2]], NULL)))) 2505 cmp(ee, cb, cntxt) 2506 TRUE 2507 } 2508 else FALSE 2509}) 2510@ %def 2511The interpreter couls, and probably should, be modified to handle this 2512case of a [[local]] call expression in the same way as the compiler. 2513 2514\subsection{The [[return]] function} 2515\label{subsec:return} 2516A call to [[return]] causes a return from the associated function 2517call, as determined by the lexical context in which the [[return]] 2518expression is defined. If the [[return]] is captured in a closure and 2519is executed within a callee then this requires a [[longjmp]]. A 2520[[longjmp]] is also needed if the [[return]] call occurs within a loop 2521that is compiled to a separate code object to support a [[setjmp]] for 2522[[break]] or [[next]] calls. The [[RETURNJMP]] instruction is 2523provided for this purpose. In all other cases an ordinary [[RETURN]] 2524instruction can be used. 2525%% **** if function body code was tagged as such then changing from 2526%% **** RETURNJMP to RETURN could be done by post-processing the 2527%% **** bytecode or by putcode 2528[[return]] calls with [[...]], which may be legal if [[...]] contains 2529only one argument, or missing arguments or more than one argument, 2530which will produce runtime errors, are compiled as generic [[SPECIAL]] 2531calls. 2532<<inlining handler for [[return]] function>>= 2533setInlineHandler("return", function(e, cb, cntxt) { 2534 if (dots.or.missing(e) || length(e) > 2) 2535 cmpSpecial(e, cb, cntxt) ## **** punt for now 2536 else { 2537 if (length(e) == 1) 2538 val <- NULL 2539 else 2540 val <- e[[2]] 2541 ncntxt <- make.nonTailCallContext(cntxt) 2542 cmp(val, cb, ncntxt) 2543 if (cntxt$needRETURNJMP) 2544 cb$putcode(RETURNJMP.OP) 2545 else 2546 cb$putcode(RETURN.OP) 2547 } 2548 TRUE 2549}) 2550@ 2551 2552 2553\section{Branching and labels} 2554The code generated so far is straight line code without conditional or 2555unconditional branches. To implement conditional evaluation 2556constructs and loops we need to add conditional and unconditional 2557branching instructions. These make use of the labels mechanism 2558provided by the code buffer. 2559 2560 2561\subsection{Inlining handler for [[if]] expressions} 2562Using the labels mechanism we can implement an inlining handler for 2563[[if]] expressions. The first step extracts the components of the 2564expression. An [[if]] expression with no [[else]] clause will 2565invisibly return [[NULL]] if the test is [[FALSE]], but the visible 2566flag setting only matters if the [[if]] expression is in tail 2567position. So the case of no [[else]] clause will be handled slightly 2568differently in tail and non-tail contexts. 2569%% **** In no value contexts it would be good to avoid pushing and 2570%% **** immediately popping constants. Alternatively a peephole optimizer 2571%% **** could clean these up. 2572%% **** Should there be error checking for either two or three arguments here?? 2573<<[[if]] inline handler body>>= 2574test <- e[[2]] 2575then.expr <- e[[3]] 2576if (length(e) == 4) { 2577 have.else.expr <- TRUE 2578 else.expr <- e[[4]] 2579} 2580else have.else.expr <- FALSE 2581@ %def 2582 2583To deal with use of [[if (FALSE) ...]] for commenting out code and of 2584[[if (is.R()) ... else ...]] for handling both R and Splus code it is 2585useful to attempt to constant-fold the test. If this succeeds and 2586produces either [[TRUE]] or [[FALSE]] then only the appropriate branch 2587is compiled and the handler returns [[TRUE]]. 2588<<[[if]] inline handler body>>= 2589ct <- constantFold(test, cntxt, loc = cb$savecurloc()) 2590if (! is.null(ct) && is.logical(ct$value) && length(ct$value) == 1 2591 && ! is.na(ct$value)) { 2592 if (ct$value) 2593 cmp(then.expr, cb, cntxt) 2594 else if (have.else.expr) 2595 cmp(else.expr, cb, cntxt) 2596 else if (cntxt$tailcall) { 2597 cb$putcode(LDNULL.OP) 2598 cb$putcode(INVISIBLE.OP) 2599 cb$putcode(RETURN.OP) 2600 } 2601 else cb$putcode(LDNULL.OP) 2602 return(TRUE) 2603} 2604@ 2605 2606Next, the test code is compiled, a label for the start of code for the 2607[[else]] clause is generated, and a conditional branch instruction 2608that branches to the [[else]] label if the test fails is emitted. 2609This is followed by code for the consequent (test is [[TRUE]]) 2610expression. The [[BRIFNOT]] takes two operand, the constant pool 2611index for the call and the label to branch to if the value on the 2612stack is [[FALSE]]. The call is used if an error needs to be signaled 2613for an improper test result on the stack. 2614<<[[if]] inline handler body>>= 2615ncntxt <- make.nonTailCallContext(cntxt) 2616cmp(test, cb, ncntxt) 2617callidx <- cb$putconst(e) 2618else.label <- cb$makelabel() 2619cb$putcode(BRIFNOT.OP, callidx, else.label) 2620cmp(then.expr, cb, cntxt) 2621@ 2622 2623The code for the alternative [[else]] expression will be placed after 2624the code for the consequent expression. If the [[if]] expression 2625appears in tail position then the code for the consequent will end with 2626a [[RETURN]] instruction and there is no need to jump over the 2627following instructions for the [[else]] expression. All that is needed 2628is to record the value of the label for the [[else]] clause and to 2629emit the code for the [[else]] clause. If no [[else]] clause was 2630provided then that code arranges for the value [[NULL]] to be returned 2631invisibly. 2632<<[[if]] inline handler body>>= 2633if (cntxt$tailcall) { 2634 cb$putlabel(else.label) 2635 if (have.else.expr) 2636 cmp(else.expr, cb, cntxt) 2637 else { 2638 cb$putcode(LDNULL.OP) 2639 cb$putcode(INVISIBLE.OP) 2640 cb$putcode(RETURN.OP) 2641 } 2642} 2643@ %def 2644On the other hand, if the [[if]] expression is not in tail position 2645then a label for the next instruction after the [[else]] expression 2646code is needed, and the consequent expression code needs to end with a 2647[[GOTO]] instruction to that label. If the expression does not 2648include an [[else]] clause then the alternative code just places 2649[[NULL]] on the stack. 2650<<[[if]] inline handler body>>= 2651else { 2652 end.label <- cb$makelabel() 2653 cb$putcode(GOTO.OP, end.label) 2654 cb$putlabel(else.label) 2655 if (have.else.expr) 2656 cmp(else.expr, cb, cntxt) 2657 else 2658 cb$putcode(LDNULL.OP) 2659 cb$putlabel(end.label) 2660} 2661@ %def 2662 2663The resulting handler definition is 2664<<inlining handler for [[if]]>>= 2665setInlineHandler("if", function(e, cb, cntxt) { 2666 ## **** test for missing, ... 2667 <<[[if]] inline handler body>> 2668 TRUE 2669}) 2670@ %def 2671%% **** need some assembly code examples?? 2672 2673 2674\subsection{Inlining handlers for [[&&]] and [[||]] expressions} 2675In many languages it is possible to convert the expression [[a && b]] 2676to an equivalent [[if]] expression of the form 2677\begin{verbatim} 2678if (a) { if (b) TRUE else FALSE } 2679\end{verbatim} 2680Similarly, in these languages the expression [[a || b]] is equivalent 2681to 2682\begin{verbatim} 2683if (a) TRUE else if (b) TRUE else FALSE 2684\end{verbatim} 2685Compilation of these expressions is thus reduced to compiling [[if]] 2686expressions. 2687 2688Unfortunately, because of the possibility of [[NA]] values, these 2689equivalencies do not hold in R. In R, [[NA || TRUE]] should evaluate 2690to [[TRUE]] and [[NA && FALSE]] to [[FALSE]]. This is handled by 2691introducing special instructions [[AND1ST]] and [[AND2ND]] for [[&&]] 2692expressions and [[OR1ST]] and [[OR2ND]] for [[||]]. 2693 2694The code generator for [[&&]] expressions generates code to evaluate 2695the first argument and then emits an [[AND1ST]] instruction. The 2696[[AND1ST]] instruction has one operand, the label for the instruction 2697following code for the second argument. If the value on the stack 2698produced by the first argument is [[FALSE]] then [[AND1ST]] jumps to 2699the label and skips evaluation of the second argument; the value of 2700the expression is [[FALSE]]. The code for the second argument is 2701generated next, followed by an [[AND2ND]] instruction. This removes 2702the values of the two arguments to [[&&]] from the stack and pushes 2703the value of the expression onto the stack. A [[RETURN]] instruction 2704is generated if the [[&&]] expression was in tail position. 2705%% **** check over all uses of argContext vs nonTailCallContext 2706%% **** The first argument can use nonTailCallCOntext because nothing 2707%% **** is on the stack yet. The second one has to use argContext. 2708%% **** This wouldn't be an issue if break/next could reset the stack 2709%% **** before the jump. 2710<<inlining handler for [[&&]]>>= 2711setInlineHandler("&&", function(e, cb, cntxt) { 2712 ## **** arity check?? 2713 ncntxt <- make.argContext(cntxt) 2714 callidx <- cb$putconst(e) 2715 label <- cb$makelabel() 2716 cmp(e[[2]], cb, ncntxt) 2717 cb$putcode(AND1ST.OP, callidx, label) 2718 cmp(e[[3]], cb, ncntxt) 2719 cb$putcode(AND2ND.OP, callidx) 2720 cb$putlabel(label) 2721 if (cntxt$tailcall) 2722 cb$putcode(RETURN.OP) 2723 TRUE 2724}) 2725@ %def 2726 2727The code generator for [[||]] expressions is analogous. 2728<<inlining handler for [[||]]>>= 2729setInlineHandler("||", function(e, cb, cntxt) { 2730 ## **** arity check?? 2731 ncntxt <- make.argContext(cntxt) 2732 callidx <- cb$putconst(e) 2733 label <- cb$makelabel() 2734 cmp(e[[2]], cb, ncntxt) 2735 cb$putcode(OR1ST.OP, callidx, label) 2736 cmp(e[[3]], cb, ncntxt) 2737 cb$putcode(OR2ND.OP, callidx) 2738 cb$putlabel(label) 2739 if (cntxt$tailcall) 2740 cb$putcode(RETURN.OP) 2741 TRUE 2742}) 2743@ %def 2744 2745 2746\section{Loops} 2747\label{sec:loops} 2748In principle code for [[repeat]] and [[while]] loops can be generated 2749using just [[GOTO]] and [[BRIFNOT]] instructions. [[for]] loops 2750require a little more to manage the loop variable and termination. A 2751complication arises due to the need to support [[break]] and [[next]] 2752calls in the context of lazy evaluation of arguments: if a [[break]] 2753or [[next]] expression appears in a function argument that is compiled 2754as a closure, then the expression may be evaluated deep inside a 2755series of nested function calls and require a non-local jump. A 2756similar issue arises for calls to the [[return]] function as described 2757in Section \ref{subsec:return}. 2758 2759To support these non-local jumps the interpreter sets up a [[setjmp]] 2760context for each loop, and [[break]] and [[next]] use [[longjmp]] to 2761transfer control. In general, compiled loops need to use a similar 2762approach. For now, this is implemented by the [[STARTLOOPCNTXT]] and 2763[[ENDLOOPCNTXT]] instructions. The [[STARTLOOPCNTXT]] instructions 2764takes two operands, a flag indicating whether the loop is a [[for]] 2765loop or not, and a label which points after the loop. The interpreter 2766jumps to this label in case of a non-local jump implementing 2767[[break]]. The loop body should end with a call to [[ENDLOOPCNTXT]], 2768which takes one operand indicating whether this is a [[for]] loop or 2769not. [[ENDLOOPCNTXT]] terminates the context established by 2770[[STARTLOOPCNTXT]] and pops it off the context stack. The context 2771data is stored on the byte code interpreter stack; in the case of a 2772[[for]] loop some loop state information is duplicated on the stack by 2773[[STARTLOOPCNTXT]] and removed again by [[ENDLOOPCNTXT]]. The byte 2774code intepreter stores the [[pc]] in a slot in the [[RCNTXT]] 2775structure so it is available after a [[longjmp]] triggered by a 2776[[break]] for retrieving the label on the [[ENDLOOPCNTXT]] 2777instruction. An alternative would be to add separate 2778[[STARTFORLOOPCNTXT]] and [[ENDFORLOOPCNTXT]] instructions. Then the 2779[[pc]] or the label could be stored on the note stack. 2780 2781At least with some assumptions it is often possible to implement 2782[[break]] and [[next]] calls as simple [[GOTO]]s. If all [[break]] 2783and [[next]] calls in a loop can be implemented using [[GOTO]]s then 2784the loop context is not necessary. The mechanism to enable the 2785simpler code generation is presented in Section 2786\ref{subsec:skipcntxt}. 2787 2788The current engine implementation executes one [[setjmp]] per 2789[[STARTLOOPCNTXT]] and uses nested calls to [[bceval]] to run the 2790code. Eventually we should be able to reduce the need for nested 2791[[bceval]] calls and to arrange that [[setjmp]] buffers be reused for 2792multiple purposes. 2793 2794 2795\subsection{[[repeat]] loops} 2796The simplest loop in R is the [[repeat]] loop. The code generator is 2797defined as 2798<<inlining handler for [[repeat]] loops>>= 2799setInlineHandler("repeat", function(e, cb, cntxt) { 2800 body <- e[[2]] 2801 <<generate context and body for [[repeat]] loop>> 2802 <<generate [[repeat]] and [[while]] loop wrap-up code>> 2803 TRUE 2804}) 2805@ %def 2806 2807If a loop context is not needed then the code for the loop body is 2808just written to the original code buffer. The [[else]] clause in the 2809code chunk below generates the code for the general case. The need for 2810using [[RETURNJMP]] for [[return]] calls is indicated by setting the 2811[[needRETURNJMP]] flag in the compiler context to [[TRUE]]. 2812<<generate context and body for [[repeat]] loop>>= 2813if (checkSkipLoopCntxt(body, cntxt)) 2814 cmpRepeatBody(body, cb, cntxt) 2815else { 2816 cntxt$needRETURNJMP <- TRUE ## **** do this a better way 2817 ljmpend.label <- cb$makelabel() 2818 cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label) 2819 cmpRepeatBody(body, cb, cntxt) 2820 cb$putlabel(ljmpend.label) 2821 cb$putcode(ENDLOOPCNTXT.OP, 0) 2822} 2823@ %def 2824 2825The loop body uses two labels. [[loop.label]] marks the top of the 2826loop and is the target of the [[GOTO]] instruction at the end of the 2827body. This label is also used by [[next]] expressions that do not 2828require [[longjmp]]s. The [[end.loop]] label is placed after the 2829[[GOTO]] instruction and is used by [[break]] expressions that do not 2830require [[longjmp]]s. The body is compiled in a context that makes 2831these labels available, and the value left on the stack is removed by 2832a [[POP]] instruction. The [[POP]] instruction is followed by a 2833[[GOTO]] instruction that returns to the top of the loop. 2834<<[[cmpRepeatBody]] function>>= 2835cmpRepeatBody <- function(body, cb, cntxt) { 2836 loop.label <- cb$makelabel() 2837 end.label <- cb$makelabel() 2838 cb$putlabel(loop.label) 2839 lcntxt <- make.loopContext(cntxt, loop.label, end.label) 2840 cmp(body, cb, lcntxt) 2841 cb$putcode(POP.OP) 2842 cb$putcode(GOTO.OP, loop.label) 2843 cb$putlabel(end.label) 2844} 2845@ %def cmpRepeatBody 2846 2847 2848The wrap-up code for the loop places the [[NULL]] value of the loop 2849expression on the stack and emits [[INVISIBLE]] and [[RETURN]] 2850instructions to return the value if the loop appears in tail position. 2851<<generate [[repeat]] and [[while]] loop wrap-up code>>= 2852cb$putcode(LDNULL.OP) 2853if (cntxt$tailcall) { 2854 cb$putcode(INVISIBLE.OP) 2855 cb$putcode(RETURN.OP) 2856} 2857@ %def 2858 2859The [[break]] and [[next]] code generators emit [[GOTO]] instructions 2860if the loop information is available and the [[gotoOK]] compiler 2861context flag is [[TRUE]]. A warning is issued if no loop is visible 2862in the compilation context. 2863<<inlining handlers for [[next]] and [[break]]>>= 2864setInlineHandler("break", function(e, cb, cntxt) { 2865 if (is.null(cntxt$loop)) { 2866 notifyWrongBreakNext("break", cntxt, loc = cb$savecurloc()) 2867 cmpSpecial(e, cb, cntxt) 2868 } 2869 else if (cntxt$loop$gotoOK) { 2870 cb$putcode(GOTO.OP, cntxt$loop$end) 2871 TRUE 2872 } 2873 else cmpSpecial(e, cb, cntxt) 2874}) 2875 2876setInlineHandler("next", function(e, cb, cntxt) { 2877 if (is.null(cntxt$loop)) { 2878 notifyWrongBreakNext("next", cntxt, loc = cb$savecurloc()) 2879 cmpSpecial(e, cb, cntxt) 2880 } 2881 else if (cntxt$loop$gotoOK) { 2882 cb$putcode(GOTO.OP, cntxt$loop$loop) 2883 TRUE 2884 } 2885 else cmpSpecial(e, cb, cntxt) 2886}) 2887@ %def 2888 2889 2890\subsection{[[while]] loops} 2891%% could just compile repeat{ if (condition) body else break } ?? 2892The structure for the [[while]] loop code generator is similar to the 2893structure of the [[repeat]] code generator: 2894<<inlining handler for [[while]] loops>>= 2895setInlineHandler("while", function(e, cb, cntxt) { 2896 cond <- e[[2]] 2897 body <- e[[3]] 2898 <<generate context and body for [[while]] loop>> 2899 <<generate [[repeat]] and [[while]] loop wrap-up code>> 2900 TRUE 2901}) 2902@ %def 2903The context and body generation chunk is similar as well. The 2904expression stored in the code object isn't quite right as what is 2905compiled includes both the test and the body, but this code object 2906should not be externally visible. 2907<<generate context and body for [[while]] loop>>= 2908if (checkSkipLoopCntxt(cond, cntxt) && checkSkipLoopCntxt(body, cntxt)) 2909 cmpWhileBody(e, cond, body, cb, cntxt) 2910else { 2911 cntxt$needRETURNJMP <- TRUE ## **** do this a better way 2912 ljmpend.label <- cb$makelabel() 2913 cb$putcode(STARTLOOPCNTXT.OP, 0, ljmpend.label) 2914 cmpWhileBody(e, cond, body, cb, cntxt) 2915 cb$putlabel(ljmpend.label) 2916 cb$putcode(ENDLOOPCNTXT.OP, 0) 2917} 2918@ %def 2919 2920Again two labels are used, one at the top of the loop and one at the 2921end. The [[loop.label]] is followed by code for the test. Next is a 2922[[BRIFNOT]] instruction that jumps to the end of the loop if the value 2923left on the stack by the test is [[FALSE]]. This is followed by the 2924code for the body, a [[POP]] instruction, and a [[GOTO]] instruction 2925that jumps to the top of the loop. Finally, the [[end.label]] is 2926recorded. 2927<<[[cmpWhileBody]] function>>= 2928cmpWhileBody <- function(call, cond, body, cb, cntxt) { 2929 loop.label <- cb$makelabel() 2930 end.label <- cb$makelabel() 2931 cb$putlabel(loop.label) 2932 lcntxt <- make.loopContext(cntxt, loop.label, end.label) 2933 cmp(cond, cb, lcntxt) 2934 callidx <- cb$putconst(call) 2935 cb$putcode(BRIFNOT.OP, callidx, end.label) 2936 cmp(body, cb, lcntxt) 2937 cb$putcode(POP.OP) 2938 cb$putcode(GOTO.OP, loop.label) 2939 cb$putlabel(end.label) 2940} 2941@ cmpWhileBody 2942 2943 2944\subsection{[[for]] loops} 2945%% could compile repeat { if (stepfor) body else break } and peephole a bit ?? 2946Code generation for [[for]] loops is a little more complex because of 2947the need to manage the loop variable value and stepping through the 2948sequence. Code for [[for]] loops uses three additional instructions. 2949[[STARTFOR]] takes the constant pool index of the call, the constant 2950pool index of the loop variable symbol, and the label of the start 2951instruction as operands. It finds the sequence to iterate over on the 2952stack and places information for accessing the loop variable binding 2953and stepping the sequence on the stack before jumping to the label. 2954The call is used if an error for an improper for loop sequence needs 2955to be signaled. The [[STEPFOR]] instruction takes a label for the top 2956of the loop as its operand. If there are more elements in the 2957sequence then [[STEPFOR]] advances the position within the sequence, 2958sets the loop variable, and jumps to the top of the loop. Otherwise 2959it drops through to the next instruction. Finally [[ENDFOR]] cleans 2960up the loop information stored on the stack by [[STARTFOR]] and leaves 2961the [[NULL]] loop value on the stack. 2962 2963The inlining handler for a [[for]] loop starts out by checking the 2964loop variable and issuing a warning if it is not a symbol. The code 2965generator then declines to inline the loop expression. This means it 2966is compiled as a generic function call and will signal an error at 2967runtime. An alternative would be do generate code to signal the error 2968as is done with improper use of [[...]] arguments. After checking the 2969symbol, code to compute the sequence to iterate over is generated. 2970From then on the structure is similar to the structure of the other 2971loop code generators. 2972%% **** do cmpSpecial instead of returning FALSE?? 2973<<inlining handler for [[for]] loops>>= 2974setInlineHandler("for", function(e, cb, cntxt) { 2975 sym <- e[[2]] 2976 seq <- e[[3]] 2977 body <- e[[4]] 2978 if (! is.name(sym)) { 2979 ## not worth warning here since the parser should not allow this 2980 return(FALSE) 2981 } 2982 ncntxt <- make.nonTailCallContext(cntxt) 2983 cmp(seq, cb, ncntxt) 2984 ci <- cb$putconst(sym) 2985 callidx <- cb$putconst(e) 2986 <<generate context and body for [[for]] loop>> 2987 <<generate [[for]] loop wrap-up code>> 2988 TRUE 2989}) 2990@ %def 2991 2992When a [[setjmp]] context is needed, the label given to [[STARTFOR]] 2993is just the following instruction, which is a [[STARTLOOPCNTXT]] 2994instruction. If the context is not needed then the label for the 2995[[STARTFOR]] instruction will be the loop's [[STEPFOR]] instruction; 2996if the context is needed then the first instruction in the code object 2997for the body will be a [[GOTO]] instruction that jumps to the 2998[[STEPFOR]] instruction. This design means the stepping and the jump 2999can be handled by one instruction instead of two, a step instruction 3000and a [[GOTO]]. 3001<<generate context and body for [[for]] loop>>= 3002if (checkSkipLoopCntxt(body, cntxt)) 3003 cmpForBody(callidx, body, ci, cb, cntxt) 3004else { 3005 cntxt$needRETURNJMP <- TRUE ## **** do this a better way 3006 ctxt.label <- cb$makelabel() 3007 cb$putcode(STARTFOR.OP, callidx, ci, ctxt.label) 3008 cb$putlabel(ctxt.label) 3009 ljmpend.label <- cb$makelabel() 3010 cb$putcode(STARTLOOPCNTXT.OP, 1, ljmpend.label) 3011 cmpForBody(NULL, body, NULL, cb, cntxt) 3012 cb$putlabel(ljmpend.label) 3013 cb$putcode(ENDLOOPCNTXT.OP, 1) 3014} 3015@ %def 3016 3017The body code generator takes an additional argument, the index of the 3018loop label. For the case where a [[setjmp]] context is needed this 3019argument is [[NULL]], and the first instruction generated is a 3020[[GOTO]] targeting the [[STEPFOR]] instruction. This is labeled by 3021the [[loop.label]] label, since this will also be the target used by a 3022[[next]] expression. An additional label, [[body.label]] is needed for 3023the top of the loop, which is used by [[STEPFOR]] if there are more 3024loop elements to process. When the [[ci]] argument is not [[NULL]] 3025code is being generated for the case without a [[setjmp]] context, and 3026the first instruction is the [[STARTFOR]] instruction which 3027initializes the loop and jumps to [[loop.label]] at the [[STEPLOOP]] 3028instruction. 3029<<[[cmpForBody]] function>>= 3030cmpForBody <- function(callidx, body, ci, cb, cntxt) { 3031 body.label <- cb$makelabel() 3032 loop.label <- cb$makelabel() 3033 end.label <- cb$makelabel() 3034 if (is.null(ci)) 3035 cb$putcode(GOTO.OP, loop.label) 3036 else 3037 cb$putcode(STARTFOR.OP, callidx, ci, loop.label) 3038 cb$putlabel(body.label) 3039 lcntxt <- make.loopContext(cntxt, loop.label, end.label) 3040 cmp(body, cb, lcntxt) 3041 cb$putcode(POP.OP) 3042 cb$putlabel(loop.label) 3043 cb$putcode(STEPFOR.OP, body.label) 3044 cb$putlabel(end.label) 3045} 3046@ %def cmpForBody 3047 3048The wrap-up code issues an [[ENDFOR]] instruction instead of the 3049[[LDNULL]] instruction used for [[repeat]] and [[while]] loops. 3050<<generate [[for]] loop wrap-up code>>= 3051cb$putcode(ENDFOR.OP) 3052if (cntxt$tailcall) { 3053 cb$putcode(INVISIBLE.OP) 3054 cb$putcode(RETURN.OP) 3055} 3056@ %def 3057 3058 3059\subsection{Avoiding runtime loop contexts} 3060\label{subsec:skipcntxt} 3061When all uses of [[break]] or [[next]] in a loop occur only in top 3062level contexts then all [[break]] and [[next]] calls can be 3063implemented with simple [[GOTO]] instructions and a [[setjmp]] context 3064for the loop is not needed. Top level contexts are the loop body 3065itself and argument expressions in top level calls to [[if]], [[{]], 3066and [[(]]. The [[switch]] functions will eventually be included as well. 3067%% **** need to add switch to the top level functions 3068%% **** may not be OK if switch uses vmpSpecial because of ... arg 3069The function [[checkSkipLoopContxt]] recursively traverses an 3070expression tree to determine whether all relevant uses of [[break]] or 3071[[next]] are safe to compile as [[GOTO]] instructions. The search 3072returns [[FALSE]] if a [[break]] or [[next]] call occurs in an unsafe 3073place. The search stops and returns [[TRUE]] for any expression that 3074cannot contain relevant [[break]] or [[next]] calls. These stop 3075expressions are calls to the three loop functions and to [[function]]. 3076Calls to functions like [[quote]] that are known not to evaluate their 3077arguments could also be included among the stop functions but this 3078doesn't seem particularly worth while at this time. Loops that include a 3079call to [[eval]] (or [[evalq]], [[source]]) are compiled with context to support a 3080programming pattern present e.g. in package [[Rmpi]]: a server application 3081is implemented using an infinite loop, which evaluates de-serialized code 3082received from the client; the server shuts down when it receives a 3083serialized version of [[break]]. 3084 3085The recursive checking function is defined as 3086<<[[checkSkipLoopCntxt]] function>>= 3087checkSkipLoopCntxt <- function(e, cntxt, breakOK = TRUE) { 3088 if (typeof(e) == "language") { 3089 fun <- e[[1]] 3090 if (typeof(fun) == "symbol") { 3091 fname <- as.character(fun) 3092 if (! breakOK && fname %in% c("break", "next")) 3093 FALSE 3094 else if (isLoopStopFun(fname, cntxt)) 3095 TRUE 3096 else if (isLoopTopFun(fname, cntxt)) 3097 checkSkipLoopCntxtList(e[-1], cntxt, breakOK) 3098 else if (fname %in% c("eval", "evalq", "source")) 3099 FALSE 3100 else 3101 checkSkipLoopCntxtList(e[-1], cntxt, FALSE) 3102 } 3103 else 3104 checkSkipLoopCntxtList(e, cntxt, FALSE) 3105 } 3106 else TRUE 3107} 3108@ %def checkSkipLoopCntxt 3109A version that operates on a list of expressions is given by 3110<<[[checkSkipLoopCntxtList]] function>>= 3111checkSkipLoopCntxtList <- function(elist, cntxt, breakOK) { 3112 for (a in as.list(elist)) 3113 if (! missing(a) && ! checkSkipLoopCntxt(a, cntxt, breakOK)) 3114 return(FALSE) 3115 TRUE 3116} 3117@ %def checkSkipLoopCntxtList 3118 3119The stop functions are identified by [[isLoopStopFun]]. This uses 3120[[isBaseVar]] to ensure that interpreting a reference to a stop 3121function name as referring to the corresponding function in the 3122[[base]] package is permitted by the current optimization settings. 3123%% **** could also stop for quote() and some others. 3124<<[[isLoopStopFun]] function>>= 3125isLoopStopFun <- function(fname, cntxt) 3126 (fname %in% c("function", "for", "while", "repeat") && 3127 isBaseVar(fname, cntxt)) 3128@ %def isLoopStopFun 3129 3130The top level functions are identified by [[isLoopTopFun]]. Again the 3131compilation context is consulted to ensure that candidate can be 3132assumed to be from the [[base]] package. 3133%% **** eventually add "switch" 3134<<[[isLoopTopFun]] function>>= 3135isLoopTopFun <- function(fname, cntxt) 3136 (fname %in% c("(", "{", "if") && 3137 isBaseVar(fname, cntxt)) 3138@ %def isLoopTopFun 3139 3140The [[checkSkipLoopCntxt]] function does not check whether calls to 3141[[break]] or [[next]] are indeed calls to the [[base]] functions. 3142Given the special syntactic nature of [[break]] and [[next]] this is 3143very unlikely to cause problems, but if it does it will result in some 3144safe loops being considered unsafe and so errs in the conservative 3145direction. 3146 3147 3148\section{More inlining} 3149 3150\subsection{Basic arithmetic expressions} 3151The addition and subtraction functions [[+]] and [[-]] are [[BUILTIN]] 3152functions that can both be called with one or two arguments. 3153Multiplication and division functions [[*]] and [[/]] require two 3154arguments. Since code generation for all one arguments cases and all 3155two argument cases is very similar these are abstracted out into 3156functions [[cmpPrim1]] and [[cmpPrim2]]. 3157 3158The code generators for addition and subtraction are given by 3159<<inline handlers for [[+]] and [[-]]>>= 3160setInlineHandler("+", function(e, cb, cntxt) { 3161 if (length(e) == 3) 3162 cmpPrim2(e, cb, ADD.OP, cntxt) 3163 else 3164 cmpPrim1(e, cb, UPLUS.OP, cntxt) 3165}) 3166 3167setInlineHandler("-", function(e, cb, cntxt) { 3168 if (length(e) == 3) 3169 cmpPrim2(e, cb, SUB.OP, cntxt) 3170 else 3171 cmpPrim1(e, cb, UMINUS.OP, cntxt) 3172}) 3173@ %def 3174The code generators for multiplication and division are 3175<<inline handlers for [[*]] and [[/]]>>= 3176setInlineHandler("*", function(e, cb, cntxt) 3177 cmpPrim2(e, cb, MUL.OP, cntxt)) 3178 3179setInlineHandler("/", function(e, cb, cntxt) 3180 cmpPrim2(e, cb, DIV.OP, cntxt)) 3181@ %def 3182 3183Code for instructions corresponding to calls to a [[BUILTIN]] function 3184with one argument are generated by [[cmpPrim1]]. The generator 3185produces code for a generic [[BUILTIN]] call using [[cmpBuiltin]] if 3186if there are any missing or [[...]] arguments or if the number of 3187arguments is not equal to one. Otherwise code for the argument is 3188generated in a non-tail-call context, and the instruction provided as 3189the [[op]] argument is emitted followed by a [[RETURN]] instruction 3190for an expression in tail position. The [[op]] instructions take the 3191call as operand for use in error message and for internal dispatching. 3192<<[[cmpPrim1]] function>>= 3193cmpPrim1 <- function(e, cb, op, cntxt) { 3194 if (dots.or.missing(e[-1])) 3195 cmpBuiltin(e, cb, cntxt) 3196 else if (length(e) != 2) { 3197 notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc()) 3198 cmpBuiltin(e, cb, cntxt) 3199 } 3200 else { 3201 ncntxt <- make.nonTailCallContext(cntxt) 3202 cmp(e[[2]], cb, ncntxt); 3203 ci <- cb$putconst(e) 3204 cb$putcode(op, ci) 3205 if (cntxt$tailcall) 3206 cb$putcode(RETURN.OP) 3207 TRUE 3208 } 3209} 3210@ %def cmpPrim1 3211 3212Code generation for the two argument case is similar, except that the 3213second argument has to be compiled with an argument context since the 3214stack already has the value of the first argument on it and that would 3215need to be popped before a jump. 3216<<[[cmpPrim2]] function>>= 3217cmpPrim2 <- function(e, cb, op, cntxt) { 3218 if (dots.or.missing(e[-1])) 3219 cmpBuiltin(e, cb, cntxt) 3220 else if (length(e) != 3) { 3221 notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc()) 3222 cmpBuiltin(e, cb, cntxt) 3223 } 3224 else { 3225 needInc <- checkNeedsInc(e[[3]], cntxt) 3226 ncntxt <- make.nonTailCallContext(cntxt) 3227 cmp(e[[2]], cb, ncntxt); 3228 if (needInc) cb$putcode(INCLNK.OP) 3229 ncntxt <- make.argContext(cntxt) 3230 cmp(e[[3]], cb, ncntxt) 3231 if (needInc) cb$putcode(DECLNK.OP) 3232 ci <- cb$putconst(e) 3233 cb$putcode(op, ci) 3234 if (cntxt$tailcall) 3235 cb$putcode(RETURN.OP) 3236 TRUE 3237 } 3238} 3239@ %def cmpPrim2 3240 3241The [[INCLNK]] and [[DECLNK]] instructions are used to protect 3242evaluated arguents on the stack from modifications during evaluation 3243of subsequent arguments. These instructions can be omitted if the 3244subsequent argument evaluations cannot modify values on the stack. 3245With changes to stack protection this is mechanism is no longer 3246needed, so the check just returns [[FALSE]]. 3247<<[[checkNeedsInc]] function>>= 3248checkNeedsInc <- function(e, cntxt) 3249 return(FALSE) 3250@ %def checkNeedsInc 3251<<old [[checkNeedsInc]] function>>= 3252checkNeedsInc <- function(e, cntxt) { 3253 type <- typeof(e) 3254 if (type %in% c("language", "bytecode", "promise")) 3255 TRUE 3256 else FALSE ## symbols and constants 3257} 3258@ %def checkNeedsInc 3259 3260Calls to the power function [[^]] and the functions [[exp]] and 3261[[sqrt]] can be compiled using [[cmpPrim1]] and [[cmpPrim2]] as well: 3262<<inline handlers for [[^]], [[exp]], and [[sqrt]]>>= 3263setInlineHandler("^", function(e, cb, cntxt) 3264 cmpPrim2(e, cb, EXPT.OP, cntxt)) 3265 3266setInlineHandler("exp", function(e, cb, cntxt) 3267 cmpPrim1(e, cb, EXP.OP, cntxt)) 3268 3269setInlineHandler("sqrt", function(e, cb, cntxt) 3270 cmpPrim1(e, cb, SQRT.OP, cntxt)) 3271@ 3272 3273The [[log]] function is currently defined as a [[SPECIAL]]. The 3274default inline handler action is therefore to use [[cmpSpecial]]. For 3275calls with one unnamed argument the [[LOG.OP]] instruction is 3276used. For two unnamed arguments [[LOGBASE.OP]] is used. It might be 3277useful to introduce instructions for [[log2]] and [[log10]] as well 3278but this has not been done yet. 3279<<inline handler for [[log]]>>= 3280setInlineHandler("log", function(e, cb, cntxt) { 3281 if (dots.or.missing(e) || ! is.null(names(e)) || 3282 length(e) < 2 || length(e) > 3) 3283 cmpSpecial(e, cb, cntxt) 3284 else { 3285 ci <- cb$putconst(e) 3286 ncntxt <- make.nonTailCallContext(cntxt) 3287 cmp(e[[2]], cb, ncntxt); 3288 if (length(e) == 2) 3289 cb$putcode(LOG.OP, ci) 3290 else { 3291 needInc <- checkNeedsInc(e[[3]], cntxt) 3292 if (needInc) cb$putcode(INCLNK.OP) 3293 ncntxt <- make.argContext(cntxt) 3294 cmp(e[[3]], cb, ncntxt) 3295 if (needInc) cb$putcode(DECLNK.OP) 3296 cb$putcode(LOGBASE.OP, ci) 3297 } 3298 if (cntxt$tailcall) 3299 cb$putcode(RETURN.OP) 3300 TRUE 3301 } 3302}) 3303@ 3304 3305A number of one argument math functions are handled by the interpreter 3306using the function [[math1]] in [[arithmetic.c]]. The [[MATH1.OP]] 3307instruction handles these for compuled code. The instruction takes two 3308operands, an index for the call expression in the constant table, and 3309an index for the function to be called in a table of function 3310pointers. The table of names in the byte code compiler has to match 3311the function pointer array in the byte code interpreter. It would 3312have been possible to use the same indices as the offset values used 3313in [[names.c]], but keeping this consistent seemed more challenging. 3314<<list of one argument math functions>>= 3315## Keep the order consistent with the order in the internal byte code 3316## interpreter! 3317math1funs <- c("floor", "ceiling", "sign", 3318 "expm1", "log1p", 3319 "cos", "sin", "tan", "acos", "asin", "atan", 3320 "cosh", "sinh", "tanh", "acosh", "asinh", "atanh", 3321 "lgamma", "gamma", "digamma", "trigamma", 3322 "cospi", "sinpi", "tanpi") 3323@ %def math1funs 3324 3325The code generation is done by [[cmpMath1]]: 3326<<[[cmpMath1]] function>>= 3327cmpMath1 <- function(e, cb, cntxt) { 3328 if (dots.or.missing(e[-1])) 3329 cmpBuiltin(e, cb, cntxt) 3330 else if (length(e) != 2) { 3331 notifyWrongArgCount(e[[1]], cntxt, loc = cb$savecurloc()) 3332 cmpBuiltin(e, cb, cntxt) 3333 } 3334 else { 3335 name <- as.character(e[[1]]) 3336 idx <- match(name, math1funs) - 1 3337 if (is.na(idx)) 3338 cntxt$stop( 3339 paste(sQuote(name), "is not a registered math1 function"), 3340 cntxt, loc = cb$savecurloc()) 3341 ncntxt <- make.nonTailCallContext(cntxt) 3342 cmp(e[[2]], cb, ncntxt); 3343 ci <- cb$putconst(e) 3344 cb$putcode(MATH1.OP, ci, idx) 3345 if (cntxt$tailcall) 3346 cb$putcode(RETURN.OP) 3347 TRUE 3348 } 3349} 3350@ %def cmpMath1 3351The generators are installed by 3352<<inline one argument math functions>>= 3353for (name in math1funs) 3354 setInlineHandler(name, cmpMath1) 3355@ 3356 3357\subsection{Logical operators} 3358Two argument instructions are provided for the comparison operators 3359and code for them can be generated using [[cmpPrim2]]: 3360<<inline handlers for comparison operators>>= 3361setInlineHandler("==", function(e, cb, cntxt) 3362 cmpPrim2(e, cb, EQ.OP, cntxt)) 3363 3364setInlineHandler("!=", function(e, cb, cntxt) 3365 cmpPrim2(e, cb, NE.OP, cntxt)) 3366 3367setInlineHandler("<", function(e, cb, cntxt) 3368 cmpPrim2(e, cb, LT.OP, cntxt)) 3369 3370setInlineHandler("<=", function(e, cb, cntxt) 3371 cmpPrim2(e, cb, LE.OP, cntxt)) 3372 3373setInlineHandler(">=", function(e, cb, cntxt) 3374 cmpPrim2(e, cb, GE.OP, cntxt)) 3375 3376setInlineHandler(">", function(e, cb, cntxt) 3377 cmpPrim2(e, cb, GT.OP, cntxt)) 3378@ %def 3379 3380The vectorized [[&]] and [[|]] functions are handled similarly: 3381<<inline handlers for [[&]] and [[|]]>>= 3382setInlineHandler("&", function(e, cb, cntxt) 3383 cmpPrim2(e, cb, AND.OP, cntxt)) 3384 3385setInlineHandler("|", function(e, cb, cntxt) 3386 cmpPrim2(e, cb, OR.OP, cntxt)) 3387@ %def 3388 3389The negation operator [[!]] takes only one argument and code for calls 3390to it are generated using [[cmpPrim1]]: 3391<<inline handler for [[!]]>>= 3392setInlineHandler("!", function(e, cb, cntxt) 3393 cmpPrim1(e, cb, NOT.OP, cntxt)) 3394@ %def 3395 3396%% **** do log() somewhere around here? 3397%% **** is log(x,) == log(x)??? 3398%% **** is log(,y) allowed? 3399 3400 3401\subsection{Subsetting and related operations} 3402\label{subsec:subset} 3403Current R semantics are such that the subsetting operator [[[]] and a 3404number of others may not evaluate some of their arguments if S3 or S4 3405methods are available. S-plus has different semantics---there the 3406subsetting operator is guaranteed to evaluate its arguments. 3407% In the case of the concatenation function [[c]] it is not clear 3408% whether these semantics are worth preserving; changing [[c]] to a 3409% [[BUILTIN]] seems to cause no problems on [[CRAN]] and [[BioC]] 3410% packages tested. 3411For subsetting there are [[CRAN]] packages that use non-standard 3412evaluation of their arguments ([[igraph]] is one example), so this 3413probably can no longer be changed. 3414 3415The compiler preserve these semantics. To do so subsetting is 3416implemented in terms of two instructions, [[STARTSUBSET]] and 3417[[DFLTSUBSET]]. The object being subsetted is evaluated and placed on 3418the stack. [[STARTSUBSET]] takes a constant table index for the 3419expression and a label operand as operands and examines the object on 3420the stack. If an internal S3 or S4 dispatch succeeds then the 3421receiver object is removed and the result is placed on the stack and a 3422jump to the label is carried out. If the dispatch fails then code to 3423evaluate and execute the arguments is executed followed by a 3424[[DFLTSUBSET]] instruction. 3425This pattern is used for several other operations and is abstracted 3426into the code generation function [[cmpDispatch]]. Code for subsetting 3427and other operations is then generated by 3428<<inlining handlers for some dispatching SPECIAL functions>>= 3429# **** this is now handled differently; see "Improved subset ..." 3430# setInlineHandler("[", function(e, cb, cntxt) 3431# cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt)) 3432 3433# **** c() is now a BUILTIN 3434# setInlineHandler("c", function(e, cb, cntxt) 3435# cmpDispatch(STARTC.OP, DFLTC.OP, e, cb, cntxt, FALSE)) 3436 3437# **** this is now handled differently; see "Improved subset ..." 3438# setInlineHandler("[[", function(e, cb, cntxt) 3439# cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt)) 3440@ 3441 3442The [[cmpDispatch]] function takes the two opcodes as arguments. It 3443declines to handle cases with [[...]] arguments in the call or with a 3444missing first argument --- these will be handled as calls to a 3445[[SPECIAL]] primitive. For the case handled it generates code for the 3446first argument, followed by a call to the first [[start.op]] 3447instruction. The operands for the [[start.op]] are a constant pool 3448index for the expression and a label for the instruction following the 3449[[dflt.op]] instruction that allows skipping over the default case 3450code. The default case code consists of code to compute and push the 3451arguments followed by the [[dflt.op]] instruction. 3452<<[[cmpDispatch]] function>>= 3453cmpDispatch <- function(start.op, dflt.op, e, cb, cntxt, missingOK = TRUE) { 3454 if ((missingOK && any.dots(e)) || 3455 (! missingOK && dots.or.missing(e)) || 3456 length(e) == 1) 3457 cmpSpecial(e, cb, cntxt) ## punt 3458 else { 3459 ne <- length(e) 3460 oe <- e[[2]] 3461 if (missing(oe)) 3462 cmpSpecial(e, cb, cntxt) ## punt 3463 else { 3464 ncntxt <- make.argContext(cntxt) 3465 cmp(oe, cb, ncntxt) 3466 ci <- cb$putconst(e) 3467 end.label <- cb$makelabel() 3468 cb$putcode(start.op, ci, end.label) 3469 if (ne > 2) 3470 cmpBuiltinArgs(e[-(1:2)], names(e)[-(1:2)], cb, cntxt, 3471 missingOK) 3472 cb$putcode(dflt.op) 3473 cb$putlabel(end.label) 3474 if (cntxt$tailcall) cb$putcode(RETURN.OP) 3475 TRUE 3476 } 3477 } 3478} 3479@ %def cmpDispatch 3480%% **** The implementation currently implies that arguments to things 3481%% **** with S4 methods may be evaluated more than once if dispatch 3482%% **** does not happen. It would be better to rewrite this so if 3483%% **** arguments are evaluated we stay with the interpreted version 3484%% **** all the way. This requires a bit of refactoring of 3485%% **** DispatchOrEval code to get it to work. But it should not 3486%% **** affect the compiler. 3487%% **** 3488%% **** There may be some merit to always go with the interpreted code 3489%% **** if the receiver has the object bit set -- that way the 3490%% **** sequence could be done as 3491%% **** 3492%% **** if (object bit set) 3493%% **** CALLSPECIAL 3494%% **** else 3495%% **** do default thing 3496%% **** 3497%% **** and in some cases the object bit test can be hoisted. 3498 3499The [[$]] function is simpler to implement since its selector argument 3500is never evaluated. The [[DOLLAR]] instruction takes the object to 3501extract a component from off the stack and takes a constant index 3502argument specifying the selection symbol. 3503%% signal warning if selector is not a symbol or a string?? 3504%% also decline if any missing args? 3505<<inlining handler for [[$]]>>= 3506setInlineHandler("$", function(e, cb, cntxt) { 3507 if (any.dots(e) || length(e) != 3) 3508 cmpSpecial(e, cb, cntxt) 3509 else { 3510 sym <- if (is.character(e[[3]]) && length(e[[3]]) == 1 3511 && e[[3]] != "") 3512 as.name(e[[3]]) else e[[3]] 3513 if (is.name(sym)) { 3514 ncntxt <- make.argContext(cntxt) 3515 cmp(e[[2]], cb, ncntxt) 3516 ci <- cb$putconst(e) 3517 csi <- cb$putconst(sym) 3518 cb$putcode(DOLLAR.OP, ci, csi) 3519 if (cntxt$tailcall) cb$putcode(RETURN.OP) 3520 TRUE 3521 } 3522 else cmpSpecial(e, cb, cntxt) 3523 } 3524}) 3525@ %def 3526 3527 3528\subsection{Inlining simple [[.Internal]] functions} 3529A number of functions are defined as simple wrappers around 3530[[.Internal]] calls. One example is [[dnorm]], which is currently 3531defined as 3532\begin{verbatim} 3533dnorm <- function (x, mean = 0, sd = 1, log = FALSE) 3534 .Internal(dnorm(x, mean, sd, log)) 3535\end{verbatim} 3536The implementation of [[.Internal]] functions can be of type 3537[[BUILTIN]] or [[SPECIAL]]. The [[dnorm]] implementation is of type 3538[[BUILTIN]], so its arguments are guaranteed to be evaluated in order, 3539and this particular function doe not depend on the position of its 3540calls in the evaluation stack. As a result, a call of the form 3541\begin{verbatim} 3542dnorm(2, 1) 3543\end{verbatim} 3544can be replaced by the call 3545\begin{verbatim} 3546.Internal(dnorm(2, 1, 1, FALSE)) 3547\end{verbatim} 3548%% **** except for error messages maybe?? 3549This can result in considerable speed-up since it avoids the overhead 3550of the call to the wrapper function. 3551 3552The substitution of a call to the wrapper with a [[.Internal]] call 3553can be done by a function [[inlineSimpleInternalCall]] defined as 3554<<[[inlineSimpleInternalCall]] function>>= 3555inlineSimpleInternalCall <- function(e, def) { 3556 if (! dots.or.missing(e) && is.simpleInternal(def)) { 3557 forms <- formals(def) 3558 b <- body(def) 3559 if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{") 3560 b <- b[[2]] 3561 icall <- b[[2]] 3562 defaults <- forms ## **** could strip missings but OK not to? 3563 cenv <- c(as.list(match.call(def, e, F))[-1], defaults) 3564 subst <- function(n) 3565 if (typeof(n) == "symbol") cenv[[as.character(n)]] else n 3566 args <- lapply(as.list(icall[-1]), subst) 3567 as.call(list(quote(.Internal), as.call(c(icall[[1]], args)))) 3568 } 3569 else NULL 3570} 3571@ %def inlineSimpleInternalCall 3572 3573Code for an inlined simple internal function can then be generated by 3574[[cmpSimpleInternal]]: 3575<<[[cmpSimpleInternal]] function>>= 3576cmpSimpleInternal <- function(e, cb, cntxt) { 3577 if (any.dots(e)) 3578 FALSE 3579 else { 3580 name <- as.character(e[[1]]) 3581 def <- findFunDef(name, cntxt) 3582 if (! checkCall(def, e, NULL)) return(FALSE) 3583 call <- inlineSimpleInternalCall(e, def) 3584 if (is.null(call)) 3585 FALSE 3586 else 3587 cmpDotInternalCall(call, cb, cntxt) 3588 } 3589} 3590@ %def cmpSimpleInternal 3591 3592<<inline safe simple [[.Internal]] functions from [[base]]>>= 3593safeBaseInternals <- c("atan2", "besselY", "beta", "choose", 3594 "drop", "inherits", "is.vector", "lbeta", "lchoose", 3595 "nchar", "polyroot", "typeof", "vector", "which.max", 3596 "which.min", "is.loaded", "identical", 3597 "match", "rep.int", "rep_len") 3598 3599for (i in safeBaseInternals) setInlineHandler(i, cmpSimpleInternal) 3600@ %def safeBaseInternals 3601 3602%% **** nextn would also be OK with a broader definition of 'safe' 3603<<inline safe simple [[.Internal]] functions from [[stats]]>>= 3604safeStatsInternals <- c("dbinom", "dcauchy", "dgeom", "dhyper", "dlnorm", 3605 "dlogis", "dnorm", "dpois", "dunif", "dweibull", 3606 "fft", "mvfft", "pbinom", "pcauchy", 3607 "pgeom", "phyper", "plnorm", "plogis", "pnorm", 3608 "ppois", "punif", "pweibull", "qbinom", "qcauchy", 3609 "qgeom", "qhyper", "qlnorm", "qlogis", "qnorm", 3610 "qpois", "qunif", "qweibull", "rbinom", "rcauchy", 3611 "rgeom", "rhyper", "rlnorm", "rlogis", "rnorm", 3612 "rpois", "rsignrank", "runif", "rweibull", 3613 "rwilcox", "ptukey", "qtukey") 3614 3615for (i in safeStatsInternals) setInlineHandler(i, cmpSimpleInternal, "stats") 3616@ %def 3617 3618It is possible to automate the process of identifying functions with 3619the simple wrapper form and with [[.Internal]] implementations of type 3620[[BUILTIN]], and the function [[simpleInternals]] produces a list of 3621such candidates for a given package on the search path. But 3622determining whether such a candidate can be safely inlined needs to be 3623done manually. Most can, but some, such as [[sys.call]], cannot since 3624they depend on their position on the call stack (removing the wrapper 3625call that the implementation expects would change the result). 3626Nevertheless, [[simpleInternals]] is useful for providing a list of 3627candidates to screen. The [[is.simpleInternal]] function can be used 3628in test code to check that the assumption made in the compiler is 3629valid. The implementation is 3630<<[[simpleInternals]] function>>= 3631simpleInternals <- function(pos = "package:base") { 3632 names <- ls(pos = pos, all.names = TRUE) 3633 if (length(names) == 0) 3634 character(0) 3635 else { 3636 fn <- function(n) 3637 is.simpleInternal(get(n, pos = pos)) 3638 names[sapply(names, fn)] 3639 } 3640} 3641@ %def simpleInternals 3642 3643<<[[is.simpleInternal]] function>>= 3644is.simpleInternal <- function(def) { 3645 if (typeof(def) == "closure" && simpleFormals(def)) { 3646 b <- body(def) 3647 if (typeof(b) == "language" && length(b) == 2 && b[[1]] == "{") 3648 b <- b[[2]] 3649 if (typeof(b) == "language" && 3650 typeof(b[[1]]) == "symbol" && 3651 b[[1]] == ".Internal") { 3652 icall <- b[[2]] 3653 ifun <- icall[[1]] 3654 typeof(ifun) == "symbol" && 3655 .Internal(is.builtin.internal(as.name(ifun))) && 3656 simpleArgs(icall, names(formals(def))) 3657 } 3658 else FALSE 3659 } 3660 else FALSE 3661} 3662@ %def is.simpleInternal 3663 3664<<[[simpleFormals]] function>>= 3665simpleFormals <- function(def) { 3666 forms <- formals(def) 3667 if ("..." %in% names(forms)) 3668 return(FALSE) 3669 for (d in as.list(forms)) { 3670 if (! missing(d)) { 3671 ## **** check constant folding 3672 if (typeof(d) %in% c("symbol", "language", "promise", "bytecode")) 3673 return(FALSE) 3674 } 3675 } 3676 TRUE 3677} 3678@ %def simpleFormals 3679 3680<<[[simpleArgs]] function>>= 3681simpleArgs <- function(icall, fnames) { 3682 for (a in as.list(icall[-1])) { 3683 if (missing(a)) 3684 return(FALSE) 3685 else if (typeof(a) == "symbol") { 3686 if (! (as.character(a) %in% fnames)) 3687 return(FALSE) 3688 } 3689 else if (typeof(a) %in% c("language", "promise", "bytecode")) 3690 return(FALSE) 3691 } 3692 TRUE 3693} 3694@ %def simpleArgs 3695 3696 3697\subsection{Inlining [[is.xyz]] functions} 3698Most of the [[is.xyz]] functions in [[base]] are simple [[BUILTIN]]s 3699that do not do internal dispatch. They have simple instructions 3700defined for them and are compiled in a common way. [[cmpIs]] abstract 3701out the common compilation process. 3702<<[[cmpIs]] function>>= 3703cmpIs <- function(op, e, cb, cntxt) { 3704 if (any.dots(e) || length(e) != 2) 3705 cmpBuiltin(e, cb, cntxt) 3706 else { 3707 ## **** check that the function is a builtin somewhere?? 3708 s<-make.argContext(cntxt) 3709 cmp(e[[2]], cb, s) 3710 cb$putcode(op) 3711 if (cntxt$tailcall) cb$putcode(RETURN.OP) 3712 TRUE 3713 } 3714} 3715@ %def cmpIs 3716 3717Inlining handlers are then defined by 3718<<inlining handlers for [[is.xyz]] functions>>= 3719setInlineHandler("is.character", function(e, cb, cntxt) 3720 cmpIs(ISCHARACTER.OP, e, cb, cntxt)) 3721setInlineHandler("is.complex", function(e, cb, cntxt) 3722 cmpIs(ISCOMPLEX.OP, e, cb, cntxt)) 3723setInlineHandler("is.double", function(e, cb, cntxt) 3724 cmpIs(ISDOUBLE.OP, e, cb, cntxt)) 3725setInlineHandler("is.integer", function(e, cb, cntxt) 3726 cmpIs(ISINTEGER.OP, e, cb, cntxt)) 3727setInlineHandler("is.logical", function(e, cb, cntxt) 3728 cmpIs(ISLOGICAL.OP, e, cb, cntxt)) 3729setInlineHandler("is.name", function(e, cb, cntxt) 3730 cmpIs(ISSYMBOL.OP, e, cb, cntxt)) 3731setInlineHandler("is.null", function(e, cb, cntxt) 3732 cmpIs(ISNULL.OP, e, cb, cntxt)) 3733setInlineHandler("is.object", function(e, cb, cntxt) 3734 cmpIs(ISOBJECT.OP, e, cb, cntxt)) 3735setInlineHandler("is.symbol", function(e, cb, cntxt) 3736 cmpIs(ISSYMBOL.OP, e, cb, cntxt)) 3737@ %def 3738 3739At present [[is.numeric]], [[is.matrix]], and [[is.array]] do internal 3740dispatching so we just handle them as ordinary [[BUILTIN]]s. It might 3741be worth defining virtual machine instructions for them as well. 3742 3743 3744\subsection{Inline handler for calling C functions} 3745The [[.Call]] interface is now the preferred interface for calling C 3746functions and is also used in base packages like [[stat]]. The 3747[[DOTCALL.OP]] instruction allows these calls to be made without 3748allocating a list of arguments---the arguments are accumulated on the 3749stack. For now only 16 or fewer arguments are handled; more arguments, 3750and cases with named arguments, are handled by the standard [[.Call]] 3751[[BUILTIN]]. 3752<<inline handler for [[.Call]]>>= 3753setInlineHandler(".Call", function(e, cb, cntxt) { 3754 nargsmax <- 16 ## should match DOTCALL_MAX in eval.c 3755 if (dots.or.missing(e[-1]) || ! is.null(names(e)) || 3756 length(e) < 2 || length(e) > nargsmax + 2) 3757 cmpBuiltin(e, cb, cntxt) ## punt 3758 else { 3759 ncntxt <- make.nonTailCallContext(cntxt) 3760 cmp(e[[2]], cb, ncntxt); 3761 nargs <- length(e) - 2 3762 if (nargs > 0) { 3763 ncntxt <- make.argContext(cntxt) 3764 for (a in as.list(e[-(1:2)])) 3765 cmp(a, cb, ncntxt); 3766 } 3767 ci <- cb$putconst(e) 3768 cb$putcode(DOTCALL.OP, ci, nargs) 3769 if (cntxt$tailcall) 3770 cb$putcode(RETURN.OP) 3771 TRUE 3772 } 3773}) 3774@ 3775 3776 3777\subsection{Inline handlers for generating integer sequences} 3778The colon operator and the [[BUILTIN]] functions [[seq_along]] and 3779[[seq_len]] generate sequences (the sequence might not be integers if 3780long vectors are involved or the colon operator is given no-integer 3781arguments). The [[COLON.OP]], [[SEQALONG.OP]], and [[SEQLEN.OP]] 3782instructions implement these operations in byte code. This allows an 3783implementation in which the result stored on the stack is not a fully 3784realized sequence but only a recipe that the [[for]] loop, for 3785example, can use to run the loop without generating the sequence. 3786This is optionally implemented in the byte code interpreter. It would 3787also be possible to allow the compact sequence representation to be 3788stored in variables, etc., but this would require more extensive 3789changes. 3790<<inline handlers for integer sequences>>= 3791setInlineHandler(":", function(e, cb, cntxt) 3792 cmpPrim2(e, cb, COLON.OP, cntxt)) 3793 3794setInlineHandler("seq_along", function(e, cb, cntxt) 3795 cmpPrim1(e, cb, SEQALONG.OP, cntxt)) 3796 3797setInlineHandler("seq_len", function(e, cb, cntxt) 3798 cmpPrim1(e, cb, SEQLEN.OP, cntxt)) 3799@ 3800 3801\subsection{Inlining handlers for controlling warnings} 3802The inlining handlers in this section do not actually affect code 3803generation. Their purpose is to suppress warnings. 3804 3805Compiling calls to the [[::]] and [[:::]] functions without special 3806handling would generate undefined variable warnings for the arguments. 3807This is avoided by converting the arguments from symbols to strings, 3808which these functions would do anyway at runtime, and then compiling 3809the modified calls. The common process is handled by [[cmpMultiColon]]. 3810<<[[cmpMultiColon]] function>>= 3811cmpMultiColon <- function(e, cb, cntxt) { 3812 if (! dots.or.missing(e) && length(e) == 3) { 3813 goodType <- function(a) 3814 typeof(a) == "symbol" || 3815 (typeof(a) == "character" && length(a) == 1) 3816 fun <- e[[1]] 3817 x <- e[[2]] 3818 y <- e[[3]] 3819 if (goodType(x) && goodType(y)) { 3820 args <- list(as.character(x), as.character(y)) 3821 cmpCallSymFun(fun, args, e, cb, cntxt) 3822 TRUE 3823 } 3824 else FALSE 3825 } 3826 else FALSE 3827} 3828@ %def cmpMultiColon 3829Code generators are then registered by 3830<<inlining handlers for [[::]] and [[:::]]>>= 3831setInlineHandler("::", cmpMultiColon) 3832setInlineHandler(":::", cmpMultiColon) 3833@ 3834 3835Calls to with will often generate spurious undefined variable warning 3836for variables appearing in the expression argument. A crude approach 3837is to compile the entire call with undefined variable warnings 3838suppressed. 3839<<inlining handler for [[with]]>>= 3840setInlineHandler("with", function(e, cb, cntxt) { 3841 cntxt$suppressUndefined <- TRUE 3842 cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt) 3843 TRUE 3844}) 3845@ 3846 3847A similar issue arises for [[require]], where an unquoted argument is 3848often used. 3849<<inlining handler for [[require]]>>= 3850setInlineHandler("require", function(e, cb, cntxt) { 3851 cntxt$suppressUndefined <- TRUE 3852 cmpCallSymFun(e[[1]], e[-1], e, cb, cntxt) 3853 TRUE 3854}) 3855@ 3856 3857 3858\section{The [[switch]] function} 3859The [[switch]] function has somewhat awkward semantics that vary 3860depending on whether the value of the first argument is a character 3861string or is numeric. For a string all or all but one of the 3862alternatives must be named, and empty case arguments are allowed and 3863result in falling through to the next non-empty case. In the numeric 3864case selecting an empty case produces an error. If there is more than 3865one alternative case and no cases are named then a character selector 3866argument will produce an error, so one can assume that a numeric 3867switch is intended. But a [[switch]] with named arguments can be used 3868with a numeric selector, so it is not in general possible to determine 3869the intended type of the [[switch]] call from the structure of the 3870call alone. The compiled code therefore has to allow for both 3871possibilities. 3872 3873The inlining handler goes through a number of steps collecting and 3874processing information computed from the call and finally emits code 3875for the non-empty alternatives. If the [[switch]] expression appears 3876in tail position then each alternative will end in a [[RETURN]] 3877instruction. If the call is not in tail position then each 3878alternative will end with a [[GOTO]] than jumps to a label placed 3879after the code for the final alternative. 3880<<inline handler for [[switch]]>>= 3881setInlineHandler("switch", function(e, cb, cntxt) { 3882 if (length(e) < 2 || any.dots(e)) 3883 cmpSpecial(e, cb, cntxt) 3884 else { 3885 ## **** check name on EXPR, if any, partially matches EXPR? 3886 <<extract the [[switch]] expression components>> 3887 3888 <<collect information on named alternatives>> 3889 3890 <<create the labels>> 3891 3892 <<create the map from names to labels for a character switch>> 3893 3894 <<emit code for the [[EXPR]] argument>> 3895 3896 <<emit the switch instruction>> 3897 3898 <<emit error code for empty alternative in numerical switch>> 3899 3900 <<emit code for the default case>> 3901 3902 <<emit code for non-empty alternatives>> 3903 3904 if (! cntxt$tailcall) 3905 cb$putlabel(endLabel) 3906 } 3907 TRUE 3908}) 3909@ 3910 3911The first step in processing the [[switch]] expression is to extract 3912the selector expression [[expr]] and the case expressions, to identify 3913which, if any, of the cases are empty, and to extract the names of the 3914cases as [[nm]]. A warning is issued if there are no cases. If there 3915is only one case and that case is not named then setting [[nm = ""]] 3916allows this situation to be processed by code used when names are 3917present. 3918<<extract the [[switch]] expression components>>= 3919expr <- e[[2]] 3920cases <-e[-c(1, 2)] 3921 3922if (is.null(cases)) 3923 notifyNoSwitchcases(cntxt, loc = cb$savecurloc()) 3924 3925miss <- missingArgs(cases) 3926nm <- names(cases) 3927 3928## allow for corner cases like switch(x, 1) which always 3929## returns 1 if x is a character scalar. 3930if (is.null(nm) && length(cases) == 1) 3931 nm <- "" 3932@ %def 3933 3934The next step in the case where some cases are named is to check for a 3935default expression. If there is more than one expression then the 3936[[switch]] is compiled by [[cmpSpecial]]. This avoids having to 3937reproduce the runtime error that would be generated if the [[switch]] 3938is called with a character selector. 3939%% **** would probably be better to not punt though -- then we could 3940%% **** allow break/next to use GOTO 3941<<collect information on named alternatives>>= 3942## collect information on named alternatives and check for 3943## multiple default cases. 3944if (! is.null(nm)) { 3945 haveNames <- TRUE 3946 ndflt <- sum(nm == "") 3947 if (ndflt > 1) { 3948 notifyMultipleSwitchDefaults(ndflt, cntxt, loc = cb$savecurloc()) 3949 ## **** punt back to interpreted version for now to get 3950 ## **** runtime error message for multiple defaults 3951 cmpSpecial(e, cb, cntxt) 3952 return(TRUE) 3953 } 3954 if (ndflt > 0) 3955 haveCharDflt <- TRUE 3956 else 3957 haveCharDflt <- FALSE 3958} 3959else { 3960 haveNames <- FALSE 3961 haveCharDflt <- FALSE 3962} 3963@ %def 3964 3965Next the labels are generated. [[missLabel]] will be the label for 3966code that signals an error if a numerical selector expression chooses 3967a case with an empty argument. The label [[dfltLabel]] will be for 3968code that invisibly procures the value [[NULL]], which is the default 3969case for a numerical selector argument and also for a character 3970selector when no unnamed default case is provided. All non-empty cases 3971are given their own labels, and [[endLabel]] is generated if it will 3972be needed as the [[GOTO]] target for a [[switch]] expression that is 3973not in tail position. 3974<<create the labels>>= 3975## create the labels 3976if (any(miss)) 3977 missLabel <- cb$makelabel() 3978dfltLabel <- cb$makelabel() 3979 3980lab <- function(m) 3981 if (m) missLabel 3982 else cb$makelabel() 3983labels <- c(lapply(miss, lab), list(dfltLabel)) 3984 3985if (! cntxt$tailcall) 3986 endLabel <- cb$makelabel() 3987@ %def 3988 3989When there are named cases a map from the case names to the 3990corresponding code labels is constructed next. If no unnamed default 3991was provided one is added that uses the [[dfltLabel]]. 3992<<create the map from names to labels for a character switch>>= 3993## create the map from names to labels for a character switch 3994if (haveNames) { 3995 unm <- unique(nm[nm != ""]) 3996 if (haveCharDflt) 3997 unm <- c(unm, "") 3998 nlabels <- labels[unlist(lapply(unm, findActionIndex, nm, miss))] 3999 ## if there is no unnamed case to act as a default for a 4000 ## character switch then the numeric default becomes the 4001 ## character default as well. 4002 if (! haveCharDflt) { 4003 unm <- c(unm, "") 4004 nlabels <- c(nlabels, list(dfltLabel)) 4005 } 4006} 4007else { 4008 unm <- NULL 4009 nlabels <- NULL 4010} 4011@ %def 4012The computation of the index of the appropriate label for a given name 4013is carried out by [[findActionIndex]]. 4014%% **** rewrite this to directly return the label? 4015<<[[findActionIndex]] function>>= 4016findActionIndex <- function(name, nm, miss) { 4017 start <- match(name, nm) 4018 aidx <- c(which(! miss), length(nm) + 1) 4019 min(aidx[aidx >= start]) 4020} 4021@ %def findActionIndex 4022 4023At this point we are ready to start emitting code into the code 4024buffer. First code to compute the selector is emitted. As with the 4025condition for an [[if]] expression a non-tail-call context is used. 4026<<emit code for the [[EXPR]] argument>>= 4027## compile the EXPR argument 4028ncntxt <- make.nonTailCallContext(cntxt) 4029cmp(expr, cb, ncntxt) 4030@ %def 4031 4032The switch instruction takes the selector off the stack and four 4033operands from the instruction stream: the call index, an index for the 4034names, or [[NULL]] if there are none, and indices for the labels for a 4035character selector and for a numeric selector. At this point lists of 4036labels are placed in the instruction buffer. At code extraction time 4037these will be replaced by indices for numeric offset vectors by the 4038[[patchlables]] function of the code buffer. 4039<<emit the switch instruction>>= 4040## emit the SWITCH instruction 4041cei <- cb$putconst(e) 4042if (haveNames) { 4043 cni <- cb$putconst(unm) 4044 cb$putcode(SWITCH.OP, cei, cni, nlabels, labels) 4045} 4046else { 4047 cni <- cb$putconst(NULL) 4048 cb$putcode(SWITCH.OP, cei, cni, cni, labels) 4049} 4050@ %def 4051 4052If there are empty alternatives then code to signal an error for a 4053numeric selector that chooses one of these is needed and is 4054identified by the label [[missLabel]]. 4055<<emit error code for empty alternative in numerical switch>>= 4056## emit code to signal an error if a numeric switch hist an 4057## empty alternative (fall through, as for character, might 4058## make more sense but that isn't the way switch() works) 4059if (any(miss)) { 4060 cb$putlabel(missLabel) 4061 cmp(quote(stop("empty alternative in numeric switch")), cb, cntxt) 4062} 4063@ %def 4064 4065Code for the numeric default case, corresponding to [[dfltLabel]], 4066places [[NULL]] on the stack, and for a [[switch]] in tail position 4067this is followed by an [[INVISIBLE]] and a [[RETURN]] instruction. 4068<<emit code for the default case>>= 4069## emit code for the default case 4070cb$putlabel(dfltLabel) 4071cb$putcode(LDNULL.OP) 4072if (cntxt$tailcall) { 4073 cb$putcode(INVISIBLE.OP) 4074 cb$putcode(RETURN.OP) 4075} 4076else 4077 cb$putcode(GOTO.OP, endLabel) 4078@ %def 4079 4080Finally the labels and code for the non-empty alternatives are written 4081to the code buffer. In non-tail position the code is followed by a 4082[[GOTO]] instruction that jumps to [[endLabel]]. The final case does 4083not need this [[GOTO]]. 4084%% **** maybe try to drop the final GOTO 4085<<emit code for non-empty alternatives>>= 4086## emit code for the non-empty alternatives 4087for (i in seq_along(cases)) { 4088 if (! miss[i]) { 4089 cb$putlabel(labels[[i]]) 4090 cmp(cases[[i]], cb, cntxt) 4091 if (! cntxt$tailcall) 4092 cb$putcode(GOTO.OP, endLabel) 4093 } 4094} 4095@ %def 4096 4097 4098\section{Assignments expressions} 4099R supports simple assignments in which the left-hand side of the 4100assignment expression is a symbol and complex assignments of the form 4101\begin{verbatim} 4102f(x) <- v 4103\end{verbatim} 4104or 4105\begin{verbatim} 4106g(f(x)) <- v 4107\end{verbatim} 4108The second form is sometimes called a nested complex assignment. 4109Ordinary assignment creates or modifies a binding in the current 4110environment. Superassignment via the [[<<-]] operator modifies a 4111binding in a containing environment. 4112 4113Assignment expressions are compiled by [[cmpAssign]]. This function 4114checks the form of the assignment expression and, for well formed 4115expressions then uses [[cmpSymbolAssign]] for simple assignments and 4116[[cmpComplexAssign]] for complex assignments. 4117 4118For now, a temporary hack is needed to address a discrepancy between 4119byte code and AST code that can be caused by assignments in arguments 4120to primitives. The root issue is that we are not recording referenced 4121to arguments that have been evaluated. Once that is addressed we can 4122remove this hack. 4123<<temporary hack to deal with assignments in arguments issue>>= 4124## if (! cntxt$toplevel) 4125## return(cmpSpecial(e, cb, cntxt)) 4126@ 4127<<[[cmpAssign]] function>>= 4128cmpAssign <- function(e, cb, cntxt) { 4129 <<temporary hack to deal with assignments in arguments issue>> 4130 if (! checkAssign(e, cntxt, loc = cb$savecurloc())) 4131 return(cmpSpecial(e, cb, cntxt)) 4132 superAssign <- as.character(e[[1]]) == "<<-" 4133 lhs <- e[[2]] 4134 value <- e[[3]] 4135 symbol <- as.name(getAssignedVar(e, cntxt)) 4136 if (superAssign && ! findVar(symbol, cntxt)) 4137 notifyNoSuperAssignVar(symbol, cntxt, loc = cb$savecurloc()) 4138 if (is.name(lhs) || is.character(lhs)) 4139 cmpSymbolAssign(symbol, value, superAssign, cb, cntxt) 4140 else if (typeof(lhs) == "language") 4141 cmpComplexAssign(symbol, lhs, value, superAssign, cb, cntxt) 4142 else cmpSpecial(e, cb, cntxt) # punt for now 4143} 4144@ %def cmpAssign 4145 4146The code generators for the assignment operators [[<-]] and [[=]] and 4147the superassignment operator [[<<-]] are registered by 4148<<inlining handlers for [[<-]], [[=]], and [[<<-]]>>= 4149setInlineHandler("<-", cmpAssign) 4150setInlineHandler("=", cmpAssign) 4151setInlineHandler("<<-", cmpAssign) 4152@ %def 4153 4154The function [[checkAssign]] is used to check that an assignment 4155expression is well-formed. 4156<<[[checkAssign]] function>>= 4157checkAssign <- function(e, cntxt, loc = NULL) { 4158 if (length(e) != 3) 4159 FALSE 4160 else { 4161 place <- e[[2]] 4162 if (typeof(place) == "symbol" || 4163 (typeof(place) == "character" && length(place) == 1)) 4164 TRUE 4165 else { 4166 <<check left hand side call>> 4167 } 4168 } 4169} 4170@ %def checkAssign 4171A valid left hand side call must have a function that is either a 4172symbol or is of the form [[foo::bar]] or [[foo:::bar]], and the first 4173argument must be a symbol or another valid left hand side call. A 4174[[while]] loop is used to unravel nested calls. 4175<<check left hand side call>>= 4176while (typeof(place) == "language") { 4177 fun <- place[[1]] 4178 if (typeof(fun) != "symbol" && 4179 ! (typeof(fun) == "language" && length(fun) == 3 && 4180 typeof(fun[[1]]) == "symbol" && 4181 as.character(fun[[1]]) %in% c("::", ":::"))) { 4182 notifyBadAssignFun(fun, cntxt, loc) 4183 return(FALSE) 4184 } 4185 place = place[[2]] 4186} 4187if (typeof(place) == "symbol") 4188 TRUE 4189else FALSE 4190@ %def 4191 4192\subsection{Simple assignment expressions} 4193%% **** handle fun defs specially for message purposes?? 4194Code for assignment to a symbol is generated by [[cmpSymbolAssign]]. 4195<<[[cmpSymbolAssign]] function>>= 4196cmpSymbolAssign <- function(symbol, value, superAssign, cb, cntxt) { 4197 <<compile the right hand side value expression>> 4198 <<emit code for the symbol assignment instruction>> 4199 <<for tail calls return the value invisibly>> 4200 TRUE 4201} 4202@ %def cmpSymbolAssign 4203 4204A non-tail-call context is used to generate code for the right hand 4205side value expression. 4206<<compile the right hand side value expression>>= 4207ncntxt <- make.nonTailCallContext(cntxt) 4208cmp(value, cb, ncntxt) 4209@ %def 4210 4211The [[SETVAR]] and [[SETVAR2]] instructions assign the value on the 4212stack to the symbol specified by its constant pool index operand. The 4213[[SETVAR]] instruction is used by ordinary assignment to assign in the 4214local frame, and [[SETVAR2]] for superassignments. 4215<<emit code for the symbol assignment instruction>>= 4216ci <- cb$putconst(symbol) 4217if (superAssign) 4218 cb$putcode(SETVAR2.OP, ci) 4219else 4220 cb$putcode(SETVAR.OP, ci) 4221@ %def 4222The super-assignment case does not need to check for and warn about a 4223missing binding since this is done in [[cmpAssign]]. 4224 4225The [[SETVAR]] and [[SETVAR2]] instructions leave the value on the 4226stack as the value of the assignment expression; if the expression 4227appears in tail position then this value is returned with the visible 4228flag set to [[FALSE]]. 4229<<for tail calls return the value invisibly>>= 4230if (cntxt$tailcall) { 4231 cb$putcode(INVISIBLE.OP) 4232 cb$putcode(RETURN.OP) 4233} 4234@ %def 4235 4236 4237\subsection{Complex assignment expressions} 4238\label{subsec:complexassign} 4239It seems somehow appropriate at this point to mention that the code in 4240[[eval.c]] implementing the interpreter semantics starts with the 4241following comment: 4242\begin{verbatim} 4243 /* 4244 * Assignments for complex LVAL specifications. This is the stuff that 4245 * nightmares are made of ... 4246\end{verbatim} 4247 4248There are some issues with the semantics for complex assignment as 4249implemented by the interpreter: 4250\begin{itemize} 4251\item With the current approach the following legal, though strange, 4252 code fails: 4253<<inner assignment trashes temporary>>= 4254f <-function(x, y) x 4255`f<-` <- function(x, y, value) { y; x} 4256x <- 1 4257y <- 2 4258f(x, y[] <- 1) <- 3 4259@ %def 4260 The reason is that the current left hand side object is maintained in 4261 a variable [[*tmp*]], and processing the assignment in the second 4262 argument first overwrites the value of [[*tmp*]] and then removes 4263 [[*tmp*]] before the first argument is evaluated. Using evaluated 4264 promises as arguments, as is done for the right hand side value, 4265 solves this. 4266 4267\item The current approach of using a temporary variable [[*tmp*]] to 4268 hold the evaluated LHS object requires an internal cleanup context 4269 to ensure that the variable is removed in the event of a non-local 4270 exit. Implementing this in the compiler would introduce significant 4271 overhead. 4272 4273\item The asymmetry of handling the pre-evaluated right hand side 4274 value via an evaluated promise and the pre-evaluated left hand side 4275 via a temporary variable makes the code harder to understand and the 4276 semantics harder to explain. 4277 4278\item Using promises in an expression passed to eval means promises 4279 can leak out into R via sys.call. This is something we have tried to 4280 avoid and should try to avoid so we can have the freedom to 4281 implement lazy evaluation differently if that seems useful. [It may 4282 be possible at some point to avoid allocation of promise objects in 4283 compiled code.] The compiler can avoid this by using promises only 4284 in the argument lists passed to function calls, not in the call 4285 expressions. A similar change could be made in the interpreter but 4286 it would have a small runtime penalty for constructing an expression 4287 in addition to an argument list I would prefer to avoid that for now 4288 until the compiler has been turned on by default. 4289 4290\item The current approach of installing the intermediate RHS value as 4291 the expression for the RHS promise in nested complex assignments has 4292 several drawbacks: 4293 \begin{itemize} 4294 \item it can produce huge expressions. 4295 4296 \item the result is misleading if the intermediate RHS value is a 4297 symbol or a language object. 4298 4299 \item to maintain this in compiled code it would be necessary to 4300 construct the assignment function call expression at runtime even 4301 though it is usually not needed (or it would require significant 4302 rewriting to allow on-demand computation of the call). If *vtmp* 4303 is used as a marker for the expression and documented as not a 4304 real variable then the call can be constructed at compile time. 4305 \end{itemize} 4306 4307\item In nested complex assignments the additional arguments of the 4308 inner functions are evaluated twice. This is illustrated by running 4309 this code: 4310<<multiple evaluation of arguments in assignments>>= 4311f <- function(x, y) {y ; x } 4312`f<-` <- function(x, y, value) { y; x } 4313g <- function(x, y) {y ; x } 4314`g<-` <- function(x, y, value) { y; x } 4315x <- 1 4316y <- 2 4317f(g(x, print(y)), y) <- 3 4318@ %def 4319 This is something we have lived with, and I don't propose to change 4320 it at this time. But it would be good to be able to change it in the 4321 future. 4322\end{itemize} 4323 4324Because of these issues the compiler implements slightly different 4325semantics for complex assignment than the current intepreter. 4326\emph{Evaluation} semantics should be identical; the difference arises 4327in how intermediate values are managed and has some effect on results 4328produced by [[substitute]]. In particular, no intermediate [[*tmp*]] 4329value is used and therefore no cleanup frame is needed. This does 4330mean that uses of the form 4331\begin{verbatim} 4332 eval(substitute(<first arg>), parent.frame()) 4333\end{verbatim} 4334will no longer work. In tests of most of CRAN and BioC this directly 4335affected only one function, [[$.proto]] in the [[proto]] package, and 4336indirectly about 30 packages using proto failed. I looked at the 4337[[$.proto]] implementation, and it turned out that the 4338[[eval(substitute())]] approach used there could be replaced by 4339standard evaluation using lexical scope. This produces better code, 4340and the result works with both the current R interpreter and compiled 4341code (proto and all the dependent packages pass check with this 4342change). The [[proto]] maintainer has changed [[proto]] along these 4343lines. It would be good to soon change the interpreter to also use 4344evaluated promises in place of the [[*tmp*]] variable to bring the 4345compiled and interpreted semantics closer together. 4346 4347Complex assignment expressions are compiled by [[cmpComplexAssign]]. 4348<<[[cmpComplexAssign]] function>>= 4349cmpComplexAssign <- function(symbol, lhs, value, superAssign, cb, cntxt) { 4350 <<select complex assignment instructions>> 4351 <<protect the stack during a non-top-level complex assignment>> 4352 <<compile the right hand side value expression>> 4353 <<compile the left hand side call>> 4354 <<unprotect the stack after a not-top-level complex assignment>> 4355 <<for tail calls return the value invisibly>> 4356 TRUE; 4357} 4358@ %def cmpComplexAssign 4359 4360Assignment code is bracketed by a start and an end instruction. 4361<<compile the left hand side call>>= 4362csi <- cb$putconst(symbol) 4363cb$putcode(startOP, csi) 4364 4365<<compile code to compute left hand side values>> 4366<<compile code to compute right hand side values>> 4367 4368cb$putcode(endOP, csi) 4369@ %def 4370The appropriate instructions [[startOP]] and [[endOP]] depend on 4371whether the assignment is an ordinary assignment or a superassignment. 4372<<select complex assignment instructions>>= 4373if (superAssign) { 4374 startOP <- STARTASSIGN2.OP 4375 endOP <- ENDASSIGN2.OP 4376} 4377else { 4378 if (! findVar(symbol, cntxt)) 4379 notifyUndefVar(symbol, cntxt, loc = cb$savecurloc()) 4380 startOP <- STARTASSIGN.OP 4381 endOP <- ENDASSIGN.OP 4382} 4383@ %def 4384An undefined variable notification is issued for ordinary assignment, 4385since this will produce a runtime error. For superassignment 4386[[cmpAssign]] has already checked for an undefined left-hand-side 4387variable and issued a notification if none was found. 4388 4389The start instructions obtain the initial value of the left-hand-side 4390variable and in the case of standard assignment assign it in the local 4391frame if it is not assigned there already. They also prepare the stack 4392for the assignment process. The stack invariant maintained by the 4393assignment process is that the current right hand side value is on the 4394top, followed by the evaluated left hand side values, the binding 4395cell, and the original right hand side value. Thus the start 4396instruction leaves the right hand side value on the top, then the 4397value of the left hand side variable, the binding cell, and again the 4398right hand side value on the stack. 4399 4400The end instruction finds the final right hand side value followed by 4401the original right hand side value on the top of the stack. The final 4402value is removed and assigned to the appropriate variable binding. 4403The original right hand side value is left on the top of the stack as 4404the value of the assignment expression. 4405 4406Evaluating a nested complex assignment involves evaluating a sequence 4407of expressions to obtain the left hand sides to modify, and then 4408evaluating a sequence of corresponding calls to replacement functions 4409in the opposite order. The function [[flattenPlace]] returns a list 4410of the expressions that need to be considered, with [[*tmp*]] in place 4411of the current left hand side argument. For example, for an assignment 4412of the form [[f(g(h(x, k), j), i) <- v]] this produces 4413\begin{verbatim} 4414> flattenPlace(quote(f(g(h(x, k), j), i)))$places 4415[[1]] 4416f(`*tmp*`, i) 4417 4418[[2]] 4419g(`*tmp*`, j) 4420 4421[[3]] 4422h(`*tmp*`, k) 4423\end{verbatim} 4424The sequence of left hand side values needed consists of the original 4425variable value, which is already on the stack, and the values of 4426[[h(`*tmp*`, k)]] and [[g(`*tmp*`, j)]]. 4427 4428In general the additional evaluations needed are of all but the first 4429expression produced by [[flattenPlace]], evaluated in reverse 4430order. An argument context is used since there are already values on 4431the stack. 4432<<compile code to compute left hand side values>>= 4433ncntxt <- make.argContext(cntxt) 4434flat <- flattenPlace(lhs, cntxt, loc = cb$savecurloc()) 4435flatOrigPlace <- flat$origplaces 4436flatPlace <- flat$places 4437flatPlaceIdxs <- seq_along(flatPlace)[-1] 4438for (i in rev(flatPlaceIdxs)) 4439 cmpGetterCall(flatPlace[[i]], flatOrigPlace[[i]], cb, ncntxt) 4440@ %def 4441The compilation of the individual calls carried out by 4442[[cmpGetterCall]], which is presented in Section \ref{subsec:getter}. 4443Each compilation places the new left hand side value on the top of the 4444stack and then switches it with the value below, which is the original 4445right hand side value, to preserve the stack invariant. 4446 4447The function [[flattenPlace]] is defined as 4448<<[[flattenPlace]] function>>= 4449flattenPlace <- function(place, cntxt, loc = NULL) { 4450 places <- NULL 4451 origplaces <- NULL 4452 while (typeof(place) == "language") { 4453 if (length(place) < 2) 4454 cntxt$stop(gettext("bad assignment 1"), cntxt, loc = loc) 4455 origplaces <- c(origplaces, list(place)) 4456 tplace <- place 4457 tplace[[2]] <- as.name("*tmp*") 4458 places <- c(places, list(tplace)) 4459 place <- place[[2]] 4460 } 4461 if (typeof(place) != "symbol") 4462 cntxt$stop(gettext("bad assignment 2"), cntxt, loc = loc) 4463 list(places = places, origplaces = origplaces) 4464} 4465@ %def flattenPlace 4466 4467After the right hand side values have been computed the stack contains 4468the original right hand side value followed by the left hand side 4469values in the order in which they need to be modified. Code to call 4470the sequence of replacement functions is generated by 4471<<compile code to compute right hand side values>>= 4472cmpSetterCall(flatPlace[[1]], flatOrigPlace[[1]], value, cb, ncntxt) 4473for (i in flatPlaceIdxs) 4474 cmpSetterCall(flatPlace[[i]], flatOrigPlace[[i]], as.name("*vtmp*"), cb, ncntxt) 4475@ %def 4476The first call uses the expression for the original right hand side in 4477its call; all others will use [[*vtmp*]]. Each replacement function 4478call compiled by [[cmpSetterCall]] will remove the top two elements 4479from the stack and then push the new right hand side value on the 4480stack. [[cmpSetterCall]] is described in Section \ref{subsec:setter}. 4481 4482For non-top-level complex assignments values on the stack need to be 4483protected from mutation during the assignment. 4484<<protect the stack during a non-top-level complex assignment>>= 4485if (! cntxt$toplevel) cb$putcode(INCLNKSTK.OP) 4486@ 4487<<unprotect the stack after a not-top-level complex assignment>>= 4488if (! cntxt$toplevel) cb$putcode(DECLNKSTK.OP) 4489@ 4490 4491 4492\subsection{Compiling setter calls} 4493\label{subsec:setter} 4494Setter calls, or calls to replacement functions, in compiled 4495assignment expressions find stack that contains the current right hand 4496side value on the top followed by the current left hand side value. 4497Some replacement function calls, such as calls to [[$<-]], are handled 4498by an inlining mechanism described below. The general case when the 4499function is specified by a symbol is handled a [[GETFUN]] instruction 4500to push the function on the stack, pushing any additional arguments on 4501the stack, and using the [[SETTER_CALL]] instruction to execute the 4502call. This instruction adjusts the argument list by inserting as the 4503first argument an evaluated promise for the left hand side value and 4504as the last argument an evaluated promise for the right hand side 4505value; the final argument also has the [[value]] tag. The case where 4506the function is specified in the form [[foo::bar]] or [[foo:::bar]] 4507differs only compiling the function expression and using [[CHECKFUN]] 4508to verify the result and prepare the stack. 4509<<[[cmpSetterCall]] function>>= 4510cmpSetterCall <- function(place, origplace, vexpr, cb, cntxt) { 4511 afun <- getAssignFun(place[[1]]) 4512 acall <- as.call(c(afun, as.list(place[-1]), list(value = vexpr))) 4513 acall[[2]] <- as.name("*tmp*") 4514 ncntxt <- make.callContext(cntxt, acall) 4515 sloc <- cb$savecurloc() 4516 cexpr <- as.call(c(afun, as.list(origplace[-1]), list(value = vexpr))) 4517 cb$setcurexpr(cexpr) 4518 if (is.null(afun)) 4519 ## **** warn instead and arrange for cmpSpecial? 4520 ## **** or generate code to signal runtime error? 4521 cntxt$stop(gettext("invalid function in complex assignment"), 4522 loc = cb$savecurloc()) 4523 else if (typeof(afun) == "symbol") { 4524 if (! trySetterInline(afun, place, origplace, acall, cb, ncntxt)) { 4525 ci <- cb$putconst(afun) 4526 cb$putcode(GETFUN.OP, ci) 4527 <<compile additional arguments and call to setter function>> 4528 } 4529 } 4530 else { 4531 cmp(afun, cb, ncntxt) 4532 cb$putcode(CHECKFUN.OP) 4533 <<compile additional arguments and call to setter function>> 4534 } 4535 cb$restorecurloc(sloc) 4536} 4537@ %def cmpSetterCall 4538The common code for compiling additional arguments and issuing the 4539[[SETTER_CALL]] instruction is given by 4540<<compile additional arguments and call to setter function>>= 4541cb$putcode(PUSHNULLARG.OP) 4542cmpCallArgs(place[-c(1, 2)], cb, ncntxt) 4543cci <- cb$putconst(acall) 4544cvi <- cb$putconst(vexpr) 4545cb$putcode(SETTER_CALL.OP, cci, cvi) 4546@ %def 4547The [[PUSHNULL]] instruction places [[NULL]] in the argument list as a 4548first argument to serve as a place holder; [[SETTER_CALL]] replaces 4549this with the evaluated promise for the current left hand side value. 4550 4551The replacement function corresponding to [[fun]] is computed by 4552[[getAssignFun]]. If [[fun]] is a symbol then the assignment function 4553is the symbol followed by [[<-]]. The function [[fun]] can also be an 4554expression of the form [[foo::bar]], in which case the replacement 4555function is the expression [[foo::`bar<-`]]. [[NULL]] is returned if 4556[[fun]] does not fit into one of these two cases. 4557<<[[getAssignFun]] function>>= 4558getAssignFun <- function(fun) { 4559 if (typeof(fun) == "symbol") 4560 as.name(paste0(fun, "<-")) 4561 else { 4562 ## check for and handle foo::bar(x) <- y assignments here 4563 if (typeof(fun) == "language" && length(fun) == 3 && 4564 (as.character(fun[[1]]) %in% c("::", ":::")) && 4565 typeof(fun[[2]]) == "symbol" && typeof(fun[[3]]) == "symbol") { 4566 afun <- fun 4567 afun[[3]] <- as.name(paste0(fun[[3]],"<-")) 4568 afun 4569 } 4570 else NULL 4571 } 4572} 4573@ %def getAssignFun 4574 4575To produce more efficient code some replacement function calls can be 4576inlined and use specialized instructions. The most important of these 4577are [[$<-]], [[[<-]], and [[[[<-]]. An inlining mechanism similar to 4578the one described in Section \ref{sec:inlining} is used for this 4579purpose. A separate mechanism is needed because of the fact that in 4580the present context two arguments, the left hand side and right hand 4581side values, are already on the stack. 4582<<setter inlining mechanism>>= 4583setterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv()) 4584 4585setSetterInlineHandler <- function(name, h, package = "base") { 4586 if (exists(name, setterInlineHandlers, inherits = FALSE)) { 4587 entry <- get(name, setterInlineHandlers) 4588 if (entry$package != package) { 4589 fmt <- "handler for '%s' is already defined for another package" 4590 stop(gettextf(fmt, name), domain = NA) 4591 } 4592 } 4593 entry <- list(handler = h, package = package) 4594 assign(name, entry, setterInlineHandlers) 4595} 4596 4597getSetterInlineHandler <- function(name, package = "base") { 4598 if (exists(name, setterInlineHandlers, inherits = FALSE)) { 4599 hinfo <- get(name, setterInlineHandlers) 4600 if (hinfo$package == package) 4601 hinfo$handler 4602 else NULL 4603 } 4604 else NULL 4605} 4606 4607trySetterInline <- function(afun, place, origplace, call, cb, cntxt) { 4608 name <- as.character(afun) 4609 info <- getInlineInfo(name, cntxt) 4610 if (is.null(info)) 4611 FALSE 4612 else { 4613 h <- getSetterInlineHandler(name, info$package) 4614 if (! is.null(h)) 4615 h(afun, place, origplace, call, cb, cntxt) 4616 else FALSE 4617 } 4618} 4619@ %def 4620 4621The inline handler for [[$<-]] replacement calls uses the 4622[[DOLLARGETS]] instruction. The handler declines to handle cases that 4623would produce runtime errors; these are compiled by the generic 4624mechanism. 4625%% **** might be useful to signal a warning at compile time 4626<<setter inline handler for [[$<-]]>>= 4627setSetterInlineHandler("$<-", function(afun, place, origplace, call, cb, cntxt) { 4628 if (any.dots(place) || length(place) != 3) 4629 FALSE 4630 else { 4631 sym <- place[[3]] 4632 if (is.character(sym)) 4633 sym <- as.name(sym) 4634 if (is.name(sym)) { 4635 ci <- cb$putconst(call) 4636 csi <- cb$putconst(sym) 4637 cb$putcode(DOLLARGETS.OP, ci, csi) 4638 TRUE 4639 } 4640 else FALSE 4641 } 4642}) 4643@ %def 4644 4645The replacement functions [[[<-]] and [[[[<-]]] are implemented as 4646[[SPECIAL]] functions that do internal dispatching. They are 4647therefore compiled along the same lines as their corresponding 4648accessor functions as described in Section \ref{subsec:subset}. The 4649common pattern is implemented by [[cmpSetterDispatch]]. 4650<<[[cmpSetterDispatch]] function>>= 4651cmpSetterDispatch <- function(start.op, dflt.op, afun, place, call, cb, cntxt) { 4652 if (any.dots(place)) 4653 FALSE ## punt 4654 else { 4655 ci <- cb$putconst(call) 4656 end.label <- cb$makelabel() 4657 cb$putcode(start.op, ci, end.label) 4658 if (length(place) > 2) { 4659 args <- place[-(1:2)] 4660 cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE) 4661 } 4662 cb$putcode(dflt.op) 4663 cb$putlabel(end.label) 4664 TRUE 4665 } 4666} 4667@ %def cmpSetterDispatch 4668The two inlining handlers are then defined as 4669<<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>>= 4670# **** this is now handled differently; see "Improved subset ..." 4671# setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt) 4672# cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP, 4673# afun, place, call, cb, cntxt)) 4674 4675# setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt) 4676# cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP, 4677# afun, place, call, cb, cntxt)) 4678@ %def 4679 4680An inline handler is defined for [[@<-]] in order to suppress spurious 4681warnings about the slot name symbol. A call in which the slot is 4682specified by a symbol is converted to one using a string instead, and 4683is then compiled by a recursive call to [[cmpSetterCall]]; the handler 4684will decline in this second call and the default compilation strategy 4685will be used. 4686<<setter inlining handler for [[@<-]]>>= 4687setSetterInlineHandler("@<-", function(afun, place, origplace, acall, cb, cntxt) { 4688 if (! dots.or.missing(place) && length(place) == 3 && 4689 typeof(place[[3]]) == "symbol") { 4690 place[[3]] <- as.character(place[[3]]) 4691 vexpr <- acall[[length(acall)]] 4692 cmpSetterCall(place, origplace, vexpr, cb, cntxt) 4693 TRUE 4694 } 4695 else FALSE 4696}) 4697@ 4698 4699 4700\subsection{Compiling getter calls} 4701\label{subsec:getter} 4702Getter calls within an assignment also need special handling because 4703of the left hand side argument being on the stack already and because 4704of the need to restore the stack invariant. There are again two cases 4705for installing the getter function on the stack. These are then 4706followed by common code for handling the additional arguments and the 4707call. 4708<<[[cmpGetterCall]] function>>= 4709cmpGetterCall <- function(place, origplace, cb, cntxt) { 4710 ncntxt <- make.callContext(cntxt, place) 4711 sloc <- cb$savecurloc() 4712 cb$setcurexpr(origplace) 4713 fun <- place[[1]] 4714 if (typeof(fun) == "symbol") { 4715 if (! tryGetterInline(place, cb, ncntxt)) { 4716 ci <- cb$putconst(fun) 4717 cb$putcode(GETFUN.OP, ci) 4718 <<compile additional arguments and call to getter function>> 4719 } 4720 } 4721 else { 4722 cmp(fun, cb, ncntxt) 4723 cb$putcode(CHECKFUN.OP) 4724 <<compile additional arguments and call to getter function>> 4725 } 4726 cb$restorecurloc(sloc) 4727} 4728@ %def cmpGetterCall 4729In the common code, as in setter calls a [[NULL]] is placed on the 4730argument stack as a place holder for the left hand side promise. Then 4731the additional arguments are placed on the stack and the 4732[[GETTER-CALL]] instruction is issued. This instruction installs the 4733evaluated promise with the left hand side value as the first argument 4734and executes the call. The call will leave the next right left hand 4735side on the top of the stack. A [[SWAP]] instruction then switches 4736the top two stack entries. This leaves the original right hand side 4737value on top followed by the new left hand side value returned by the 4738getter call and any other left hand side values produced by earlier 4739getter call. 4740<<compile additional arguments and call to getter function>>= 4741cb$putcode(PUSHNULLARG.OP) 4742cmpCallArgs(place[-c(1, 2)], cb, ncntxt) 4743cci <- cb$putconst(place) 4744cb$putcode(GETTER_CALL.OP, cci) 4745cb$putcode(SWAP.OP) 4746@ %def 4747 4748Again an inlining mechanism is needed to handle calls to functions like 4749[[$]] and [[[]]. These are able to use the same instructions as the 4750inline handlers in Section \ref{subsec:subset} for ordinary calls to 4751[[$]] and [[[]] but require some additional work to deal with 4752maintaining the stack invariant. 4753 4754The inlining mechanism itself is analogous to the general one and the 4755one for inlining setter calls. 4756<<getter inlining mechanism>>= 4757getterInlineHandlers <- new.env(hash = TRUE, parent = emptyenv()) 4758 4759setGetterInlineHandler <- function(name, h, package = "base") { 4760 if (exists(name, getterInlineHandlers, inherits = FALSE)) { 4761 entry <- get(name, getterInlineHandlers) 4762 if (entry$package != package) { 4763 fmt <- "handler for '%s' is already defined for another package" 4764 stop(gettextf(fmt, name), domain = NA) 4765 } 4766 } 4767 entry <- list(handler = h, package = package) 4768 assign(name, entry, getterInlineHandlers) 4769} 4770 4771getGetterInlineHandler <- function(name, package = "base") { 4772 if (exists(name, getterInlineHandlers, inherits = FALSE)) { 4773 hinfo <- get(name, getterInlineHandlers) 4774 if (hinfo$package == package) 4775 hinfo$handler 4776 else NULL 4777 } 4778 else NULL 4779} 4780 4781tryGetterInline <- function(call, cb, cntxt) { 4782 name <- as.character(call[[1]]) 4783 info <- getInlineInfo(name, cntxt) 4784 if (is.null(info)) 4785 FALSE 4786 else { 4787 h <- getGetterInlineHandler(name, info$package) 4788 if (! is.null(h)) 4789 h(call, cb, cntxt) 4790 else FALSE 4791 } 4792} 4793@ %def 4794 4795The inline handler for [[$]] in a getter context uses the [[DUP2ND]] 4796instruction to push the second value on the stack, the previous left 4797hand side value, onto the stack. The [[DOLLAR]] instruction pops this 4798value, computes the component for this value and the symbol in the 4799constant pool, and pushes the result on the stack. A [[SWAP]] 4800instruction then interchanges this value with the next value, which is 4801the original right hand side value, thus restoring the stack 4802invariant. 4803<<getter inline handler for [[$]]>>= 4804setGetterInlineHandler("$", function(call, cb, cntxt) { 4805 if (any.dots(call) || length(call) != 3) 4806 FALSE 4807 else { 4808 sym <- call[[3]] 4809 if (is.character(sym)) 4810 sym <- as.name(sym) 4811 if (is.name(sym)) { 4812 ci <- cb$putconst(call) 4813 csi <- cb$putconst(sym) 4814 cb$putcode(DUP2ND.OP) 4815 cb$putcode(DOLLAR.OP, ci, csi) 4816 cb$putcode(SWAP.OP) 4817 TRUE 4818 } 4819 else FALSE 4820 } 4821}) 4822@ %def 4823 4824Calls to [[[]] and [[[[]] again need two instructions to support the 4825internal dispatch. The general pattern is implemented in 4826[[cmpGetterDispatch]]. A [[DUP2ND]] instruction is used to place the 4827first argument for the call on top of the stack, code analogous to the 4828code for ordinary calls to [[[]] and [[[[]] is used to make the call, 4829and this is followed by a [[SWAP]] instruction to rearrange the stack. 4830<<[[cmpGetterDispatch]] function>>= 4831cmpGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) { 4832 if (any.dots(call)) 4833 FALSE ## punt 4834 else { 4835 ci <- cb$putconst(call) 4836 end.label <- cb$makelabel() 4837 cb$putcode(DUP2ND.OP) 4838 cb$putcode(start.op, ci, end.label) 4839 if (length(call) > 2) { 4840 args <- call[-(1:2)] 4841 cmpBuiltinArgs(args, names(args), cb, cntxt, TRUE) 4842 } 4843 cb$putcode(dflt.op) 4844 cb$putlabel(end.label) 4845 cb$putcode(SWAP.OP) 4846 TRUE 4847 } 4848} 4849@ %def cmpGetterDispatch 4850The two inline handlers are then defined as 4851<<getter inline handlers for [[[]] and [[[[]]>>= 4852# **** this is now handled differently; see "Improved subset ..." 4853# setGetterInlineHandler("[", function(call, cb, cntxt) 4854# cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt)) 4855 4856# setGetterInlineHandler("[[", function(call, cb, cntxt) 4857# cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt)) 4858@ %def 4859 4860 4861\section{Constant folding} 4862A very valuable compiler optimization is constant folding. For 4863example, an expression for computing a normal density function may 4864include the code 4865\begin{verbatim} 48661 / sqrt(2 * pi) 4867\end{verbatim} 4868The interpreter would have to evaluate this expression each time it is 4869needed, but a compiler can often compute the value once at compile 4870time. 4871 4872The constant folding optimization can be applied at various points in 4873the compilation process: It can be applied to the source code before 4874code generation or to the generated code in a separate optimization 4875phase. For now, constant folding is applied during the code 4876generation phase. 4877 4878The [[constantFold]] function examines its expression argument and 4879handles each expression type by calling an appropriate function. 4880<<[[constantFold]] function>>= 4881## **** rewrite using switch?? 4882constantFold <- function(e, cntxt, loc = NULL) { 4883 type = typeof(e) 4884 if (type == "language") 4885 constantFoldCall(e, cntxt) 4886 else if (type == "symbol") 4887 constantFoldSym(e, cntxt) 4888 else if (type == "promise") 4889 cntxt$stop(gettext("cannot constant fold literal promises"), 4890 cntxt, loc = loc) 4891 else if (type == "bytecode") 4892 cntxt$stop(gettext("cannot constant fold literal bytecode objects"), 4893 cntxt, loc = loc) 4894 else checkConst(e) 4895} 4896@ %def constantFold 4897%% **** warn and return NULL instead of calling stop?? 4898 4899The [[checkConst]] function decides whether a value is a constant that 4900is small enough and simple enough to enter into the constant pool. If 4901so, then [[checkConst]] wraps the value in a list as the [[value]] 4902component. If not, then [[NULL]] is returned. 4903<<[[checkConst]] function>>= 4904checkConst <- function(e) { 4905 if (mode(e) %in% constModes && length(e) <= maxConstSize) 4906 list(value = e) 4907 else 4908 NULL 4909} 4910@ %def checkConst 4911The maximal size and acceptable modes are defined by 4912<<[[maxConstSize]] and [[constModes]] definitions>>= 4913maxConstSize <- 10 4914 4915constModes <- c("numeric", "logical", "NULL", "complex", "character") 4916@ %def maxConstSize constModes 4917 4918For now, constant folding is only applied for a particular set of 4919variables and functions defined in the base package. The constant 4920folding code uses [[isBaseVar]] to determine whether a variable can be 4921assumed to reference the corresponding base variable given the 4922current compilation environment and optimization setting. 4923[[constantFoldSym]] is applied to base variables in the [[constNames]] 4924list. 4925<<[[constantFoldSym]] function>>= 4926## Assumes all constants will be defined in base. 4927## Eventually allow other packages to define constants. 4928## Any variable with locked binding could be used if type is right. 4929## Allow local declaration of optimize, notinline declaration. 4930constantFoldSym <- function(var, cntxt) { 4931 var <- as.character(var) 4932 if (var %in% constNames && isBaseVar(var, cntxt)) 4933 checkConst(get(var, .BaseNamespaceEnv)) 4934 else NULL 4935} 4936@ %def constantFoldSym 4937<<[[constNames]] definition>>= 4938constNames <- c("pi", "T", "F") 4939@ %def constNames 4940 4941Call expressions are handled by determining whether the function 4942called is eligible for constant folding, attempting to constant fold 4943the arguments, and calling the folding function. The result is the 4944passed to [[checkConst]]. If an error or a warning occurs in the call to the 4945folding function then [[constantFoldCall]] returns [[NULL]]. 4946<<[[constantFoldCall]] function>>= 4947constantFoldCall <- function(e, cntxt) { 4948 fun <- e[[1]] 4949 if (typeof(fun) == "symbol") { 4950 ffun <- getFoldFun(fun, cntxt) 4951 if (! is.null(ffun)) { 4952 args <- as.list(e[-1]) 4953 for (i in seq_along(args)) { 4954 a <- args[[i]] 4955 if (missing(a)) 4956 return(NULL) 4957 val <- constantFold(a, cntxt) 4958 if (! is.null(val)) 4959 args[i] <- list(val$value) ## **** in case value is NULL 4960 else return(NULL) 4961 } 4962 modes <- unlist(lapply(args, mode)) 4963 if (all(modes %in% constModes)) { 4964 tryCatch(checkConst(do.call(ffun, args)), 4965 error = function(e) NULL, warning = function(w) NULL) 4966 ## **** issue warning?? 4967 } 4968 else NULL 4969 } 4970 else NULL 4971 } 4972 else NULL 4973} 4974@ %def constantFoldCall 4975%% **** separate out and explain argument processing chunk (maybe also the call) 4976 4977The functions in the base package eligible for constant folding are 4978<<[[foldFuns]] definition>>= 4979foldFuns <- c("+", "-", "*", "/", "^", "(", 4980 ">", ">=", "==", "!=", "<", "<=", "||", "&&", "!", 4981 "|", "&", "%%", 4982 "c", "rep", ":", 4983 "abs", "acos", "acosh", "asin", "asinh", "atan", "atan2", 4984 "atanh", "ceiling", "choose", "cos", "cosh", "exp", "expm1", 4985 "floor", "gamma", "lbeta", "lchoose", "lgamma", "log", "log10", 4986 "log1p", "log2", "max", "min", "prod", "range", "round", 4987 "seq_along", "seq.int", "seq_len", "sign", "signif", 4988 "sin", "sinh", "sqrt", "sum", "tan", "tanh", "trunc", 4989 "baseenv", "emptyenv", "globalenv", 4990 "Arg", "Conj", "Im", "Mod", "Re", 4991 "is.R") 4992@ %def foldFuns 4993[[getFoldFun]] checks the called function against this list and 4994whether the binding for the variable can be assumed to be from the 4995base package. If then returns the appropriate function from the base 4996package or [[NULL]]. 4997<<[[getFoldFun]] function>>= 4998## For now assume all foldable functions are in base 4999getFoldFun <- function(var, cntxt) { 5000 var <- as.character(var) 5001 if (var %in% foldFuns && isBaseVar(var, cntxt)) { 5002 val <- get(var, .BaseNamespaceEnv) 5003 if (is.function(val)) 5004 val 5005 else 5006 NULL 5007 } 5008 else NULL 5009} 5010@ %def getFoldFun 5011 5012 5013\section{More top level functions} 5014\subsection{Compiling closures} 5015The function [[cmpfun]] is for compiling a closure. The body is 5016compiled with [[genCode]] and combined with the closure's formals and 5017environment to form a compiled closure. The [[.Internal]] function 5018[[bcClose]] does this. Some additional fiddling is needed if the 5019closure is an S4 generic. The need for the [[asS4]] bit seems a bit 5020odd but it is apparently needed at this point. 5021<<[[cmpfun]] function>>= 5022cmpfun <- function(f, options = NULL) { 5023 type <- typeof(f) 5024 if (type == "closure") { 5025 cntxt <- make.toplevelContext(makeCenv(environment(f)), options) 5026 ncntxt <- make.functionContext(cntxt, formals(f), body(f)) 5027 if (mayCallBrowser(body(f), ncntxt)) 5028 return(f) 5029 if (typeof(body(f)) != "language" || body(f)[1] != "{") 5030 loc <- list(expr = body(f), srcref = getExprSrcref(f)) 5031 else 5032 loc <- NULL 5033 b <- genCode(body(f), ncntxt, loc = loc) 5034 val <- .Internal(bcClose(formals(f), b, environment(f))) 5035 attrs <- attributes(f) 5036 if (! is.null(attrs)) 5037 attributes(val) <- attrs 5038 if (isS4(f)) ## **** should this really be needed?? 5039 val <- asS4(val) 5040 val 5041 } 5042 else if (type == "builtin" || type == "special") 5043 f 5044 else stop("cannot compile a non-function") 5045} 5046@ %def cmpfun 5047 5048For use in compiling packages and in JIT compilation it is useful to 5049have a variant that returns the uncompiled function if there is an 5050error during compilation. 5051<<[[tryCmpfun]] function>>= 5052tryCmpfun <- function(f) 5053 tryCatch(cmpfun(f), error = function(e) { 5054 notifyCompilerError(paste(e$message, "at", deparse(e$call))) 5055 f 5056 }) 5057@ %def tryCmpfun 5058 5059A similar utility for expressions for use in JIT compilation of loops: 5060<<[[tryCompile]] function>>= 5061tryCompile <- function(e, ...) 5062 tryCatch(compile(e, ...), error = function(err) { 5063 notifyCompilerError(paste(err$message, "at", deparse(err$call))) 5064 e 5065 }) 5066@ %def tryCompile 5067 5068If a function contains a call to [[browser]], it should not be compiled, 5069because the byte-code interpreter does not support command-by-command 5070execution ("n"). This function explores the AST of a closure to find out if 5071it may contain a call to [[browser]]: 5072 5073<<[[mayCallBrowser]] function>>= 5074mayCallBrowser <- function(e, cntxt) { 5075 if (typeof(e) == "language") { 5076 fun <- e[[1]] 5077 if (typeof(fun) == "symbol") { 5078 fname <- as.character(fun) 5079 if (fname == "browser") ## not checking isBaseVar to err on the 5080 ## positive 5081 TRUE 5082 else if (fname == "function" && isBaseVar(fname, cntxt)) 5083 FALSE 5084 else 5085 mayCallBrowserList(e[-1], cntxt) 5086 } 5087 else 5088 mayCallBrowserList(e, cntxt) 5089 } 5090 else FALSE 5091} 5092@ %def mayCallBrowser 5093 5094A version that operates on a list of expressions is 5095<<[[mayCallBrowserList]] function>>= 5096mayCallBrowserList <- function(elist, cntxt) { 5097 for (a in as.list(elist)) 5098 if (! missing(a) && mayCallBrowser(a, cntxt)) 5099 return(TRUE) 5100 FALSE 5101} 5102@ %def mayCallBrowserList 5103 5104 5105\subsection{Compiling and loading files} 5106A file can be compiled with [[cmpfile]] and loaded with [[loadcmp]]. 5107[[cmpfile]] reads in the expressions, compiles them, and serializes 5108the list of compiled expressions by calling the [[.Internal]] function 5109[[save.to.file]]. 5110<<[[cmpfile]] function>>= 5111cmpfile <- function(infile, outfile, ascii = FALSE, env = .GlobalEnv, 5112 verbose = FALSE, options = NULL, version = NULL) { 5113 if (! is.environment(env) || ! identical(env, topenv(env))) 5114 stop("'env' must be a top level environment") 5115 <<create [[outfile]] if argument is missing>> 5116 <<check that [[infile]] and [[outfile]] are not the same>> 5117 forms <- parse(infile) 5118 nforms <- length(forms) 5119 srefs <- attr(forms, "srcref") 5120 if (nforms > 0) { 5121 expr.needed <- 1000 5122 expr.old <- getOption("expressions") 5123 if (expr.old < expr.needed) { 5124 options(expressions = expr.needed) 5125 on.exit(options(expressions = expr.old)) 5126 } 5127 cforms <- vector("list", nforms) 5128 cenv <- makeCenv(env) 5129 cntxt <- make.toplevelContext(cenv, options) 5130 cntxt$env <- addCenvVars(cenv, findLocalsList(forms, cntxt)) 5131 for (i in 1:nforms) { 5132 e <- forms[[i]] 5133 sref <- srefs[[i]] 5134 if (verbose) { 5135 if (typeof(e) == "language" && e[[1]] == "<-" && 5136 typeof(e[[3]]) == "language" && e[[3]][[1]] == "function") 5137 cat(paste0("compiling function \"", e[[2]], "\"\n")) 5138 else 5139 cat(paste("compiling expression", deparse(e, 20)[1], 5140 "...\n")) 5141 } 5142 if (!mayCallBrowser(e, cntxt)) 5143 cforms[[i]] <- genCode(e, cntxt, 5144 loc = list(expr = e, srcref = sref)) 5145 } 5146 cat(gettextf("saving to file \"%s\" ... ", outfile)) 5147 .Internal(save.to.file(cforms, outfile, ascii, version)) 5148 cat(gettext("done"), "\n", sep = "") 5149 } 5150 else warning("empty input file; no output written"); 5151 invisible(NULL) 5152} 5153@ %def cmpfile 5154 5155The default output file name is the base name of the input file with a 5156[[.Rc]] extension. 5157<<create [[outfile]] if argument is missing>>= 5158if (missing(outfile)) { 5159 basename <- sub("\\.[a-zA-Z0-9]$", "", infile) 5160 outfile <- paste0(basename, ".Rc") 5161} 5162@ %def 5163As a precaution it is useful to check that [[infile]] and [[outfile]] 5164are not the same and signal an error if they are. 5165<<check that [[infile]] and [[outfile]] are not the same>>= 5166if (infile == outfile) 5167 stop("input and output file names are the same") 5168@ %def 5169 5170The [[loadcmp]] reads in the serialized list of expressions 5171using the [[.Internal]] function [[load.from.file]]. The compiled 5172expressions are then evaluated in the global environment. 5173<<[[loadcmp]] function>>= 5174loadcmp <- function (file, envir = .GlobalEnv, chdir = FALSE) { 5175 if (!(is.character(file) && file.exists(file))) 5176 stop(gettextf("file '%s' does not exist", file), domain = NA) 5177 exprs <- .Internal(load.from.file(file)) 5178 if (length(exprs) == 0) 5179 return(invisible()) 5180 if (chdir && (path <- dirname(file)) != ".") { 5181 owd <- getwd() 5182 on.exit(setwd(owd), add = TRUE) 5183 setwd(path) 5184 } 5185 for (i in exprs) { 5186 eval(i, envir) 5187 } 5188 invisible() 5189} 5190@ %def loadcmp 5191[[loadcmp]] is the analog to [[source]] for compiled files. 5192 5193Two additional functions that are currently not exported or used are 5194[[cmpframe]] and [[cmplib]]. They should probably be removed. 5195<<[[cmpframe]] function>>= 5196cmpframe <- function(inpos, file) { 5197 expr.needed <- 1000 5198 expr.old <- getOption("expressions") 5199 if (expr.old < expr.needed) 5200 options(expressions = expr.needed) 5201 on.exit(options(expressions = expr.old)) 5202 5203 attach(NULL, name="<compiled>") 5204 inpos <- inpos + 1 5205 outpos <- 2 5206 on.exit(detach(pos=outpos), add=TRUE) 5207 5208 for (f in ls(pos = inpos, all.names = TRUE)) { 5209 def <- get(f, pos = inpos) 5210 if (typeof(def) == "closure") { 5211 cat(gettextf("compiling '%s'", f), "\n", sep = "") 5212 fc <- cmpfun(def) 5213 assign(f, fc, pos=outpos) 5214 } 5215 } 5216 cat(gettextf("saving to file \"%s\" ... ", file)) 5217 save(list = ls(pos = outpos, all.names = TRUE), file = file) 5218 cat(gettext("done"), "\n", sep = "") 5219} 5220@ %def cmpframe 5221 5222<<[[cmplib]] function>>= 5223cmplib <- function(package, file) { 5224 package <- as.character(substitute(package)) 5225 pkgname <- paste("package", package, sep = ":") 5226 pos <- match(pkgname, search()); 5227 if (missing(file)) 5228 file <- paste0(package,".Rc") 5229 if (is.na(pos)) { 5230 library(package, character.only = TRUE) 5231 pos <- match(pkgname, search()); 5232 on.exit(detach(pos=match(pkgname, search()))) 5233 } 5234 cmpframe(pos, file) 5235} 5236@ %def cmplib 5237 5238 5239\subsection{Enabling implicit compilation} 5240<<[[enableJIT]] function>>= 5241enableJIT <- function(level) 5242 .Internal(enableJIT(level)) 5243@ %def enableJIT 5244<<[[compilePKGS]] function>>= 5245compilePKGS <- function(enable) 5246 .Internal(compilePKGS(enable)) 5247@ %def compilePKGS 5248 5249 5250\subsection{Setting compiler options} 5251The [[setCompilerOptions]] function provides a means for users to 5252adjust the default compiler option values. This interface is 5253experimental and may change. 5254<<[[setCompilerOptions]] function>>= 5255setCompilerOptions <- function(...) { 5256 options <- list(...) 5257 nm <- names(options) 5258 for (n in nm) 5259 if (! exists(n, compilerOptions)) 5260 stop(gettextf("'%s' is not a valid compiler option", n), 5261 domain = NA) 5262 old <- list() 5263 newOptions <- as.list(compilerOptions) # copy options 5264 for (n in nm) { 5265 op <- options[[n]] 5266 switch(n, 5267 optimize = { 5268 op <- as.integer(op) 5269 if (length(op) == 1 && 0 <= op && op <= 3) { 5270 old <- c(old, list(optimize = 5271 compilerOptions$optimize)) 5272 newOptions$optimize <- op 5273 } 5274 }, 5275 suppressAll = { 5276 if (identical(op, TRUE) || identical(op, FALSE)) { 5277 old <- c(old, list(suppressAll = 5278 compilerOptions$suppressAll)) 5279 newOptions$suppressAll <- op 5280 } 5281 }, 5282 suppressNoSuperAssignVar = { 5283 if (isTRUE(op) || isFALSE(op)) { 5284 old <- c(old, list( 5285 suppressNoSuperAssignVar = 5286 compilerOptions$suppressNoSuperAssignVar)) 5287 newOptions$suppressNoSuperAssignVar <- op 5288 } 5289 }, 5290 suppressUndefined = { 5291 if (identical(op, TRUE) || identical(op, FALSE) || 5292 is.character(op)) { 5293 old <- c(old, list(suppressUndefined = 5294 compilerOptions$suppressUndefined)) 5295 newOptions$suppressUndefined <- op 5296 } 5297 }) 5298 } 5299 jitEnabled <- enableJIT(-1) 5300 if (checkCompilerOptions(jitEnabled, newOptions)) 5301 for(n in names(newOptions)) # commit the new options 5302 assign(n, newOptions[[n]], compilerOptions) 5303 invisible(old) 5304} 5305@ %def 5306 5307For now, a [[.onLoad]] function is used to allow all warning to be 5308suppressed. This is probably useful for building packages, since the 5309way lazy loading is done means variables defined in shared libraries 5310are not available and produce a raft of warnings. The [[.onLoad]] 5311function also allows undefined variables to be suppressed and the 5312optimization level to be specified using environment variables. 5313<<[[.onLoad]] function>>= 5314.onLoad <- function(libname, pkgname) { 5315 envAsLogical <- function(varName) { 5316 value = Sys.getenv(varName) 5317 if (value == "") 5318 NA 5319 else 5320 switch(value, 5321 "1"=, "TRUE"=, "true"=, "True"=, "yes"=, "Yes"= TRUE, 5322 "0"=, "FALSE"=,"false"=,"False"=, "no"=, "No" = FALSE, 5323 stop(gettextf("invalid environment variable value: %s==%s", 5324 varName, value))) 5325 } 5326 val <- envAsLogical("R_COMPILER_SUPPRESS_ALL") 5327 if (!is.na(val)) 5328 setCompilerOptions(suppressAll = val) 5329 val <- envAsLogical("R_COMPILER_SUPPRESS_UNDEFINED") 5330 if (!is.na(val)) 5331 setCompilerOptions(suppressUndefined = val) 5332 val <- envAsLogical("R_COMPILER_SUPPRESS_NO_SUPER_ASSIGN_VAR") 5333 if (!is.na(val)) 5334 setCompilerOptions(suppressNoSuperAssignVar = val) 5335 if (Sys.getenv("R_COMPILER_OPTIMIZE") != "") 5336 tryCatch({ 5337 lev <- as.integer(Sys.getenv("R_COMPILER_OPTIMIZE")) 5338 if (0 <= lev && lev <= 3) 5339 setCompilerOptions(optimize = lev) 5340 }, error = function(e) e, warning = function(w) w) 5341} 5342@ %def .onLoad 5343 5344When [[enableJIT]] is set to 3, loops should be compiled before executing. 5345However, if the [[optimize]] option is set to 0 or 1, a compiled loop will 5346call to the same primitive function as is used by the AST interpretter (e.g. 5347[[do_for]]), and the compilation would run into infinite recursion. 5348[[checkCompilerOptions]] will detect invalid combinations of [[enableJIT]] 5349and [[optimize]] and report a warning. 5350%% **** could also change the interface and atomically set enableJIT and optimize 5351<<[[checkCompilerOptions]] function>>= 5352checkCompilerOptions <- function(jitEnabled, options = NULL) { 5353 optimize <- getCompilerOption("optimize", options) 5354 if (jitEnabled <= 2 || optimize >= 2) 5355 TRUE 5356 else { 5357 stop(gettextf( 5358 "invalid compiler options: optimize(==%d)<2 and jitEnabled(==%d)>2", 5359 optimize, jitEnabled)) 5360 FALSE 5361 } 5362} 5363@ %def checkCompilerOptions 5364 5365 5366\subsection{Disassembler} 5367A minimal disassembler is provided by [[disassemble]]. This is 5368primarily useful for debugging the compiler. A more readable output 5369representation might be nice to have. It would also probably make 5370sense to give the result a class and write a print method. 5371<<[[disassemble]] function>>= 5372disassemble <- function(code) { 5373 .CodeSym <- as.name(".Code") 5374 disasm.const<-function(x) 5375 if (typeof(x)=="list" && length(x) > 0 && identical(x[[1]], .CodeSym)) 5376 disasm(x) else x 5377 disasm <-function(code) { 5378 code[[2]]<-bcDecode(code[[2]]) 5379 code[[3]]<-lapply(code[[3]], disasm.const) 5380 code 5381 } 5382 if (typeof(code)=="closure") { 5383 code <- .Internal(bodyCode(code)) 5384 if (typeof(code) != "bytecode") 5385 stop("function is not compiled") 5386 } 5387 dput(disasm(.Internal(disassemble(code)))) 5388} 5389@ %def disassemble 5390 5391The [[.Internal]] function [[disassemble]] extracts the numeric code 5392vector and constant pool. The function [[bcDecode]] uses the 5393[[Opcodes.names]] array to translate the numeric opcodes into symbolic 5394ones. At this point not enough information is available in a 5395reasonable place to also convert labels back to symbolic form. 5396<<[[bcDecode]] function>>= 5397bcDecode <- function(code) { 5398 n <- length(code) 5399 ncode <- vector("list", n) 5400 ncode[[1]] <- code[1] # version number 5401 i <- 2 5402 while (i <= n) { 5403 name<-Opcodes.names[code[i]+1] 5404 argc<-Opcodes.argc[[code[i]+1]] 5405 ncode[[i]] <- as.name(name) 5406 i<-i+1 5407 if (argc > 0) 5408 for (j in 1:argc) { 5409 ncode[[i]]<-code[i] 5410 i<-i+1 5411 } 5412 } 5413 ncode 5414} 5415@ %def bcDecode 5416 5417 5418\section{Improved subset and sub-assignment handling} 5419This section describes changes that allow subset and subassign 5420operations to inmost case be handled without allocating list of the 5421index arguments --- the arguments are passed on the stack instead. 5422The function [[cmpSubsetDispatch]] is analogous to [[cmpDispatch]] 5423described above. the [[dflt.op]] argument passed information about the 5424instruction to be emitted. For instructions designed for a particular 5425number of arguments the [[rank]] component is [[FALSE]] and no index 5426count is emitted; this is used for [[VECSUBSET.OP]] and 5427[[MATSUBSET.OP]] instructions. If [[rank]] is [[TRUE]], then the 5428number of indices is emitted as an operand; this is used by the 5429[[SUBSET_N.OP]] instruction. 5430<<[[cmpSubsetDispatch]] function>>= 5431cmpSubsetDispatch <- function(start.op, dflt.op, e, cb, cntxt) { 5432 if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3) 5433 cntxt$stop(gettext("cannot compile this expression"), cntxt, 5434 loc = cb$savecurloc()) 5435 else { 5436 oe <- e[[2]] 5437 if (missing(oe)) 5438 cntxt$stop(gettext("cannot compile this expression"), cntxt, 5439 loc = cb$savecurloc()) 5440 ncntxt <- make.argContext(cntxt) 5441 ci <- cb$putconst(e) 5442 label <- cb$makelabel() 5443 cmp(oe, cb, ncntxt) 5444 cb$putcode(start.op, ci, label) 5445 indices <- e[-c(1, 2)] 5446 cmpIndices(indices, cb, ncntxt) 5447 if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices)) 5448 else cb$putcode(dflt.op$code, ci) 5449 cb$putlabel(label) 5450 if (cntxt$tailcall) cb$putcode(RETURN.OP) 5451 TRUE 5452 } 5453} 5454@ %def cmpSubsetDispatch 5455 5456Index expressions are compiled by 5457<<[[cmpIndices]] function>>= 5458cmpIndices <- function(indices, cb, cntxt) { 5459 n <- length(indices) 5460 needInc <- FALSE 5461 for (i in seq_along(indices)) 5462 if (i > 1 && checkNeedsInc(indices[[i]], cntxt)) { 5463 needInc <- TRUE 5464 break 5465 } 5466 for (i in seq_along(indices)) { 5467 cmp(indices[[i]], cb, cntxt, TRUE) 5468 if (needInc && i < n) cb$putcode(INCLNK.OP) 5469 } 5470 if (needInc) { 5471 if (n == 2) cb$putcode(DECLNK.OP) 5472 else if (n > 2) cb$putcode(DECLNK_N.OP, n - 1) 5473 } 5474} 5475@ %def cmpIndices 5476This adds instructions to increment and later decrement link counts on 5477previously computed values to prevent later computations from 5478modifying earlier ones. Eventually it should be possible to eliminate 5479some of these increment/decrement instructions in an optimization 5480phase. 5481 5482The subsetting handlers fall back to using [[cmpDispatch]] if there 5483are any named arguments or if an error would need to be signaled (we 5484could issue a compiler warning at this point as well). If all 5485arguments are unnamed and there are no dots then [[cmpSubsetDispatch]] 5486is used; the instruction emitted depends on the argument count. 5487<<inline handlers for subsetting>>= 5488setInlineHandler("[", function(e, cb, cntxt) { 5489 if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3) 5490 cmpDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, e, cb, cntxt) ## punt 5491 else { 5492 nidx <- length(e) - 2; 5493 if (nidx == 1) 5494 dflt.op <- list(code = VECSUBSET.OP, rank = FALSE) 5495 else if (nidx == 2) 5496 dflt.op <- list(code = MATSUBSET.OP, rank = FALSE) 5497 else 5498 dflt.op <- list(code = SUBSET_N.OP, rank = TRUE) 5499 cmpSubsetDispatch(STARTSUBSET_N.OP, dflt.op, e, cb, cntxt) 5500 } 5501}) 5502 5503setInlineHandler("[[", function(e, cb, cntxt) { 5504 if (dots.or.missing(e) || ! is.null(names(e)) || length(e) < 3) 5505 cmpDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, e, cb, cntxt) ## punt 5506 else { 5507 nidx <- length(e) - 2; 5508 if (nidx == 1) 5509 dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE) 5510 else if (nidx == 2) 5511 dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE) 5512 else 5513 dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE) 5514 cmpSubsetDispatch(STARTSUBSET2_N.OP, dflt.op, e, cb, cntxt) 5515 } 5516}) 5517@ 5518 5519Similarly, [[cmpSubassignDispatch]] is a variant of 5520[[cmpSetterDispatch]] that passes index arguments on the stack and 5521emits an index count if necessary. 5522<<[[cmpSubassignDispatch]] function>>= 5523cmpSubassignDispatch <- function(start.op, dflt.op, afun, place, call, cb, 5524 cntxt) { 5525 if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3) 5526 cntxt$stop(gettext("cannot compile this expression"), cntxt, 5527 loc = cb$savecurloc()) 5528 else { 5529 ci <- cb$putconst(call) 5530 label <- cb$makelabel() 5531 cb$putcode(start.op, ci, label) 5532 indices <- place[-c(1, 2)] 5533 cmpIndices(indices, cb, cntxt) 5534 if (dflt.op$rank) cb$putcode(dflt.op$code, ci, length(indices)) 5535 else cb$putcode(dflt.op$code, ci) 5536 cb$putlabel(label) 5537 TRUE 5538 } 5539} 5540@ %def cmpSubassignDispatch 5541 5542Again the handlers fall back to [[cmpSetterDispatch]] if there are 5543named arguments or other complication. 5544<<inline handlers for subassignment>>= 5545setSetterInlineHandler("[<-", function(afun, place, origplace, call, cb, cntxt) { 5546 if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3) 5547 cmpSetterDispatch(STARTSUBASSIGN.OP, DFLTSUBASSIGN.OP, 5548 afun, place, call, cb, cntxt) ## punt 5549 else { 5550 nidx <- length(place) - 2 5551 if (nidx == 1) 5552 dflt.op <- list(code = VECSUBASSIGN.OP, rank = FALSE) 5553 else if (nidx == 2) 5554 dflt.op <- list(code = MATSUBASSIGN.OP, rank = FALSE) 5555 else 5556 dflt.op <- list(code = SUBASSIGN_N.OP, rank = TRUE) 5557 cmpSubassignDispatch(STARTSUBASSIGN_N.OP, dflt.op, afun, place, call, 5558 cb, cntxt) 5559 } 5560}) 5561 5562setSetterInlineHandler("[[<-", function(afun, place, origplace, call, cb, cntxt) { 5563 if (dots.or.missing(place) || ! is.null(names(place)) || length(place) < 3) 5564 cmpSetterDispatch(STARTSUBASSIGN2.OP, DFLTSUBASSIGN2.OP, 5565 afun, place, call, cb, cntxt) ## punt 5566 else { 5567 nidx <- length(place) - 2 5568 if (nidx == 1) 5569 dflt.op <- list(code = VECSUBASSIGN2.OP, rank = FALSE) 5570 else if (nidx == 2) 5571 dflt.op <- list(code = MATSUBASSIGN2.OP, rank = FALSE) 5572 else 5573 dflt.op <- list(code = SUBASSIGN2_N.OP, rank = TRUE) 5574 cmpSubassignDispatch(STARTSUBASSIGN2_N.OP, dflt.op, afun, place, call, 5575 cb, cntxt) 5576 } 5577}) 5578@ 5579 5580Similarly, again, [[cmpSubsetGetterDispatch]] is a variant of 5581[[cmpGetterDispatch]] that passes index arguments on the stack. 5582<<[[cmpSubsetGetterDispatch]] function>>= 5583cmpSubsetGetterDispatch <- function(start.op, dflt.op, call, cb, cntxt) { 5584 if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3) 5585 cntxt$stop(gettext("cannot compile this expression"), cntxt, 5586 loc = cb$savecurloc()) 5587 else { 5588 ci <- cb$putconst(call) 5589 end.label <- cb$makelabel() 5590 cb$putcode(DUP2ND.OP) 5591 cb$putcode(start.op, ci, end.label) 5592 indices <- call[-c(1, 2)] 5593 cmpIndices(indices, cb, cntxt) 5594 if (dflt.op$rank) 5595 cb$putcode(dflt.op$code, ci, length(indices)) 5596 else 5597 cb$putcode(dflt.op$code, ci) 5598 cb$putlabel(end.label) 5599 cb$putcode(SWAP.OP) 5600 TRUE 5601 } 5602} 5603@ %def cmpSubsetGetterDispatch 5604 5605And again the handlers fall back to [[cmpGetterDispatch]] if necessary. 5606<<inline handlers for subset getters>>= 5607setGetterInlineHandler("[", function(call, cb, cntxt) { 5608 if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3) 5609 cmpGetterDispatch(STARTSUBSET.OP, DFLTSUBSET.OP, call, cb, cntxt) 5610 else { 5611 nidx <- length(call) - 2; 5612 if (nidx == 1) 5613 dflt.op <- list(code = VECSUBSET.OP, rank = FALSE) 5614 else if (nidx == 2) 5615 dflt.op <- list(code = MATSUBSET.OP, rank = FALSE) 5616 else 5617 dflt.op <- list(code = SUBSET_N.OP, rank = TRUE) 5618 cmpSubsetGetterDispatch(STARTSUBSET_N.OP, dflt.op, call, cb, cntxt) 5619 } 5620}) 5621 5622setGetterInlineHandler("[[", function(call, cb, cntxt) { 5623 if (dots.or.missing(call) || ! is.null(names(call)) || length(call) < 3) 5624 cmpGetterDispatch(STARTSUBSET2.OP, DFLTSUBSET2.OP, call, cb, cntxt) 5625 else { 5626 nidx <- length(call) - 2; 5627 if (nidx == 1) 5628 dflt.op <- list(code = VECSUBSET2.OP, rank = FALSE) 5629 else if (nidx == 2) 5630 dflt.op <- list(code = MATSUBSET2.OP, rank = FALSE) 5631 else 5632 dflt.op <- list(code = SUBSET2_N.OP, rank = TRUE) 5633 cmpSubsetGetterDispatch(STARTSUBSET2_N.OP, dflt.op, call, cb, cntxt) 5634 } 5635}) 5636@ 5637 5638\section{Discussion and future directions} 5639Despite its long gestation period this compiler should be viewed as a 5640first pass at creating a byte code compiler for R. The compiler 5641itself is very simple in design as a single pass compiler with no 5642separate optimization phases. Similarly the virtual machine uses a 5643very simple stack design. While the compiler already achieves some 5644useful performance improvements on loop-intensive code, more can be 5645achieved with more sophisticated approaches. This will be explored in 5646future work. 5647 5648A major objective of this first version was to reproduce R's 5649interpreted semantics with as few departures as possible while at the 5650same time optimizing a number of aspect of the execution process. The 5651inlining rules controlled by an optimization level setting seem to 5652provide a good way of doing this, and the default optimization setting 5653seems to be reasonably effective. Mechanisms for adjusting the 5654default settings via declarations will be explored and added in the 5655near future. 5656 5657Future versions of the compiler and the engine will explore a number 5658of alternative designs. Switching to a register-based virtual machine 5659will be explored fairly soon. Preliminary experiments suggest that 5660this can provide significant improvements in the case of tight loops 5661by allowing allocation of intermediate results to be avoided in many 5662cases. It may be possible at least initially to keep the current 5663compiler ant just translate the stack-based machine code to a 5664register-based code. 5665 5666Another direction that will be explored is whether sequences of 5667arithmetic and other numerical operations can be fused and possibly 5668vectorized. Again preliminary experiments are promising, but more 5669exploration is needed. 5670 5671Other improvements to be examined may affect interpreted code as much 5672as compiled code. These include more efficient environment 5673representations and more efficient calling conventions. 5674 5675%% **** add some benchmarks 5676%% **** comment on engine 5677 5678%% **** lots of other builtins, specials, and .Internals 5679%% **** controlling compiler warnings 5680%% **** merging in codetools features 5681%% **** put in install-time tests of assumptions about BUILTINs, etc. 5682 5683%% **** switch to register-based engine 5684%% **** make function calls more efficient 5685%% **** try to stay within a single bceval call 5686 5687%% **** think about optimizing things like mean? 5688 5689%% **** jit that compiles all expressions?? 5690%% **** can it record code for expr/env pairs or some such? 5691%% **** can it inline primitives at that point? 5692 5693%% **** Stuff to think about: 5694%% **** alternate environment representation for compiler 5695%% **** optimizing function calls in general 5696%% **** avoiding matching for BOA calls 5697 5698%% **** Think about different ways of handling environments. Should every op 5699%% **** return a new env object that includes (possible) new local vars? 5700 5701%% **** Useful to be able to distinguish ... in args from assigned-to ... 5702 5703%% **** matrix subsetting has to be slow because of the way dim is stored. 5704%% **** might make sense to explicitly compile as multiple operations and 5705%% **** invariant hoisting out of loops for loops? 5706 5707%% **** look at tail call optimization -- pass call and parent.frame?? 5708%% **** eliminating variables not needed? 5709%% **** alternate builtin call implementations? 5710 5711%% **** install compiled promises in code body?? 5712 5713%% **** catch errors at dispatching of inliners; fall back to runtime error 5714 5715 5716\appendix 5717\section{General utility functions} 5718This appendix provides a few general utility functions. 5719 5720The utility function [[pasteExpr]] is used in the error messages. 5721%% **** use elipsis instead of collapse?? 5722%% **** use error context or catch errors? 5723%% **** maybe don't need expression if we catch errors? 5724<<[[pasteExpr]] function>>= 5725pasteExpr <- function(e, prefix = "\n ") { 5726 de <- deparse(e) 5727 if (length(de) == 1) sQuote(de) 5728 else paste(prefix, deparse(e), collapse="") 5729} 5730@ %def pasteExpr 5731 5732The function [[dots.or.missing]] checks the argument list for any 5733missing or [[...]] arguments: 5734<<[[dots.or.missing]] function>>= 5735dots.or.missing <- function(args) { 5736 for (i in 1:length(args)) { 5737 a <-args[[i]] 5738 if (missing(a)) return(TRUE) #**** better test? 5739 if (typeof(a) == "symbol" && a == "...") return(TRUE) 5740 } 5741 return(FALSE) 5742} 5743@ %def dots.or.missing 5744 5745The function [[any.dots]] is defined as 5746<<[[any.dots]] function>>= 5747any.dots <- function(args) { 5748 for (i in 1:length(args)) { 5749 a <-args[[i]] 5750 if (! missing(a) && typeof(a) == "symbol" && a == "...") 5751 return(TRUE) 5752 } 5753 return(FALSE) 5754} 5755@ %def any.dots 5756 5757The utility function [[is.ddsym]] is used to recognize symbols of the 5758form [[..1]], [[..2]], and so on. 5759<<[[is.ddsym]] function>>= 5760is.ddsym <- function(name) { 5761 (is.symbol(name) || is.character(name)) && 5762 length(grep("^\\.\\.[0-9]+$", as.character(name))) != 0 5763} 5764@ %def is.ddsym 5765 5766[[missingArgs]] takes an argument list for a call a logical vector 5767indicating for each argument whether it is empty (missing) or not. 5768<<[[missingArgs]] function>>= 5769missingArgs <- function(args) { 5770 val <- logical(length(args)) 5771 for (i in seq_along(args)) { 5772 a <- args[[i]] 5773 if (missing(a)) 5774 val[i] <- TRUE 5775 else 5776 val[i] <- FALSE 5777 } 5778 val 5779} 5780@ %def missingArgs 5781 5782 5783 5784\section{Environment utilities} 5785This appendix presents some utilities for computations on environments. 5786 5787The function [[frameTypes]] takes an environment argument and returns 5788a character vector with elements for each frame in the environment 5789classifying the frame as local, namespace, or global. The environment 5790is assumed to be a standard evaluation environment that contains 5791[[.GlobalEnv]] as one of its parents. It does this by computing the 5792number of local, namespace, and global frames and then generating the 5793result using [[rep]]. 5794<<[[frameTypes]] function>>= 5795frameTypes <- function(env) { 5796 top <- topenv(env) 5797 empty <- emptyenv() 5798 <<find the number [[nl]] of local frames>> 5799 <<find the number [[nn]] of namespace frames>> 5800 <<find the number [[ng]] of global frames>> 5801 rep(c("local", "namespace", "global"), c(nl, nn, ng)) 5802} 5803@ %def frameTypes 5804The number of local frames is computes by marching down the parent 5805frames with [[parent.env]] until the top level environment is reached. 5806<<find the number [[nl]] of local frames>>= 5807nl <- 0 5808while (! identical(env, top)) { 5809 if (isNamespace(env)) 5810 stop("namespace found within local environments") 5811 env <- parent.env(env) 5812 nl <- nl + 1 5813 if (identical(env, empty)) 5814 stop("not a proper evaluation environment") 5815} 5816@ %def 5817The number of namespace frames is computed by continuing down the 5818parent frames until [[.GlobalEnv]] is reached. 5819<<find the number [[nn]] of namespace frames>>= 5820nn <- 0 5821if (isNamespace(env)) { 5822 while (! identical(env, .GlobalEnv)) { 5823 if (!isNamespace(env)) { 5824 name <- attr(env, "name") 5825 if (!is.character(name) || !startsWith(name, "imports:")) 5826 stop("non-namespace found within namespace environments") 5827 } 5828 env <- parent.env(env) 5829 nn <- nn + 1 5830 if (identical(env, empty)) 5831 stop("not a proper evaluation environment") 5832 } 5833} 5834@ %def 5835Finally the number of global frames is computed by continuing until 5836the empty environment is reached. An alternative would be to compute 5837the length of the result returned by [[search]] 5838<<find the number [[ng]] of global frames>>= 5839ng <- 0 5840while (! identical(env, empty)) { 5841 if (isNamespace(env)) 5842 stop("namespace found within global environments") 5843 env <- parent.env(env) 5844 ng <- ng + 1 5845} 5846@ 5847 5848The function [[findHomeNS]] takes a variable name and a namespace 5849frame, or a namespace imports frame, and returns the namespace frame 5850in which the variable was originally defined, if any. The code assumes 5851that renaming has not been used (it may no longer be supported in the 5852namespace implementation in any case). Just in case, an attempt is 5853made to check for renaming. The result returned is the namaspace 5854frame for the namespace in which the variable was defined or [[NULL]] 5855if the variable was not defined in the specified namespace or one of 5856its imports, or if the home namespace cannot be determined. 5857<<[[findHomeNS]] function>>= 5858## Given a symbol name and a namespace environment (or a namespace 5859## imports environment) find the namespace in which the symbol's value 5860## was originally defined. Returns NULL if the symbol is not found via 5861## the namespace. 5862findHomeNS <- function(sym, ns, cntxt) { 5863 <<if [[ns]] is an imports frame find the corresponding namespace>> 5864 if (exists(sym, ns, inherits = FALSE)) 5865 ns 5866 else if (exists(".__NAMESPACE__.", ns, inherits = FALSE)) { 5867 <<search the imports for [[sym]]>> 5868 NULL 5869 } 5870 else NULL 5871} 5872@ %def findHomeNS 5873 5874If the [[ns]] argument is not a namespace frame it should be the 5875imports frame of a namespace. Such an imports frame should have a 5876[[name]] attribute or the form [["imports:foo"]] it it is associated 5877with namespace [["foo"]]. This is used to find the namespace frame 5878that owns the imports frame in this case, and this frames is then 5879assigned to [[ns]]. 5880<<if [[ns]] is an imports frame find the corresponding namespace>>= 5881if (! isNamespace(ns)) { 5882 ## As a convenience this allows for 'ns' to be the imports fame 5883 ## of a namespace. It appears that these now have a 'name' 5884 ## attribute of the form 'imports:foo' if 'foo' is the 5885 ## namespace. 5886 name <- attr(ns, "name") 5887 if (is.null(name)) 5888 cntxt$stop("'ns' must be a namespace or a namespace imports environment", 5889 cntxt) 5890 ns <- getNamespace(sub("imports:", "", attr(ns, "name"))) 5891} 5892@ %def 5893 5894The imports are searched in reverse order since in the case of name 5895conflicts the last one imported will take precedence. Full imports 5896via an [[import]] directive have to be handled differently than 5897selective imports created with [[importFrom]] directives. 5898<<search the imports for [[sym]]>>= 5899imports <- get(".__NAMESPACE__.", ns)$imports 5900for (i in rev(seq_along(imports))) { 5901 iname <- names(imports)[i] 5902 ins <- getNamespace(iname) 5903 if (identical(imports[[i]], TRUE)) { 5904 <<search in a full import>> 5905 } 5906 else { 5907 <<search in a selective import>> 5908 } 5909} 5910@ %def 5911 5912If an entry in the [[imports]] specification for the import source 5913namespace [[ins]] has value [[TRUE]], then all exports of the [[ins]] 5914have been imported. If [[sym]] is in the exports then the result of a 5915recursive call to [[findHomeNS]] is returned. 5916<<search in a full import>>= 5917if (identical(ins, .BaseNamespaceEnv)) 5918 exports <- .BaseNamespaceEnv 5919else 5920 exports <- get(".__NAMESPACE__.", ins)$exports 5921if (exists(sym, exports, inherits = FALSE)) 5922 return(findHomeNS(sym, ins, cntxt)) 5923@ %def 5924 5925For selective imports the [[imports]] entry is a named character 5926vector mapping export name to import name. In the absence of renaming 5927the names should match the values; if this is not the case [[NULL]] is 5928returned. Otherwise, a match results again in returning a recursive 5929call to [[findHomeNS]]. 5930<<search in a selective import>>= 5931exports <- imports[[i]] 5932pos <- match(sym, names(exports), 0) 5933if (pos) { 5934 ## If renaming has been used things get too 5935 ## confusing so return NULL. (It is not clear if 5936 ## renaming this is still supported by the 5937 ## namespace code.) 5938 if (sym == exports[pos]) 5939 return(findHomeNS(sym, ins, cntxt)) 5940 else 5941 return(NULL) 5942} 5943@ 5944 5945Given a package package frame from the global environment the function 5946[[packFrameName]] returns the associated package name, which is 5947computed from the [[name]] attribute. 5948%% **** might be good the check the name is of the form package:foo 5949<<[[packFrameName]] function>>= 5950packFrameName <- function(frame) { 5951 fname <- attr(frame, "name") 5952 if (is.character(fname)) 5953 sub("package:", "", fname) 5954 else if (identical(frame , baseenv())) 5955 "base" 5956 else "" 5957} 5958@ %def packFrameName 5959 5960For a namespace frame the function [[nsName]] retrieves the namespace 5961name from the namespace information structure. 5962<<[[nsName]] function>>= 5963nsName <- function(ns) { 5964 if (identical(ns, .BaseNamespaceEnv)) 5965 "base" 5966 else { 5967 name <- ns$.__NAMESPACE__.$spec["name"] 5968 if (is.character(name)) 5969 as.character(name) ## strip off names 5970 else "" 5971 } 5972} 5973@ %def nsName 5974 5975 5976\section{Experimental utilities} 5977 5978This section presents two experimental utililities that, for now, are 5979not exported. The first is a simple byte code profiler. This requires 5980that the file [[eval.c]] be compiled with [[BC_PROFILING]] enabled, 5981which on [[gcc]]-compatible compilers will disable threaded code. The 5982byte code profiler uses the profile timer to record the active byte 5983code instruction at interrupt time. The function [[bcprof]] runs the 5984profiler while evaluating its argument expression and returns a 5985summary of the counts. 5986 5987<<[[bcprof]] function>>= 5988bcprof <- function(expr) { 5989 .Internal(bcprofstart()) 5990 expr 5991 .Internal(bcprofstop()) 5992 val <- structure(.Internal(bcprofcounts()), 5993 names = Opcodes.names) 5994 hits <- sort(val[val > 0], decreasing = TRUE) 5995 pct <- round(100 * hits / sum(hits), 1) 5996 data.frame(hits = hits, pct = pct) 5997} 5998@ %def bcprof 5999 6000The second utility is a simple interface to the code building 6001mechanism that may help with experimenting with code optimizations. 6002<<[[asm]] function>>= 6003asm <- function(e, gen, env = .GlobalEnv, options = NULL) { 6004 cenv <- makeCenv(env) 6005 cntxt <- make.toplevelContext(cenv, options) 6006 cntxt$env <- addCenvVars(cenv, findLocals(e, cntxt)) 6007 genCode(e, cntxt, gen = gen) 6008} 6009@ %def asm 6010 6011\section{Opcode constants} 6012\subsection{Symbolic opcode names} 6013<<opcode definitions>>= 6014BCMISMATCH.OP <- 0 6015RETURN.OP <- 1 6016GOTO.OP <- 2 6017BRIFNOT.OP <- 3 6018POP.OP <- 4 6019DUP.OP <- 5 6020PRINTVALUE.OP <- 6 6021STARTLOOPCNTXT.OP <- 7 6022ENDLOOPCNTXT.OP <- 8 6023DOLOOPNEXT.OP <- 9 6024DOLOOPBREAK.OP <- 10 6025STARTFOR.OP <- 11 6026STEPFOR.OP <- 12 6027ENDFOR.OP <- 13 6028SETLOOPVAL.OP <- 14 6029INVISIBLE.OP <- 15 6030LDCONST.OP <- 16 6031LDNULL.OP <- 17 6032LDTRUE.OP <- 18 6033LDFALSE.OP <- 19 6034GETVAR.OP <- 20 6035DDVAL.OP <- 21 6036SETVAR.OP <- 22 6037GETFUN.OP <- 23 6038GETGLOBFUN.OP <- 24 6039GETSYMFUN.OP <- 25 6040GETBUILTIN.OP <- 26 6041GETINTLBUILTIN.OP <- 27 6042CHECKFUN.OP <- 28 6043MAKEPROM.OP <- 29 6044DOMISSING.OP <- 30 6045SETTAG.OP <- 31 6046DODOTS.OP <- 32 6047PUSHARG.OP <- 33 6048PUSHCONSTARG.OP <- 34 6049PUSHNULLARG.OP <- 35 6050PUSHTRUEARG.OP <- 36 6051PUSHFALSEARG.OP <- 37 6052CALL.OP <- 38 6053CALLBUILTIN.OP <- 39 6054CALLSPECIAL.OP <- 40 6055MAKECLOSURE.OP <- 41 6056UMINUS.OP <- 42 6057UPLUS.OP <- 43 6058ADD.OP <- 44 6059SUB.OP <- 45 6060MUL.OP <- 46 6061DIV.OP <- 47 6062EXPT.OP <- 48 6063SQRT.OP <- 49 6064EXP.OP <- 50 6065EQ.OP <- 51 6066NE.OP <- 52 6067LT.OP <- 53 6068LE.OP <- 54 6069GE.OP <- 55 6070GT.OP <- 56 6071AND.OP <- 57 6072OR.OP <- 58 6073NOT.OP <- 59 6074DOTSERR.OP <- 60 6075STARTASSIGN.OP <- 61 6076ENDASSIGN.OP <- 62 6077STARTSUBSET.OP <- 63 6078DFLTSUBSET.OP <- 64 6079STARTSUBASSIGN.OP <- 65 6080DFLTSUBASSIGN.OP <- 66 6081STARTC.OP <- 67 6082DFLTC.OP <- 68 6083STARTSUBSET2.OP <- 69 6084DFLTSUBSET2.OP <- 70 6085STARTSUBASSIGN2.OP <- 71 6086DFLTSUBASSIGN2.OP <- 72 6087DOLLAR.OP <- 73 6088DOLLARGETS.OP <- 74 6089ISNULL.OP <- 75 6090ISLOGICAL.OP <- 76 6091ISINTEGER.OP <- 77 6092ISDOUBLE.OP <- 78 6093ISCOMPLEX.OP <- 79 6094ISCHARACTER.OP <- 80 6095ISSYMBOL.OP <- 81 6096ISOBJECT.OP <- 82 6097ISNUMERIC.OP <- 83 6098VECSUBSET.OP <- 84 6099MATSUBSET.OP <- 85 6100VECSUBASSIGN.OP <- 86 6101MATSUBASSIGN.OP <- 87 6102AND1ST.OP <- 88 6103AND2ND.OP <- 89 6104OR1ST.OP <- 90 6105OR2ND.OP <- 91 6106GETVAR_MISSOK.OP <- 92 6107DDVAL_MISSOK.OP <- 93 6108VISIBLE.OP <- 94 6109SETVAR2.OP <- 95 6110STARTASSIGN2.OP <- 96 6111ENDASSIGN2.OP <- 97 6112SETTER_CALL.OP <- 98 6113GETTER_CALL.OP <- 99 6114SWAP.OP <- 100 6115DUP2ND.OP <- 101 6116SWITCH.OP <- 102 6117RETURNJMP.OP <- 103 6118STARTSUBSET_N.OP <- 104 6119STARTSUBASSIGN_N.OP <- 105 6120VECSUBSET2.OP <- 106 6121MATSUBSET2.OP <- 107 6122VECSUBASSIGN2.OP <- 108 6123MATSUBASSIGN2.OP <- 109 6124STARTSUBSET2_N.OP <- 110 6125STARTSUBASSIGN2_N.OP <- 111 6126SUBSET_N.OP <- 112 6127SUBSET2_N.OP <- 113 6128SUBASSIGN_N.OP <- 114 6129SUBASSIGN2_N.OP <-115 6130LOG.OP <- 116 6131LOGBASE.OP <- 117 6132MATH1.OP <- 118 6133DOTCALL.OP <- 119 6134COLON.OP <- 120 6135SEQALONG.OP <- 121 6136SEQLEN.OP <- 122 6137BASEGUARD.OP <- 123 6138INCLNK.OP <- 124 6139DECLNK.OP <- 125 6140DECLNK_N.OP <- 126 6141INCLNKSTK.OP <- 127 6142DECLNKSTK.OP <- 128 6143@ 6144 6145\subsection{Instruction argument counts and names} 6146<<opcode argument counts>>= 6147Opcodes.argc <- list( 6148BCMISMATCH.OP = 0, 6149RETURN.OP = 0, 6150GOTO.OP = 1, 6151BRIFNOT.OP = 2, 6152POP.OP = 0, 6153DUP.OP = 0, 6154PRINTVALUE.OP = 0, 6155STARTLOOPCNTXT.OP = 2, 6156ENDLOOPCNTXT.OP = 1, 6157DOLOOPNEXT.OP = 0, 6158DOLOOPBREAK.OP = 0, 6159STARTFOR.OP = 3, 6160STEPFOR.OP = 1, 6161ENDFOR.OP = 0, 6162SETLOOPVAL.OP = 0, 6163INVISIBLE.OP = 0, 6164LDCONST.OP = 1, 6165LDNULL.OP = 0, 6166LDTRUE.OP = 0, 6167LDFALSE.OP = 0, 6168GETVAR.OP = 1, 6169DDVAL.OP = 1, 6170SETVAR.OP = 1, 6171GETFUN.OP = 1, 6172GETGLOBFUN.OP = 1, 6173GETSYMFUN.OP = 1, 6174GETBUILTIN.OP = 1, 6175GETINTLBUILTIN.OP = 1, 6176CHECKFUN.OP = 0, 6177MAKEPROM.OP = 1, 6178DOMISSING.OP = 0, 6179SETTAG.OP = 1, 6180DODOTS.OP = 0, 6181PUSHARG.OP = 0, 6182PUSHCONSTARG.OP = 1, 6183PUSHNULLARG.OP = 0, 6184PUSHTRUEARG.OP = 0, 6185PUSHFALSEARG.OP = 0, 6186CALL.OP = 1, 6187CALLBUILTIN.OP = 1, 6188CALLSPECIAL.OP = 1, 6189MAKECLOSURE.OP = 1, 6190UMINUS.OP = 1, 6191UPLUS.OP = 1, 6192ADD.OP = 1, 6193SUB.OP = 1, 6194MUL.OP = 1, 6195DIV.OP = 1, 6196EXPT.OP = 1, 6197SQRT.OP = 1, 6198EXP.OP = 1, 6199EQ.OP = 1, 6200NE.OP = 1, 6201LT.OP = 1, 6202LE.OP = 1, 6203GE.OP = 1, 6204GT.OP = 1, 6205AND.OP = 1, 6206OR.OP = 1, 6207NOT.OP = 1, 6208DOTSERR.OP = 0, 6209STARTASSIGN.OP = 1, 6210ENDASSIGN.OP = 1, 6211STARTSUBSET.OP = 2, 6212DFLTSUBSET.OP = 0, 6213STARTSUBASSIGN.OP = 2, 6214DFLTSUBASSIGN.OP = 0, 6215STARTC.OP = 2, 6216DFLTC.OP = 0, 6217STARTSUBSET2.OP = 2, 6218DFLTSUBSET2.OP = 0, 6219STARTSUBASSIGN2.OP = 2, 6220DFLTSUBASSIGN2.OP = 0, 6221DOLLAR.OP = 2, 6222DOLLARGETS.OP = 2, 6223ISNULL.OP = 0, 6224ISLOGICAL.OP = 0, 6225ISINTEGER.OP = 0, 6226ISDOUBLE.OP = 0, 6227ISCOMPLEX.OP = 0, 6228ISCHARACTER.OP = 0, 6229ISSYMBOL.OP = 0, 6230ISOBJECT.OP = 0, 6231ISNUMERIC.OP = 0, 6232VECSUBSET.OP = 1, 6233MATSUBSET.OP = 1, 6234VECSUBASSIGN.OP = 1, 6235MATSUBASSIGN.OP = 1, 6236AND1ST.OP = 2, 6237AND2ND.OP = 1, 6238OR1ST.OP = 2, 6239OR2ND.OP = 1, 6240GETVAR_MISSOK.OP = 1, 6241DDVAL_MISSOK.OP = 1, 6242VISIBLE.OP = 0, 6243SETVAR2.OP = 1, 6244STARTASSIGN2.OP = 1, 6245ENDASSIGN2.OP = 1, 6246SETTER_CALL.OP = 2, 6247GETTER_CALL.OP = 1, 6248SWAP.OP = 0, 6249DUP2ND.OP = 0, 6250SWITCH.OP = 4, 6251RETURNJMP.OP = 0, 6252STARTSUBSET_N.OP = 2, 6253STARTSUBASSIGN_N.OP = 2, 6254VECSUBSET2.OP = 1, 6255MATSUBSET2.OP = 1, 6256VECSUBASSIGN2.OP = 1, 6257MATSUBASSIGN2.OP = 1, 6258STARTSUBSET2_N.OP = 2, 6259STARTSUBASSIGN2_N.OP = 2, 6260SUBSET_N.OP = 2, 6261SUBSET2_N.OP = 2, 6262SUBASSIGN_N.OP = 2, 6263SUBASSIGN2_N.OP = 2, 6264LOG.OP = 1, 6265LOGBASE.OP = 1, 6266MATH1.OP = 2, 6267DOTCALL.OP = 2, 6268COLON.OP = 1, 6269SEQALONG.OP = 1, 6270SEQLEN.OP = 1, 6271BASEGUARD.OP = 2, 6272INCLNK.OP = 0, 6273DECLNK.OP = 0, 6274DECLNK_N.OP = 1, 6275INCLNKSTK.OP = 0, 6276DECLNKSTK.OP = 0 6277) 6278@ 6279 6280<<opcode names>>= 6281Opcodes.names <- names(Opcodes.argc) 6282@ %def Opcodes.names 6283 6284 6285\section{Implementation file} 6286%% Benchmark code: 6287% sf <- function(x) { 6288% s <- 0 6289% for (y in x) 6290% s <- s + y 6291% s 6292% } 6293 6294% sfc <- cmpfun(sf) 6295 6296% x <- 1 : 1000000 6297 6298% system.time(sf(x)) 6299% system.time(sfc(x)) 6300 6301 6302%% **** need header/copyright/license stuff here 6303<<cmp.R>>= 6304# Automatically generated from ../noweb/compiler.nw. 6305# 6306# File src/library/compiler/R/cmp.R 6307# Part of the R package, https://www.R-project.org 6308# Copyright (C) 2001-2014 Luke Tierney 6309# 6310# This program is free software; you can redistribute it and/or modify 6311# it under the terms of the GNU General Public License as published by 6312# the Free Software Foundation; either version 2 of the License, or 6313# (at your option) any later version. 6314# 6315# This program is distributed in the hope that it will be useful, 6316# but WITHOUT ANY WARRANTY; without even the implied warranty of 6317# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 6318# GNU General Public License for more details. 6319# 6320# A copy of the GNU General Public License is available at 6321# https://www.R-project.org/Licenses/ 6322 6323## 6324## Compiler options 6325## 6326 6327<<compiler options data base>> 6328 6329<<[[getCompilerOption]] function>> 6330 6331 6332## 6333## General Utilities 6334## 6335 6336<<[[pasteExpr]] function>> 6337 6338<<[[dots.or.missing]] function>> 6339 6340<<[[any.dots]] function>> 6341 6342<<[[is.ddsym]] function>> 6343 6344 6345<<[[missingArgs]] function>> 6346 6347 6348## 6349## Environment utilities 6350## 6351 6352<<[[frameTypes]] function>> 6353 6354<<[[findHomeNS]] function>> 6355 6356<<[[packFrameName]] function>> 6357 6358<<[[nsName]] function>> 6359 6360 6361## 6362## Finding possible local variables 6363## 6364 6365<<[[getAssignedVar]] function>> 6366 6367<<[[findLocals1]] function>> 6368 6369<<[[findLocalsList1]] function>> 6370 6371<<[[findLocals]] function>> 6372 6373<<[[findLocalsList]] function>> 6374 6375 6376## 6377## Compilation environment implementation 6378## 6379 6380<<[[makeCenv]] function>> 6381 6382<<[[addCenvVars]] function>> 6383 6384<<[[addCenvFrame]] function>> 6385 6386<<[[findCenvVar]] function>> 6387 6388<<[[isBaseVar]] function>> 6389 6390<<[[funEnv]] function>> 6391 6392<<[[findLocVar]] function>> 6393 6394<<[[findFunDef]] function>> 6395 6396<<[[findVar]] function>> 6397 6398 6399## 6400## Constant folding 6401## 6402 6403<<[[maxConstSize]] and [[constModes]] definitions>> 6404 6405<<[[constNames]] definition>> 6406 6407<<[[checkConst]] function>> 6408 6409<<[[constantFoldSym]] function>> 6410 6411<<[[getFoldFun]] function>> 6412 6413<<[[constantFoldCall]] function>> 6414 6415<<[[constantFold]] function>> 6416 6417<<[[foldFuns]] definition>> 6418 6419<<[[languageFuns]] definition>> 6420 6421 6422## 6423## Opcode constants 6424## 6425 6426<<opcode argument counts>> 6427 6428<<opcode names>> 6429 6430<<opcode definitions>> 6431 6432 6433## 6434## Code buffer implementation 6435## 6436 6437<<source location tracking functions>> 6438 6439<<[[make.codeBuf]] function>> 6440 6441<<[[codeBufCode]] function>> 6442 6443<<[[genCode]] function>> 6444 6445 6446## 6447## Compiler contexts 6448## 6449 6450<<[[make.toplevelContext]] function>> 6451 6452<<[[make.callContext]] function>> 6453 6454<<[[make.promiseContext]] function>> 6455 6456<<[[make.functionContext]] function>> 6457 6458<<[[make.nonTailCallContext]] function>> 6459 6460<<[[make.argContext]] function>> 6461 6462<<[[make.noValueContext]] function>> 6463 6464<<[[make.loopContext]] function>> 6465 6466 6467## 6468## Compiler top level 6469## 6470 6471<<[[cmp]] function>> 6472 6473<<[[cmpConst]] function>> 6474 6475<<[[cmpSym]] function>> 6476 6477<<[[cmpCall]] function>> 6478 6479<<[[cmpCallSymFun]] function>> 6480 6481<<[[cmpCallExprFun]] function>> 6482 6483<<[[cmpCallArgs]] function>> 6484 6485<<[[cmpConstArg]]>> 6486 6487<<[[checkCall]] function>> 6488 6489## **** need to handle ... and ..n arguments specially 6490## **** separate call opcode for calls with named args? 6491## **** for (a in e[[-1]]) ... goes into infinite loop 6492 6493<<[[cmpTag]] function>> 6494 6495<<[[mayCallBrowser]] function>> 6496 6497<<[[mayCallBrowserList]] function>> 6498 6499## 6500## Inlining mechanism 6501## 6502 6503<<inline handler implementation>> 6504 6505## tryInline implements the rule permitting inlining as they stand now: 6506## Inlining is controlled by the optimize compiler option, with possible 6507## values 0, 1, 2, 3. 6508 6509<<[[getInlineInfo]] function>> 6510 6511<<[[tryInline]] function>> 6512 6513 6514## 6515## Inline handlers for some SPECIAL functions 6516## 6517 6518<<inlining handler for [[function]]>> 6519 6520<<inlining handler for left brace function>> 6521 6522<<inlining handler for [[if]]>> 6523 6524<<inlining handler for [[&&]]>> 6525 6526<<inlining handler for [[||]]>> 6527 6528 6529## 6530## Inline handlers for assignment expressions 6531## 6532 6533<<setter inlining mechanism>> 6534 6535<<getter inlining mechanism>> 6536 6537<<[[cmpAssign]] function>> 6538 6539<<[[flattenPlace]] function>> 6540 6541<<[[cmpGetterCall]] function>> 6542 6543<<[[checkAssign]] function>> 6544 6545<<[[cmpSymbolAssign]] function>> 6546 6547<<[[cmpComplexAssign]] function>> 6548 6549<<[[cmpSetterCall]] function>> 6550 6551<<[[getAssignFun]] function>> 6552 6553<<[[cmpSetterDispatch]] function>> 6554 6555<<inlining handlers for [[<-]], [[=]], and [[<<-]]>> 6556 6557<<setter inline handler for [[$<-]]>> 6558 6559<<setter inline handlers for [[ [<- ]] and [[ [[<- ]]>> 6560 6561<<[[cmpGetterDispatch]] function>> 6562 6563<<getter inline handler for [[$]]>> 6564 6565<<getter inline handlers for [[[]] and [[[[]]>> 6566 6567 6568## 6569## Inline handlers for loops 6570## 6571 6572<<inlining handlers for [[next]] and [[break]]>> 6573 6574<<[[isLoopStopFun]] function>> 6575 6576<<[[isLoopTopFun]] function>> 6577 6578<<[[checkSkipLoopCntxtList]] function>> 6579 6580<<[[checkSkipLoopCntxt]] function>> 6581 6582<<inlining handler for [[repeat]] loops>> 6583 6584<<[[cmpRepeatBody]] function>> 6585 6586<<inlining handler for [[while]] loops>> 6587 6588<<[[cmpWhileBody]] function>> 6589 6590<<inlining handler for [[for]] loops>> 6591 6592<<[[cmpForBody]] function>> 6593 6594 6595## 6596## Inline handlers for one and two argument primitives 6597## 6598 6599<<[[cmpPrim1]] function>> 6600 6601<<[[checkNeedsInc]] function>> 6602 6603<<[[cmpPrim2]] function>> 6604 6605<<inline handlers for [[+]] and [[-]]>> 6606 6607<<inline handlers for [[*]] and [[/]]>> 6608 6609<<inline handlers for [[^]], [[exp]], and [[sqrt]]>> 6610 6611<<inline handler for [[log]]>> 6612 6613<<list of one argument math functions>> 6614 6615<<[[cmpMath1]] function>> 6616 6617<<inline one argument math functions>> 6618 6619<<inline handlers for comparison operators>> 6620 6621<<inline handlers for [[&]] and [[|]]>> 6622 6623<<inline handler for [[!]]>> 6624 6625 6626## 6627## Inline handlers for the left parenthesis function 6628## 6629 6630<<inlining handler for [[(]]>> 6631 6632 6633## 6634## Inline handlers for general BUILTIN and SPECIAL functions 6635## 6636 6637<<[[cmpBuiltin]] function>> 6638 6639<<[[cmpBuiltinArgs]] function>> 6640 6641<<[[cmpSpecial]] function>> 6642 6643<<inlining handler for [[.Internal]]>> 6644 6645 6646## 6647## Inline handlers for subsetting and related operators 6648## 6649 6650<<[[cmpDispatch]] function>> 6651 6652<<inlining handlers for some dispatching SPECIAL functions>> 6653 6654<<inlining handler for [[$]]>> 6655 6656 6657## 6658## Inline handler for local() and return() functions 6659## 6660 6661<<inlining handler for [[local]] function>> 6662 6663<<inlining handler for [[return]] function>> 6664 6665 6666## 6667## Inline handlers for the family of is.xyz primitives 6668## 6669 6670<<[[cmpIs]] function>> 6671 6672<<inlining handlers for [[is.xyz]] functions>> 6673 6674 6675## 6676## Default inline handlers for BUILTIN and SPECIAL functions 6677## 6678 6679<<install default inlining handlers>> 6680 6681 6682## 6683## Inline handlers for some .Internal functions 6684## 6685 6686<<[[simpleFormals]] function>> 6687 6688<<[[simpleArgs]] function>> 6689 6690<<[[is.simpleInternal]] function>> 6691 6692<<[[inlineSimpleInternalCall]] function>> 6693 6694<<[[cmpSimpleInternal]] function>> 6695 6696<<inline safe simple [[.Internal]] functions from [[base]]>> 6697 6698<<inline safe simple [[.Internal]] functions from [[stats]]>> 6699 6700 6701## 6702## Inline handler for switch 6703## 6704 6705<<[[findActionIndex]] function>> 6706 6707<<inline handler for [[switch]]>> 6708 6709 6710## 6711## Inline handler for .Call 6712## 6713 6714<<inline handler for [[.Call]]>> 6715 6716 6717## 6718## Inline handlers for generating integer sequences 6719## 6720 6721<<inline handlers for integer sequences>> 6722 6723 6724## 6725## Inline handlers to control warnings 6726## 6727 6728<<[[cmpMultiColon]] function>> 6729 6730<<inlining handlers for [[::]] and [[:::]]>> 6731 6732<<setter inlining handler for [[@<-]]>> 6733 6734<<inlining handler for [[with]]>> 6735 6736<<inlining handler for [[require]]>> 6737 6738 6739## 6740## Compiler warnings 6741## 6742 6743<<[[suppressAll]] function>> 6744 6745<<[[suppressNoSuperAssignVar]] function>> 6746 6747<<[[suppressUndef]] function>> 6748 6749<<[[notifyLocalFun]] function>> 6750 6751<<[[notifyUndefFun]] function>> 6752 6753<<[[notifyUndefVar]] function>> 6754 6755<<[[notifyNoSuperAssignVar]] function>> 6756 6757<<[[notifyWrongArgCount]] function>> 6758 6759<<[[notifyWrongDotsUse]] function>> 6760 6761<<[[notifyWrongBreakNext]] function>> 6762 6763<<[[notifyBadCall]] function>> 6764 6765<<[[notifyBadAssignFun]] function>> 6766 6767<<[[notifyMultipleSwitchDefaults]] function>> 6768 6769<<[[notifyNoSwitchcases]] function>> 6770 6771<<[[notifyAssignSyntacticFun]] function>> 6772 6773<<[[notifyCompilerError]] function>> 6774 6775 6776## 6777## Compiler interface 6778## 6779 6780<<[[compile]] function>> 6781 6782<<[[cmpfun]] function>> 6783 6784<<[[tryCmpfun]] function>> 6785 6786<<[[tryCompile]] function>> 6787 6788<<[[cmpframe]] function>> 6789 6790<<[[cmplib]] function>> 6791 6792<<[[cmpfile]] function>> 6793 6794<<[[loadcmp]] function>> 6795 6796<<[[enableJIT]] function>> 6797 6798<<[[compilePKGS]] function>> 6799 6800<<[[setCompilerOptions]] function>> 6801 6802<<[[.onLoad]] function>> 6803 6804<<[[checkCompilerOptions]] function>> 6805 6806 6807## 6808## Disassembler 6809## 6810 6811<<[[bcDecode]] function>> 6812 6813<<[[disassemble]] function>> 6814 6815 6816## 6817## Experimental Utilities 6818## 6819 6820<<[[bcprof]] function>> 6821 6822<<[[asm]] function>> 6823 6824 6825## 6826## Improved subset and subassign handling 6827## 6828 6829<<[[cmpIndices]] function>> 6830 6831<<[[cmpSubsetDispatch]] function>> 6832 6833<<inline handlers for subsetting>> 6834 6835<<[[cmpSubassignDispatch]] function>> 6836 6837<<inline handlers for subassignment>> 6838 6839<<[[cmpSubsetGetterDispatch]] function>> 6840 6841<<inline handlers for subset getters>> 6842@ 6843\end{document} 6844