1\input texinfo 2@c %**start of header 3@setfilename R-ints.info 4@settitle R Internals 5@setchapternewpage on 6@c %**end of header 7 8@c @documentencoding ISO-8859-1 9 10@syncodeindex fn vr 11 12@dircategory Programming 13@direntry 14* R Internals: (R-ints). R Internals. 15@end direntry 16 17@finalout 18 19@include R-defs.texi 20@include version.texi 21 22@copying 23This manual is for R, version @value{VERSION}. 24 25@Rcopyright{1999} 26 27@quotation 28@permission{} 29@end quotation 30@end copying 31 32@titlepage 33@title R Internals 34@subtitle Version @value{VERSION} 35@author R Core Team 36@page 37@vskip 0pt plus 1filll 38@insertcopying 39@end titlepage 40 41@ifplaintext 42@insertcopying 43@end ifplaintext 44 45@c @ifnothtml 46@contents 47@c @end ifnothtml 48 49@ifnottex 50@node Top, R Internal Structures, (dir), (dir) 51@top R Internals 52 53This is a guide to the internal structures of @R{} and coding standards for 54the core team working on @R{} itself. 55 56@insertcopying 57 58@end ifnottex 59 60@menu 61* R Internal Structures:: 62* .Internal vs .Primitive:: 63* Internationalization in the R sources:: 64* Package Structure:: 65* Files:: 66* Graphics Devices:: 67* GUI consoles:: 68* Tools:: 69* R coding standards:: 70* Testing R code:: 71* Use of TeX dialects:: 72* Current and future directions:: 73* Function and variable index:: 74* Concept index:: 75@end menu 76@c Could have (autogenerated) @detailmenu here .. 77 78@node R Internal Structures, .Internal vs .Primitive, Top, Top 79@chapter R Internal Structures 80 81This chapter is the beginnings of documentation about @R{} internal 82structures. It is written for the core team and others studying the 83code in the @file{src/main} directory. 84 85It is a work-in-progress and should be checked against the current 86version of the source code. Versions for @R{} 2.x.y contain historical 87comments about when features were introduced: this version is for the 883.x.y series. 89 90@menu 91* SEXPs:: 92* Environments and variable lookup:: 93* Attributes:: 94* Contexts:: 95* Argument evaluation:: 96* Autoprinting:: 97* The write barrier:: 98* Serialization Formats:: 99* Encodings for CHARSXPs:: 100* The CHARSXP cache:: 101* Warnings and errors:: 102* S4 objects:: 103* Memory allocators:: 104* Internal use of global and base environments:: 105* Modules:: 106* Visibility:: 107* Lazy loading:: 108@end menu 109 110@node SEXPs, Environments and variable lookup, R Internal Structures, R Internal Structures 111@section SEXPs 112 113@cindex SEXP 114@cindex SEXPRREC 115What @R{} users think of as @emph{variables} or @emph{objects} are 116symbols which are bound to a value. The value can be thought of as 117either a @code{SEXP} (a pointer), or the structure it points to, a 118@code{SEXPREC} (and there are alternative forms used for vectors, namely 119@code{VECSXP} pointing to @code{VECTOR_SEXPREC} structures). 120So the basic building blocks of @R{} objects are often called 121@emph{nodes}, meaning @code{SEXPREC}s or @code{VECTOR_SEXPREC}s. 122 123Note that the internal structure of the @code{SEXPREC} is not made 124available to @R{} Extensions: rather @code{SEXP} is an opaque pointer, 125and the internals can only be accessed by the functions provided. 126 127@cindex node 128Both types of node structure have as their first three fields a 64-bit 129@code{sxpinfo} header and then three pointers (to the attributes and the 130previous and next node in a doubly-linked list), and then some further 131fields. On a 32-bit platform a node@footnote{strictly, a @code{SEXPREC} 132node; @code{VECTOR_SEXPREC} nodes are slightly smaller but followed by 133data in the node.} occupies 32 bytes: on a 64-bit platform typically 56 134bytes (depending on alignment constraints). 135 136The first five bits of the @code{sxpinfo} header specify one of up to 32 137@code{SEXPTYPE}s. 138 139@menu 140* SEXPTYPEs:: 141* Rest of header:: 142* The 'data':: 143* Allocation classes:: 144@end menu 145 146@node SEXPTYPEs, Rest of header, SEXPs, SEXPs 147@subsection SEXPTYPEs 148 149@cindex SEXPTYPE 150Currently @code{SEXPTYPE}s 0:10 and 13:25 are in use. Values 11 and 12 151were used for internal factors and ordered factors and have since been 152withdrawn. Note that the @code{SEXPTYPE} numbers are stored in 153@code{save}d objects and that the ordering of the types is used, so the 154gap cannot easily be reused. 155 156@cindex SEXPTYPE table 157@quotation 158@multitable {no} {SPECIALSXPXXX} {S4 classes not of simple type} 159@headitem no @tab SEXPTYPE@tab Description 160@item @code{0} @tab @code{NILSXP} @tab @code{NULL} 161@item @code{1} @tab @code{SYMSXP} @tab symbols 162@item @code{2} @tab @code{LISTSXP} @tab pairlists 163@item @code{3} @tab @code{CLOSXP} @tab closures 164@item @code{4} @tab @code{ENVSXP} @tab environments 165@item @code{5} @tab @code{PROMSXP} @tab promises 166@item @code{6} @tab @code{LANGSXP} @tab language objects 167@item @code{7} @tab @code{SPECIALSXP} @tab special functions 168@item @code{8} @tab @code{BUILTINSXP} @tab builtin functions 169@item @code{9} @tab @code{CHARSXP} @tab internal character strings 170@item @code{10} @tab @code{LGLSXP} @tab logical vectors 171@item @code{13} @tab @code{INTSXP} @tab integer vectors 172@item @code{14} @tab @code{REALSXP} @tab numeric vectors 173@item @code{15} @tab @code{CPLXSXP} @tab complex vectors 174@item @code{16} @tab @code{STRSXP} @tab character vectors 175@item @code{17} @tab @code{DOTSXP} @tab dot-dot-dot object 176@item @code{18} @tab @code{ANYSXP} @tab make ``any'' args work 177@item @code{19} @tab @code{VECSXP} @tab list (generic vector) 178@item @code{20} @tab @code{EXPRSXP} @tab expression vector 179@item @code{21} @tab @code{BCODESXP} @tab byte code 180@item @code{22} @tab @code{EXTPTRSXP} @tab external pointer 181@item @code{23} @tab @code{WEAKREFSXP} @tab weak reference 182@item @code{24} @tab @code{RAWSXP} @tab raw vector 183@item @code{25} @tab @code{S4SXP} @tab S4 classes not of simple type 184@end multitable 185@end quotation 186 187@cindex atomic vector type 188Many of these will be familiar from @R{} level: the atomic vector types 189are @code{LGLSXP}, @code{INTSXP}, @code{REALSXP}, @code{CPLXSP}, 190@code{STRSXP} and @code{RAWSXP}. Lists are @code{VECSXP} and names 191(also known as symbols) are @code{SYMSXP}. Pairlists (@code{LISTSXP}, 192the name going back to the origins of @R{} as a Scheme-like language) 193are rarely seen at @R{} level, but are for example used for argument 194lists. Character vectors are effectively lists all of whose elements 195are @code{CHARSXP}, a type that is rarely visible at @R{} level. 196 197@cindex language object 198@cindex argument list 199Language objects (@code{LANGSXP}) are calls (including formulae and so 200on). Internally they are pairlists with first element a 201reference@footnote{a pointer to a function or a symbol to look up the 202function by name, or a language object to be evaluated to give a 203function.} to the function to be called with remaining elements the 204actual arguments for the call (and with the tags if present giving the 205specified argument names). Although this is not enforced, many places 206in the code assume that the pairlist is of length one or more, often 207without checking. 208 209@cindex expression 210Expressions are of type @code{EXPRSXP}: they are a vector of (usually 211language) objects most often seen as the result of @code{parse()}. 212 213@cindex function 214The functions are of types @code{CLOSXP}, @code{SPECIALSXP} and 215@code{BUILTINSXP}: where @code{SEXPTYPE}s are stored in an integer 216these are sometimes lumped into a pseudo-type @code{FUNSXP} with code 21799. Functions defined via @code{function} are of type @code{CLOSXP} and 218have formals, body and environment. 219 220@cindex S4 type 221The @code{SEXPTYPE} @code{S4SXP} is for S4 objects which do not consist 222solely of a simple type such as an atomic vector or function. 223 224 225@node Rest of header, The 'data', SEXPTYPEs, SEXPs 226@subsection Rest of header 227 228Note that the size and structure of the header changed in @R{} 3.5.0: 229see earlier editions of this manual for the previous layout. 230 231The @code{sxpinfo} header is defined as a 64-bit C structure by 232 233@example 234#define NAMED_BITS 16 235struct sxpinfo_struct @{ 236 SEXPTYPE type : 5; /* @r{discussed above} */ 237 unsigned int scalar: 1; /* @r{is this a numeric vector of length 1?} 238 unsigned int obj : 1; /* @r{is this an object with a class attribute?} */ 239 unsigned int alt : 1; /* @r{is this an @code{ALTREP} object?} */ 240 unsigned int gp : 16; /* @r{general purpose, see below} */ 241 unsigned int mark : 1; /* @r{mark object as `in use' in GC} */ 242 unsigned int debug : 1; 243 unsigned int trace : 1; 244 unsigned int spare : 1; /* @r{debug once and with reference counting} */ 245 unsigned int gcgen : 1; /* @r{generation for GC} */ 246 unsigned int gccls : 3; /* @r{class of node for GC} */ 247 unsigned int named : NAMED_BITS; /* @r{used to control copying} */ 248 unsigned int extra : 32 - NAMED_BITS; 249@}; /* Tot: 64 */ 250@end example 251 252@findex debug bit 253The @code{debug} bit is used for closures and environments. For 254closures it is set by @code{debug()} and unset by @code{undebug()}, and 255indicates that evaluations of the function should be run under the 256browser. For environments it indicates whether the browsing is in 257single-step mode. 258 259@findex trace bit 260The @code{trace} bit is used for functions for @code{trace()} and for 261other objects when tracing duplications (see @code{tracemem}). 262 263@findex spare bit 264The @code{spare} bit is used for closures to mark them for one-time 265debugging. 266 267@findex named bits 268@findex NAMED 269@findex SET_NAMED 270@cindex copying semantics 271The @code{named} field is set and accessed by the @code{SET_NAMED} and 272@code{NAMED} macros, and take values @code{0}, @code{1} and @code{2}, or 273possibly higher if @code{NAMEDMAX} is set to a higher value. 274@R{} has a `call by value' illusion, so an assignment like 275@example 276b <- a 277@end example 278[The @code{NAMED} mechanism has been replaced by reference counting.] 279 280@noindent 281appears to make a copy of @code{a} and refer to it as @code{b}. 282However, if neither @code{a} nor @code{b} are subsequently altered there 283is no need to copy. What really happens is that a new symbol @code{b} 284is bound to the same value as @code{a} and the @code{named} field on the 285value object is set (in this case to @code{2}). When an object is about 286to be altered, the @code{named} field is consulted. A value of @code{2} 287or more means that the object must be duplicated before being changed. (Note 288that this does not say that it is necessary to duplicate, only that it 289should be duplicated whether necessary or not.) A value of @code{0} 290means that it is known that no other @code{SEXP} shares data with this 291object, and so it may safely be altered. A value of @code{1} is used 292for situations like 293 294@example 295dim(a) <- c(7, 2) 296@end example 297 298@noindent 299where in principle two copies of @code{a} exist for the duration of the 300computation as (in principle) 301 302@example 303a <- `dim<-`(a, c(7, 2)) 304@end example 305 306@noindent 307but for no longer, and so some primitive functions can be optimized to 308avoid a copy in this case. [This mechanism is scheduled to be replaced 309in @R{} 4.0.0.] 310 311The @code{gp} bits are by definition `general purpose'. We label these 312from 0 to 15. Bits 0--5 and bits 14--15 have been used as described below 313(mainly from detective work on the sources). 314 315@findex gp bits 316@findex LEVELS 317@findex SETLEVELS 318The bits can be accessed and set by the @code{LEVELS} and 319@code{SETLEVELS} macros, which names appear to date back to the internal 320factor and ordered types and are now used in only a few places in the 321code. The @code{gp} field is serialized/unserialized for the 322@code{SEXPTYPE}s other than @code{NILSXP}, @code{SYMSXP} and 323@code{ENVSXP}. 324 325Bits 14 and 15 of @code{gp} are used for `fancy bindings'. Bit 14 is 326used to lock a binding or an environment, and bit 15 is used to indicate 327an active binding. (For the definition of an `active binding' see the 328header comments in file @file{src/main/envir.c}.) Bit 15 is used for an 329environment to indicate if it participates in the global cache. 330 331@findex ARGSUSED 332@findex SET_ARGUSED 333The macros @code{ARGUSED} and @code{SET_ARGUSED} are used when matching 334actual and formal function arguments, and take the values 0, 1 and 2. 335 336@findex MISSING 337@findex SET_MISSING 338The macros @code{MISSING} and @code{SET_MISSING} are used for pairlists 339of arguments. Four bits are reserved, but only two are used (and 340exactly what for is not explained). It seems that bit 0 is used by 341@code{matchArgs_NR} to mark missingness on the returned argument list, and 342bit 1 is used to mark the use of a default value for an argument copied 343to the evaluation frame of a closure. 344 345@findex DDVAL 346@findex SET_DDVAL 347@cindex ... argument 348Bit 0 is used by macros @code{DDVAL} and @code{SET_DDVAL}. This 349indicates that a @code{SYMSXP} is one of the symbols @code{..n} which 350are implicitly created when @code{...} is processed, and so indicates 351that it may need to be looked up in a @code{DOTSXP}. 352 353@findex PRSEEN 354@cindex promise 355Bit 0 is used for @code{PRSEEN}, a flag to indicate if a promise has 356already been seen during the evaluation of the promise (and so to avoid 357recursive loops). 358 359Bit 0 is used for @code{HASHASH}, on the @code{PRINTNAME} of the 360@code{TAG} of the frame of an environment. (This bit is not serialized 361for @code{CHARSXP} objects.) 362 363Bits 0 and 1 are used for weak references (to indicate `ready to 364finalize', `finalize on exit'). 365 366Bit 0 is used by the condition handling system (on a @code{VECSXP}) to 367indicate a calling handler. 368 369Bit 4 is turned on to mark S4 objects. 370 371Bits 1, 2, 3, 5 and 6 are used for a @code{CHARSXP} to denote its 372encoding. Bit 1 indicates that the @code{CHARSXP} should be treated as 373a set of bytes, not necessarily representing a character in any known 374encoding. Bits 2, 3 and 6 are used to indicate that it is known to be 375in Latin-1, UTF-8 or @acronym{ASCII} respectively. 376 377Bit 5 for a @code{CHARSXP} indicates that it is hashed by its address, 378that is @code{NA_STRING} or is in the @code{CHARSXP} cache (this is not 379serialized). Only exceptionally is a @code{CHARSXP} not hashed, and 380this should never happen in end-user code. 381 382@node The 'data', Allocation classes, Rest of header, SEXPs 383@subsection The `data' 384 385A @code{SEXPREC} is a C structure containing the 64-bit header as 386described above, three pointers (to the attributes, previous and next 387node) and the node data, a union 388 389@example 390union @{ 391 struct primsxp_struct primsxp; 392 struct symsxp_struct symsxp; 393 struct listsxp_struct listsxp; 394 struct envsxp_struct envsxp; 395 struct closxp_struct closxp; 396 struct promsxp_struct promsxp; 397@} u; 398@end example 399 400@noindent 401All of these alternatives apart from the first (an @code{int}) are three 402pointers, so the union occupies three words. 403 404@cindex vector type 405The vector types are @code{RAWSXP}, @code{CHARSXP}, @code{LGLSXP}, 406@code{INTSXP}, @code{REALSXP}, @code{CPLXSXP}, @code{STRSXP}, 407@code{VECSXP}, @code{EXPRSXP} and @code{WEAKREFSXP}. Remember that such 408types are a @code{VECTOR_SEXPREC}, which again consists of the header 409and the same three pointers, but followed by two integers giving the 410length and `true length'@footnote{The only current use is for hash tables of 411environments (@code{VECSXP}s), where @code{length} is the size of the table 412and @code{truelength} is the number of primary slots in use, for the 413reference hash tables in serialization (@code{VECSXP}s), and for `growable' 414vectors (atomic vectors, @code{VECSXP}s and @code{EXPRSXP}s) which are 415created by slightly over-committing when enlarging a vector during 416subassignment, so that some number of the following enlargements during 417subassignment can be performed in place), where @code{truelength} is the 418number of slots in use. } of the vector, and then followed by the data 419(aligned as required: on most 32-bit systems with a 24-byte 420@code{VECTOR_SEXPREC} node the data can follow immediately after the node). 421The data are a block of memory of the appropriate length to store `true 422length' elements (rounded up to a multiple of 8 bytes, with the 8-byte 423blocks being the `Vcells' referred in the documentation for @code{gc()}). 424 425The `data' for the various types are given in the table below. A lot of 426this is interpretation, i.e.@: the types are not checked. 427 428@table @code 429@item NILSXP 430There is only one object of type @code{NILSXP}, @code{R_NilValue}, with 431no data. 432 433@item SYMSXP 434Pointers to three nodes, the name, value and internal, accessed by 435@code{PRINTNAME} (a @code{CHARSXP}), @code{SYMVALUE} and 436@code{INTERNAL}. (If the symbol's value is a @code{.Internal} function, 437the last is a pointer to the appropriate @code{SEXPREC}.) Many symbols 438have @code{SYMVALUE} @code{R_UnboundValue}. 439 440@item LISTSXP 441Pointers to the CAR, CDR (usually a @code{LISTSXP} or @code{NULL}) and 442TAG (a @code{SYMSXP} or @code{NULL}). 443 444@item CLOSXP 445Pointers to the formals (a pairlist), the body and the environment. 446 447@item ENVSXP 448Pointers to the frame, enclosing environment and hash table (@code{NULL} or a 449@code{VECSXP}). A frame is a tagged pairlist with tag the symbol and 450CAR the bound value. 451 452@item PROMSXP 453Pointers to the value, expression and environment (in which to evaluate 454the expression). Once an promise has been evaluated, the environment is 455set to @code{NULL}. 456 457@item LANGSXP 458A special type of @code{LISTSXP} used for function calls. (The CAR 459references the function (perhaps via a symbol or language object), and 460the CDR the argument list with tags for named arguments.) @R{}-level 461documentation references to `expressions' / `language objects' are 462mainly @code{LANGSXP}s, but can be symbols (@code{SYMSXP}s) or 463expression vectors (@code{EXPRSXP}s). 464 465@item SPECIALSXP 466@itemx BUILTINSXP 467An integer giving the offset into the table of 468primitives/@code{.Internal}s. 469 470@item CHARSXP 471@code{length}, @code{truelength} followed by a block of bytes (allowing 472for the @code{nul} terminator). 473 474@item LGLSXP 475@itemx INTSXP 476@code{length}, @code{truelength} followed by a block of C @code{int}s 477(which are 32 bits on all @R{} platforms). 478 479@item REALSXP 480@code{length}, @code{truelength} followed by a block of C @code{double}s. 481 482@item CPLXSXP 483@code{length}, @code{truelength} followed by a block of C99 @code{double 484complex}s. 485 486@item STRSXP 487@code{length}, @code{truelength} followed by a block of pointers 488(@code{SEXP}s pointing to @code{CHARSXP}s). 489 490@item DOTSXP 491A special type of @code{LISTSXP} for the value bound to a @code{...} 492symbol: a pairlist of promises. 493 494@item ANYSXP 495This is used as a place holder for any type: there are no actual objects 496of this type. 497 498@item VECSXP 499@itemx EXPRSXP 500@code{length}, @code{truelength} followed by a block of pointers. These 501are internally identical (and identical to @code{STRSXP}) but differ in 502the interpretations placed on the elements. 503 504@item BCODESXP 505For the `byte-code' objects generated by the compiler. 506 507@item EXTPTRSXP 508Has three pointers, to the pointer, the protection value (an @R{} object 509which if alive protects this object) and a tag (a @code{SYMSXP}?). 510 511@item WEAKREFSXP 512A @code{WEAKREFSXP} is a special @code{VECSXP} of length 4, with 513elements @samp{key}, @samp{value}, @samp{finalizer} and @samp{next}. 514The @samp{key} is @code{NULL}, an environment or an external pointer, 515and the @samp{finalizer} is a function or @code{NULL}. 516 517@item RAWSXP 518@code{length}, @code{truelength} followed by a block of bytes. 519 520@item S4SXP 521two unused pointers and a tag. 522@end table 523 524@node Allocation classes, , The 'data', SEXPs 525@subsection Allocation classes 526 527@cindex allocation classes 528As we have seen, the field @code{gccls} in the header is three bits to 529label up to 8 classes of nodes. Non-vector nodes are of class 0, and 530`small' vector nodes are of classes 1 to 5, with a class for custom 531allocator vector nodes 6 and `large' vector nodes being of class 7. The 532`small' vector nodes are able to store vector data of up to 8, 16, 32, 53364 and 128 bytes: larger vectors are @code{malloc}-ed individually 534whereas the `small' nodes are allocated from pages of about 2000 535bytes. Vector nodes allocated using custom allocators (via 536@code{allocVector3}) are not counted in the gc memory usage statistics 537since their memory semantics is not under R's control and may be 538non-standard (e.g., memory could be partially shared across nodes). 539 540 541@node Environments and variable lookup, Attributes, SEXPs, R Internal Structures 542@section Environments and variable lookup 543 544@cindex environment 545@cindex variable lookup 546What users think of as `variables' are symbols which are bound to 547objects in `environments'. The word `environment' is used ambiguously 548in @R{} to mean @emph{either} the frame of an @code{ENVSXP} (a pairlist 549of symbol-value pairs) @emph{or} an @code{ENVSXP}, a frame plus an 550enclosure. 551 552@cindex user databases 553There are additional places that `variables' can be looked up, called 554`user databases' in comments in the code. These seem undocumented in 555the @R{} sources, but apparently refer to the @pkg{RObjectTable} package 556at @uref{http://www.omegahat.net/RObjectTables/}. 557 558@cindex base environment 559@cindex environment, base 560The base environment is special. There is an @code{ENVSXP} environment 561with enclosure the empty environment @code{R_EmptyEnv}, but the frame of 562that environment is not used. Rather its bindings are part of the 563global symbol table, being those symbols in the global symbol table 564whose values are not @code{R_UnboundValue}. When @R{} is started the 565internal functions are installed (by C code) in the symbol table, with 566primitive functions having values and @code{.Internal} functions having 567what would be their values in the field accessed by the @code{INTERNAL} 568macro. Then @code{.Platform} and @code{.Machine} are computed and the 569base package is loaded into the base environment followed by the system 570profile. 571 572The frames of environments (and the symbol table) are normally hashed 573for faster access (including insertion and deletion). 574 575By default @R{} maintains a (hashed) global cache of `variables' (that 576is symbols and their bindings) which have been found, and this refers 577only to environments which have been marked to participate, which 578consists of the global environment (aka the user workspace), the base 579environment plus environments@footnote{Remember that attaching a list or 580a saved image actually creates and populates an environment and attaches 581that.} which have been @code{attach}ed. When an environment is either 582@code{attach}ed or @code{detach}ed, the names of its symbols are flushed 583from the cache. The cache is used whenever searching for variables from 584the global environment (possibly as part of a recursive search). 585 586@menu 587* Search paths:: 588* Namespaces:: 589* Hash table:: 590@end menu 591 592@node Search paths, Namespaces, Environments and variable lookup, Environments and variable lookup 593@subsection Search paths 594 595@cindex search path 596@Sl{} has the notion of a `search path': the lookup for a `variable' 597leads (possibly through a series of frames) to the `session frame' the 598`working directory' and then along the search path. The search path is 599a series of databases (as returned by @code{search()}) which contain the 600system functions (but not necessarily at the end of the path, as by 601default the equivalent of packages are added at the end). 602 603@R{} has a variant on the @Sl{} model. There is a search path (also 604returned by @code{search()}) which consists of the global environment 605(aka user workspace) followed by environments which have been attached 606and finally the base environment. Note that unlike @Sl{} it is not 607possible to attach environments before the workspace nor after the base 608environment. 609 610However, the notion of variable lookup is more general in @R{}, hence 611the plural in the title of this subsection. Since environments have 612enclosures, from any environment there is a search path found by looking 613in the frame, then the frame of its enclosure and so on. Since loops 614are not allowed, this process will eventually terminate: it can 615terminate at either the base environment or the empty environment. (It 616can be conceptually simpler to think of the search always terminating at 617the empty environment, but with an optimization to stop at the base 618environment.) So the `search path' describes the chain of environments 619which is traversed once the search reaches the global environment. 620 621@node Namespaces, Hash table, Search paths, Environments and variable lookup 622@subsection Namespaces 623 624@cindex namespace 625Namespaces are environments associated with packages (and once again 626the base package is special and will be considered separately). A 627package @code{@var{pkg}} defines two environments 628@code{namespace:@var{pkg}} and @code{package:@var{pkg}}: it is 629@code{package:@var{pkg}} that can be @code{attach}ed and form part of 630the search path. 631 632The objects defined by the @R{} code in the package are symbols with 633bindings in the @code{namespace:@var{pkg}} environment. The 634@code{package:@var{pkg}} environment is populated by selected symbols 635from the @code{namespace:@var{pkg}} environment (the exports). The 636enclosure of this environment is an environment populated with the 637explicit imports from other namespaces, and the enclosure of 638@emph{that} environment is the base namespace. (So the illusion of the 639imports being in the namespace environment is created via the 640environment tree.) The enclosure of the base namespace is the global 641environment, so the search from a package namespace goes via the 642(explicit and implicit) imports to the standard `search path'. 643 644@cindex base namespace 645@cindex namespace, base 646@findex R_BaseNamespace 647The base namespace environment @code{R_BaseNamespace} is another 648@code{ENVSXP} that is special-cased. It is effectively the same thing 649as the base environment @code{R_BaseEnv} @emph{except} that its 650enclosure is the global environment rather than the empty environment: 651the internal code diverts lookups in its frame to the global symbol 652table. 653 654@node Hash table, , Namespaces, Environments and variable lookup 655@subsection Hash table 656 657Environments in @R{} usually have a hash table, and nowadays that is the 658default in @code{new.env()}. It is stored as a @code{VECSXP} where 659@code{length} is used for the allocated size of the table and 660@code{truelength} is the number of primary slots in use---the pointer to 661the @code{VECSXP} is part of the header of a @code{SEXP} of type 662@code{ENVSXP}, and this points to @code{R_NilValue} if the environment 663is not hashed. 664 665For the pros and cons of hashing, see a basic text on Computer Science. 666 667The code to implement hashed environments is in @file{src/main/envir.c}. 668Unless set otherwise (e.g.@: by the @code{size} argument of 669@code{new.env()}) the initial table size is @code{29}. The table will 670be resized by a factor of 1.2 once the load factor (the proportion of 671primary slots in use) reaches 85%. 672 673The hash chains are stored as pairlist elements of the @code{VECSXP}: 674items are inserted at the front of the pairlist. Hashing is principally 675designed for fast searching of environments, which are from time to time 676added to but rarely deleted from, so items are not actually deleted but 677have their value set to @code{R_UnboundValue}. 678 679 680@node Attributes, Contexts, Environments and variable lookup, R Internal Structures 681@section Attributes 682 683@cindex attributes 684@findex ATTRIB 685@findex SET_ATTRIB 686@findex DUPLICATE_ATTRIB 687As we have seen, every @code{SEXPREC} has a pointer to the attributes of 688the node (default @code{R_NilValue}). The attributes can be 689accessed/set by the macros/functions @code{ATTRIB} and 690@code{SET_ATTRIB}, but such direct access is normally only used to check 691if the attributes are @code{NULL} or to reset them. Otherwise access 692goes through the functions @code{getAttrib} and @code{setAttrib} which 693impose restrictions on the attributes. One thing to watch is that if 694you copy attributes from one object to another you may (un)set the 695@code{"class"} attribute and so need to copy the object and S4 bits as 696well. There is a macro/function @code{DUPLICATE_ATTRIB} to automate 697this. 698 699Note that the `attributes' of a @code{CHARSXP} are used as part of the 700management of the @code{CHARSXP} cache: of course @code{CHARSXP}'s are 701not user-visible but C-level code might look at their attributes. 702 703The code assumes that the attributes of a node are either 704@code{R_NilValue} or a pairlist of non-zero length (and this is checked 705by @code{SET_ATTRIB}). The attributes are named (via tags on the 706pairlist). The replacement function @code{attributes<-} ensures that 707@code{"dim"} precedes @code{"dimnames"} in the pairlist. Attribute 708@code{"dim"} is one of several that is treated specially: the values are 709checked, and any @code{"names"} and @code{"dimnames"} attributes are 710removed. Similarly, you cannot set @code{"dimnames"} without having set 711@code{"dim"}, and the value assigned must be a list of the correct 712length and with elements of the correct lengths (and all zero-length 713elements are replaced by @code{NULL}). 714 715The other attributes which are given special treatment are 716@code{"names"}, @code{"class"}, @code{"tsp"}, @code{"comment"} and 717@code{"row.names"}. For pairlist-like objects the names are not stored 718as an attribute but (as symbols) as the tags: however the @R{} interface 719makes them look like conventional attributes, and for one-dimensional 720arrays they are stored as the first element of the @code{"dimnames"} 721attribute. The C code ensures that the @code{"tsp"} attribute is an 722@code{REALSXP}, the frequency is positive and the implied length agrees 723with the number of rows of the object being assigned to. Classes and 724comments are restricted to character vectors, and assigning a 725zero-length comment or class removes the attribute. Setting or removing 726a @code{"class"} attribute sets the object bit appropriately. Integer 727row names are converted to and from the internal compact representation. 728 729@cindex copying semantics 730Care needs to be taken when adding attributes to objects of the types 731with non-standard copying semantics. There is only one object of type 732@code{NILSXP}, @code{R_NilValue}, and that should never have attributes 733(and this is enforced in @code{installAttrib}). For environments, 734external pointers and weak references, the attributes should be relevant 735to all uses of the object: it is for example reasonable to have a name 736for an environment, and also a @code{"path"} attribute for those 737environments populated from @R{} code in a package. 738 739@cindex attributes, preserving 740@cindex preserving attributes 741When should attributes be preserved under operations on an object? 742Becker, Chambers & Wilks (1988, pp. 144--6) give some guidance. Scalar 743functions (those which operate element-by-element on a vector and whose 744output is similar to the input) should preserve attributes (except 745perhaps class, and if they do preserve class they need to preserve the 746@code{OBJECT} and S4 bits). Binary operations normally call 747@findex copyMostAttrib 748@code{copyMostAttrib} to copy most attributes from the longer 749argument (and if they are of the same length from both, preferring the 750values on the first). Here `most' means all except the @code{names}, 751@code{dim} and @code{dimnames} which are set appropriately by the code 752for the operator. 753 754Subsetting (other than by an empty index) generally drops all attributes 755except @code{names}, @code{dim} and @code{dimnames} which are reset as 756appropriate. On the other hand, subassignment generally preserves such 757attributes even if the length is changed. Coercion drops all 758attributes. For example: 759 760@example 761> x <- structure(1:8, names=letters[1:8], comm="a comment") 762> x[] 763a b c d e f g h 7641 2 3 4 5 6 7 8 765attr(,"comm") 766[1] "a comment" 767> x[1:3] 768a b c 7691 2 3 770> x[3] <- 3 771> x 772a b c d e f g h 7731 2 3 4 5 6 7 8 774attr(,"comm") 775[1] "a comment" 776> x[9] <- 9 777> x 778a b c d e f g h 7791 2 3 4 5 6 7 8 9 780attr(,"comm") 781[1] "a comment" 782@end example 783 784 785@node Contexts, Argument evaluation, Attributes, R Internal Structures 786@section Contexts 787 788@cindex context 789@emph{Contexts} are the internal mechanism used to keep track of where a 790computation has got to (and from where), so that control-flow constructs 791can work and reasonable information can be produced on error conditions 792(such as @emph{via} traceback), and otherwise (the @code{sys.@var{xxx}} 793functions). 794 795Execution contexts are a stack of C @code{structs}: 796 797@example 798typedef struct RCNTXT @{ 799 struct RCNTXT *nextcontext; /* @r{The next context up the chain} */ 800 int callflag; /* @r{The context `type'} */ 801 JMP_BUF cjmpbuf; /* @r{C stack and register information} */ 802 int cstacktop; /* @r{Top of the pointer protection stack} */ 803 int evaldepth; /* @r{Evaluation depth at inception} */ 804 SEXP promargs; /* @r{Promises supplied to closure} */ 805 SEXP callfun; /* @r{The closure called} */ 806 SEXP sysparent; /* @r{Environment the closure was called from} */ 807 SEXP call; /* @r{The call that effected this context} */ 808 SEXP cloenv; /* @r{The environment} */ 809 SEXP conexit; /* @r{Interpreted @code{on.exit} code} */ 810 void (*cend)(void *); /* @r{C @code{on.exit} thunk} */ 811 void *cenddata; /* @r{Data for C @code{on.exit} thunk} */ 812 char *vmax; /* @r{Top of the @code{R_alloc} stack} */ 813 int intsusp; /* @r{Interrupts are suspended} */ 814 SEXP handlerstack; /* @r{Condition handler stack} */ 815 SEXP restartstack; /* @r{Stack of available restarts} */ 816 struct RPRSTACK *prstack; /* @r{Stack of pending promises} */ 817@} RCNTXT, *context; 818@end example 819 820@noindent 821plus additional fields for the byte-code compiler. The `types' 822are from 823 824@example 825enum @{ 826 CTXT_TOPLEVEL = 0, /* @r{toplevel context} */ 827 CTXT_NEXT = 1, /* @r{target for @code{next}} */ 828 CTXT_BREAK = 2, /* @r{target for @code{break}} */ 829 CTXT_LOOP = 3, /* @r{@code{break} or @code{next} target} */ 830 CTXT_FUNCTION = 4, /* @r{function closure} */ 831 CTXT_CCODE = 8, /* @r{other functions that need error cleanup} */ 832 CTXT_RETURN = 12, /* @r{@code{return()} from a closure} */ 833 CTXT_BROWSER = 16, /* @r{return target on exit from browser} */ 834 CTXT_GENERIC = 20, /* @r{rather, running an S3 method} */ 835 CTXT_RESTART = 32, /* @r{a call to @code{restart} was made from a closure} */ 836 CTXT_BUILTIN = 64 /* @r{builtin internal function} */ 837@}; 838@end example 839 840@noindent 841where the @code{CTXT_FUNCTION} bit is on wherever function closures are 842involved. 843 844Contexts are created by a call to @code{begincontext} and ended by a 845call to @code{endcontext}: code can search up the stack for a 846particular type of context via @code{findcontext} (and jump there) or 847jump to a specific context via @code{R_JumpToContext}. 848@code{R_ToplevelContext} is the `idle' state (normally the command 849prompt), and @code{R_GlobalContext} is the top of the stack. 850 851Note that whilst calls to closures set a context, internal functions never 852do and primitive builtins only set it when profiling or when they are 853interfaces to foreign functions. 854 855The byte-code compiler generates a map of instructions to source references 856and expressions at compile time, which allows to produce information on 857error conditions. As an optimization, the byte-code interpreter then does 858not set a context in some cases, such as in simple loops or when inlining 859simple builtins or wrappers for internal functions. 860 861@findex UseMethod 862@cindex method dispatch 863Dispatching from a S3 generic (via @code{UseMethod} or its internal 864equivalent) or calling @code{NextMethod} sets the context type to 865@code{CTXT_GENERIC}. This is used to set the @code{sysparent} of the 866method call to that of the @code{generic}, so the method appears to have 867been called in place of the generic rather than from the generic. 868 869The @R{} @code{sys.frame} and @code{sys.call} functions work by counting 870calls to closures (type @code{CTXT_FUNCTION}) from either end of the 871context stack. 872 873Note that the @code{sysparent} element of the structure is not the same 874thing as @code{sys.parent()}. Element @code{sysparent} is primarily 875used in managing changes of the function being evaluated, i.e.@: by 876@code{Recall} and method dispatch. 877 878@code{CTXT_CCODE} contexts are currently used in @code{cat()}, 879@code{load()}, @code{scan()} and @code{write.table()} (to close the 880connection on error), by @code{PROTECT}, serialization (to recover from 881errors, e.g.@: free buffers) and within the error handling code (to 882raise the C stack limit and reset some variables). 883 884 885@node Argument evaluation, Autoprinting, Contexts, R Internal Structures 886@section Argument evaluation 887 888@cindex argument evaluation 889As we have seen, functions in @R{} come in three types, closures 890(@code{SEXPTYPE} @code{CLOSXP}), specials (@code{SPECIALSXP}) and 891builtins (@code{BUILTINSXP}). In this section we consider when (and if) 892the actual arguments of function calls are evaluated. The rules are 893different for the internal (special/builtin) and @R{}-level functions 894(closures). 895 896For a call to a closure, the actual and formal arguments are matched and 897a matched call (another @code{LANGSXP}) is constructed. This process 898first replaces the actual argument list by a list of promises to the 899values supplied. It then constructs a new environment which contains 900the names of the formal parameters matched to actual or default values: 901all the matched values are promises, the defaults as promises to be 902evaluated in the environment just created. That environment is then 903used for the evaluation of the body of the function, and promises will 904be forced (and hence actual or default arguments evaluated) when they 905are encountered. 906@findex NAMED 907(Evaluating a promise sets @code{NAMED = NAMEDMAX} on its value, so if the 908argument was a symbol its binding is regarded as having multiple 909references during the evaluation of the closure call.) 910[The @code{NAMED} mechanism has been replaced by reference counting.] 911 912If the closure is an S3 generic (that is, contains a call to 913@code{UseMethod}) the evaluation process is the same until the 914@code{UseMethod} call is encountered. At that point the argument on 915which to do dispatch (normally the first) will be evaluated if it has 916not been already. If a method has been found which is a closure, a new 917evaluation environment is created for it containing the matched 918arguments of the method plus any new variables defined so far during the 919evaluation of the body of the generic. (Note that this means changes to 920the values of the formal arguments in the body of the generic are 921discarded when calling the method, but @emph{actual} argument promises 922which have been forced retain the values found when they were forced. 923On the other hand, missing arguments have values which are promises to 924use the default supplied by the method and not by the generic.) If the 925method found is a primitive it is called with the matched argument list 926of promises (possibly already forced) used for the generic. 927 928@cindex builtin function 929@cindex special function 930@cindex primitive function 931@cindex .Internal function 932The essential difference@footnote{There is currently one other 933difference: when profiling builtin functions are counted as function 934calls but specials are not.} between special and builtin functions is 935that the arguments of specials are not evaluated before the C code is 936called, and those of builtins are. Note that being a special/builtin is 937separate from being primitive or @code{.Internal}: @code{quote} is a 938special primitive, @code{+} is a builtin primitive, @code{cbind} is a 939special @code{.Internal} and @code{grep} is a builtin @code{.Internal}. 940 941@cindex generic, internal 942@findex DispatchOrEval 943Many of the internal functions are internal generics, which for specials 944means that they do not evaluate their arguments on call, but the C code 945starts with a call to @code{DispatchOrEval}. The latter evaluates the 946first argument, and looks for a method based on its class. (If S4 947dispatch is on, S4 methods are looked for first, even for S3 classes.) 948If it finds a method, it dispatches to that method with a call based on 949promises to evaluate the remaining arguments. If no method is found, 950the remaining arguments are evaluated before return to the internal 951generic. 952 953@cindex generic, generic 954@findex DispatchGeneric 955The other way that internal functions can be generic is to be group 956generic. Most such functions are builtins (so immediately evaluate all 957their arguments), and all contain a call to the C function 958@code{DispatchGeneric}. There are some peculiarities over the number of 959arguments for the @code{"Math"} group generic, with some members 960allowing only one argument, some having two (with a default for the 961second) and @code{trunc} allows one or more but the default method only 962accepts one. 963 964@menu 965* Missingness:: 966* Dot-dot-dot arguments:: 967@end menu 968 969@node Missingness, Dot-dot-dot arguments, Argument evaluation, Argument evaluation 970@subsection Missingness 971 972@cindex missingness 973Actual arguments to (non-internal) @R{} functions can be fewer than are 974required to match the formal arguments of the function. Having 975unmatched formal arguments will not matter if the argument is never used 976(by lazy evaluation), but when the argument is evaluated, either its 977default value is evaluated (within the evaluation environment of the 978function) or an error is thrown with a message along the lines of 979 980@example 981argument "foobar" is missing, with no default 982@end example 983 984@findex MISSING 985@findex R_MissingArg 986Internally missingness is handled by two mechanisms. The object 987@code{R_MissingArg} is used to indicate that a formal argument has no 988(default) value. When matching the actual arguments to the formal 989arguments, a new argument list is constructed from the formals all of 990whose values are @code{R_MissingArg} with the first @code{MISSING} bit 991set. Then whenever a formal argument is matched to an actual argument, 992the corresponding member of the new argument list has its value set to 993that of the matched actual argument, and if that is not 994@code{R_MissingArg} the missing bit is unset. 995 996This new argument list is used to form the evaluation frame for the 997function, and if named arguments are subsequently given a new value 998(before they are evaluated) the missing bit is cleared. 999 1000Missingness of arguments can be interrogated via the @code{missing()} 1001function. An argument is clearly missing if its missing bit is set or 1002if the value is @code{R_MissingArg}. However, missingness can be passed 1003on from function to function, for using a formal argument as an actual 1004argument in a function call does not count as evaluation. So 1005@code{missing()} has to examine the value (a promise) of a 1006non-yet-evaluated formal argument to see if it might be missing, which 1007might involve investigating a promise and so on @dots{}. 1008 1009Special primitives also need to handle missing arguments, and in some 1010case (e.g.@: @code{log}) that is why they are special and not 1011builtin. This is usually done by testing if an argument's value is 1012@code{R_MissingArg}. 1013 1014@node Dot-dot-dot arguments, , Missingness, Argument evaluation 1015@subsection Dot-dot-dot arguments 1016 1017@cindex ... argument 1018Dot-dot-dot arguments are convenient when writing functions, but 1019complicate the internal code for argument evaluation. 1020 1021The formals of a function with a @code{...} argument represent that as a 1022single argument like any other argument, with tag the symbol 1023@code{R_DotsSymbol}. When the actual arguments are matched to the 1024formals, the value of the @code{...} argument is of @code{SEXPTYPE} 1025@code{DOTSXP}, a pairlist of promises (as used for matched arguments) 1026but distinguished by the @code{SEXPTYPE}. 1027 1028Recall that the evaluation frame for a function initially contains the 1029@code{@var{name}=@var{value}} pairs from the matched call, and hence 1030this will be true for @code{...} as well. The value of @code{...} is a 1031(special) pairlist whose elements are referred to by the special symbols 1032@code{..1}, @code{..2}, @dots{} which have the @code{DDVAL} bit set: 1033when one of these is encountered it is looked up (via @code{ddfindVar}) 1034in the value of the @code{...} symbol in the evaluation frame. 1035 1036Values of arguments matched to a @code{...} argument can be missing. 1037 1038Special primitives may need to handle @code{...} arguments: see for 1039example the internal code of @code{switch} in file 1040@file{src/main/builtin.c}. 1041 1042@node Autoprinting, The write barrier, Argument evaluation, R Internal Structures 1043@section Autoprinting 1044 1045@cindex autoprinting 1046@findex R_Visible 1047 1048Whether the returned value of a top-level @R{} expression is printed is 1049controlled by the global boolean variable @code{R_Visible}. This is set 1050(to true or false) on entry to all primitive and internal functions 1051based on the @code{eval} column of the table in file 1052@file{src/main/names.c}: the appropriate setting can be extracted by the 1053macro @code{PRIMPRINT}. 1054@findex PRIMPRINT 1055 1056@findex invisible 1057The @R{} primitive function @code{invisible} makes use of this 1058mechanism: it just sets @code{R_Visible = FALSE} before entry and 1059returns its argument. 1060 1061For most functions the intention will be that the setting of 1062@code{R_Visible} when they are entered is the setting used when they 1063return, but there need to be exceptions. The @R{} functions 1064@code{identify}, @code{options}, @code{system} and @code{writeBin} 1065determine whether the result should be visible from the arguments or 1066user action. Other functions themselves dispatch functions which may 1067change the visibility flag: examples@footnote{the other current example 1068is left brace, which is implemented as a primitive.} are 1069@code{.Internal}, @code{do.call}, @code{eval}, @code{withVisible}, 1070@code{if}, @code{NextMethod}, @code{Recall}, @code{recordGraphics}, 1071@code{standardGeneric}, @code{switch} and @code{UseMethod}. 1072 1073`Special' primitive and internal functions evaluate their arguments 1074internally @emph{after} @code{R_Visible} has been set, and evaluation of 1075the arguments (e.g.@: an assignment as in PR#9263) can change the value 1076of the flag. 1077 1078The @code{R_Visible} flag can also get altered during the evaluation of 1079a function, with comments in the code about @code{warning}, 1080@code{writeChar} and graphics functions calling @code{GText} (PR#7397). 1081(Since the C-level function @code{eval} sets @code{R_Visible}, this 1082could apply to any function calling it. Since it is called when 1083evaluating promises, even object lookup can change @code{R_Visible}.) 1084Internal and primitive functions force the documented setting of 1085@code{R_Visible} on return, unless the C code is allowed to change it 1086(the exceptions above are indicated by @code{PRIMPRINT} having value 2). 1087 1088The actual autoprinting is done by @code{PrintValueEnv} in file 1089@file{print.c}. If the object to be printed has the S4 bit set and S4 1090methods dispatch is on, @code{show} is called to print the object. 1091Otherwise, if the object bit is set (so the object has a 1092@code{"class"} attribute), @code{print} is called to dispatch methods: 1093for objects without a class the internal code of @code{print.default} 1094is called. 1095 1096 1097@node The write barrier, Serialization Formats, Autoprinting, R Internal Structures 1098@section The write barrier and the garbage collector 1099 1100@cindex write barrier 1101@cindex garbage collector 1102@R{} has long had a generational garbage collector, and bit @code{gcgen} 1103in the @code{sxpinfo} header is used in the implementation of this. 1104This is used in conjunction with the @code{mark} bit to identify two 1105previous generations. 1106 1107There are three levels of collections. Level 0 collects only the 1108youngest generation, level 1 collects the two youngest generations and 1109level 2 collects all generations. After 20 level-0 collections the next 1110collection is at level 1, and after 5 level-1 collections at level 2. 1111Further, if a level-@var{n} collection fails to provide 20% free space 1112(for each of nodes and the vector heap), the next collection will be at 1113level @var{n+1}. (The @R{}-level function @code{gc()} performs a 1114level-2 collection.) 1115 1116A generational collector needs to efficiently `age' the objects, 1117especially list-like objects (including @code{STRSXP}s). This is done 1118by ensuring that the elements of a list are regarded as at least as old 1119as the list @emph{when they are assigned}. This is handled by the 1120functions @code{SET_VECTOR_ELT} and @code{SET_STRING_ELT}, which is why 1121they are functions and not macros. Ensuring the integrity of such 1122operations is termed the @dfn{write barrier} and is done by making the 1123@code{SEXP} opaque and only providing access via functions (which cannot 1124be used as lvalues in assignments in C). 1125 1126All code in @R{} extensions is by default behind the write barrier. The 1127only way to obtain direct access to the internals of the @code{SEXPREC}s 1128is to define @samp{USE_RINTERNALS} before including header file 1129@file{Rinternals.h}, which is normally defined in @file{Defn.h}. To 1130enable a check on the way that the access is used, @R{} can be compiled 1131with flag @option{--enable-strict-barrier} which ensures that header 1132@file{Defn.h} does not define @samp{USE_RINTERNALS} and hence that 1133@code{SEXP} is opaque in most of @R{} itself. (There are some necessary 1134exceptions: foremost in file @file{memory.c} where the accessor 1135functions are defined and also in file @file{size.c} which needs access 1136to the sizes of the internal structures.) 1137 1138For background papers see 1139@uref{https://homepage.stat.uiowa.edu/~luke/R/barrier.html} and 1140@uref{https://homepage.stat.uiowa.edu/~luke/R/gengcnotes.html}. 1141 1142@node Serialization Formats, Encodings for CHARSXPs, The write barrier, R Internal Structures 1143@section Serialization Formats 1144 1145@cindex serialization 1146Serialized versions of @R{} objects are used by @code{load}/@code{save} 1147and also at a slightly lower level by @code{saveRDS}/@code{readRDS} (and 1148their earlier `internal' dot-name versions) and 1149@code{serialize}/@code{unserialize}. These differ in what they 1150serialize to (a file, a connection, a raw vector) and whether they are 1151intended to serialize a single object or a collection of objects 1152(typically the workspace). @code{save} writes a header at the beginning 1153of the file (a single LF-terminated line) which the lower-level versions 1154do not. 1155 1156@code{save} and @code{saveRDS} allow various forms of compression, and 1157@command{gzip} compression is the default (except for @acronym{ASCII} 1158saves). Compression is applied to the whole file stream, including the 1159headers, so serialized files can be uncompressed or re-compressed by 1160external programs. Both @code{load} and @code{readRDS} can read 1161@command{gzip}, @command{bzip2} and @command{xz} forms of compression 1162when reading from a file, and @command{gzip} compression when reading 1163from a connection. 1164 1165@R{} has used the same serialization format called `version 2' from @R{} 11661.4.0 in December 2001 until @R{} 3.5.3 in March 2019. It has been expanded 1167in back-compatible ways since its inception, for example to support 1168additional @code{SEXPTYPE}s. Earlier formats are still supported via 1169@code{load} and @code{save} but such formats are not described here. The 1170current default serialization format is called `version 3', and has been 1171introduced in @R{} 3.5.0. 1172 1173@code{save} works by writing a single-line header (typically 1174@code{RDX2\n} for a binary save: the only other current value is 1175@code{RDA2\n} for @code{save(files=TRUE)}), then creating a tagged 1176pairlist of the objects to be saved and serializing that single object. 1177@code{load} reads the header line, unserializes a single object (a 1178pairlist or a vector list) and assigns the elements of the object in the 1179specified environment. The header line serves two purposes in @R{}: it 1180identifies the serialization format so @code{load} can switch to the 1181appropriate reader code, and the newline @code{\n} allows the detection of files 1182which have been subjected to a non-binary transfer which re-mapped line 1183endings. It can also be thought of as a `magic number' in the sense 1184used by the @command{file} program (although @R{} save files are not yet 1185by default known to that program). 1186 1187Serialization in @R{} needs to take into account that objects may 1188contain references to environments, which then have enclosing 1189environments and so on. (Environments recognized as package or name 1190space environments are saved by name.) There are `reference objects' 1191which are not duplicated on copy and should remain shared on 1192unserialization. These are weak references, external pointers and 1193environments other than those associated with packages, namespaces and 1194the global environment. These are handled via a hash table, and 1195references after the first are written out as a reference marker indexed 1196by the table entry. 1197 1198Version-2 serialization first writes a header indicating the format 1199(normally @samp{X\n} for an XDR format binary save, but @samp{A\n}, 1200ASCII, and @samp{B\n}, native word-order binary, can also occur) and 1201then three integers giving the version of the format and two @R{} 1202versions (packed by the @code{R_Version} macro from @file{Rversion.h}). 1203(Unserialization interprets the two versions as the version of @R{} 1204which wrote the file followed by the minimal version of @R{} needed to 1205read the format.) Serialization then writes out the object recursively 1206using function @code{WriteItem} in file @file{src/main/serialize.c}. 1207 1208Some objects are written as if they were @code{SEXPTYPE}s: such 1209pseudo-@code{SEXPTYPE}s cover @code{R_NilValue}, @code{R_EmptyEnv}, 1210@code{R_BaseEnv}, @code{R_GlobalEnv}, @code{R_UnboundValue}, 1211@code{R_MissingArg} and @code{R_BaseNamespace}. 1212 1213For all @code{SEXPTYPE}s except @code{NILSXP}, @code{SYMSXP} and 1214@code{ENVSXP} serialization starts with an integer with the 1215@code{SEXPTYPE} in bits 0:7@footnote{only bits 0:4 are currently used 1216for @code{SEXPTYPE}s but values 241:255 are used for 1217pseudo-@code{SEXPTYPE}s.} followed by the object bit, two bits 1218indicating if there are any attributes and if there is a tag (for the 1219pairlist types), an unused bit and then the @code{gp} 1220field@footnote{Currently the only relevant bits are 0:1, 4, 14:15.} in 1221bits 12:27. Pairlist-like objects write their attributes (if any), tag 1222(if any), CAR and then CDR (using tail recursion): other objects write 1223their attributes after themselves. Atomic vector objects write their 1224length followed by the data: generic vector-list objects write their 1225length followed by a call to @code{WriteItem} for each element. The 1226code for @code{CHARSXP}s special-cases @code{NA_STRING} and writes it as 1227length @code{-1} with no data. Lengths no more than @code{2^31 - 1} are 1228written in that way and larger lengths (which only occur on 64-bit 1229systems) as @code{-1} followed by the upper and lower 32-bits as integers 1230(regarded as unsigned). 1231 1232Environments are treated in several ways: as we have seen, some are 1233written as specific pseudo-@code{SEXPTYPE}s. Package and namespace 1234environments are written with pseudo-@code{SEXPTYPE}s followed by the 1235name. `Normal' environments are written out as @code{ENVSXP}s with an 1236integer indicating if the environment is locked followed by the 1237enclosure, frame, `tag' (the hash table) and attributes. 1238 1239In the `XDR' format integers and doubles are written in bigendian order: 1240however the format is not fully XDR (as defined in RFC 1832) as byte 1241quantities (such as the contents of @code{CHARSXP} and @code{RAWSXP} 1242types) are written as-is and not padded to a multiple of four bytes. 1243 1244The `ASCII' format writes 7-bit characters. Integers are formatted with 1245@code{%d} (except that @code{NA_integer_} is written as @code{NA}), 1246doubles formatted with @code{%.16g} (plus @code{NA}, @code{Inf} and 1247@code{-Inf}) and bytes with @code{%02x}. Strings are written using 1248standard escapes (e.g.@: @code{\t} and @code{\013}) for non-printing and 1249non-@acronym{ASCII} bytes. 1250 1251Version-3 serialization extends version-2 by support for custom 1252serialization of @code{ALTREP} framework objects. It also stores the 1253current native encoding at serialization time, so that unflagged strings can 1254be converted if unserialized in R running under different native encoding. 1255 1256@node Encodings for CHARSXPs, The CHARSXP cache, Serialization Formats, R Internal Structures 1257@section Encodings for CHARSXPs 1258 1259Character data in @R{} are stored in the sexptype @code{CHARSXP}. 1260 1261There is support for encodings other than that of the current locale, in 1262particular UTF-8 and the multi-byte encodings used on Windows for CJK 1263languages. A limited means to indicate the encoding of a @code{CHARSXP} 1264is @emph{via} two of the `general purpose' bits which are used to declare 1265the encoding to be either Latin-1 or UTF-8. (Note that it is possible 1266for a character vector to contain elements in different encodings.) 1267Both printing and plotting notice the declaration and convert the string 1268to the current locale (possibly using @code{<xx>} to display in 1269hexadecimal bytes that are not valid in the current locale). Many (but 1270not all) of the character manipulation functions will either preserve 1271the declaration or re-encode the character string. 1272 1273Strings that refer to the OS such as file names need to be passed 1274through a wide-character interface on some OSes (e.g.@: Windows). 1275 1276When are character strings declared to be of known encoding? One way is 1277to do so directly via @code{Encoding}. The parser declares the encoding 1278if this is known, either via the @code{encoding} argument to 1279@code{parse} or from the locale within which parsing is being done at 1280the @R{} command line. (Other ways are recorded on the help page for 1281@code{Encoding}.) 1282 1283It is not necessary to declare the encoding of @acronym{ASCII} strings 1284as they will work in any locale. @acronym{ASCII} strings should never 1285have a marked encoding, as any encoding will be ignored when entering 1286such strings into the @code{CHARSXP} cache. 1287 1288The rationale behind considering only UTF-8 and Latin-1 was that most 1289systems are capable of producing UTF-8 strings and this is the nearest 1290we have to a universal format. For those that do not (for example those 1291lacking a powerful enough @code{iconv}), it is likely that they work in 1292Latin-1, the old @R{} assumption. Then the parser can return a 1293UTF-8-encoded string if it encounters a @samp{\uxxxx} escape for a 1294Unicode point that cannot be represented in the current charset. (This 1295needs MBCS support, and was only enabled@footnote{See define 1296@code{USE_UTF8_IF_POSSIBLE} in file @file{src/main/gram.c}.} on 1297Windows.) This is enabled for all platforms, and a @samp{\uxxxx} or 1298@samp{\Uxxxxxxxx} escape ensures that the parsed string will be marked 1299as UTF-8. 1300 1301Most of the character manipulation functions now preserve UTF-8 1302encodings: there are some notes as to which at the top of file 1303@file{src/main/character.c} and in file 1304@file{src/library/base/man/Encoding.Rd}. 1305 1306Graphics devices are offered the possibility of handing UTF-8-encoded 1307strings without re-encoding to the native character set, by setting 1308@code{hasTextUTF8} to be @samp{TRUE} and supplying functions 1309@code{textUTF8} and @code{strWidthUTF8} that expect UTF-8-encoded 1310inputs. Normally the symbol font is encoded in Adobe Symbol encoding, 1311but that can be re-encoded to UTF-8 by setting @code{wantSymbolUTF8} to 1312@samp{TRUE}. The Windows' port of cairographics has a rather peculiar 1313assumption: it wants the symbol font to be encoded in UTF-8 as if it 1314were encoded in Latin-1 rather than Adobe Symbol: this is selected by 1315@code{wantSymbolUTF8 = NA_LOGICAL}. 1316 1317Windows has no UTF-8 locales, but rather expects to work with 1318UCS-2@footnote{or UTF-16 if support for surrogates is enabled in the OS, 1319which it used not to be when encoding support was added to @R{}.} 1320strings. @R{} (being written in standard C) would not work internally 1321with UCS-2 without extensive changes. The @file{Rgui} 1322console@footnote{but not the GraphApp toolkit.} uses UCS-2 internally, 1323but communicates with the @R{} engine in the native encoding. To allow 1324UTF-8 strings to be printed in UTF-8 in @file{Rgui.exe}, an escape 1325convention is used (see header file @file{rgui_UTF8.h}) by 1326@code{cat}, @code{print} and autoprinting. 1327 1328`Unicode' (UCS-2LE) files are common in the Windows world, and 1329@code{readLines} and @code{scan} will read them into UTF-8 strings on 1330Windows if the encoding is declared explicitly on an unopened 1331connection passed to those functions. 1332 1333@node The CHARSXP cache, Warnings and errors, Encodings for CHARSXPs, R Internal Structures 1334@section The CHARSXP cache 1335 1336@findex mkChar 1337There is a global cache for @code{CHARSXP}s created by @code{mkChar} --- 1338the cache ensures that most @code{CHARSXP}s with the same contents share 1339storage (`contents' including any declared encoding). Not all 1340@code{CHARSXP}s are part of the cache -- notably @samp{NA_STRING} is 1341not. @code{CHARSXP}s reloaded from the @code{save} formats of @R{} prior 1342to 0.99.0 are not cached (since the code used is frozen and very few 1343examples still exist). 1344 1345@findex mkCharLenCE 1346The cache records the encoding of the string as well as the bytes: all 1347requests to create a @code{CHARSXP} should be @emph{via} a call to 1348@code{mkCharLenCE}. Any encoding given in @code{mkCharLenCE} call will 1349be ignored if the string's bytes are all @acronym{ASCII} characters. 1350 1351 1352@node Warnings and errors, S4 objects, The CHARSXP cache, R Internal Structures 1353@section Warnings and errors 1354 1355@findex warning 1356@findex warningcall 1357@findex error 1358@findex errorcall 1359 1360Each of @code{warning} and @code{stop} have two C-level equivalents, 1361@code{warning}, @code{warningcall}, @code{error} and @code{errorcall}. 1362The relationship between the pairs is similar: @code{warning} tries to 1363fathom out a suitable call, and then calls @code{warningcall} with that 1364call as the first argument if it succeeds, and with @code{call = 1365R_NilValue} if it does not. When @code{warningcall} is called, it 1366includes the deparsed call in its printout unless @code{call = 1367R_NilValue}. 1368 1369@code{warning} and @code{error} look at the context stack. If the 1370topmost context is not of type @code{CTXT_BUILTIN}, it is used to 1371provide the call, otherwise the next context provides the call. 1372This means that when these functions are called from a primitive or 1373@code{.Internal}, the imputed call will not be to 1374primitive/@code{.Internal} but to the function calling the 1375primitive/@code{.Internal} . This is exactly what one wants for a 1376@code{.Internal}, as this will give the call to the closure wrapper. 1377(Further, for a @code{.Internal}, the call is the argument to 1378@code{.Internal}, and so may not correspond to any @R{} function.) 1379However, it is unlikely to be what is needed for a primitive. 1380 1381The upshot is that that @code{warningcall} and @code{errorcall} should 1382normally be used for code called from a primitive, and @code{warning} 1383and @code{error} should be used for code called from a @code{.Internal} 1384(and necessarily from @code{.Call}, @code{.C} and so on, where the call 1385is not passed down). However, there are two complications. One is that 1386code might be called from either a primitive or a @code{.Internal}, in 1387which case probably @code{warningcall} is more appropriate. The other 1388involves replacement functions, where the call was once of the form 1389@example 1390> length(x) <- y ~ x 1391Error in "length<-"(`*tmp*`, value = y ~ x) : invalid value 1392@end example 1393 1394@noindent 1395which is unpalatable to the end user. For replacement functions there 1396will be a suitable context at the top of the stack, so @code{warning} 1397should be used. (The results for @code{.Internal} replacement functions 1398such as @code{substr<-} are not ideal.) 1399 1400 1401 1402@node S4 objects, Memory allocators, Warnings and errors, R Internal Structures 1403@section S4 objects 1404 1405[This section is currently a preliminary draft and should not be taken 1406as definitive. The description assumes that @env{R_NO_METHODS_TABLES} 1407has not been set.] 1408 1409@menu 1410* Representation of S4 objects:: 1411* S4 classes:: 1412* S4 methods:: 1413* Mechanics of S4 dispatch:: 1414@end menu 1415 1416@node Representation of S4 objects, S4 classes, S4 objects, S4 objects 1417@subsection Representation of S4 objects 1418 1419S4 objects can be of any @code{SEXPTYPE}. They are either an object of 1420a simple type (such as an atomic vector or function) with S4 class 1421information or of type @code{S4SXP}. In all cases, the `S4 bit' (bit 4 1422of the `general purpose' field) is set, and can be tested by the 1423macro/function @code{IS_S4_OBJECT}. 1424 1425S4 objects are created via @code{new()}@footnote{This can also create 1426non-S4 objects, as in @code{new("integer")}.} and thence via the C 1427function @code{R_do_new_object}. This duplicates the prototype of the 1428class, adds a class attribute and sets the S4 bit. All S4 class 1429attributes should be character vectors of length one with an attribute 1430giving (as a character string) the name of the package (or 1431@code{.GlobalEnv}) containing the class definition. Since S4 objects 1432have a class attribute, the @code{OBJECT} bit is set. 1433 1434It is currently unclear what should happen if the class attribute is 1435removed from an S4 object, or if this should be allowed. 1436 1437@node S4 classes, S4 methods, Representation of S4 objects, S4 objects 1438@subsection S4 classes 1439 1440S4 classes are stored as @R{} objects in the environment in which they 1441are created, with names @code{.__C__@var{classname}}: as such they are 1442not listed by default by @code{ls}. 1443 1444The objects are S4 objects of class @code{"classRepresentation"} which 1445is defined in the @pkg{methods} package. 1446 1447Since these are just objects, they are subject to the normal scoping 1448rules and can be imported and exported from namespaces like other 1449objects. The directives @code{importClassesFrom} and 1450@code{exportClasses} are merely convenient ways to refer to class 1451objects without needing to know their internal `metaname' (although 1452@code{exportClasses} does a little sanity checking via @code{isClass}). 1453 1454@node S4 methods, Mechanics of S4 dispatch, S4 classes, S4 objects 1455@subsection S4 methods 1456 1457Details of the methods are stored in environments (typically hidden in the 1458respective namespace) with a non-syntactic name of the form 1459@code{.__T__@var{generic}:@var{package}} containing objects of class 1460@code{MethodDefinition} for all methods defined in the current environment 1461for the named generic derived from a specific package (which might be @code{.GlobalEnv}). 1462This is sometimes referred to as a `methods table'. 1463 1464For example, 1465@example 1466 length(nM <- asNamespace("Matrix") ) # 941 for Matrix 1.2-6 1467 length(meth <- grep("^[.]__T__", names(nM), value=TRUE))# 107 generics with methods 1468 length(meth.Ops <- nM$`.__T__Ops:base`) # 71 methods for the 'Ops' (group)generic 1469 head(sort(names(meth.Ops))) ## "abIndex#abIndex" ... "ANY#ddiMatrix" "ANY#ldiMatrix" "ANY#Matrix" 1470@end example 1471 1472During an @R{} session there is an environment associated with each 1473non-primitive generic containing objects @code{.AllMTable}, 1474@code{.Generic}, @code{.Methods}, @code{.MTable}, @code{.SigArgs} and 1475@code{.SigLength}. @code{.MTable} and @code{AllMTable} are merged 1476methods tables containing all the methods defined directly and via 1477inheritance respectively. @code{.Methods} is a merged methods list. 1478 1479Exporting methods from a namespace is more complicated than exporting a 1480class. Note first that you do not export a method, but rather the 1481directive @code{exportMethods} will export all the methods defined in 1482the namespace for a specified generic: the code also adds to the list 1483of generics any that are exported directly. For generics which are 1484listed via @code{exportMethods} or exported themselves, the 1485corresponding environment is exported and so 1486will appear (as hidden object) in the package environment. 1487 1488Methods for primitives which are internally S4 generic (see below) are 1489always exported, whether mentioned in the @file{NAMESPACE} file or not. 1490 1491Methods can be imported either via the directive 1492@code{importMethodsFrom} or via importing a namespace by @code{import}. 1493Also, if a generic is imported via @code{importFrom}, its methods are 1494also imported. In all cases the generic will be imported if it is in 1495the namespace, so @code{importMethodsFrom} is most appropriate for 1496methods defined on generics in other packages. Since methods for a 1497generic could be imported from several different packages, the methods 1498tables are merged. 1499 1500When a package is attached 1501@code{methods:::cacheMetaData} is called to update the internal tables: 1502only the visible methods will be cached. 1503 1504 1505@node Mechanics of S4 dispatch, , S4 methods, S4 objects 1506@subsection Mechanics of S4 dispatch 1507 1508This subsection does not discuss how S4 methods are chosen: see 1509@uref{https://developer.@/r-project.org/howMethodsWork.pdf}. 1510 1511For all but primitive functions, setting a method on an existing 1512function that is not itself S4 generic creates a new object in the 1513current environment which is a call to @code{standardGeneric} with the 1514old definition as the default method. Such S4 generics can also be 1515created @emph{via} a call to @code{setGeneric}@footnote{although this is 1516not recommended as it is less future-proof.} and are standard closures 1517in the @R{} language, with environment the environment within which they 1518are created. With the advent of namespaces this is somewhat 1519problematic: if @code{myfn} was previously in a package with a name 1520space there will be two functions called @code{myfn} on the search 1521paths, and which will be called depends on which search path is in use. 1522This is starkest for functions in the base namespace, where the 1523original will be found ahead of the newly created function from any 1524other package. 1525 1526Primitive functions are treated quite differently, for efficiency 1527reasons: this results in different semantics. @code{setGeneric} is 1528disallowed for primitive functions. The @pkg{methods} namespace 1529contains a list @code{.BasicFunsList} named by primitive functions: 1530the entries are either @code{FALSE} or a standard S4 generic showing 1531the effective definition. When @code{setMethod} (or 1532@code{setReplaceMethod}) is called, it either fails (if the list entry 1533is @code{FALSE}) or a method is set on the effective generic given in 1534the list. 1535 1536Actual dispatch of S4 methods for almost all primitives piggy-backs on 1537the S3 dispatch mechanism, so S4 methods can only be dispatched for 1538primitives which are internally S3 generic. When a primitive that is 1539internally S3 generic is called with a first argument which is an S4 1540object and S4 dispatch is on (that is, the @pkg{methods} namespace is 1541loaded), @code{DispatchOrEval} calls @code{R_possible_dispatch} (defined 1542in file @file{src/main/objects.c}). (Members of the S3 group generics, 1543which includes all the generic operators, are treated slightly 1544differently: the first two arguments are checked and 1545@code{DispatchGroup} is called.) @code{R_possible_dispatch} first 1546checks an internal table to see if any S4 methods are set for that 1547generic (and S4 dispatch is currently enabled for that generic), and if 1548so proceeds to S4 dispatch using methods stored in another internal 1549table. All primitives are in the base namespace, and this mechanism 1550means that S4 methods can be set for (some) primitives and will always 1551be used, in contrast to setting methods on non-primitives. 1552 1553The exception is @code{%*%}, which is S4 generic but not S3 generic as 1554its C code contains a direct call to @code{R_possible_dispatch}. 1555 1556The primitive @code{as.double} is special, as @code{as.numeric} and 1557@code{as.real} are copies of it. The @pkg{methods} package code partly 1558refers to generics by name and partly by function, and maps 1559@code{as.double} and @code{as.real} to @code{as.numeric} (since that is 1560the name used by packages exporting methods for it). 1561 1562Some elements of the language are implemented as primitives, for example 1563@code{@}}. This includes the subset and subassignment `functions' and 1564they are S4 generic, again piggybacking on S3 dispatch. 1565 1566@code{.BasicFunsList} is generated when @pkg{methods} is installed, by 1567computing all primitives, initially disallowing methods on all and then 1568setting generics for members of @code{.GenericArgsEnv}, the S4 group 1569generics and a short exceptions list in file @file{BasicFunsList.R}: this 1570currently contains the subsetting and subassignment operators and an 1571override for @code{c}. 1572 1573@node Memory allocators, Internal use of global and base environments, S4 objects, R Internal Structures 1574@section Memory allocators 1575 1576@R{}'s memory allocation is almost all done via routines in file 1577@file{src/main/memory.c}. It is important to keep track of where memory 1578is allocated, as the Windows port (by default) makes use of a memory 1579allocator that differs from @code{malloc} etc as provided by MinGW. 1580Specifically, there are entry points @code{Rm_malloc}, @code{Rm_free}, 1581@code{Rm_calloc} and @code{Rm_free} provided by file 1582@file{src/gnuwin32/malloc.c}. This was done for two reasons. The 1583primary motivation was performance: the allocator provided by MSVCRT 1584@emph{via} MinGW was far too slow at handling the many small allocations 1585that the allocation system for @code{SEXPREC}s uses. As a side benefit, 1586we can set a limit on the amount of allocated memory: this is useful as 1587whereas Windows does provide virtual memory it is relatively far slower 1588than many other @R{} platforms and so limiting @R{}'s use of swapping is 1589highly advantageous. The high-performance allocator is only called from 1590@file{src/main/memory.c}, @file{src/main/regex.c}, @file{src/extra/pcre} 1591and @file{src/extra/xdr}: note that this means that it is not used in 1592packages. 1593 1594The rest of @R{} should where possible make use of the allocators made 1595available by file @file{src/main/memory.c}, which are also the methods 1596recommended in 1597@ifset UseExternalXrefs 1598@ref{Memory allocation, , Memory allocation, R-exts, Writing R Extensions} 1599@end ifset 1600@ifclear UseExternalXrefs 1601`Writing R Extensions' 1602@end ifclear 1603@findex R_alloc 1604@findex Calloc 1605@findex Realloc 1606@findex Free 1607for use in @R{} packages, namely the use of @code{R_alloc}, 1608@code{Calloc}, @code{Realloc} and @code{Free}. Memory allocated by 1609@code{R_alloc} is freed by the garbage collector once the `watermark' 1610has been reset by calling 1611@findex vmaxset 1612@code{vmaxset}. This is done automatically by the wrapper code calling 1613primitives and @code{.Internal} functions (and also by the wrapper code 1614to @code{.Call} and @code{.External}), but 1615@findex vmaxget 1616@code{vmaxget} and @code{vmaxset} can be used to reset the watermark 1617from within internal code if the memory is only required for a short 1618time. 1619 1620@findex alloca 1621All of the methods of memory allocation mentioned so far are relatively 1622expensive. All @R{} platforms support @code{alloca}, and in almost all 1623cases@footnote{but apparently not on Windows.} this is managed by the 1624compiler, allocates memory on the C stack and is very efficient. 1625 1626There are two disadvantages in using @code{alloca}. First, it is 1627fragile and care is needed to avoid writing (or even reading) outside 1628the bounds of the allocation block returned. Second, it increases the 1629danger of overflowing the C stack. It is suggested that it is only 1630used for smallish allocations (up to tens of thousands of bytes), and 1631that 1632 1633@findex R_CheckStack 1634@example 1635 R_CheckStack(); 1636@end example 1637 1638@noindent 1639is called immediately after the allocation (as @R{}'s stack checking 1640mechanism will warn far enough from the stack limit to allow for modest 1641use of alloca). (@code{do_makeunique} in file @file{src/main/unique.c} 1642provides an example of both points.) 1643 1644There is an alternative check, 1645@findex R_CheckStack2 1646@example 1647 R_CheckStack2(size_t extra); 1648@end example 1649 1650@noindent 1651to be called immediately @emph{before} trying an allocation of 1652@code{extra} bytes. 1653 1654An alternative strategy has been used for various functions which 1655require intermediate blocks of storage of varying but usually small 1656size, and this has been consolidated into the routines in the header 1657file @file{src/main/RBufferUtils.h}. This uses a structure which 1658contains a buffer, the current size and the default size. A call to 1659@findex R_AllocStringBuffer 1660@example 1661 R_AllocStringBuffer(size_t blen, R_StringBuffer *buf); 1662@end example 1663 1664@noindent 1665sets @code{buf->data} to a memory area of at least @code{blen+1} bytes. 1666At least the default size is used, which means that for small 1667allocations the same buffer can be reused. A call to 1668@findex R_FreeStringBufferL 1669@findex R_FreeStringBuffer 1670@code{R_FreeStringBufferL} releases memory if more than the default has 1671been allocated whereas a call to @code{R_FreeStringBuffer} frees any 1672memory allocated. 1673 1674The @code{R_StringBuffer} structure needs to be initialized, for example by 1675 1676@example 1677static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@}; 1678@end example 1679 1680@noindent 1681which uses a default size of @code{MAXELTSIZE = 8192} bytes. Most 1682current uses have a static @code{R_StringBuffer} structure, which 1683allows the (default-sized) buffer to be shared between calls to e.g.@: 1684@code{grep} and even between functions: this will need to be changed if 1685@R{} ever allows concurrent evaluation threads. So the idiom is 1686 1687@example 1688static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@}; 1689... 1690 char *buf; 1691 for(i = 0; i < n; i++) @{ 1692 compute len 1693 buf = R_AllocStringBuffer(len, &ex_buff); 1694 use buf 1695 @} 1696 /* free allocation if larger than the default, but leave 1697 default allocated for future use */ 1698 R_FreeStringBufferL(&ex_buff); 1699@end example 1700 1701 1702@menu 1703* Internals of R_alloc:: 1704@end menu 1705 1706@node Internals of R_alloc, , Memory allocators, Memory allocators 1707@subsection Internals of R_alloc 1708 1709The memory used by @code{R_alloc} is allocated as @R{} vectors, of type 1710@code{RAWSXP}. Thus the allocation is in units of 8 bytes, and is 1711rounded up. A request for zero bytes currently returns @code{NULL} (but 1712this should not be relied on). For historical reasons, in all other 1713cases 1 byte is added before rounding up so the allocation is always 17141--8 bytes more than was asked for: again this should not be relied on. 1715 1716The vectors allocated are protected via the setting of @code{R_VStack}, 1717as the garbage collector marks everything that can be reached from that 1718location. When a vector is @code{R_alloc}ated, its @code{ATTRIB} 1719pointer is set to the current @code{R_VStack}, and @code{R_VStack} is 1720set to the latest allocation. Thus @code{R_VStack} is a single-linked 1721chain of the vectors currently allocated via @code{R_alloc}. Function 1722@code{vmaxset} resets the location @code{R_VStack}, and should be to a 1723value that has previously be obtained @emph{via} @code{vmaxget}: 1724allocations after the value was obtained will no longer be protected and 1725hence available for garbage collection. 1726 1727@node Internal use of global and base environments, Modules, Memory allocators, R Internal Structures 1728@section Internal use of global and base environments 1729 1730This section notes known use by the system of these environments: the 1731intention is to minimize or eliminate such uses. 1732 1733@menu 1734* Base environment:: 1735* Global environment:: 1736@end menu 1737 1738@node Base environment, Global environment, Internal use of global and base environments, Internal use of global and base environments 1739@subsection Base environment 1740 1741@cindex base environment 1742@cindex environment, base 1743@findex .Device 1744@findex .Devices 1745The graphics devices system maintains two variables @code{.Device} and 1746@code{.Devices} in the base environment: both are always set. The 1747variable @code{.Devices} gives a list of character vectors of the names 1748of open devices, and @code{.Device} is the element corresponding to the 1749currently active device. The null device will always be open. 1750 1751@findex .Options 1752There appears to be a variable @code{.Options}, a pairlist giving the 1753current options settings. But in fact this is just a symbol with a 1754value assigned, and so shows up as a base variable. 1755 1756@findex .Last.value 1757Similarly, the evaluator creates a symbol @code{.Last.value} which 1758appears as a variable in the base environment. 1759 1760@findex .Traceback 1761@findex last.warning 1762Errors can give rise to objects @code{.Traceback} and 1763@code{last.warning} in the base environment. 1764 1765@node Global environment, , Base environment, Internal use of global and base environments 1766@subsection Global environment 1767 1768@cindex global environment 1769@cindex environment, global 1770@findex .Random.seed 1771The seed for the random number generator is stored in object 1772@code{.Random.seed} in the global environment. 1773 1774@findex dump.frames 1775Some error handlers may give rise to objects in the global environment: 1776for example @code{dump.frames} by default produces @code{last.dump}. 1777 1778@findex .SavedPlots 1779The @code{windows()} device makes use of a variable @code{.SavedPlots} 1780to store display lists of saved plots for later display. This is 1781regarded as a variable created by the user. 1782 1783 1784@node Modules, Visibility, Internal use of global and base environments, R Internal Structures 1785@section Modules 1786 1787@cindex modules 1788@R{} makes use of a number of shared objects/DLLs stored in the 1789@file{modules} directory. These are parts of the code which have been 1790chosen to be loaded `on demand' rather than linked as dynamic libraries 1791or incorporated into the main executable/dynamic library. 1792 1793For the remaining modules the motivation has been the amount of (often 1794optional) code they will bring in @emph{via} libraries to which they are 1795linked. 1796 1797@table @asis 1798 1799@item @code{internet} 1800The internal HTTP and FTP clients and socket support, which link to 1801system-specific support libraries. This may load @code{libcurl} and on 1802Windows will load @file{wininet.dll} and @file{ws2_32.dll}. 1803 1804@item @code{lapack} 1805The code which makes use of the LAPACK library, and is linked to 1806@file{libRlapack} or an external LAPACK library. 1807 1808@item @code{X11} 1809(Unix-alikes only.) The @code{X11()}, @code{jpeg()}, @code{png()} and 1810@code{tiff()} devices. These are optional, and links to some or all of 1811the @code{X11}, @code{pango}, @code{cairo}, @code{jpeg}, @code{libpng} 1812and @code{libtiff} libraries. 1813@end table 1814 1815@node Visibility, Lazy loading, Modules, R Internal Structures 1816@section Visibility 1817@cindex visibility 1818 1819@menu 1820* Hiding C entry points:: 1821* Variables in Windows DLLs:: 1822@end menu 1823 1824@node Hiding C entry points, Variables in Windows DLLs, Visibility, Visibility 1825@subsection Hiding C entry points 1826 1827We make use of the visibility mechanisms discussed in 1828@ifset UseExternalXrefs 1829@ref{Controlling visibility, , Controlling visibility, R-exts, Writing R Extensions}, 1830@end ifset 1831@ifclear UseExternalXrefs 1832section `Controlling Visibility' in `Writing R Extensions', 1833@end ifclear 1834C entry points not needed outside the main @R{} executable/dynamic 1835library (and in particular in no package nor module) should be prefixed 1836by @code{attribute_hidden}. 1837@findex attribute_hidden 1838Minimizing the visibility of symbols in the @R{} dynamic library will 1839speed up linking to it (which packages will do) and reduce the 1840possibility of linking to the wrong entry points of the same name. In 1841addition, on some platforms reducing the number of entry points allows 1842more efficient versions of PIC to be used: somewhat over half the entry 1843points are hidden. A convenient way to hide variables (as distinct from 1844functions) is to declare them @code{extern0} in header file @file{Defn.h}. 1845 1846The visibility mechanism used is only available with some compilers and 1847platforms, and in particular not on Windows, where an alternative 1848mechanism is used. Entry points will not be made available in 1849@file{R.dll} if they are listed in the file 1850@file{src/gnuwin32/Rdll.hide}. 1851@findex Rdll.hide 1852Entries in that file start with a space and must be strictly in 1853alphabetic order in the C locale (use @command{sort} on the file to 1854ensure this if you change it). It is possible to hide Fortran as well 1855as C entry points via this file: the former are lower-cased and have an 1856underline as suffix, and the suffixed name should be included in the 1857file. Some entry points exist only on Windows or need to be visible 1858only on Windows, and some notes on these are provided in file 1859@file{src/gnuwin32/Maintainters.notes}. 1860 1861Because of the advantages of reducing the number of visible entry 1862points, they should be declared @code{attribute_hidden} where possible. 1863Note that this only has an effect on a shared-R-library build, and so 1864care is needed not to hide entry points that are legitimately used by 1865packages. So it is best if the decision on visibility is made when a 1866new entry point is created, including the decision if it should be 1867included in header file @file{Rinternals.h}. A list of the visible 1868entry points on shared-R-library build on a reasonably standard 1869Unix-alike can be made by something like 1870 1871@example 1872nm -g libR.so | grep ' [BCDT] ' | cut -b20- 1873@end example 1874 1875@node Variables in Windows DLLs, , Hiding C entry points, Visibility 1876@subsection Variables in Windows DLLs 1877 1878Windows is unique in that it conventionally treats importing variables 1879differently from functions: variables that are imported from a DLL need 1880to be specified by a prefix (often @samp{_imp_}) when being linked to 1881(`imported') but not when being linked from (`exported'). The details 1882depend on the compiler system, and have changed for MinGW during the 1883lifetime of that port. They are in the main hidden behind some macros 1884defined in header file @file{R_ext/libextern.h}. 1885 1886A (non-function) variable in the main @R{} sources that needs to be 1887referred to outside @file{R.dll} (in a package, module or another DLL 1888such as @file{Rgraphapp.dll}) should be declared with prefix 1889@code{LibExtern}. The main use is in @file{Rinternals.h}, but it needs 1890to be considered for any public header and also @file{Defn.h}. 1891 1892It would nowadays be possible to make use of the `auto-import' feature 1893of the MinGW port of @command{ld} to fix up imports from DLLs (and if 1894@R{} is built for the Cygwin platform this is what happens). However, 1895this was not possible when the MinGW build of @R{} was first constructed 1896in ca 1998, allows less control of visibility and would not work for 1897other Windows compiler suites. 1898 1899It is only possible to check if this has been handled correctly by 1900compiling the @R{} sources on Windows. 1901 1902@node Lazy loading, , Visibility, R Internal Structures 1903@section Lazy loading 1904 1905Lazy loading is always used for code in packages but is optional 1906(selected by the package maintainer) for datasets in packages. When a 1907package/namespace which uses it is loaded, the package/namespace 1908environment is populated with promises for all the named objects: when 1909these promises are evaluated they load the actual code from a database. 1910 1911There are separate databases for code and data, stored in the @file{R} 1912and @file{data} subdirectories. The database consists of two files, 1913@file{@var{name}.rdb} and @file{@var{name}.rdx}. The @file{.rdb} file 1914is a concatenation of serialized objects, and the @file{.rdx} file 1915contains an index. The objects are stored in (usually) a 1916@command{gzip}-compressed format with a 4-byte header giving the 1917uncompressed serialized length (in XDR, that is big-endian, byte order) 1918and read by a call to the primitive @code{lazyLoadDBfetch}. (Note that 1919this makes lazy-loading unsuitable for really large objects: the 1920unserialized length of an @R{} object can exceed 4GB.) 1921 1922The index or `map' file @file{@var{name}.rdx} is a compressed serialized 1923@R{} object to be read by @code{readRDS}. It is a list with three 1924elements @code{variables}, @code{references} and @code{compressed}. The 1925first two are named lists of integer vectors of length 2 giving the 1926offset and length of the serialized object in the @file{@var{name}.rdb} 1927file. Element @code{variables} has an entry for each named object: 1928@code{references} serializes a temporary environment used when named 1929environments are added to the database. @code{compressed} is a logical 1930indicating if the serialized objects were compressed: compression is 1931always used nowadays. We later added the values @code{compressed = 2} 1932and @code{3} for @command{bzip2} and @command{xz} compression (with the 1933possibility of future expansion to other methods): these formats add a 1934fifth byte to the header for the type of compression, and store 1935serialized objects uncompressed if compression expands them. 1936 1937Source references are treated specially for performance reasons: bindings 1938@code{lines} and @code{parseData} from @code{srcfile} environments are 1939loaded lazily. This uses a mechanism that allows loading selected bindings 1940from an environment lazily. The key for such environment is a list with two 1941elements: @code{eagerKey} gives the length-two integer key for the bindings 1942loaded eagerly and @code{lazyKeys} gives a vector of length-two integer 1943keys, one for each lazily loaded binding. 1944 1945The loader for a lazy-load database of code or data is function 1946@code{lazyLoad} in the @pkg{base} package, but note that there is a 1947separate copy to load @pkg{base} itself in file 1948@file{R_HOME/base/R/base}. 1949 1950Lazy-load databases are created by the code in 1951@file{src/library/tools/R/makeLazyLoad.R}: the main tool is the 1952unexported function @code{makeLazyLoadDB} and the insertion of database 1953entries is done by calls to @code{.Call("R_lazyLoadDBinsertValue", 1954...)}. 1955 1956Lazy-load databases of less than 10MB are cached in memory at first use: 1957this was found necessary when using file systems with high latency 1958(removable devices and network-mounted file systems on Windows). 1959 1960Lazy-load databases are loaded into the exports for a package, but not 1961into the namespace environment itself. Thus they are visible when the 1962package is @emph{attached}, and also @emph{via} the @code{::} operator. 1963This was a deliberate design decision, as packages mostly make datasets 1964available for use by the end user (or other packages), and they should 1965not be found preferentially from functions in the package, surprising 1966users who expected the normal search path to be used. (There is an 1967alternative mechanism, @file{sysdata.rda}, for `system datasets' that 1968are intended primarily to be used within the package.) 1969 1970The same database mechanism is used to store parsed @file{Rd} files. 1971One or all of the parsed objects is fetched by a call to 1972@code{tools:::fetchRdDB}. 1973 1974@node .Internal vs .Primitive, Internationalization in the R sources, R Internal Structures, Top 1975@chapter @code{.Internal} vs @code{.Primitive} 1976 1977@findex .Internal 1978@findex .Primitive 1979C code compiled into @R{} at build time can be called directly in what 1980are termed @emph{primitives} or via the @code{.Internal} interface, 1981which is very similar to the @code{.External} interface except in 1982syntax. More precisely, @R{} maintains a table of @R{} function names and 1983corresponding C functions to call, which by convention all start with 1984@samp{do_} and return a @code{SEXP}. This table (@code{R_FunTab} in 1985file @file{src/main/names.c}) also specifies how many arguments to a 1986function are required or allowed, whether or not the arguments are to be 1987evaluated before calling, and whether the function is `internal' in 1988the sense that it must be accessed via the @code{.Internal} interface, 1989or directly accessible in which case it is printed in @R{} as 1990@code{.Primitive}. 1991 1992Functions using @code{.Internal()} wrapped in a closure are in general 1993preferred as this ensures standard handling of named and default 1994arguments. For example, @code{grep} is defined as 1995 1996@example 1997@group 1998grep <- 1999function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE, 2000 fixed = FALSE, useBytes = FALSE, invert = FALSE) 2001@{ 2002 if (!is.character(x)) x <- structure(as.character(x), names = names(x)) 2003 .Internal(grep(as.character(pattern), x, ignore.case, value, 2004 perl, fixed, useBytes, invert)) 2005@} 2006 2007@end group 2008@end example 2009@noindent 2010and the use of @code{as.character} allows methods to be dispatched (for 2011example, for factors). 2012 2013However, for reasons of convenience and also efficiency (as there is 2014some overhead in using the @code{.Internal} interface wrapped in a 2015function closure), the primitive functions are exceptions that can be 2016accessed directly. And of course, primitive functions are needed for 2017basic operations---for example @code{.Internal} is itself a primitive. 2018Note that primitive functions make no use of @R{} code, and hence are 2019very different from the usual interpreted functions. In particular, 2020@code{formals} and @code{body} return @code{NULL} for such objects, and 2021argument matching can be handled differently. For some primitives 2022(including @code{call}, @code{switch}, @code{.C} and @code{.subset}) 2023positional matching is important to avoid partial matching of the first 2024argument. 2025 2026The list of primitive functions is subject to change; currently, it 2027includes the following. 2028 2029@enumerate 2030 2031@item 2032``Special functions'' which really are @emph{language} elements, but 2033implemented as primitive functions: 2034 2035@example 2036@group 2037@{ ( if for while repeat break next 2038return function quote switch 2039@end group 2040@end example 2041 2042@item 2043Language elements and basic @emph{operator}s (i.e., functions usually 2044@emph{not} called as @code{foo(a, b, ...)}) for subsetting, assignment, 2045arithmetic, comparison and logic: 2046 2047@example 2048@group 2049 [ [[ $ @@ 2050<- <<- = [<- [[<- $<- @@<- 2051 2052+ - * / ^ %% %*% %/% 2053< <= == != >= > 2054| || & && ! 2055@end group 2056@end example 2057 2058@noindent 2059When the arithmetic, comparison and logical operators are called as 2060functions, any argument names are discarded so positional matching is used. 2061 2062@item 2063``Low level'' 0-- and 1--argument functions which belong to one of the 2064following groups of functions: 2065 2066@enumerate a 2067@item 2068Basic mathematical functions with a single argument, i.e., 2069 2070@example 2071@group 2072abs sign sqrt 2073floor ceiling 2074@end group 2075 2076@group 2077exp expm1 2078log2 log10 log1p 2079cos sin tan 2080acos asin atan 2081cosh sinh tanh 2082acosh asinh atanh 2083cospi sinpi tanpi 2084@end group 2085 2086@group 2087gamma lgamma digamma trigamma 2088@end group 2089 2090@group 2091cumsum cumprod cummax cummin 2092@end group 2093 2094@group 2095Im Re Arg Conj Mod 2096@end group 2097@end example 2098 2099@code{log} is a primitive function of one or two arguments with named 2100argument matching. 2101 2102@code{trunc} is a difficult case: it is a primitive that can have one 2103or more arguments: the default method handled in the primitive has 2104only one. 2105 2106@item 2107Functions rarely used outside of ``programming'' (i.e., mostly used 2108inside other functions), such as 2109 2110@example 2111@group 2112nargs missing on.exit interactive 2113as.call as.character as.complex as.double 2114as.environment as.integer as.logical as.raw 2115is.array is.atomic is.call is.character 2116is.complex is.double is.environment is.expression 2117is.finite is.function is.infinite is.integer 2118is.language is.list is.logical is.matrix 2119is.na is.name is.nan is.null 2120is.numeric is.object is.pairlist is.raw 2121is.real is.recursive is.single is.symbol 2122baseenv emptyenv globalenv pos.to.env 2123unclass invisible seq_along seq_len 2124@end group 2125@end example 2126 2127@item 2128The programming and session management utilities 2129 2130@example 2131@group 2132browser proc.time gc.time tracemem retracemem untracemem 2133@end group 2134@end example 2135 2136@end enumerate 2137 2138@item 2139The following basic replacement and extractor functions 2140 2141@example 2142@group 2143length length<- 2144class class<- 2145oldClass oldClass<- 2146attr attr<- 2147attributes attributes<- 2148names names<- 2149dim dim<- 2150dimnames dimnames<- 2151 environment<- 2152 levels<- 2153 storage.mode<- 2154@end group 2155@end example 2156 2157@findex NAMED 2158@noindent 2159Note that optimizing @code{NAMED = 1} is only effective within a 2160primitive (as the closure wrapper of a @code{.Internal} will set 2161@code{NAMED = NAMEDMAX} when the promise to the argument is evaluated) and 2162hence replacement functions should where possible be primitive to avoid 2163copying (at least in their default methods). 2164[The @code{NAMED} mechanism has been replaced by reference counting.] 2165 2166@item 2167The following functions are primitive for efficiency reasons: 2168 2169@example 2170@group 2171: ~ c list 2172call expression substitute 2173UseMethod standardGeneric 2174.C .Fortran .Call .External 2175round signif rep seq.int 2176@end group 2177@end example 2178 2179@noindent 2180as well as the following internal-use-only functions 2181 2182@example 2183@group 2184.Primitive .Internal 2185.Call.graphics .External.graphics 2186.subset .subset2 2187.primTrace .primUntrace 2188lazyLoadDBfetch 2189@end group 2190@end example 2191 2192@end enumerate 2193 2194 2195The multi-argument primitives 2196@example 2197@group 2198call switch 2199.C .Fortran .Call .External 2200@end group 2201@end example 2202 2203@noindent 2204intentionally use positional matching, and need to do so to avoid 2205partial matching to their first argument. They do check that the first 2206argument is unnamed or for the first two, partially matches the formal 2207argument name. On the other hand, 2208 2209@example 2210@group 2211attr attr<- browser rememtrace substitute UseMethod 2212log round signif rep seq.int 2213@end group 2214@end example 2215 2216@noindent 2217manage their own argument matching and do work in the standard way. 2218 2219All the one-argument primitives check that if they are called with a 2220named argument that this (partially) matches the name given in the 2221documentation: this is also done for replacement functions with one 2222argument plus @code{value}. 2223 2224The net effect is that argument matching for primitives intended for 2225end-user use @emph{as functions} is done in the same way as for 2226interpreted functions except for the six exceptions where positional 2227matching is required. 2228 2229@menu 2230* Special primitives:: 2231* Special internals:: 2232* Prototypes for primitives:: 2233* Adding a primitive:: 2234@end menu 2235 2236@node Special primitives, Special internals, .Internal vs .Primitive, .Internal vs .Primitive 2237@section Special primitives 2238 2239A small number of primitives are @emph{specials} rather than 2240@emph{builtins}, that is they are entered with unevaluated arguments. 2241This is clearly necessary for the language constructs and the assignment 2242operators, as well as for @code{&&} and @code{||} which conditionally 2243evaluate their second argument, and @code{~}, @code{.Internal}, 2244@code{call}, @code{expression}, @code{missing}, @code{on.exit}, 2245@code{quote} and @code{substitute} which do not evaluate some of their 2246arguments. 2247 2248@code{rep} and @code{seq.int} are special as they evaluate some of their 2249arguments conditional on which are non-missing. 2250 2251@code{log}, @code{round} and @code{signif} are special to allow default 2252values to be given to missing arguments. 2253 2254The subsetting, subassignment and @code{@@} operators are all special. 2255(For both extraction and replacement forms, @code{$} and @code{@@} 2256take a symbol argument, and @code{[} and @code{[[} allow missing 2257arguments.) 2258 2259@code{UseMethod} is special to avoid the additional contexts added to 2260calls to builtins. 2261 2262@node Special internals, Prototypes for primitives, Special primitives, .Internal vs .Primitive 2263@section Special internals 2264 2265There are also special @code{.Internal} functions: @code{NextMethod}, 2266@code{Recall}, @code{withVisible}, @code{cbind}, @code{rbind} (to allow 2267for the @code{deparse.level} argument), @code{eapply}, @code{lapply} and 2268@code{vapply}. 2269 2270@node Prototypes for primitives, Adding a primitive, Special internals, .Internal vs .Primitive 2271@section Prototypes for primitives 2272 2273Prototypes are available for the primitive functions and operators, and 2274these are used for printing, @code{args} and package checking (e.g.@: by 2275@code{tools::checkS3methods} and by package @CRANpkg{codetools}). There are 2276two environments in the @pkg{base} package (and namespace), 2277@samp{.GenericArgsEnv} for those primitives which are internal S3 2278generics, and @samp{.ArgsEnv} for the rest. Those environments contain 2279closures with the same names as the primitives, formal arguments derived 2280(manually) from the help pages, a body which is a suitable call to 2281@code{UseMethod} or @code{NULL} and environment the base namespace. 2282 2283The C code for @code{print.default} and @code{args} uses the closures in 2284these environments in preference to the definitions in base (as 2285primitives). 2286 2287The QC function @code{undoc} checks that all the functions prototyped in 2288these environments are currently primitive, and that the primitives not 2289included are better thought of as language elements (at the time of 2290writing 2291 2292@example 2293$ $<- && ( : @@ @@<- [ [[ [[<- [<- @{ || ~ <- <<- = 2294break for function if next repeat return while 2295@end example 2296 2297@noindent 2298). One could argue about @code{~}, but it is known to the parser and has 2299semantics quite unlike a normal function. And @code{:} is documented 2300with different argument names in its two meanings. 2301 2302The QC functions @code{codoc} and @code{checkS3methods} also make use of 2303these environments (effectively placing them in front of base in the 2304search path), and hence the formals of the functions they contain are 2305checked against the help pages by @code{codoc}. However, there are two 2306problems with the generic primitives. The first is that many of the 2307operators are part of the S3 group generic @code{Ops} and that defines 2308their arguments to be @code{e1} and @code{e2}: although it would be very 2309unusual, an operator could be called as e.g.@: @code{"+"(e1=a, e2=b)} 2310and if method dispatch occurred to a closure, there would be an argument 2311name mismatch. So the definitions in environment @code{.GenericArgsEnv} 2312have to use argument names @code{e1} and @code{e2} even though the 2313traditional documentation is in terms of @code{x} and @code{y}: 2314@code{codoc} makes the appropriate adjustment via 2315@code{tools:::.make_S3_primitive_generic_env}. The second discrepancy 2316is with the @code{Math} group generics, where the group generic is 2317defined with argument list @code{(x, ...)}, but most of the members only 2318allow one argument when used as the default method (and @code{round} and 2319@code{signif} allow two as default methods): again fix-ups are used. 2320 2321Those primitives which are in @code{.GenericArgsEnv} are checked (via 2322@file{tests/primitives.R}) to be generic @emph{via} defining methods for 2323them, and a check is made that the remaining primitives are probably not 2324generic, by setting a method and checking it is not dispatched to (but 2325this can fail for other reasons). However, there is no certain way to 2326know that if other @code{.Internal} or primitive functions are not 2327internally generic except by reading the source code. 2328 2329@node Adding a primitive, , Prototypes for primitives, .Internal vs .Primitive 2330@section Adding a primitive 2331 2332[For R-core use: reverse this procedure to remove a primitive. Most 2333commonly this is done by changing a @code{.Internal} to a primitive or 2334@emph{vice versa}.] 2335 2336Primitives are listed in the table @code{R_FunTab} in 2337@file{src/main/names.c}: primitives have @samp{Y = 0} in the @samp{eval} 2338field. 2339 2340There needs to be an @samp{\alias} entry in a help file in the @pkg{base} 2341package, and the primitive needs to be added to one of the lists at the 2342start of this section. 2343 2344Some primitives are regarded as language elements (the current ones are 2345listed above). These need to be added to two lists of exceptions, 2346@code{langElts} in @code{undoc()} (in file 2347@file{src/library/tools/R/QC.R}) and @code{lang_elements} in 2348@file{tests/primitives.R}. 2349 2350All other primitives are regarded as functions and should be listed in 2351one of the environments defined in @file{src/library/base/R/zzz.R}, 2352either @code{.ArgsEnv} or @code{.GenericArgsEnv}: internal generics also 2353need to be listed in the character vector @code{.S3PrimitiveGenerics}. 2354Note too the discussion about argument matching above: if you add a 2355primitive function with more than one argument by converting a 2356@code{.Internal} you need to add argument matching to the C code, and 2357for those with a single argument, add argument-name checking. 2358 2359Do ensure that @command{make check-devel} has been run: that tests most 2360of these requirements. 2361 2362@node Internationalization in the R sources, Package Structure, .Internal vs .Primitive, Top 2363@chapter Internationalization in the R sources 2364 2365The process of marking messages (errors, warnings etc) for translation 2366in an @R{} package is described in 2367@ifset UseExternalXrefs 2368@ref{Internationalization, , Internationalization, R-exts, Writing R Extensions}, 2369@end ifset 2370@ifclear UseExternalXrefs 2371`Writing R Extensions', 2372@end ifclear 2373and the standard packages included with @R{} have (with an exception in 2374@pkg{grDevices} for the menus of the @code{windows()} device) been 2375internationalized in the same way as other packages. 2376 2377@menu 2378* R code:: 2379* Main C code:: 2380* Windows-GUI-specific code:: 2381* macOS GUI:: 2382* Updating:: 2383@end menu 2384 2385@node R code, Main C code, Internationalization in the R sources, Internationalization in the R sources 2386@section R code 2387 2388Internationalization for @R{} code is done in exactly the same way as 2389for extension packages. As all standard packages which have @R{} code 2390also have a namespace, it is never necessary to specify @code{domain}, 2391but for efficiency calls to @code{message}, @code{warning} and 2392@code{stop} should include @code{domain = NA} when the message is 2393constructed @emph{via} @code{gettextf}, @code{gettext} or 2394@code{ngettext}. 2395 2396For each package, the extracted messages and translation sources are 2397stored under package directory @file{po} in the source package, and 2398compiled translations under @file{inst/po} for installation to package 2399directory @file{po} in the installed package. This also applies to C 2400code in packages. 2401 2402@node Main C code, Windows-GUI-specific code, R code, Internationalization in the R sources 2403@section Main C code 2404 2405The main C code (e.g.@: that in files @file{src/*/*.c} and in 2406the modules) is where @R{} is closest to the sort of application for 2407which @samp{gettext} was written. Messages in the main C code are in 2408domain @code{R} and stored in the top-level directory @file{po} with 2409compiled translations under @file{share/locale}. 2410 2411The list of files covered by the @R{} domain is specified in file 2412@file{po/POTFILES.in}. 2413 2414The normal way to mark messages for translation is via @code{_("msg")} 2415just as for packages. However, sometimes one needs to mark passages for 2416translation without wanting them translated at the time, for example 2417when declaring string constants. This is the purpose of the @code{N_} 2418macro, for example 2419 2420@example 2421@{ ERROR_ARGTYPE, N_("invalid argument type")@}, 2422@end example 2423 2424@noindent 2425from file @file{src/main/errors.c}. 2426 2427The @code{P_} macro 2428 2429@example 2430#ifdef ENABLE_NLS 2431#define P_(StringS, StringP, N) ngettext (StringS, StringP, N) 2432#else 2433#define P_(StringS, StringP, N) (N > 1 ? StringP: StringS) 2434#endif 2435@end example 2436 2437@noindent 2438may be used 2439as a wrapper for @code{ngettext}: however in some cases the preferred 2440approach has been to conditionalize (on @code{ENABLE_NLS}) code using 2441@code{ngettext}. 2442 2443The macro @code{_("msg")} can safely be used in directory 2444@file{src/appl}; the header for standalone @samp{nmath} skips possible 2445translation. (This does not apply to @code{N_} or @code{P_}). 2446 2447 2448@node Windows-GUI-specific code, macOS GUI, Main C code, Internationalization in the R sources 2449@section Windows-GUI-specific code 2450 2451Messages for the Windows GUI are in a separate domain @samp{RGui}. This 2452was done for two reasons: 2453 2454@itemize 2455@item 2456The translators for the Windows version of @R{} might be separate from 2457those for the rest of @R{} (familiarity with the GUI helps), and 2458 2459@item 2460Messages for Windows are most naturally handled in the native charset 2461for the language, and in the case of CJK languages the charset is 2462Windows-specific. (It transpires that as the @code{iconv} we ported 2463works well under Windows, this is less important than anticipated.) 2464@end itemize 2465 2466Messages for the @samp{RGui} domain are marked by @code{G_("msg")}, a 2467macro that is defined in header file @file{src/gnuwin32/win-nls.h}. The 2468list of files that are considered is hardcoded in the 2469@code{RGui.pot-update} target of file @file{po/Makefile.in.in}: note 2470that this includes @file{devWindows.c} as the menus on the 2471@code{windows} device are considered to be part of the GUI. (There is 2472also @code{GN_("msg")}, the analogue of @code{N_("msg")}.) 2473 2474The template and message catalogs for the @samp{RGui} domain are in the 2475top-level @file{po} directory. 2476 2477 2478@node macOS GUI, Updating, Windows-GUI-specific code, Internationalization in the R sources 2479@section macOS GUI 2480 2481This is handled separately: see 2482@uref{https://developer.r-project.org/Translations30.html}. 2483 2484 2485@node Updating, , macOS GUI, Internationalization in the R sources 2486@section Updating 2487 2488See file @file{po/README} for how to update the message templates and catalogs. 2489 2490@node Package Structure, Files, Internationalization in the R sources, Top 2491@chapter Structure of an Installed Package 2492 2493@menu 2494* Metadata:: 2495* Help:: 2496@end menu 2497 2498The structure of a @emph{source} packages is described in @ref{Creating 2499R packages, , Creating R packages, R-exts, Writing R Extensions}: this 2500chapter is concerned with the structure of @emph{installed} packages. 2501 2502An installed package has a top-level file @file{DESCRIPTION}, a copy of 2503the file of that name in the package sources with a @samp{Built} field 2504appended, and file @file{INDEX}, usually describing the objects on which 2505help is available, a file @file{NAMESPACE} if the package has a name 2506space, optional files such as @file{CITATION}, @file{LICENCE} and 2507@file{NEWS}, and any other files copied in from @file{inst}. It will 2508have directories @file{Meta}, @file{help} and @file{html} (even if the 2509package has no help pages), almost always has a directory @file{R} and 2510often has a directory @file{libs} to contain compiled code. Other 2511directories with known meaning to @R{} are @file{data}, @file{demo}, 2512@file{doc} and @file{po}. 2513 2514Function @code{library} looks for a namespace and if one is found 2515passes control to @code{loadNamespace}. Then @code{library} or 2516@code{loadNamespace} looks for file @file{R/@var{pkgname}}, warns if it 2517is not found and otherwise sources the code (using @code{sys.source}) 2518into the package's environment, then lazy-loads a database 2519@file{R/sysdata} if present. So how @R{} code gets loaded depends on 2520the contents of @file{R/@var{pkgname}}: a standard template to load 2521lazy-load databases are provided in @file{share/R/nspackloader.R}. 2522 2523Compiled code is usually loaded when the package's namespace is loaded 2524by a @code{useDynlib} directive in a @file{NAMESPACE} file or by the 2525package's @code{.onLoad} function. Conventionally compiled code is 2526loaded by a call to @code{library.dynam} and this looks in directory 2527@file{libs} (and in an appropriate sub-directory if sub-architectures 2528are in use) for a shared object (Unix-alike) or DLL (Windows). 2529 2530Subdirectory @file{data} serves two purposes. In a package using 2531lazy-loading of data, it contains a lazy-load database @file{Rdata}, 2532plus a file @file{Rdata.rds} which contain a named character vector used 2533by @code{data()} in the (unusual) event that it is used for such a 2534package. Otherwise it is a copy of the @file{data} directory in the 2535sources, with saved images re-compressed if @command{R CMD INSTALL 2536--resave-data} was used. 2537 2538Subdirectory @file{demo} supports the @code{demo} function, and is 2539copied from the sources. 2540 2541Subdirectory @file{po} contains (in subdirectories) compiled message 2542catalogs. 2543 2544@node Metadata, Help, Package Structure, Package Structure 2545@section Metadata 2546 2547Directory @file{Meta} contains several files in @code{.rds} format, that 2548is serialized @R{} objects written by @code{saveRDS}. All packages 2549have files @file{Rd.rds}, @file{hsearch.rds}, @file{links.rds}, 2550@file{features.rds}, and 2551@file{package.rds}. Packages with namespaces have a file 2552@file{nsInfo.rds}, and those with data, demos or vignettes have 2553@file{data.rds}, @file{demo.rds} or @file{vignette.rds} files. 2554 2555The structure of these files (and their existence and names) is private 2556to @R{}, so the description here is for those trying to follow the @R{} 2557sources: there should be no reference to these files in non-base 2558packages. 2559 2560File @file{package.rds} is a dump of information extracted from the 2561@file{DESCRIPTION} file. It is a list of several components. The 2562first, @samp{DESCRIPTION}, is a character vector, the @file{DESCRIPTION} 2563file as read by @code{read.dcf}. Further elements @samp{Depends}, 2564@samp{Suggests}, @samp{Imports}, @samp{Rdepends} and @samp{Rdepends2} 2565record the @samp{Depends}, @samp{Suggests} and @samp{Imports} fields. 2566These are all lists, and can be empty. The first three have an entry 2567for each package named, each entry being a list of length 1 or 3, which 2568element @samp{name} (the package name) and optional elements @samp{op} 2569(a character string) and @samp{version} (an object of class 2570@samp{"package_version"}). Element @samp{Rdepends} is used for the 2571first version dependency on @R{}, and @samp{Rdepends2} is a list of zero 2572or more @R{} version dependencies---each is a three-element list of the 2573form described for packages. Element @samp{Rdepends} is no longer used, 2574but it is still potentially needed so @R{} < 2.7.0 can detect that the 2575package was not installed for it. 2576 2577File @file{nsInfo.rds} records a list, a parsed version of the 2578@file{NAMESPACE} file. 2579 2580File @file{Rd.rds} records a data frame with one row for each help file. 2581The columns are @samp{File} (the file name with extension), @samp{Name} 2582(the @samp{\name} section), @samp{Type} (from the optional 2583@samp{\docType} section), @samp{Title}, @samp{Encoding}, @samp{Aliases}, 2584@samp{Concepts} and @samp{Keywords}. All columns are character vectors 2585apart from @samp{Aliases}, which is a list of character vectors. 2586 2587File @file{hsearch.rds} records the information to be used by 2588@samp{help.search}. This is a list of four unnamed elements which are 2589character matrices for help files, aliases, keywords and concepts. All 2590the matrices have columns @samp{ID} and @samp{Package} which are used to 2591tie the aliases, keywords and concepts (the remaining column of the last 2592three elements) to a particular help file. The first element has 2593further columns @samp{LibPath} (stored as @code{""} and filled in what 2594the file is loaded), @samp{name}, @samp{title}, @samp{topic} (the first 2595alias, used when presenting the results as 2596@samp{@var{pkgname}::@var{topic}}) and @samp{Encoding}. 2597 2598File @file{links.rds} records a named character vector, the names being 2599aliases and the values character strings of the form 2600@example 2601"../../@var{pkgname}/html/@var{filename}.html" 2602@end example 2603 2604File @file{data.rds} records a two-column character matrix with columns 2605of dataset names and titles from the corresponding help file. File 2606@file{demo.rds} has the same structure for package demos. 2607 2608File @file{vignette.rds} records a data frame with one row for each 2609`vignette' (@file{.[RS]nw} file in @file{inst/doc}) and with columns 2610@samp{File} (the full file path in the sources), @samp{Title}, 2611@samp{PDF} (the pathless file name of the installed PDF version, if 2612present), @samp{Depends}, @samp{Keywords} and @samp{R} (the pathless 2613file name of the installed @R{} code, if present). 2614 2615 2616@node Help, , Metadata, Package Structure 2617@section Help 2618 2619All installed packages, whether they had any @file{.Rd} files or not, 2620have @file{help} and @file{html} directories. The latter normally only 2621contains the single file @file{00Index.html}, the package index which 2622has hyperlinks to the help topics (if any). 2623 2624Directory @file{help} contains files @file{AnIndex}, @file{paths.rds} 2625and @file{@var{pkgname}.rd[bx]}. The latter two files are a lazy-load 2626database of parsed @file{.Rd} files, accessed by 2627@code{tools:::fetchRdDB}. File @file{paths.rds} is a saved character 2628vector of the original path names of the @file{.Rd} files, used when 2629updating the database. 2630 2631File @file{AnIndex} is a two-column tab-delimited file: the first column 2632contains the aliases defined in the help files and the second the 2633basename (without the @file{.Rd} or @file{.rd} extension) of the file 2634containing that alias. It is read by @code{utils:::index.search} to 2635search for files matching a topic (alias), and read by @code{scan} in 2636@code{utils:::matchAvailableTopics}, part of the completion system. 2637 2638File @file{aliases.rds} is the same information as @file{AnIndex} as a 2639named character vector (names the topics, values the file basename), for 2640faster access. 2641 2642@node Files, Graphics Devices, Package Structure, Top 2643@chapter Files 2644 2645@R{} provides many functions to work with files and directories: many of 2646these have been added relatively recently to facilitate scripting in 2647@R{} and in particular the replacement of Perl scripts by @R{} scripts 2648in the management of @R{} itself. 2649 2650These functions are implemented by standard C/POSIX library calls, 2651except on Windows. That means that filenames must be encoded in the 2652current locale as the OS provides no other means to access the file 2653system: increasingly filenames are stored in UTF-8 and the OS will 2654translate filenames to UTF-8 in other locales. So using a UTF-8 locale 2655gives transparent access to the whole file system. 2656 2657Windows is another story. There the internal view of filenames is in 2658UTF-16LE (so-called `Unicode'), and standard C library calls can only 2659access files whose names can be expressed in the current codepage. To 2660circumvent that restriction, there is a parallel set of Windows-specific 2661calls which take wide-character arguments for filepaths. Much of the 2662file-handling in @R{} has been moved over to using these functions, so 2663filenames can be manipulated in @R{} as UTF-8 encoded character strings, 2664converted to wide characters (which on Windows are UTF-16LE) and passed 2665to the OS. The utilities @code{RC_fopen} and @code{filenameToWchar} 2666help this process. Currently @code{file.copy} to a directory, 2667@code{list.files}, @code{list.dirs} and @code{path.expand} work only 2668with filepaths encoded in the current codepage. 2669 2670All these functions do tilde expansion, in the same way as 2671@code{path.expand}, with the deliberate exception of @code{Sys.glob}. 2672 2673File names may be case sensitive or not: the latter is the norm on 2674Windows and macOS, the former on other Unix-alikes. Note that this 2675is a property of both the OS and the file system: it is often possible 2676to map names to upper or lower case when mounting the file system. This 2677can affect the matching of patterns in @code{list.files} and 2678@code{Sys.glob}. 2679 2680File names commonly contain spaces on Windows and macOS but not 2681elsewhere. As file names are handled as character strings by @R{}, 2682spaces are not usually a concern unless file names are passed to other 2683process, e.g.@: by a @code{system} call. 2684 2685Windows has another couple of peculiarities. Whereas a POSIX file 2686system has a single root directory (and other physical file systems are 2687mounted onto logical directories under that root), Windows has separate 2688roots for each physical or logical file system (`volume'), organized 2689under @emph{drives} (with file paths starting @code{D:} for an 2690@acronym{ASCII} letter, case-insensitively) and @emph{network shares} 2691(with paths like @code{\netname\topdir\myfiles\a file}). There is a 2692current drive, and path names without a drive part are relative to the 2693current drive. Further, each drive has a current directory, and 2694relative paths are relative to that current directory, on a particular 2695drive if one is specified. So @file{D:dir\file} and @file{D:} are valid 2696path specifications (the last being the current directory on drive 2697@file{D:}). 2698 2699@c basename Wchar na 2700@c dir.create Wchar ~ 2701@c dirname Wchar ~ 2702@c getwd 2703@c file.access Wchar ~ 2704@c file.append RC_fopen 2705@c file.copy no ~ (+ file.append) 2706@c file.create RC_fopen 2707@c file.edit UTF-8 in R code 2708@c file.exists Wchar ~ 2709@c file.info Wchar ~ 2710@c file.link 8-bit ~ 2711@c file.remove Wchar ~ 2712@c file.rename Wchar ~ 2713@c file.show UTF-8 in R code 2714@c file.symlink not ~ 2715@c file_test 2716@c list.dirs no ~ 2717@c list.files no ~ 2718@c normalizePath Wchar ~ 2719@c path.expand no 2720@c setwd Wchar ~ 2721@c Sys.chmod Wchar ~ 2722@c Sys.glob Wchar not 2723@c Sys.readlink not ~ 2724@c Sys.umask 2725@c unlink Wchar ~ 2726 2727 2728@node Graphics Devices, GUI consoles, Files, Top 2729@chapter Graphics 2730 2731@R{}'s graphics internals were re-designed to enable multiple graphics 2732systems to be installed on top on the graphics `engine' -- currently 2733there are two such systems, one supporting `base' graphics (based on 2734that in S and whose @R{} code@footnote{The C code is in files 2735@file{base.c}, @file{graphics.c}, @file{par.c}, @file{plot.c} and 2736@file{plot3d.c} in directory @file{src/main}.} is in package 2737@pkg{graphics}) and one implemented in package @pkg{grid}. 2738 2739Some notes on the historical changes can be found at 2740@uref{https://www.stat.auckland.ac.nz/~paul/R/basegraph.html} and 2741@uref{https://www.stat.auckland.ac.nz/~paul/R/graphicsChanges.html}. 2742 2743At the lowest level is a graphics device, which manages a plotting 2744surface (a screen window or a representation to be written to a file). 2745This implements a set of graphics primitives, to `draw' 2746 2747@itemize 2748@item a circle, optionally filled 2749@item a rectangle, optionally filled 2750@item a line 2751@item a set of connected lines 2752@item a polygon, optionally filled 2753@item a paths, optionally filled using a winding rule 2754@item text 2755@item a raster image (optional) 2756@item and to set a clipping rectangle 2757@end itemize 2758 2759@noindent 2760as well as requests for information such as 2761 2762@itemize 2763@item the width of a string if plotted 2764@item the metrics (width, ascent, descent) of a single character 2765@item the current size of the plotting surface 2766@end itemize 2767 2768@noindent 2769and requests/opportunities to take action such as 2770 2771@itemize 2772@item start a new `page', possibly after responding to a request to ask 2773the user for confirmation. 2774@item return the position of the device pointer (if any). 2775@item when a device become the current device or stops being the current 2776device (this is usually used to change the window title on a screen 2777device). 2778@item when drawing starts or finishes (e.g.@: used to flush graphics to 2779the screen when drawing stops). 2780@item wait for an event, for example a mouse click or keypress. 2781@item an `onexit' action, to clean up if plotting is interrupted (by an 2782error or by the user). 2783@item capture the current contents of the device as a raster image. 2784@item close the device. 2785@end itemize 2786 2787The device also sets a number of variables, mainly Boolean flags 2788indicating its capabilities. Devices work entirely in `device units' 2789which are up to its developer: they can be in pixels, big points (1/72 2790inch), twips, @dots{}, and can differ@footnote{although that needs to be 2791handled carefully, as for example the @code{circle} callback is given a 2792radius (and that should be interpreted as in the x units).} in the 2793@samp{x} and @samp{y} directions. 2794 2795@c think of the engine as colors.c, devices.c, engine.c, plotmath.c, vfonts.c 2796The next layer up is the graphics `engine' that is the main interface to 2797the device (although the graphics subsystems do talk directly to 2798devices). This is responsible for clipping lines, rectangles and 2799polygons, converting the @code{pch} values @code{0...26} to sets of 2800lines/circles, centring (and otherwise adjusting) text, rendering 2801mathematical expressions (`plotmath') and mapping colour descriptions 2802such as names to the internal representation. 2803 2804@c graphics.c looks at device dimensions, locator, metricinfo 2805@c par.c looks at various device pars 2806@c plot3d.c looks at useRotatedTextInContour 2807@c grid looks at size, clipping, locator, ipr 2808 2809Another function of the engine is to manage display lists and snapshots. 2810Some but not all instances of graphics devices maintain display lists, a 2811`list' of operations that have been performed on the device to produce 2812the current plot (since the device was opened or the plot was last 2813cleared, e.g.@: by @code{plot.new}). Screen devices generally maintain 2814a display list to handle repaint and resize events whereas file-based 2815formats do not---display lists are also used to implement 2816@code{dev.copy()} and friends. The display list is a pairlist of 2817@code{.Internal} (base graphics) or @code{.Call.graphics} (grid 2818graphics) calls, which means that the C code implementing a graphics 2819operation will be re-called when the display list is replayed: apart 2820from the part which records the operation if successful. 2821 2822Snapshots of the current graphics state are taken by 2823@code{GEcreateSnapshot} and replayed later in the session by 2824@code{GEplaySnapshot}. These are used by @code{recordPlot()}, 2825@code{replayPlot()} and the GUI menus of the @code{windows()} device. 2826The `state' includes the display list. 2827 2828 2829The top layer comprises the graphics subsystems. Although there is 2830provision for 24 subsystems since about 2001, currently still only two 2831exist, `base' and 2832`grid'. The base subsystem is registered with the engine when @R{} is 2833initialized, and unregistered (via @code{KillAllDevices}) when an @R{} 2834session is shut down. The grid subsystem is registered in its 2835@code{.onLoad} function and unregistered in the @code{.onUnload} 2836function. The graphics subsystem may also have `state' information 2837saved in a snapshot (currently base does and grid does not). 2838 2839Package @pkg{grDevices} was originally created to contain the basic 2840graphics devices (although @code{X11} is in a separate load-on-demand 2841module because of the volume of external libraries it brings in). Since 2842then it has been used for other functionality that was thought desirable 2843for use with @pkg{grid}, and hence has been transferred from package 2844@pkg{graphics} to @pkg{grDevices}. This is principally concerned with 2845the handling of colours and recording and replaying plots. 2846 2847@menu 2848* Graphics devices:: 2849* Colours:: 2850* Base graphics:: 2851* Grid graphics:: 2852@end menu 2853 2854@node Graphics devices, Colours, Graphics Devices, Graphics Devices 2855@section Graphics Devices 2856 2857@R{} ships with several graphics devices, and there is support for 2858third-party packages to provide additional devices---several packages 2859now do. This section describes the device internals from the viewpoint 2860of a would-be writer of a graphics device. 2861 2862@menu 2863* Device structures:: 2864* Device capabilities:: 2865* Handling text:: 2866* Conventions:: 2867* 'Mode':: 2868* Graphics events:: 2869* Specific devices:: 2870@end menu 2871 2872@node Device structures, Device capabilities, Graphics devices, Graphics devices 2873@subsection Device structures 2874 2875There are two types used internally which are pointers to structures 2876related to graphics devices. 2877 2878The @code{DevDesc} type is a structure defined in the header file 2879@file{R_ext/GraphicsDevice.h} (which is included by 2880@file{R_ext/GraphicsEngine.h}). This describes the physical 2881characteristics of a device, the capabilities of the device driver and 2882contains a set of callback functions that will be used by the graphics 2883engine to obtain information about the device and initiate actions 2884(e.g.@: a new page, plotting a line or some text). Type @code{pDevDesc} 2885is a pointer to this type. 2886 2887The following callbacks can be omitted (or set to the null pointer, 2888their default value) when appropriate default behaviour will be taken by 2889the graphics engine: @code{activate}, @code{cap}, @code{deactivate}, 2890@code{locator}, @code{holdflush} (API version 9), @code{mode}, 2891@code{newFrameConfirm}, @code{path}, @code{raster} and @code{size}. 2892 2893The relationship of device units to physical dimensions is set by the 2894element @code{ipr} of the @code{DevDesc} structure: a @samp{double} 2895array of length 2. 2896 2897 2898The @code{GEDevDesc} type is a structure defined in 2899@file{R_ext/GraphicsEngine.h} (with comments in the file) as 2900 2901@example 2902typedef struct _GEDevDesc GEDevDesc; 2903struct _GEDevDesc @{ 2904 pDevDesc dev; 2905 Rboolean displayListOn; 2906 SEXP displayList; 2907 SEXP DLlastElt; 2908 SEXP savedSnapshot; 2909 Rboolean dirty; 2910 Rboolean recordGraphics; 2911 GESystemDesc *gesd[MAX_GRAPHICS_SYSTEMS]; 2912 Rboolean ask; 2913@} 2914@end example 2915 2916@noindent 2917So this is essentially a device structure plus information about the 2918device maintained by the graphics engine and normally@footnote{It is 2919possible for the device to find the @code{GEDevDesc} which points to its 2920@code{DevDesc}, and this is done often enough that there is a 2921convenience function @code{desc2GEDesc} to do so.} visible to the engine 2922and not to the device. Type @code{pGEDevDesc} is a pointer to this 2923type. 2924 2925The graphics engine maintains an array of devices, as pointers to 2926@code{GEDevDesc} structures. The array is of size 64 but the first 2927element is always occupied by the @code{"null device"} and the final 2928element is kept as NULL as a sentinel.@footnote{Calling 2929@code{R_CheckDeviceAvailable()} ensures there is a free slot or throws 2930an error.} This array is reflected in the @R{} variable 2931@samp{.Devices}. Once a device is killed its element becomes available 2932for reallocation (and its name will appear as @code{""} in 2933@samp{.Devices}). Exactly one of the devices is `active': this is the 2934the null device if no other device has been opened and not killed. 2935 2936Each instance of a graphics device needs to set up a @code{GEDevDesc} 2937structure by code very similar to 2938 2939@example 2940 pGEDevDesc gdd; 2941 2942 R_GE_checkVersionOrDie(R_GE_version); 2943 R_CheckDeviceAvailable(); 2944 BEGIN_SUSPEND_INTERRUPTS @{ 2945 pDevDesc dev; 2946 /* Allocate and initialize the device driver data */ 2947 if (!(dev = (pDevDesc) calloc(1, sizeof(DevDesc)))) 2948 return 0; /* or error() */ 2949 /* set up device driver or free 'dev' and error() */ 2950 gdd = GEcreateDevDesc(dev); 2951 GEaddDevice2(gdd, "dev_name"); 2952 @} END_SUSPEND_INTERRUPTS; 2953@end example 2954 2955The @code{DevDesc} structure contains a @code{void *} pointer 2956@samp{deviceSpecific} which is used to store data specific to the 2957device. Setting up the device driver includes initializing all the 2958non-zero elements of the @code{DevDesc} structure. 2959 2960Note that the device structure is zeroed when allocated: this provides 2961some protection against future expansion of the structure since the 2962graphics engine can add elements that need to be non-NULL/non-zero to be 2963`on' (and the structure ends with 64 reserved bytes which will be zeroed 2964and allow for future expansion). 2965 2966Rather more protection is provided by the version number of the 2967engine/device API, @code{R_GE_version} defined in 2968@file{R_ext/GraphicsEngine.h} together with access functions 2969 2970@example 2971int R_GE_getVersion(void); 2972void R_GE_checkVersionOrDie(int version); 2973@end example 2974 2975@noindent 2976If a graphics device calls @code{R_GE_checkVersionOrDie(R_GE_version)} 2977it can ensure it will only be used in versions of @R{} which provide the 2978API it was designed for and compiled against. 2979 2980@node Device capabilities, Handling text, Device structures, Graphics devices 2981@subsection Device capabilities 2982 2983The following `capabilities' can be defined for the device's 2984@code{DevDesc} structure. 2985 2986@itemize 2987@item @code{canChangeGamma} -- 2988@code{Rboolean}: can the display gamma be adjusted? This is now 2989ignored, as gamma support has been removed. 2990@item @code{canHadj} -- 2991@code{integer}: can the device do horizontal adjustment of text 2992@emph{via} the @code{text} callback, and if so, how precisely? 0 = no 2993adjustment, 1 = @{0, 0.5, 1@} (left, centre, right justification) or 2 = 2994continuously variable (in [0,1]) between left and right justification. 2995@item @code{canGenMouseDown} -- 2996@code{Rboolean}: can the device handle mouse down events? This 2997flag and the next three are not currently used by R, but are maintained 2998for back compatibility. 2999@item @code{canGenMouseMove} -- 3000@code{Rboolean}: ditto for mouse move events. 3001@item @code{canGenMouseUp} -- 3002@code{Rboolean}: ditto for mouse up events. 3003@item @code{canGenKeybd} -- 3004@code{Rboolean}: ditto for keyboard events. 3005@item @code{hasTextUTF8} -- 3006@code{Rboolean}: should non-symbol text be sent (in UTF-8) to the 3007@code{textUTF8} and @code{strWidthUTF8} callbacks, and sent as Unicode 3008points (negative values) to the @code{metricInfo} callback? 3009@item @code{wantSymbolUTF8} -- 3010@code{Rboolean}: should symbol text be handled in UTF-8 in the same way 3011as other text? Requires @code{textUTF8 = TRUE}. 3012@item @code{haveTransparency}: 3013does the device support semi-transparent colours? 3014@item @code{haveTransparentBg}: 3015can the background be fully or semi-transparent? 3016@item @code{haveRaster}: 3017is there support for rendering raster images? 3018@item @code{haveCapture}: 3019is there support for @code{grid::grid.cap}? 3020@item @code{haveLocator}: 3021is there an interactive locator? 3022@end itemize 3023 3024The last three can often be deduced to be false from the presence of 3025@code{NULL} entries instead of the corresponding functions. 3026 3027@node Handling text, Conventions, Device capabilities, Graphics devices 3028@subsection Handling text 3029 3030Handling text is probably the hardest task for a graphics device, and 3031the design allows for the device to optionally indicate that it has 3032additional capabilities. (If the device does not, these will if 3033possible be handled in the graphics engine.) 3034 3035The three callbacks for handling text that must be in all graphics 3036devices are @code{text}, @code{strWidth} and @code{metricInfo} with 3037declarations 3038 3039@example 3040void text(double x, double y, const char *str, double rot, double hadj, 3041 pGgcontext gc, pDevDesc dd); 3042 3043double strWidth(const char *str, pGEcontext gc, pDevDesc dd); 3044 3045void metricInfo(int c, pGEcontext gc, 3046 double* ascent, double* descent, double* width, 3047 pDevDesc dd); 3048@end example 3049 3050@noindent 3051The @samp{gc} parameter provides the graphics context, most importantly 3052the current font and fontsize, and @samp{dd} is a pointer to the active 3053device's structure. 3054 3055The @code{text} callback should plot @samp{str} at @samp{(x, 3056y)}@footnote{in device coordinates} with an anti-clockwise rotation of 3057@samp{rot} degrees. (For @samp{hadj} see below.) The interpretation 3058for horizontal text is that the baseline is at @code{y} and the start is 3059a @code{x}, so any left bearing for the first character will start at 3060@code{x}. 3061 3062The @code{strWidth} callback computes the width of the string which it 3063would occupy if plotted horizontally in the current font. (Width here 3064is expected to include both (preferably) or neither of left and right 3065bearings.) 3066 3067The @code{metricInfo} callback computes the size of a single 3068character: @code{ascent} is the distance it extends above the baseline 3069and @code{descent} how far it extends below the baseline. 3070@code{width} is the amount by which the cursor should be advanced when 3071the character is placed. For @code{ascent} and @code{descent} this is 3072intended to be the bounding box of the `ink' put down by the glyph and 3073not the box which might be used when assembling a line of conventional 3074text (it needs to be for e.g.@: @code{hat(beta)} to work correctly). 3075However, the @code{width} is used in plotmath to advance to the next 3076character, and so needs to include left and right bearings. 3077 3078The @emph{interpretation} of @samp{c} depends on the locale. In a 3079single-byte locale values @code{32...255} indicate the corresponding 3080character in the locale (if present). For the symbol font (as used by 3081@samp{graphics::par(font=5)}, @samp{grid::gpar(fontface=5}) and by 3082`plotmath'), values @code{32...126, 161...239, 241...254} indicate 3083glyphs in the Adobe Symbol encoding. In a multibyte locale, @code{c} 3084represents a Unicode point (except in the symbol font). So the function 3085needs to include code like 3086 3087@example 3088 Rboolean Unicode = mbcslocale && (gc->fontface != 5); 3089 if (c < 0) @{ Unicode = TRUE; c = -c; @} 3090 if(Unicode) UniCharMetric(c, ...); else CharMetric(c, ...); 3091@end example 3092 3093@noindent 3094In addition, if device capability @code{hasTextUTF8} (see below) is 3095true, Unicode points will be passed as negative values: the code snippet 3096above shows how to handle this. (This applies to the symbol font only 3097if device capability @code{wantSymbolUTF8} is true.) 3098 3099If possible, the graphics device should handle clipping of text. It 3100indicates this by the structure element @code{canClip} which if true 3101will result in calls to the callback @code{clip} to set the clipping 3102region. If this is not done, the engine will clip very crudely (by 3103omitting any text that does not appear to be wholly inside the clipping 3104region). 3105 3106The device structure has an integer element @code{canHadj}, which 3107indicates if the device can do horizontal alignment of text. If this is 3108one, argument @samp{hadj} to @code{text} will be called as @code{0 ,0.5, 31091} to indicate left-, centre- and right-alignment at the indicated 3110position. If it is two, continuous values in the range @code{[0, 1]} 3111are assumed to be supported. 3112 3113Capability @code{hasTextUTF8} if true, it has two consequences. 3114First, there are callbacks @code{textUTF8} and @code{strWidthUTF8} that 3115should behave identically to @code{text} and @code{strWidth} except that 3116@samp{str} is assumed to be in UTF-8 rather than the current locale's 3117encoding. The graphics engine will call these for all text except in 3118the symbol font. Second, Unicode points will be passed to the 3119@code{metricInfo} callback as negative integers. If your device would 3120prefer to have UTF-8-encoded symbols, define @code{wantSymbolUTF8} as 3121well as @code{hasTextUTF8}. In that case text in the symbol font is 3122sent to @code{textUTF8} and @code{strWidthUTF8}. 3123 3124Some devices can produce high-quality rotated text, but those based on 3125bitmaps often cannot. Those which can should set 3126@code{useRotatedTextInContour} to be true from graphics API version 4. 3127 3128Several other elements relate to the precise placement of text by the 3129graphics engine: 3130 3131@example 3132double xCharOffset; 3133double yCharOffset; 3134double yLineBias; 3135double cra[2]; 3136@end example 3137 3138@noindent 3139These are more than a little mysterious. Element @code{cra} provides an 3140indication of the character size, @code{par("cra")} in base graphics, in 3141device units. The mystery is what is meant by `character size': which 3142character, which font at which size? Some help can be obtained by 3143looking at what this is used for. The first element, `width', is not 3144used by @R{} except to set the graphical parameters. The second, 3145`height', is use to set the line spacing, that is the relationship 3146between @code{par("mai")} and @code{par("mai")} and so on. It is 3147suggested that a good choice is 3148 3149@example 3150dd->cra[0] = 0.9 * fnsize; 3151dd->cra[1] = 1.2 * fnsize; 3152@end example 3153 3154@noindent 3155where @samp{fnsize} is the `size' of the standard font (@code{cex=1}) 3156on the device, in device units. So for a 12-point font (the usual 3157default for graphics devices), @samp{fnsize} should be 12 points in 3158device units. 3159 3160The remaining elements are yet more mysterious. The @code{postscript()} 3161device says 3162 3163@example 3164 /* Character Addressing Offsets */ 3165 /* These offsets should center a single */ 3166 /* plotting character over the plotting point. */ 3167 /* Pure guesswork and eyeballing ... */ 3168 3169 dd->xCharOffset = 0.4900; 3170 dd->yCharOffset = 0.3333; 3171 dd->yLineBias = 0.2; 3172@end example 3173 3174@noindent 3175It seems that @code{xCharOffset} is not currently used, and 3176@code{yCharOffset} is used by the base graphics system to set vertical 3177alignment in @code{text()} when @code{pos} is specified, and in 3178@code{identify()}. It is occasionally used by the graphic engine when 3179attempting exact centring of text, such as character string values of 3180@code{pch} in @code{points()} or @code{grid.points()}---however, it is 3181only used when precise character metric information is not available or 3182for multi-line strings. 3183 3184@code{yLineBias} is used in the base graphics system in @code{axis()} and 3185@code{mtext()} to provide a default for their @samp{padj} argument. 3186 3187@node Conventions, 'Mode', Handling text, Graphics devices 3188@subsection Conventions 3189 3190The aim is to make the (default) output from graphics devices as similar 3191as possible. Generally people follow the model of the @code{postscript} 3192and @code{pdf} devices (which share most of their internal code). 3193 3194The following conventions have become established: 3195 3196@itemize 3197 3198@item 3199The default size of a device should be 7 inches square. 3200 3201@item 3202There should be a @samp{pointsize} argument which defaults to 12, and it 3203should give the pointsize in big points (1/72 inch). How exactly this 3204is interpreted is font-specific, but it should use a font which works 3205with lines packed 1/6 inch apart, and looks good with lines 1/5 inch 3206apart (that is with 2pt leading). 3207 3208@item 3209The default font family should be a sans serif font, e.g Helvetica or 3210similar (e.g.@: Arial on Windows). 3211 3212@item 3213@code{lwd = 1} should correspond to a line width of 1/96 inch. This 3214will be a problem with pixel-based devices, and generally there is a 3215minimum line width of 1 pixel (although this may not be appropriate 3216where anti-aliasing of lines is used, and @code{cairo} prefers a minimum 3217of 2 pixels). 3218 3219@item 3220Even very small circles should be visible, e.g.@: by using a minimum 3221radius of 1 pixel or replacing very small circles by a single filled 3222pixel. 3223 3224@item 3225How RGB colour values will be interpreted should be documented, and 3226preferably be sRGB. 3227 3228@item 3229The help page should describe its policy on these conventions. 3230 3231@end itemize 3232 3233These conventions are less clear-cut for bitmap devices, especially 3234where the bitmap format does not have a design resolution. 3235 3236The interpretation of the line texture (@code{par("lty"}) is described 3237in the header @file{GraphicsEngine.h} and in the help for @code{par}: note that the 3238`scale' of the pattern should be proportional to the line width (at 3239least for widths above the default). 3240 3241 3242@node 'Mode', Graphics events, Conventions, Graphics devices 3243@subsection `Mode' 3244 3245One of the device callbacks is a function @code{mode}, documented in 3246the header as 3247 3248@example 3249 * device_Mode is called whenever the graphics engine 3250 * starts drawing (mode=1) or stops drawing (mode=0) 3251 * GMode (in graphics.c) also says that 3252 * mode = 2 (graphical input on) exists. 3253 * The device is not required to do anything 3254@end example 3255 3256@noindent 3257Since @code{mode = 2} has only recently been documented at device level. 3258It could be used to change the graphics cursor, but devices currently do 3259that in the @code{locator} callback. (In base graphics the mode is set 3260for the duration of a @code{locator} call, but if @code{type != "n"} is 3261switched back for each point whilst annotation is being done.) 3262 3263Many devices do indeed do nothing on this call, but some screen devices 3264ensure that drawing is flushed to the screen when called with @code{mode 3265= 0}. It is tempting to use it for some sort of buffering, but note 3266that `drawing' is interpreted at quite a low level and a typical single 3267figure will stop and start drawing many times. The buffering introduced 3268in the @code{X11()} device makes use of @code{mode = 0} to indicate 3269activity: it updates the screen after @emph{ca} 100ms of inactivity. 3270 3271This callback need not be supplied if it does nothing. 3272 3273@node Graphics events, Specific devices, 'Mode', Graphics devices 3274@subsection Graphics events 3275 3276Graphics devices may be designed to handle user interaction: not all are. 3277 3278Users may use @code{grDevices::setGraphicsEventEnv} to set the 3279@code{eventEnv} environment in the device driver to hold event 3280handlers. When the user calls @code{grDevices::getGraphicsEvent}, R will 3281take three steps. First, it sets the device driver member 3282@code{gettingEvent} to @code{true} for each device with a 3283non-@code{NULL} @code{eventEnv} entry, and calls @code{initEvent(dd, 3284true)} if the callback is defined. It then enters an event loop. Each 3285time through the loop R will process events once, then check whether any 3286device has set the @code{result} member of @code{eventEnv} to a 3287non-@code{NULL} value, and will save the first such value found to be 3288returned. C functions @code{doMouseEvent} and @code{doKeybd} are 3289provided to call the R event handlers @code{onMouseDown}, 3290@code{onMouseMove}, @code{onMouseUp}, and @code{onKeybd} and set 3291@code{eventEnv$result} during this step. Finally, @code{initEvent} is 3292called again with @code{init=false} to inform the devices that the 3293loop is done, and the result is returned to the user. 3294 3295@node Specific devices, , Graphics events, Graphics devices 3296@subsection Specific devices 3297 3298Specific devices are mostly documented by comments in their sources, 3299although for devices of many years' standing those comments can be in 3300need of updating. This subsection is a repository of notes on design 3301decisions. 3302 3303@menu 3304* X11():: 3305* windows():: 3306@end menu 3307 3308@node X11(), windows(), Specific devices, Specific devices 3309@subsubsection X11() 3310 3311The @code{X11(type="Xlib")} device dates back to the mid 1990's and was 3312written then in @code{Xlib}, the most basic X11 toolkit. It has since 3313optionally made use of a few features from other toolkits: @code{libXt} 3314is used to read X11 resources, and @code{libXmu} is used in the handling 3315of clipboard selections. 3316 3317Using basic @code{Xlib} code makes drawing fast, but is limiting. There 3318is no support of translucent colours (that came in the @code{Xrender} 3319toolkit of 2000) nor for rotated text (which @R{} implements by 3320rendering text to a bitmap and rotating the latter). 3321 3322The hinting for the X11 window asks for backing store to be used, and 3323some windows managers may use it to handle repaints, but it seems that 3324most repainting is done by replaying the display list (and here the fast 3325drawing is very helpful). 3326 3327There are perennial problems with finding fonts. Many users fail to 3328realize that fonts are a function of the X server and not of the machine 3329that @R{} is running on. After many difficulties, @R{} tries first to 3330find the nearest size match in the sizes provided for Adobe fonts in the 3331standard 75dpi and 100dpi X11 font packages---even that will fail to 3332work when users of near-100dpi screens have only the 75dpi set 3333installed. The 75dpi set allows sizes down to 6 points on a 100dpi 3334screen, but some users do try to use smaller sizes and even 6 and 8 3335point bitmapped fonts do not look good. 3336 3337Introduction of UTF-8 locales has caused another wave of difficulties. 3338X11 has very few genuine UTF-8 fonts, and produces composite fontsets 3339for the @code{iso10646-1} encoding. Unfortunately these seem to have 3340low coverage apart from a few monospaced fonts in a few sizes (which are 3341not suitable for graph annotation), and where glyphs are missing what is 3342plotted is often quite unsatisfactory. 3343 3344The current approach is to make use of more modern toolkits, namely 3345@code{cairo} for rendering and @code{Pango} for font 3346management---because these are associated with @code{Gtk+2} they are 3347widely available. Cairo supports translucent colours and alpha-blending 3348(@emph{via} @code{Xrender}), and anti-aliasing for the display of lines 3349and text. Pango's font management is based on @code{fontconfig} and 3350somewhat mysterious, but it seems mainly to use Type 1 and TrueType 3351fonts on the machine running @R{} and send grayscale bitmaps to cairo. 3352 3353 3354@node windows(), , X11(), Specific devices 3355@subsubsection windows() 3356 3357The @code{windows()} device is a family of devices: it supports plotting 3358to Windows (enhanced) metafiles, @code{BMP}, @code{JPEG}, @code{PNG} and 3359@code{TIFF} files as well as to Windows printers. 3360 3361In most of these cases the primary plotting is to a bitmap: this is used 3362for the (default) buffering of the screen device, which also enables the 3363current plot to be saved to BMP, JPEG, PNG or TIFF (it is the internal 3364bitmap which is copied to the file in the appropriate format). 3365 3366The device units are pixels (logical ones on a metafile device). 3367 3368The code was originally written by Guido Masarotto with extensive use of 3369macros, which can make it hard to disentangle. 3370 3371For a screen device, @code{xd->gawin} is the canvas of the screen, and 3372@code{xd->bm} is the off-screen bitmap. So macro @code{DRAW} arranges 3373to plot to @code{xd->bm}, and if buffering is off, also to 3374@code{xd->gawin}. For all other device, @code{xd->gawin} is the canvas, 3375a bitmap for the @code{jpeg()} and @code{png()} device, and an internal 3376representation of a Windows metafile for the @code{win.metafile()} and 3377@code{win.print} device. Since `plotting' is done by Windows GDI calls 3378to the appropriate canvas, its precise nature is hidden by the GDI 3379system. 3380 3381Buffering on the screen device is achieved by running a timer, which 3382when it fires copies the internal bitmap to the screen. This is set to 3383fire every 500ms (by default) and is reset to 100ms after plotting 3384activity. 3385 3386Repaint events are handled by copying the internal bitmap to the screen 3387canvas (and then reinitializing the timer), unless there has been a resize. 3388Resizes are handled by replaying the display list: this might not be 3389necessary if a fixed canvas with scrollbars is being used, but that is 3390the least popular of the three forms of resizing. 3391 3392Text on the device has moved to `Unicode' (UCS-2) in recent years. 3393UTF-8 is requested (@code{hasTextUTF8 = TRUE}) for standard text, and 3394converted to UCS-2 in the plotting functions in file 3395@file{src/extra/graphapp/gdraw.c}. However, GDI has no support for 3396Unicode symbol fonts, and symbols are handled in Adobe Symbol encoding. 3397 3398There is support for translucent colours (with alpha channel between 0 3399and 255) was introduced on the screen device and bitmap 3400devices.@footnote{It is technically possible to use alpha-blending on 3401metafile devices such as printers, but it seems few drivers have support 3402for this.} This is done by drawing on a further internal bitmap, 3403@code{xd->bm2}, in the opaque version of the colour then alpha-blending 3404that bitmap to @code{xd->bm}. The alpha-blending routine is in a 3405separate DLL, @file{msimg32.dll}, which is loaded on first use. As 3406small a rectangular region as reasonably possible is alpha-blended (this 3407is rectangle @code{r} in the code), but things like mitre joins make 3408estimation of a tight bounding box too much work for lines and polygonal 3409boundaries. Translucent-coloured lines are not common, and the 3410performance seems acceptable. 3411 3412The support for a transparent background in @code{png()} predates full 3413alpha-channel support in @code{libpng} (let alone in PNG viewers), so 3414makes use of the limited transparency support in earlier versions of 3415PNG. Where 24-bit colour is used, this is done by marking a single 3416colour to be rendered as transparent. @R{} chose @samp{#fdfefd}, and 3417uses this as the background colour (in @code{GA_NewPage} if the 3418specified background colour is transparent (and all non-opaque 3419background colours are treated as transparent). So this works by 3420marking that colour in the PNG file, and viewers without transparency 3421support see a slightly-off-white background, as if there were a 3422near-white canvas. Where a palette is used in the PNG file (if less 3423than 256 colours were used) then this colour is recorded with full 3424transparency and the remaining colours as opaque. If 32-bit colour were 3425available then we could add a full alpha channel, but this is dependent 3426on the graphics hardware and undocumented properties of GDI. 3427 3428 3429@node Colours, Base graphics, Graphics devices, Graphics Devices 3430@section Colours 3431 3432Devices receive colours as a @code{typedef} @code{rcolor} (an 3433@code{unsigned int}) defined in the header 3434@file{R_ext/GraphicsEngine.h}). The 4 bytes are @emph{R} ,@emph{G}, 3435@emph{B} and @emph{alpha} from least to most significant. So each of RGB 3436has 256 levels of luminosity from 0 to 255. The alpha byte represents 3437opacity, so value 255 is fully opaque and 0 fully transparent: many but 3438not all devices handle semi-transparent colours. 3439 3440Colors can be created in C via the macro @code{R_RGBA}, and a set of 3441macros are defined in @file{R_ext/GraphicsDevice.h} to extract the 3442various components. 3443 3444Colours in the base graphics system were originally adopted from S (and 3445before that the GRZ library from Bell Labs), with the concept of a 3446(variable-sized) palette of colours referenced by numbers 3447@samp{1...@var{N}} plus @samp{0} (the background colour of the current 3448device). @R{} introduced the idea of referring to colours by character 3449strings, either in the forms @samp{#RRGGBB} or @samp{#RRGGBBAA} 3450(representing the bytes in hex) as given by function @code{rgb()} or via 3451names: the 657 known names are given in the character vector 3452@code{colors} and in a table in file @file{colors.c} in package 3453@pkg{grDevices}. Note that semi-transparent colours are not 3454`premultiplied', so 50% transparent white is @samp{#ffffff80}. 3455 3456Integer or character @code{NA} colours are mapped internally to 3457transparent white, as is the character string @code{"NA"}. 3458 3459Negative colour numbers are an error. Colours greater than 3460@samp{@var{N}} are wrapped around, so that for example with the default 3461palette of size 8, colour @samp{10} is colour @samp{2} in the palette. 3462 3463Integer colours have been used more widely than the base graphics 3464sub-system, as they are supported by package @pkg{grid} and hence by 3465@CRANpkg{lattice} and @CRANpkg{ggplot2}. (They are also used by package 3466@CRANpkg{rgl}.) @pkg{grid} did re-define colour @samp{0} to be 3467transparent white, but @CRANpkg{rgl} used @code{col2rgb} and hence the 3468background colour of base graphics. 3469 3470Note that positive integer colours refer to the current palette and 3471colour @samp{0} to the current device (and a device is opened if needs 3472be). These are mapped to type @code{rcolor} at the time of use: this 3473matters when re-playing the display list, e.g.@: when a device is 3474resized or @code{dev.copy} is used. The palette should be thought of as 3475per-session: it is stored in package @pkg{grDevices}. 3476 3477The convention is that devices use the colorspace `sRGB'. This is an 3478industry standard: it is used by Web browsers and JPEGs from all but 3479high-end digital cameras. The interpretation is a matter for graphics 3480devices and for code that manipulates colours, but not for the graphics 3481engine or subsystems. 3482 3483@R{} uses a painting model similar to PostScript and PDF. This means 3484that where shapes (circles, rectangles and polygons) can both be filled 3485and have a stroked border, the fill should be painted first and then the 3486border (or otherwise only half the border will be visible). Where both 3487the fill and the border are semi-transparent there is some room for 3488interpretation of the intention. Most devices first paint the fill and 3489then the border, alpha-blending at each step. However, PDF does some 3490automatic grouping of objects, and @emph{when the fill and the border 3491have the same alpha}, they are painted onto the same layer and then 3492alpha-blended in one step. (See p. 569 of the PDF Reference Sixth 3493Edition, version 1.7. Unfortunately, although this is what the PDF 3494standard says should happen, it is not correctly implemented by some 3495viewers.) 3496 3497The mapping from colour numbers to type @code{rcolor} is primarily done 3498by function @code{RGBpar3}: this is exported from the @R{} binary but 3499linked to code in package @pkg{grDevices}. The first argument is a 3500@code{SEXP} pointing to a character, integer or double vector, and the 3501second is the @code{rcolor} value for colour @code{0} (or @code{"0"}). 3502C entry point @code{RGBpar} is a wrapper that takes @code{0} to be 3503transparent white: it is often used to set colour defaults for devices. 3504The @R{}-level wrapper is @code{col2rgb}. 3505 3506There is also @code{R_GE_str2col} which takes a C string and converts to 3507type @code{rcolor}: @code{"0'} is converted to transparent white. 3508 3509There is a @R{}-level conversion of colours to @samp{##RRGGBBAA} by 3510@code{image.default(useRaster = TRUE)}. 3511 3512The other color-conversion entry point in the API is @code{name2col} 3513which takes a colour name (a C string) and returns a value of type 3514@code{rcolor}. This handles @code{"NA"}, @code{"transparent"} and the 3515657 colours known to the @R{} function @code{colors()}. 3516 3517@node Base graphics, Grid graphics, Colours, Graphics Devices 3518@section Base graphics 3519 3520The base graphics system was migrated to package @pkg{graphics} in @R{} 35213.0.0: it was previously implemented in files in @file{src/main}. 3522 3523For historical reasons it is largely implemented in two layers. 3524Files @file{plot.c}, @file{plot3d.c} and @file{par.c} contain the code 3525for the around 30 @code{.External} calls that implement the basic 3526graphics operations. This code then calls functions with names starting 3527with @code{G} and declared in header @file{Rgraphics.h} in file 3528@file{graphics.c}, which in turn call the graphics engine (whose 3529functions almost all have names starting with @code{GE}). 3530 3531A large part of the infrastructure of the base graphics subsystem are 3532the graphics parameters (as set/read by @code{par()}). These are stored 3533in a @code{GPar} structure declared in the private header 3534@file{Graphics.h}. This structure has two variables (@code{state} and 3535@code{valid}) tracking the state of the base subsystem on the device, 3536and many variables recording the graphics parameters and functions of 3537them. 3538 3539The base system state is contained in @code{baseSystemState} structure 3540defined in @file{R_ext/GraphicsBase.h}. This contains three @code{GPar} 3541structures and a Boolean variable used to record if @code{plot.new()} 3542(or @code{persp}) has been used successfully on the device. 3543 3544The three copies of the @code{GPar} structure are used to store the 3545current parameters (accessed via @code{gpptr}), the `device copy' 3546(accessed via @code{dpptr}) and space for a saved copy of the `device 3547copy' parameters. The current parameters are, clearly, those currently 3548in use and are copied from the `device copy' whenever @code{plot.new()} 3549is called (whether or not that advances to the next `page'). The saved 3550copy keeps the state when the device was last completely cleared (e.g.@: 3551when @code{plot.new()} was called with @code{par(new=TRUE)}), and is 3552used to replay the display list. 3553 3554The separation is not completely clean: the `device copy' is altered if 3555a plot with log scale(s) is set up via @code{plot.window()}. 3556 3557There is yet another copy of most of the graphics parameters in 3558@code{static} variables in @file{graphics.c} which are used to preserve 3559the current parameters across the processing of inline parameters in 3560high-level graphics calls (handled by @code{ProcessInlinePars}). 3561 3562Snapshots of the base subsystem record the `saved device copy' of the 3563@code{GPar} structure. 3564 3565@menu 3566* Arguments and parameters:: 3567@end menu 3568 3569@node Arguments and parameters, , Base graphics, Base graphics 3570@subsection Arguments and parameters 3571 3572There is an unfortunate confusion between some of the graphical 3573parameters (as set by @code{par}) and arguments to base graphic 3574functions of the same name. This description may help set the record 3575straight. 3576 3577Most of the high-level plotting functions accept graphical parameters as 3578additional arguments, which are then often passed to lower-level 3579functions if not already named arguments (which is the main source of 3580confusion). 3581 3582Graphical parameter @code{bg} is the background colour of the plot. 3583Argument @code{bg} refers to the fill colour for the filled symbols 3584@code{21} to @code{25}. It is an argument to the function 3585@code{plot.xy}, but normally passed by the default method of 3586@code{points}, often from a @code{plot} method. 3587 3588Graphics parameters @code{cex}, @code{col}, @code{lty}, @code{lwd} and 3589@code{pch} also appear as arguments of @code{plot.xy} and so are often 3590passed as arguments from higher-level plot functions such as 3591@code{lines}, @code{points} and @code{plot} methods. They appear as 3592arguments of @code{legend}, @code{col}, @code{lty} and @code{lwd} are 3593arguments of @code{arrows} and @code{segments}. When used as arguments 3594they can be vectors, recycled to control the various lines, points and 3595segments. When set a graphical parameters they set the default 3596rendering: in addition @code{par(cex=)} sets the overall character 3597expansion which subsequent calls (as arguments or on-line graphical 3598parameters) multiply. 3599 3600The handling of missing values differs in the two classes of uses. 3601Generally these are errors when used in @code{par} but cause the 3602corresponding element of the plot to be omitted when used as an element 3603of a vector argument. Originally the interpretation of arguments was 3604mainly left to the device, but nowadays some of this is pre-empted in 3605the graphics engine (but for example the handling of @code{lwd = 0} 3606remains device-specific, with some interpreting it as a `thinnest 3607possible' line). 3608 3609@node Grid graphics, , Base graphics, Graphics Devices 3610@section Grid graphics 3611 3612[At least pointers to documentation.] 3613 3614@node GUI consoles, Tools, Graphics Devices, Top 3615@chapter GUI consoles 3616 3617The standard @R{} front-ends are programs which run in a terminal, but 3618there are several ways to provide a GUI console. 3619 3620This can be done by a package which is loaded from terminal-based @R{} 3621and launches a console as part of its startup code or by the user 3622running a specific function: package @CRANpkg{Rcmdr} is a well-known 3623example with a Tk-based GUI. 3624 3625There used to be a Gtk-based console invoked by @command{R --gui=GNOME}: 3626this relied on special-casing in the front-end shell script to launch a 3627different executable. There still is @command{R --gui=Tk}, which starts 3628terminal-based @R{} and runs @code{tcltk::tkStartGui()} as part of the 3629modified startup sequence. 3630 3631However, the main way to run a GUI console is to launch a separate 3632program which runs embedded @R{}: this is done by @command{Rgui.exe} on 3633Windows and @command{R.app} on macOS. The first is an integral part 3634of @R{} and the code for the console is currently in @file{R.dll}. 3635 3636@menu 3637* R.app:: 3638@end menu 3639 3640@node R.app, , GUI consoles, GUI consoles 3641@section R.app 3642 3643@command{R.app} is a macOS application which provides a console. Its 3644sources are a separate project@footnote{an Xcode project, in SVN at 3645@uref{https://svn.r-project.org/R-packages/trunk/Mac-GUI/}.}, and its binaries 3646link to an @R{} installation which it runs as a dynamic library 3647@file{libR.dylib}. The standard @acronym{CRAN} distribution of @R{} for 3648macOS bundles the GUI and @R{} itself, but installing the GUI is optional 3649and either component can be updated separately. 3650 3651@command{R.app} relies on @file{libR.dylib} being in a specific place, 3652and hence on @R{} having been built and installed as a Mac macOS 3653`framework'. Specifically, it uses 3654@file{/Library/Frameworks/R.framework/R}. This is a symbolic link, as 3655frameworks can contain multiple versions of @R{}. It eventually 3656resolves to 3657@file{/Library/Frameworks/R.framework/Versions/Current/Resources/lib/libR.dylib}, 3658which is (in the @acronym{CRAN} distribution) a `fat' binary containing 3659multiple sub-architectures. 3660 3661macOS applications are directory trees: each @command{R.app} contains 3662a front-end written in Objective-C for one sub-architecture: in the 3663standard distribution there are separate applications for 32- and 64-bit 3664Intel architectures. 3665 3666Originally the @R{} sources contained quite a lot of code used only by 3667the macOS GUI, but this was migrated to the @command{R.app} sources. 3668 3669@command{R.app} starts @R{} as an embedded application with a 3670command-line which includes @option{--gui=aqua} (see below). It uses 3671most of the interface pointers defined in the header 3672@file{Rinterface.h}, plus a private interface pointer in file 3673@file{src/main/sysutils.c}. It adds an environment 3674it names @code{tools:RGUI} to the second position in the search path. 3675This contains a number of utility functions used to support the menu 3676items, for example @code{package.manager()}, plus functions @code{q()} 3677and @code{quit()} which mask those in package @pkg{base}---the custom 3678versions save the history in a way specific to @code{R.app}. 3679 3680There is a @command{configure} option @option{--with-aqua} for @R{} 3681which customizes the way @R{} is built: this is distinct from the 3682@option{--enable-R-framework} option which causes @command{make install} 3683to install @R{} as the framework needed for use with @code{R.app}. (The 3684option @option{--with-aqua} is the default on macOS.) It sets the 3685macro @code{HAVE_AQUA} in @file{config.h} and the make variable 3686@code{BUILD_AQUA_TRUE}. These have several consequences: 3687 3688@itemize 3689@item 3690The @code{quartz()} device is built (other than as a stub) in package 3691@pkg{grDevices}: this needs an Objective-C compiler. Then 3692@code{quartz()} can be used with terminal @R{} provided the latter has 3693access to the macOS screen. 3694 3695@item 3696File @file{src/unix/aqua.c} is compiled. This now only contains an 3697interface pointer for the @code{quartz()} device(s). 3698 3699@item 3700@code{capabilities("aqua")} is set to @code{TRUE}. 3701 3702@item 3703The default path for a personal library directory is set as 3704@file{~/Library/R/arch/x.y/library}. 3705@c This is done in @file{etc/Renviron}. 3706 3707@item 3708There is support for setting a `busy' indicator whilst waiting for 3709@code{system()} to return. 3710 3711@item 3712@code{R_ProcessEvents} is inhibited in a forked child from package 3713@pkg{parallel}. The associated callback in @code{R.app} does things 3714which should not be done in a child, and forking forks the whole process 3715including the console. 3716 3717@item 3718There is support for starting the embedded @R{} with the option 3719@option{--gui=aqua}: when this is done the global C variable 3720@code{useaqua} is set to a true value. This has consequences: 3721 3722@itemize 3723@item 3724The @R{} session is asserted to be interactive @emph{via} @code{R_Interactive}. 3725 3726@item 3727@code{.Platform$GUI} is set to @code{"AQUA"}. That has consequences: 3728@itemize 3729@item 3730The environment variable @env{DISPLAY} is set to @samp{:0} if not 3731already set. 3732 3733@item 3734@file{/usr/local/bin} is appended to @env{PATH} since that is where 3735@command{gfortran} is installed. 3736 3737@item 3738The default @HTML{} browser is switched to the one in @command{R.app}. 3739 3740@item 3741Various widgets are switched to the versions provided in 3742@command{R.app}: these include graphical menus, the data editor (but not 3743the data viewer used by @code{View()}) and the workspace browser invoked 3744by @code{browseEnv()}. 3745 3746@item 3747The @pkg{grDevices} package when loaded knows that it is being run 3748under @command{R.app} and so informs any @code{quartz} devices that a 3749Quartz event loop is already running. 3750@end itemize 3751 3752@item 3753The use of the OS's @code{system} function (including by @code{system()} 3754and @code{system2()}, and to launch editors and pagers) is replaced by a 3755version in @code{R.app} (which by default just calls the OS's 3756@code{system} with various signal handlers reset). 3757 3758@end itemize 3759 3760@item 3761If either @R{} was started by @option{--gui=aqua} or @R{} is running in 3762a terminal which is not of type @samp{dumb}, the standard output to 3763files @file{stdout} and @file{stderr} is directed through the C function 3764@code{Rstd_WriteConsoleEx}. This uses ANSI terminal escapes to render 3765lines sent to @code{stderr} as bold on @code{stdout}. 3766 3767@item 3768For historical reasons the startup option @code{-psn} is allowed but 3769ignored. (It seems that in 2003, @samp{r27492}, this was added by Finder.) 3770 3771@end itemize 3772 3773 3774 3775@node Tools, R coding standards, GUI consoles, Top 3776@chapter Tools 3777 3778The behavior of @command{R CMD check} can be controlled through a 3779variety of command line arguments and environment variables. 3780 3781There is an internal @option{--install=@var{value}} command line 3782argument not shown by @command{R CMD check --help}, with possible values 3783 3784@table @code 3785@item check:@var{file} 3786Assume that installation was already performed with stdout/stderr to 3787@var{file}, the contents of which need to be checked (without repeating 3788the installation). This is useful for checks applied by repository 3789maintainers: it reduces the check time by the installation time given 3790that the package has already been installed. In this case, one also 3791needs to specify @emph{where} the package was installed to using command 3792line option @option{--library}. 3793@item fake 3794Fake installation, and turn off the run-time tests. 3795@item skip 3796Skip installation, e.g., when testing recommended packages bundled with 3797R. 3798@item no 3799The same as @option{--no-install} : turns off installation and the tests 3800which require the package to be installed. 3801@end table 3802 3803The following environment variables can be used to customize the 3804operation of @command{check}: a convenient place to set these is the 3805check environment file (default, @file{~/.R/check.Renviron}). 3806 3807@vtable @code 3808@item _R_CHECK_ALL_NON_ISO_C_ 3809If true, do not ignore compiler (typically GCC) warnings about non ISO C 3810code in @emph{system} headers. Note that this may also show additional 3811ISO C++ warnings. 3812Default: false. 3813@item _R_CHECK_FORCE_SUGGESTS_ 3814If true, give an error if suggested packages are not available. 3815Default: true (but false for CRAN submission checks). 3816@item _R_CHECK_RD_CONTENTS_ 3817If true, check @file{Rd} files for auto-generated content which needs 3818editing, and missing argument documentation. 3819Default: true. 3820@item _R_CHECK_RD_LINE_WIDTHS_ 3821If true, check @file{Rd} line widths in usage and examples sections. 3822Default: false (but true for CRAN submission checks). 3823@item _R_CHECK_RD_STYLE_ 3824If true, check whether @file{Rd} usage entries for S3 methods use the full 3825function name rather than the appropriate @code{\method} markup. 3826Default: true. 3827@item _R_CHECK_RD_XREFS_ 3828If true, check the cross-references in @file{.Rd} files. 3829Default: true. 3830@item _R_CHECK_SUBDIRS_NOCASE_ 3831If true, check the case of directories such as @file{R} and @file{man}. 3832Default: true. 3833@item _R_CHECK_SUBDIRS_STRICT_ 3834Initial setting for @option{--check-subdirs}. 3835Default: @samp{default} (which checks only tarballs, and checks in the 3836@file{src} only if there is no @file{configure} file). 3837@item _R_CHECK_USE_CODETOOLS_ 3838If true, make use of the @CRANpkg{codetools} package, which provides a 3839detailed analysis of visibility of objects (but may give false 3840positives). 3841Default: true (if recommended packages are installed). 3842@item _R_CHECK_USE_INSTALL_LOG_ 3843If true, record the output from installing a package as part of its 3844check to a log file (@file{00install.out} by default), even when running 3845interactively. 3846Default: true. 3847@item _R_CHECK_VIGNETTES_NLINES_ 3848Maximum number of lines to show from the bottom of the output when 3849reporting errors in running or re-building vignettes. ( Value @code{0} 3850means all lines will be shown.) 3851Default: 10 for running, 25 for re-building. 3852@item _R_CHECK_CODOC_S4_METHODS_ 3853Control whether @code{codoc()} testing is also performed on S4 methods. 3854Default: true. 3855@item _R_CHECK_DOT_INTERNAL_ 3856Control whether the package code is scanned for @code{.Internal} calls, 3857which should only be used by base (and occasionally by recommended) packages. 3858Default: true. 3859@item _R_CHECK_EXECUTABLES_ 3860Control checking for executable (binary) files. 3861Default: true. 3862@item _R_CHECK_EXECUTABLES_EXCLUSIONS_ 3863Control whether checking for executable (binary) files ignores files 3864listed in the package's @file{BinaryFiles} file. 3865Default: true (but false for CRAN submission checks). 3866However, most likely this package-level override mechanism will be 3867removed eventually. 3868@item _R_CHECK_PERMISSIONS_ 3869Control whether permissions of files should be checked. 3870Default: true iff @code{.Platform$OS.type == "unix"}. 3871@item _R_CHECK_FF_CALLS_ 3872Allows turning off @code{checkFF()} testing. If set to 3873@samp{registration}, checks the registration information (number of 3874arguments, correct choice of @code{.C/.Fortran/.Call/.External}) for 3875such calls provided the package is installed. 3876Default: true. 3877@item _R_CHECK_FF_DUP_ 3878Controls @code{checkFF(check_DUP)} 3879Default: true (and forced to be true for CRAN submission checks). 3880@item _R_CHECK_LICENSE_ 3881Control whether/how license checks are performed. A possible value is 3882@samp{maybe} (warn in case of problems, but not about standardizable 3883non-standard license specs). 3884Default: true. 3885@item _R_CHECK_RD_EXAMPLES_T_AND_F_ 3886Control whether @code{check_T_and_F()} also looks for ``bad'' (global) 3887@samp{T}/@samp{F} uses in examples. 3888Off by default because this can result in false positives. 3889@item _R_CHECK_RD_CHECKRD_MINLEVEL_ 3890Controls the minimum level for reporting warnings from @code{checkRd}. 3891Default: -1. 3892@item _R_CHECK_XREFS_REPOSITORIES_ 3893If set to a non-empty value, a space-separated list of repositories to 3894use to determine known packages. Default: empty, when the CRAN 3895and Bioconductor repositories known to @R{} is used. 3896@item _R_CHECK_SRC_MINUS_W_IMPLICIT_ 3897Control whether installation output is checked for compilation warnings 3898about implicit function declarations (as spotted by GCC with command 3899line option @option{-Wimplicit-function-declaration}, which is implied 3900by @option{-Wall}). 3901Default: false. 3902@item _R_CHECK_SRC_MINUS_W_UNUSED_ 3903Control whether installation output is checked for compilation warnings 3904about unused code constituents (as spotted by GCC with command line 3905option @option{-Wunused}, which is implied by @option{-Wall}). 3906Default: true. 3907@item _R_CHECK_WALL_FORTRAN_ 3908Control whether gfortran 4.0 or later @option{-Wall} warnings are used in 3909the analysis of installation output. 3910Default: false, even though the warnings are justifiable. 3911@item _R_CHECK_ASCII_CODE_ 3912If true, check @R{} code for non-ascii characters. 3913Default: true. 3914@item _R_CHECK_ASCII_DATA_ 3915If true, check data for non-ascii characters. @emph{En route}, checks 3916that all the datasets can be loaded and that their components can be 3917accessed. 3918Default: true. 3919@item _R_CHECK_COMPACT_DATA_ 3920If true, check data for ascii and uncompressed saves, and also check if 3921using @command{bzip2} or @code{xz} compression would be significantly 3922better. 3923Default: true. 3924@item _R_CHECK_SKIP_ARCH_ 3925Comma-separated list of architectures that will be omitted from 3926checking in a multi-arch setup. 3927Default: none. 3928@item _R_CHECK_SKIP_TESTS_ARCH_ 3929Comma-separated list of architectures that will be omitted from 3930running tests in a multi-arch setup. 3931Default: none. 3932@item _R_CHECK_SKIP_EXAMPLES_ARCH_ 3933Comma-separated list of architectures that will be omitted from 3934running examples in a multi-arch setup. 3935Default: none. 3936@item _R_CHECK_VC_DIRS_ 3937Should the unpacked package directory be checked for version-control 3938directories (@file{CVS}, @file{.svn} @dots{})? 3939Default: true for tarballs. 3940@item _R_CHECK_PKG_SIZES_ 3941Should @command{du} be used to find the installed sizes of packages? 3942@command{R CMD check} does check for the availability of @command{du}. 3943but this option allows the check to be overruled if an unsuitable 3944command is found (including one that does not respect the @option{-k} 3945flag to report in units of 1Kb, or reports in a different format -- the 3946GNU, macOS and Solaris @command{du} commands have been tested). 3947Default: true if @command{du} is found. 3948@item _R_CHECK_PKG_SIZES_THRESHOLD_ 3949Threshold used for @env{_R_CHECK_PKG_SIZES_} (in Mb). 3950Default: 5 3951@item _R_CHECK_DOC_SIZES_ 3952Should @command{qpdf} be used to check the installed sizes of PDFs? 3953Default: true if @command{qpdf} is found. 3954@item _R_CHECK_DOC_SIZES2_ 3955Should @command{gs} be used to check the installed sizes of PDFs? This 3956is slower than (and in addition to) the previous check, but does detect 3957figures with excessive detail (often hidden by over-plotting) or bitmap 3958figures with too high a resolution. Requires that @env{R_GSCMD} is set 3959to a valid program, or @command{gs} (or on Windows, 3960@command{gswin32.exe} or @command{gswin64c.exe}) is on the path. 3961Default: false (but true for CRAN submission checks). 3962@item _R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_ 3963By default the output from running the @R{} code in the vignettes is 3964kept only if there is an error. This also applies to the 3965@file{build_vignettes.log} log from the re-building of vignettes. 3966Default: false. 3967@item _R_CHECK_CLEAN_VIGN_TEST_ 3968Should the @file{vign_test} directory be removed if the test is 3969successful? 3970Default: true. 3971@item _R_CHECK_REPLACING_IMPORTS_ 3972Should warnings about replacing imports be reported? These sometimes come 3973from auto-generated @file{NAMESPACE} files in other packages, but most 3974often from importing the whole of a namespace rather than using 3975@code{importFrom}. 3976Default: true. 3977@item _R_CHECK_UNSAFE_CALLS_ 3978Check for calls that appear to tamper with (or allow tampering with) 3979already loaded code not from the current package: such calls may well 3980contravene CRAN policies. 3981Default: true. 3982@item _R_CHECK_TIMINGS_ 3983Optionally report timings for installation, examples, tests and 3984running/re-building vignettes as part of the check log. The format is 3985@samp{[as/bs]} for the total CPU time (including child processes) 3986@samp{a} and elapsed time @samp{b}, except on Windows, when it is 3987@samp{[bs]}. In most cases timings are only given for @samp{OK} checks. 3988Times with an elapsed component over 10 mins are reported in minutes 3989(with abbreviation @samp{m}). The value is the smallest numerical value 3990in elapsed seconds that should be reported: non-numerical values 3991indicate that no report is required, a value of @samp{0} that a report 3992is always required. 3993Default: @code{""}. (@code{10} for CRAN checks.) 3994 3995@item _R_CHECK_EXAMPLE_TIMING_THRESHOLD_ 3996If timings are being recorded, set the threshold in seconds for 3997reporting long-running examples (either user+system CPU time or elapsed 3998time). Default: @code{"5"}. 3999 4000@item _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ 4001For checks with timings enabled, report examples where the ratio of CPU 4002time to elapsed time exceeds this threshold (and the CPU time is at 4003least one second). This can help detect the simultaneous use of 4004multiple CPU cores. 4005Default: @code{NA}. 4006 4007@item _R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_ 4008Report for running an individual test if the ratio of CPU time to 4009elapsed time exceeds this threshold (and the CPU time is at least one 4010second). Not supported on Windows. 4011Default: @code{NA}. 4012 4013@item _R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_ 4014Report if when running/re-building vignettes (individually or in 4015aggregate) the ratio of CPU time to elapsed time exceeds this threshold 4016(and the CPU time is at least one second). Not supported on 4017Windows. 4018Default: @code{NA}. 4019 4020@item _R_CHECK_INSTALL_DEPENDS_ 4021If set to a true value and a test installation is to be done, this is 4022done with @code{.libPaths()} containing just a temporary library 4023directory and @code{.Library}. The temporary library is populated by 4024symbolic links@footnote{under Windows, junction points, or copies if 4025environment variable @env{R_WIN_NO_JUNCTIONS} has a non-empty value.} 4026to the installed copies of all the Depends/Imports/LinkingTo packages 4027which are not in @code{.Library}. Default: false (but true for CRAN 4028submission checks). 4029 4030Note that this is actually implemented in @command{R CMD INSTALL}, so it 4031is available to those who first install recording to a log, then call 4032@command{R CMD check}. 4033 4034@item _R_CHECK_DEPENDS_ONLY_ 4035@itemx _R_CHECK_SUGGESTS_ONLY_ 4036If set to a true value, running examples, tests and vignettes is done 4037with @code{.libPaths()} containing just a temporary library directory 4038and @code{.Library}. The temporary library is populated by symbolic 4039links@footnote{see the previous footnote.} to the installed copies of 4040all the Depends/Imports and (for the second only) Suggests packages 4041which are not in @code{.Library}. (As exceptions, packages in a 4042@samp{VignetteBuilder} field and test-suite managers in @samp{Suggests} 4043are always made available.) 4044Default: false (but 4045@env{_R_CHECK_SUGGESTS_ONLY_} is true for CRAN submission checks: some 4046of the regular checks use true 4047@c Solaris and Windows 4048and some use false). 4049 4050@item _R_CHECK_DEPENDS_ONLY_DATA_ 4051Apply @env{_R_CHECK_DEPENDS_ONLY_} only to the check of loading from 4052the @file{data} directory, so checks if any dataset depends on 4053packages which are in Suggests or undeclared. Default: false (but 4054true for CRAN submission checks) 4055 4056@item _R_CHECK_NO_RECOMMENDED_ 4057If set to a true value, augment the previous checks to make recommended 4058packages unavailable unless declared. 4059Default: false (but true for CRAN submission checks). 4060 4061This may give false positives on code which uses 4062@code{grDevices::densCols} and @code{stats:::asSparse} as these invoke 4063@CRANpkg{KernSmooth} and @CRANpkg{Matrix} respectively. 4064 4065@item _R_CHECK_CODETOOLS_PROFILE_ 4066A string with comma-separated @code{@var{name}=@var{value}} pairs (with 4067@var{value} a logical constant) giving additional arguments for the 4068@CRANpkg{codetools} functions used for analyzing package code. E.g., 4069use @code{_R_CHECK_CODETOOLS_PROFILE_="suppressLocalUnused=FALSE"} to 4070turn off suppressing warnings about unused local variables. Default: no 4071additional arguments, corresponding to using @code{skipWith = TRUE}, 4072@code{suppressPartialMatchArgs = FALSE} and @code{suppressLocalUnused = 4073TRUE}. 4074 4075@item _R_CHECK_CRAN_INCOMING_ 4076Check whether package is suitable for publication on CRAN. 4077Default: false, except for CRAN submission checks. 4078 4079@item _R_CHECK_CRAN_INCOMING_REMOTE_ 4080Include checks that require remote access among the above. 4081Default: same as @code{_R_CHECK_CRAN_INCOMING_} 4082 4083@item _R_CHECK_XREFS_USE_ALIASES_FROM_CRAN_ 4084When checking anchored Rd xrefs, use Rd aliases from the CRAN package 4085web areas in addition to those in the packages installed locally. 4086Default: false. 4087 4088@item _R_SHLIB_BUILD_OBJECTS_SYMBOL_TABLES_ 4089Make the checks of compiled code more accurate by recording the symbol 4090tables for objects (@file{.o} files) at installation in a file 4091@file{symbols.rds}. (Only currently supported on Linux, Solaris, macOS, 4092Windows and FreeBSD.) 4093Default: true. 4094 4095@item _R_CHECK_CODE_ASSIGN_TO_GLOBALENV_ 4096Should the package code be checked for assignments to the global 4097environment? 4098Default: false (but true for CRAN submission checks). 4099 4100@item _R_CHECK_CODE_ATTACH_ 4101Should the package code be checked for calls to @code{attach()}? 4102Default: false (but true for CRAN submission checks). 4103 4104@item _R_CHECK_CODE_DATA_INTO_GLOBALENV_ 4105Should the package code be checked for calls to @code{data()} which load 4106into the global environment? 4107Default: false (but true for CRAN submission checks). 4108 4109@item _R_CHECK_DOT_FIRSTLIB_ 4110Should the package code be checked for the presence of the obsolete function 4111@code{.First.lib()}? 4112Default: false (but true for CRAN submission checks). 4113 4114@item _R_CHECK_DEPRECATED_DEFUNCT_ 4115Should the package code be checked for the presence of recently deprecated 4116or defunct functions (including completely removed functions). Also for 4117platform-specific graphics devices. 4118Default: false (but true for CRAN submission checks). 4119 4120@item _R_CHECK_SCREEN_DEVICE_ 4121If set to @samp{warn}, give a warning if examples etc open a screen 4122device. If set to @samp{stop}, give an error. 4123Default: empty (but @samp{stop} for CRAN submission checks). 4124 4125@item _R_CHECK_WINDOWS_DEVICE_ 4126If set to @samp{stop}, give an error if a Windows-only device is used in 4127example etc. This is only useful on Windows: the devices do not exist 4128elsewhere. 4129Default: empty (but @samp{stop} for CRAN submission checks on Windows). 4130 4131@item _R_CHECK_TOPLEVEL_FILES_ 4132Report on top-level files in the package sources that are not described 4133in `Writing R Extensions' nor are commonly understood (like 4134@file{ChangeLog}). Variations on standard names (e.g.@: 4135@file{COPYRIGHT}) are also reported. 4136Default: false (but true for CRAN submission checks). 4137 4138@item _R_CHECK_GCT_N_ 4139Should the @option{--use-gct} use @code{gctorture2(@var{n})} rather than 4140@code{gctorture(TRUE)}? Use a positive integer to enable this. 4141Default: @code{0}. 4142 4143@item _R_CHECK_LIMIT_CORES_ 4144If set, check the usage of too many cores in package @pkg{parallel}. If 4145set to @samp{warn} gives a warning, to @samp{false} or @samp{FALSE} the 4146check is skipped, and any other non-empty value gives an error when more 4147than 2 children are spawned. 4148Default: unset (but @samp{TRUE} for CRAN submission checks). 4149 4150@item _R_CHECK_CODE_USAGE_VIA_NAMESPACES_ 4151If set, check code usage (via @CRANpkg{codetools}) directly on the 4152package namespace without loading and attaching the package and its 4153suggests and enhances. 4154Default: true (and true for CRAN submission checks). 4155 4156@item _R_CHECK_CODE_USAGE_WITH_ONLY_BASE_ATTACHED_ 4157If set, check code usage (via @CRANpkg{codetools}) with only the base 4158package attached. 4159Default: true. 4160 4161@item _R_CHECK_EXIT_ON_FIRST_ERROR_ 4162If set to a true value, the check will exit on the first error. 4163Default: false. 4164 4165@item _R_CHECK_S3_METHODS_NOT_REGISTERED_ 4166If set to a true value, report (apparent) S3 methods exported but not 4167registered. 4168Default: true. 4169 4170@item _R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_ 4171If set to a true value, report already registered S3 methods in 4172base/recommended packages which are overwritten when this package's 4173namespace is loaded. 4174Default: false (but true for CRAN submission checks). 4175 4176@item _R_CHECK_TESTS_NLINES_ 4177Number of trailing lines of test output to reproduce in the log. If 4178@code{0} all lines except the @R{} preamble are reproduced. 4179Default: 13. 4180 4181@item _R_CHECK_NATIVE_ROUTINE_REGISTRATION_ 4182If set to a true value, report if the entry points to register native 4183routines and to suppress dynamic search are not found in a package's 4184DLL. (@strong{NB:} this requires system command @command{nm} to be on the 4185@env{PATH}. On Windows, @command{objdump.exe} is first searched for in 4186compiler toolchain specified via @code{Makeconf} (can be customized by 4187environment variable @env{BINPREF}). If not found there, it must be on the 4188@env{PATH}. On Unix this would be normal when using a package with compiled 4189code (which are the only ones this checks), but Windows' users should check. 4190Default: false (but true for CRAN submission checks). 4191 4192@item _R_CHECK_NO_STOP_ON_TEST_ERROR_ 4193If set to a true value, do not stop running tests after first error (as 4194if command line option @option{--no-stop-on-test-error} had been given). 4195Default: false (but true for CRAN submission checks). 4196 4197@item _R_CHECK_PRAGMAS_ 4198Run additional checks on the pragmas in C/C++ source code and headers. 4199Default: false (but true for CRAN submission checks). 4200 4201@item _R_CHECK_COMPILATION_FLAGS_ 4202If the package is installed and has C/C++/Fortran code, check the 4203install log for non-portable flags (for example those added to 4204@file{src/Makevars} during configuration). Currently @option{-W} flags 4205are reported, except @option{-Wall}, @option{-Wextra} and 4206@option{-Weverything}, and flags which appear to be attempts to suppress 4207warnings are highlighted. 4208See 4209@ifset UseExternalXrefs 4210@ref{Writing portable packages, , Writing portable packages, R-exts, Writing R Extensions} 4211@end ifset 4212@ifclear UseExternalXrefs 4213`Writing R Extensions' 4214@end ifclear 4215for the rationale of this check (and why even @option{-Werror} is 4216unsafe). 4217 4218Environment variable @env{_R_CHECK_COMPILATION_FLAGS_KNOWN_} can be set 4219to a space-separated set of flags which come from the @R{} build used 4220for testing (flags such as @option{-Wall} and @option{-Wextra} are 4221already known). For example, for CRAN build of @R{} >= 4.0.0 on macOS 4222one could use 4223@example 4224_R_CHECK_COMPILATION_FLAGS_KNOWN_="-mmacosx-version-min=10.13" 4225@end example 4226@noindent 4227Default: false (but true for CRAN submission checks). 4228 4229@item _R_CHECK_R_DEPENDS_ 4230Check that any dependence on R is not on a recent patch-level version 4231such as @code{R (>= 3.3.3)} since blocking installation of a package 4232will also block its reverse dependencies. Possible values 4233@samp{"note"}, @samp{"warn"} and logical values (where currently true 4234values are equivalent to @samp{"note"}). 4235Default: false (but @samp{"warn"} for @option{--as-cran}). 4236 4237@item _R_CHECK_SERIALIZATION_ 4238Check that serialized @R{} objects in the package sources were 4239serialized with version 2 and there is no dependence on @samp{R >= 42403.5.0}. (Version 3 is in use as from @R{} 3.5.0 but should only be used 4241when necessary.) 4242Default: false (but true for CRAN submission checks). 4243 4244@item _R_CHECK_R_ON_PATH_ 4245This checks if the package attempts to use @command{R} or 4246@command{Rscript} from the path rather than that under test. 4247It does so by putting scripts at the head of the path which print a 4248message and fail. 4249Default: false (but true for CRAN submission checks). 4250 4251@item _R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_ 4252If set to a true value, also check the R code in common unit test 4253subdirectories of @file{tests} for undeclared package dependencies. 4254Default: false (but true for CRAN submission checks). 4255 4256@item _R_CHECK_SHLIB_OPENMP_FLAGS_ 4257Check correct and portable use of @code{SHLIB_OPENMP_*FLAGS} in 4258@file{src/Makevars} (and similar). 4259Default: false (but true for CRAN submission checks). 4260 4261@item _R_CHECK_CONNECTIONS_LEFT_OPEN_ 4262When checking examples, check for each example if connections are left 4263open: if any are found, this is reported with a fatal error. NB: 4264`connections' includes most use of files and any parallel clusters which 4265have not be stopped by @code{stopCluster()}. 4266Default: false (but true for CRAN submission checks). 4267 4268@item _R_CHECK_FUTURE_FILE_TIMESTAMPS_ 4269Check if any of the input files has a timestamp in the future (and to do 4270so, checks that the system clock is correct to within 5 minutes). 4271Default: false (but true for CRAN submission checks). 4272@c _R_CHECK_SYSTEM_CLOCK_ can be used to disable the clock check, for 4273@c use on a check farm. 4274 4275@item _R_CHECK_LENGTH_1_CONDITION_ 4276Optionally check if the condition in @code{if} and @code{while} statements 4277has length greater than one. For a true value (@samp{T}, @samp{True}, 4278@samp{TRUE} or @samp{true}), give an error. For a false value (@samp{F}, 4279@samp{False}, @samp{FALSE} or @samp{false}) or when unset, print a warning. 4280Any other non-true non-empty value needs to be a list of commands separated 4281by comma: @samp{abort} causes R to terminate unconditionally instead of 4282signalling an error, @samp{verbose} prints very detailed diagnostic message, 4283@samp{package:pkg} restricts the check to if/while statements executing in 4284the namespace of package @samp{pkg}, @samp{package:_R_CHECK_PACKAGE_NAME_} 4285restricts the check to if/while statements executing in the package that is 4286currently being checked by @code{R CMD check}, @samp{warn} causes R to 4287report a warning instead of signalling an error. 4288Default: unset (warning is reported, but 4289@samp{package:_R_CHECK_PACKAGE_NAME_,[abort,]verbose} for the CRAN submission checks). 4290 4291@item _R_CHECK_LENGTH_1_LOGIC2_ 4292Optionally check if an argument of the binary operators @code{&&} and 4293@code{||} has length greater than one, checked only if it is used. The 4294format is the same as for @samp{_R_CHECK_LENGTH_1_CONDITION_}. 4295Default: unset (nothing is reported, but 4296@samp{package:_R_CHECK_PACKAGE_NAME_,[abort,]verbose} for the CRAN 4297submission checks). 4298 4299@item _R_CHECK_BUILD_VIGNETTES_SEPARATELY_ 4300Prior to @R{} 3.6.0, re-building the vignette outputs was done in a 4301single @R{} session which allowed accidental reliance of one vignette on 4302another (for example, in the loading of packages). The current default 4303is to use a separate session for each vignette; this option allows 4304testing the older behaviour, 4305Default: true 4306 4307@item _R_CHECK_SYSTEM_CLOCK_ 4308As part of the `checking for future file timestamps' enabled by 4309@option{--as-cran}, check the system clock against an external clock to 4310catch errors such as the wrong day or even year. Not necessary on 4311systems doing repeated checks. 4312Default: true (but false for CRAN checking) 4313 4314@item _R_CHECK_AUTOCONF_ 4315For packages with a @file{configure} file generated by GNU 4316@command{autoconf} and either @file{configure.ac} or 4317@file{configure,.in}, check that @command{autoreconf} can, if available, 4318be run in a copy of the sources (this will detect missing source files 4319and report @command{autoconf} warnings). 4320Default: false (but true for CRAN submission checks). 4321 4322@item _R_CHECK_DATALIST_ 4323Check whether file @file{data/datalist} is out-of-date. 4324Default: false (but true for CRAN submission checks). 4325 4326@item _R_CHECK_THINGS_IN_CHECK_DIR_ 4327Check and report at the end of the check run if files have been left in 4328the check directory. 4329Default: false (but true for CRAN submission checks). 4330 4331@item _R_CHECK_THINGS_IN_TEMP_DIR_ 4332Check and report at the end of tthe check run if files would have been 4333left in the temporary directory (usually @file{/tmp} on a Unix-alike). 4334It does this by setting the environment variable @env{TEMPDIR} to a 4335subdirectory of the @R{} session directory for the @code{check} process: 4336if any files or directories are left there they are removed. Since some 4337of these might be out of the user's control, environment variable 4338@env{_R_CHECK_THINGS_IN_TEMP_DIR_EXCLUDE_} can specify an (extended 4339regex) pattern of file names not to be reported -- CRAN uses 4340@samp{^ompi.} for directories left behind by OpenMPI. There are rare 4341instances where @env{TEMPDIR} is not respected and so files are left in 4342@file{/tmp} (and not reported): one example is 4343@file{/tmp/boost_interprocess} on some OSes. 4344@c macOS is one. 4345Default: false (but true for CRAN submission checks). 4346 4347@item _R_CHECK_BASHISMS_ 4348Check the top-level scripts @file{configure} (unless generated by 4349@file{autoconf}) and @file{cleanup} for non-Bourne-shell code, using the 4350Perl script @command{checkbashisms} if available. This includes 4351reporting scripts using the non-portable @code{#! /bin/bash}. 4352(Script @command{checkbashisms} is available in most Linux distributions 4353in a package named either @samp{devscripts} or @samp{devscripts-checkbashisms} 4354and from @uref{https://sourceforge.net/projects/checkbaskisms/files}.) 4355Default: false (but true for CRAN submission checks except on Windows). 4356 4357@item _R_CHECK_ORPHANED_ 4358Check if dependencies are orphaned packages. As from @R{} 4.1.0 this 4359checks strict dependencies recursively, so will report any orphaned 4360packages which are needed to attach the package by @code{library()} as 4361well as any orphaned packages which are suggested. 4362Default: false (but true for CRAN submission checks). 4363 4364@item _R_CHECK_EXCESSIVE_IMPORTS_ 4365A positive integer. If set, give a NOTE if the number of imports from 4366non-base packages exceed this threshold. Large numbers of imports 4367make a package vulnerable to any of them becoming unavailable. 4368Default: unset (but 20 for CRAN submission checks) 4369 4370@item _R_CHECK_DONTTEST_EXAMPLES_ 4371If true and examples are found with @code{\donttest} sections, the 4372tests are run in one pass with these commented out and then in a 4373second pass including the @code{\donttest} sections, (for the main 4374architecture only). Only for the first pass are the results compared 4375to any @file{.Rout.save} file and timings analysed. Overridden by 4376@option{--run-donttest}. 4377Default: false unless @option{-as-cran} is specified (which can be 4378overridden by setting @samp{_R_CHECK_DONTTEST_EXAMPLES_=false}). 4379 4380@item _R_CHECK_XREFS_PKGS_ARE_DECLARED_ 4381Check if packages used in `anchored' cross-references in @file{.Rd} 4382files (those of the form @code{\link[@var{pkg}]@{@var{foo}@}} and 4383@code{\link[@var{pkg:bar}]@{@var{foo}@}}) are declared in the 4384@file{DESCRIPTION} file and so these links can be checked. 4385Default: false. 4386 4387@item _R_CHECK_XREFS_MIND_SUSPECT_ANCHORS_ 4388Check if package-anchored Rd cross-references are to @emph{files} (and 4389not aliases). 4390Default: false. 4391 4392@item _R_CHECK_BOGUS_RETURN_ 4393If true and @env{_R_CHECK_USE_CODETOOLS_} is also true, functions are 4394scanned for use of @code{return} rather than @code{return()}. 4395Default: false (but true for CRAN submission checks). 4396@end vtable 4397 4398CRAN's submission checks use something like 4399 4400@example 4401_R_CHECK_CRAN_INCOMING_=TRUE 4402_R_CHECK_CRAN_INCOMING_REMOTE_=TRUE 4403_R_CHECK_VC_DIRS_=TRUE 4404_R_CHECK_TIMINGS_=10 4405_R_CHECK_INSTALL_DEPENDS_=TRUE 4406_R_CHECK_SUGGESTS_ONLY_=TRUE 4407_R_CHECK_NO_RECOMMENDED_=TRUE 4408_R_CHECK_EXECUTABLES_EXCLUSIONS_=FALSE 4409_R_CHECK_DOC_SIZES2_=TRUE 4410_R_CHECK_CODE_ASSIGN_TO_GLOBALENV_=TRUE 4411_R_CHECK_CODE_ATTACH_=TRUE 4412_R_CHECK_CODE_DATA_INTO_GLOBALENV_=TRUE 4413_R_CHECK_CODE_USAGE_VIA_NAMESPACES_=TRUE 4414_R_CHECK_DOT_FIRSTLIB_=TRUE 4415_R_CHECK_DEPRECATED_DEFUNCT_=TRUE 4416_R_CHECK_REPLACING_IMPORTS_=TRUE 4417_R_CHECK_SCREEN_DEVICE_=stop 4418_R_CHECK_TOPLEVEL_FILES_=TRUE 4419_R_CHECK_S3_METHODS_NOT_REGISTERED_=TRUE 4420_R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_=TRUE 4421_R_CHECK_PRAGMAS_=TRUE 4422_R_CHECK_COMPILATION_FLAGS_=TRUE 4423_R_CHECK_R_DEPENDS_=warn 4424_R_CHECK_SERIALIZATION_=TRUE 4425_R_CHECK_R_ON_PATH_=TRUE 4426_R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_=TRUE 4427_R_CHECK_SHLIB_OPENMP_FLAGS_=TRUE 4428_R_CHECK_CONNECTIONS_LEFT_OPEN_=TRUE 4429_R_CHECK_FUTURE_FILE_TIMESTAMPS_=TRUE 4430_R_CHECK_LENGTH_1_CONDITION_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose 4431_R_CHECK_LENGTH_1_LOGIC2_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose 4432_R_CHECK_AUTOCONF_=true 4433_R_CHECK_DATALIST_=true 4434_R_CHECK_THINGS_IN_CHECK_DIR_=true 4435_R_CHECK_THINGS_IN_TEMP_DIR_=true 4436_R_CHECK_BASHISMS_=true 4437_R_CLASS_MATRIX_ARRARY_=true 4438_R_CHECK_ORPHANED_=true 4439_R_CHECK_BOGUS_RETURN_=true 4440@end example 4441 4442@noindent 4443These are turned on by @command{R CMD check --as-cran}: the incoming 4444checks also use 4445@example 4446_R_CHECK_FORCE_SUGGESTS_=FALSE 4447@end example 4448 4449@noindent 4450since some packages do suggest other packages not available on CRAN or 4451other commonly-used repositories. 4452 4453Several environment variables can be used to set `timeouts': limits for 4454the elapsed time taken by the sub-processes used for parts of the 4455checks. A value of @code{0} indicates no limit, and is the default. 4456Character strings ending in @samp{s}, @samp{m} or @samp{h} indicate a 4457number of seconds, minutes or hours respectively: other values are 4458interpreted as a whole number of seconds (with invalid inputs being 4459treated as no limit). 4460@vtable @code 4461@item _R_CHECK_ELAPSED_TIMEOUT_ 4462The default timeout for sub-processes not otherwise mentioned, and the 4463default value for all except @env{_R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_}. 4464(This is also used by @code{tools::check_packages_in_dir}.) 4465 4466@item _R_CHECK_INSTALL_ELAPSED_TIMEOUT_ 4467Limit for when @command{R CMD INSTALL} is run by @command{check}. 4468 4469@item _R_CHECK_EXAMPLES_ELAPSED_TIMEOUT_ 4470Limit for running all the examples for one sub-architecture. 4471 4472@item _R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_ 4473Limit for running one test for one sub-architecture. Default 4474@env{_R_CHECK_TESTS_ELAPSED_TIMEOUT_}. 4475 4476@item _R_CHECK_TESTS_ELAPSED_TIMEOUT_ 4477Limit for running all the tests for one sub-architecture (and the 4478default limit for running one test). 4479 4480@item _R_CHECK_ONE_VIGNETTE_ELAPSED_TIMEOUT_ 4481Limit for running the @R{} code in one vignette, including for 4482re-building each vignette separately. 4483 4484@item _R_CHECK_BUILD_VIGNETTES_ELAPSED_TIMEOUT_ 4485Limit for re-building all vignettes. 4486 4487@item _R_CHECK_PKGMAN_ELAPSED_TIMEOUT_ 4488Limit for each attempt at building the PDF package manual. 4489@end vtable 4490 4491Another variable which enables stricter checks is to set 4492@env{R_CHECK_CONSTANTS} to @code{5}. This checks that 4493nothing@footnote{The usual culprits are calls to compiled code 4494@emph{via} @code{.Call} or @code{.External} which alter their 4495arguments.} changes the values of `constants'@footnote{things which the 4496byte compiler assumes do not change, e.g.@: function bodies.} in @R{} 4497code. This is best used in conjunction with setting 4498@env{R_JIT_STRATEGY} to @code{3}, which checks code on first use (by 4499default most code is only checked after byte-compilation on second use). 4500Unfortunately these checks slow down checking of examples, tests and 4501vignettes, typically two-fold but in the worst cases at least a 4502hundred-fold. 4503 4504The following environment variables can be used to customize the 4505operation of @command{INSTALL}. 4506 4507@vtable @code 4508@item _R_INSTALL_LIBS_ONLY_FORCE_DEPENDS_IMPORTS_ 4509If true, give an error if installing only package libraries via 4510@option{--libs-only} and some package imported or depended on is not 4511available. 4512Default: true (false only for special applications, which analyze native 4513code of packages). 4514@end vtable 4515 4516@node R coding standards, Testing R code, Tools, Top 4517@chapter R coding standards 4518 4519@cindex coding standards 4520@R{} is meant to run on a wide variety of platforms, including Linux and 4521most variants of Unix as well as Windows and macOS. 4522Therefore, when extending @R{} by either adding to the @R{} base 4523distribution or by providing an add-on package, one should not rely on 4524features specific to only a few supported platforms, if this can be 4525avoided. In particular, although most @R{} developers use @acronym{GNU} 4526tools, they should not employ the @acronym{GNU} extensions to standard 4527tools. Whereas some other software packages explicitly rely on e.g.@: 4528@acronym{GNU} make or the @acronym{GNU} C++ compiler, @R{} does not. 4529Nevertheless, @R{} is a @acronym{GNU} project, and the spirit of the 4530@cite{@acronym{GNU} Coding Standards} should be followed if possible. 4531 4532The following tools can ``safely be assumed'' for @R{} extensions. 4533 4534@itemize @bullet 4535@item 4536An ISO C99 C compiler. Note that extensions such as @acronym{POSIX} 45371003.1 must be tested for, typically using Autoconf unless you are sure 4538they are supported on all mainstream @R{} platforms (including Windows 4539and macOS). 4540 4541@item 4542A fixed-form Fortran compiler. 4543 4544@item 4545A simple @command{make}, considering the features of @command{make} in 45464.2 @acronym{BSD} systems as a baseline. 4547@findex make 4548 4549@acronym{GNU} or other extensions, including pattern rules using 4550@samp{%}, the automatic variable @samp{$^}, the @samp{+=} syntax to 4551append to the value of a variable, the (``safe'') inclusion of makefiles 4552with no error, conditional execution, and many more, must not be used 4553(see Chapter ``Features'' in the @cite{@acronym{GNU} Make Manual} for 4554more information). On the other hand, building @R{} in a separate 4555directory (not containing the sources) should work provided that 4556@command{make} supports the @code{VPATH} mechanism. 4557 4558Windows-specific makefiles can assume @acronym{GNU} @command{make} 3.79 4559or later, as no other @command{make} is viable on that platform. 4560 4561@item 4562A Bourne shell and the ``traditional'' Unix programming tools, including 4563@command{grep}, @command{sed}, and @command{awk}. 4564 4565There are @acronym{POSIX} standards for these tools, but these may not 4566be fully supported. Baseline features could be determined from a book 4567such as @cite{The UNIX Programming Environment} by Brian W. Kernighan & 4568Rob Pike. Note in particular that @samp{|} in a regexp is an extended 4569regexp, and is not supported by all versions of @command{grep} or 4570@command{sed}. The Open Group Base Specifications, Issue 7, which are 4571technically identical to IEEE Std 1003.1 (POSIX), 2008, 4572are available at 4573@uref{https://pubs.opengroup.org/onlinepubs/9699919799/mindex.html}. 4574@end itemize 4575 4576Under Windows, most users will not have these tools installed, and you 4577should not require their presence for the operation of your package. 4578However, users who install your package from source will have them, as 4579they can be assumed to have followed the instructions in ``the Windows 4580toolset'' appendix of the ``R Installation and Administration'' manual 4581to obtain them. Redirection cannot be assumed to be available via 4582@command{system} as this does not use a standard shell (let alone a 4583Bourne shell). 4584 4585@noindent 4586In addition, the following tools are needed for certain tasks. 4587 4588@itemize @bullet 4589@item 4590Perl version 5 is only needed for the maintainer-only script 4591@file{tools/help2man.pl}. 4592@findex Perl 4593 4594@item 4595Makeinfo version 4.7 or later is needed to build the Info files for the 4596@R{} manuals written in the @acronym{GNU} Texinfo system. 4597@findex makeinfo 4598@end itemize 4599 4600It is also important that code is written in a way that allows others to 4601understand it. This is particularly helpful for fixing problems, and 4602includes using self-descriptive variable names, commenting the code, and 4603also formatting it properly. The @R{} Core Team recommends to use a 4604basic indentation of 4 for @R{} and C (and most likely also Perl) code, 4605and 2 for documentation in Rd format. Emacs (21 or later) users can 4606implement this indentation style by putting the following in one of 4607their startup files, and using customization to set the 4608@code{c-default-style} to @code{"bsd"} and @code{c-basic-offset} to 4609@code{4}.) 4610@findex emacs 4611 4612@smallexample 4613@group 4614;;; ESS 4615(add-hook 'ess-mode-hook 4616 (lambda () 4617 (ess-set-style 'C++ 'quiet) 4618 ;; Because 4619 ;; DEF GNU BSD K&R C++ 4620 ;; ess-indent-level 2 2 8 5 4 4621 ;; ess-continued-statement-offset 2 2 8 5 4 4622 ;; ess-brace-offset 0 0 -8 -5 -4 4623 ;; ess-arg-function-offset 2 4 0 0 0 4624 ;; ess-expression-offset 4 2 8 5 4 4625 ;; ess-else-offset 0 0 0 0 0 4626 ;; ess-close-brace-offset 0 0 0 0 0 4627 (add-hook 'local-write-file-hooks 4628 (lambda () 4629 (ess-nuke-trailing-whitespace))))) 4630(setq ess-nuke-trailing-whitespace-p 'ask) 4631;; or even 4632;; (setq ess-nuke-trailing-whitespace-p t) 4633@end group 4634@group 4635;;; Perl 4636(add-hook 'perl-mode-hook 4637 (lambda () (setq perl-indent-level 4))) 4638@end group 4639@end smallexample 4640 4641@noindent 4642(The `GNU' styles for Emacs' C and R modes use a basic indentation of 2, 4643which has been determined not to display the structure clearly enough 4644when using narrow fonts.) 4645 4646@node Testing R code, Use of TeX dialects, R coding standards, Top 4647@chapter Testing R code 4648 4649When you (as @R{} developer) add new functions to the R base (all the 4650packages distributed with @R{}), be careful to check if @kbd{make 4651test-Specific} or particularly, @kbd{cd tests; make no-segfault.Rout} 4652still works (without interactive user intervention, and on a standalone 4653computer). If the new function, for example, accesses the Internet, or 4654requires @acronym{GUI} interaction, please add its name to the ``stop 4655list'' in @file{tests/no-segfault.Rin}. 4656 4657[To be revised: use @command{make check-devel}, check the write barrier 4658if you change internal structures.] 4659 4660@node Use of TeX dialects, Current and future directions, Testing R code, Top 4661@chapter Use of TeX dialects 4662 4663Various dialects of TeX are used for different purposes in @R{}. The 4664policy is that manuals be written in @samp{texinfo}, and for convenience 4665the main and Windows FAQs are also. This has the advantage that is is 4666easy to produce @HTML{} and plain text versions as well as typeset manuals. 4667 4668@LaTeX{} is not used directly, but rather as an intermediate format for 4669typeset help documents and for vignettes. 4670 4671Care needs to be taken about the assumptions made about the @R{} user's 4672system: it may not have either @samp{texinfo} or a TeX system 4673installed. We have attempted to abstract out the cross-platform 4674differences, and almost all the setting of typeset documents is done by 4675@code{tools::texi2dvi}. This is used for offline printing of help 4676documents, preparing vignettes and for package manuals via @command{R 4677CMD Rd2pdf}. It is not currently used for the @R{} manuals created in 4678directory @file{doc/manual}. 4679 4680@code{tools::texi2dvi} makes use of a system command @command{texi2dvi} 4681where available. On a Unix-alike this is usually part of 4682@samp{texinfo}, whereas on Windows if it exists at all it would be an 4683executable, part of MiKTeX. If none is available, the @R{} code runs 4684a sequence of @command{(pdf)latex}, @command{bibtex} and 4685@command{makeindex} commands. 4686 4687This process has been rather vulnerable to the versions of the external 4688software used: particular issues have been @command{texi2dvi} and 4689@file{texinfo.tex} updates, mismatches between the two@footnote{Linux 4690distributions tend to unbundle @file{texinfo.tex} from @samp{texinfo}.}, 4691versions of the @LaTeX{} package @samp{hyperref} and quirks in index 4692production. The licenses used for @LaTeX{} and latterly @samp{texinfo} 4693prohibit us from including `known good' versions in the @R{} 4694distribution. 4695 4696On a Unix-alike @command{configure} looks for the executables for TeX and 4697friends and if found records the absolute paths in the system 4698@file{Renviron} file. This used to record @samp{false} if no command 4699was found, but it nowadays records the name for looking up on the path 4700at run time. The latter can be important for binary distributions: one 4701does not want to be tied to, for example, TeX Live 2007. 4702 4703 4704@node Current and future directions, Function and variable index, Use of TeX dialects, Top 4705@chapter Current and future directions 4706 4707This chapter is for notes about possible in-progress and future changes 4708to @R{}: there is no commitment to release such changes, let alone to a 4709timescale. 4710 4711@menu 4712* Long vectors:: 4713* 64-bit types:: 4714* Large matrices:: 4715@end menu 4716 4717@node Long vectors, 64-bit types, Current and future directions, Current and future directions 4718@section Long vectors 4719 4720Vectors in @R{} 2.x.y were limited to a length of 2^31 - 1 elements 4721(about 2 billion), as the length is stored in the @code{SEXPREC} as a C 4722@code{int}, and that type is used extensively to record lengths and 4723element numbers, including in packages. 4724 4725Note that longer vectors are effectively impossible under 32-bit 4726platforms because of their address limit, so this section applies only 4727on 64-bit platforms. The internals are unchanged on a 32-bit build of 4728@R{}. 4729 4730A single object with 2^31 or more elements will take up at least 8GB of 4731memory if integer or logical and 16GB if numeric or character, so 4732routine use of such objects is still some way off. 4733 4734There is now some support for long vectors. This applies to raw, 4735logical, integer, numeric and character vectors, and lists and 4736expression vectors. (Elements of character vectors (@code{CHARSXP}s) 4737remain limited to 2^31 - 1 bytes.) Some considerations: 4738 4739 4740@itemize 4741 4742@item 4743This has been implemented by recording the length (and true length) as 4744@code{-1} and recording the actual length as a 64-bit field at the 4745beginning of the header. Because a fair amount of code in @R{} uses a 4746signed type for the length, the `long length' is recorded using the 4747signed C99 type @code{ptrdiff_t}, which is typedef-ed to 4748@code{R_xlen_t}. 4749 4750@item 4751These can in theory have 63-bit lengths, but note that current 64-bit 4752OSes do not even theoretically offer 64-bit address spaces and there is 4753currently a 52-bit limit (which exceeds the theoretical limit of current 4754OSes and ensures that such lengths can be stored exactly in doubles). 4755 4756@item 4757The serialization format has been changed to accommodate longer lengths, 4758but vectors of lengths up to 2^31-1 are stored in the same way as 4759before. Longer vectors have their length field set to @code{-1} and 4760followed by two 32-bit fields giving the upper and lower 32-bits of the 4761actual length. There is currently a sanity check which limits lengths 4762to 2^48 on unserialization. 4763 4764@item 4765The type @code{R_xlen_t} is made available to packages in C header 4766@file{Rinternals.h}: this should be fine in C code since C99 is 4767required. People do try to use @R{} internals in C++, but C++98 4768compilers are not required to support these types. 4769 4770@item 4771Indexing can be done via the use of doubles. The internal indexing code 4772used to work with positive integer indices (and negative, logical and 4773matrix indices were all converted to positive integers): it now works 4774with either @code{INTSXP} or @code{REALSXP} indices. 4775 4776@item 4777The @R{} function @code{length} returns a double value if the length 4778exceeds 2^31-1. Code calling @code{as.integer(length(x))} before passing 4779to @code{.C}/@code{.Fortran} should checks for an @code{NA} result. 4780 4781@end itemize 4782 4783@node 64-bit types, Large matrices, Long vectors, Current and future directions 4784@section 64-bit types 4785 4786There is also some desire to be able to store larger integers in @R{}, 4787although the possibility of storing these as @code{double} is often 4788overlooked (and e.g.@: file pointers as returned by @code{seek} are 4789already stored as @code{double}). 4790 4791Different routes have been proposed: 4792 4793@itemize 4794 4795@item 4796Add a new type to @R{} and use that for lengths and indices---most likely 4797this would be a 64-bit signed type, say @code{longint}. @R{}'s usual 4798implicit coercion rules would ensure that supplying an @code{integer} 4799vector for indexing or @code{length<-} would work. 4800 4801@item 4802A more radical alternative is to change the existing @code{integer} type 4803to be 64-bit on 64-bit platforms (which was the approach taken by S-PLUS 4804for DEC/Compaq Alpha systems). Or even on all platforms. 4805 4806@item 4807Allow either @code{integer} or @code{double} values for lengths and 4808indices, and return @code{double} only when necessary. 4809 4810@end itemize 4811 4812The third has the advantages of minimal disruption to existing code and 4813not increasing memory requirements. In the first and third scenarios 4814both @R{}'s own code and user code would have to be adapted for lengths 4815that were not of type @code{integer}, and in the third code branches for 4816long vectors would be tested rarely. 4817 4818Most users of the @code{.C} and @code{.Fortran} interfaces use 4819@code{as.integer} for lengths and element numbers, but a few omit these 4820in the knowledge that these were of type @code{integer}. It may be 4821reasonable to assume that these are never intended to be used with long 4822vectors. 4823 4824The remaining interfaces will need to cope with the changed 4825@code{VECTOR_SEXPREC} types. It seems likely that in most cases lengths 4826are accessed by the @code{length} and @code{LENGTH} 4827functions@footnote{but @code{LENGTH} is a macro under some internal 4828uses.} The current approach is to keep these returning 32-bit lengths and 4829introduce `long' versions @code{xlength} and @code{XLENGTH} which return 4830@code{R_xlen_t} values. 4831 4832 4833See also @uref{https://homepage.cs.uiowa.edu/~luke/talks/useR10.pdf}. 4834 4835@node Large matrices, , 64-bit types, Current and future directions 4836@section Large matrices 4837 4838Matrices are stored as vectors and so were also limited to 2^31-1 4839elements. Now longer vectors are allowed on 64-bit platforms, matrices 4840with more elements are supported provided that each of the dimensions is 4841no more than 2^31-1. However, not all applications can be supported. 4842 4843The main problem is linear algebra done by Fortran code compiled 4844with 32-bit @code{INTEGER}. Although not guaranteed, it seems that all 4845the compilers currently used with @R{} on a 64-bit platform allow 4846matrices each of whose dimensions is less than 2^31 but with more than 48472^31 elements, and index them correctly, and a substantial part of the 4848support software (such as @acronym{BLAS} and @acronym{LAPACK}) also 4849work. 4850 4851There are exceptions: for example some complex @acronym{LAPACK} 4852auxiliary routines do use a single @code{INTEGER} index and hence 4853overflow silently and segfault or give incorrect results. One example 4854is @code{svd()} on a complex matrix. 4855 4856Since this is implementation-dependent, it is possible that optimized 4857@acronym{BLAS} and @acronym{LAPACK} may have further restrictions, 4858although none have yet been encountered. For matrix algebra on large 4859matrices one almost certainly wants a machine with a lot of RAM (100s of 4860gigabytes), many cores and a multi-threaded @acronym{BLAS}. 4861 4862 4863 4864@node Function and variable index, Concept index, Current and future directions, Top 4865@unnumbered Function and variable index 4866 4867@printindex vr 4868 4869@node Concept index, , Function and variable index, Top 4870@unnumbered Concept index 4871 4872@printindex cp 4873 4874@bye 4875 4876@c Local Variables: *** 4877@c mode: TeXinfo *** 4878@c End: *** 4879