1\input texinfo
2@c %**start of header
3@setfilename R-ints.info
4@settitle R Internals
5@setchapternewpage on
6@c %**end of header
7
8@c @documentencoding ISO-8859-1
9
10@syncodeindex fn vr
11
12@dircategory Programming
13@direntry
14* R Internals: (R-ints).      R Internals.
15@end direntry
16
17@finalout
18
19@include R-defs.texi
20@include version.texi
21
22@copying
23This manual is for R, version @value{VERSION}.
24
25@Rcopyright{1999}
26
27@quotation
28@permission{}
29@end quotation
30@end copying
31
32@titlepage
33@title R Internals
34@subtitle Version @value{VERSION}
35@author R Core Team
36@page
37@vskip 0pt plus 1filll
38@insertcopying
39@end titlepage
40
41@ifplaintext
42@insertcopying
43@end ifplaintext
44
45@c @ifnothtml
46@contents
47@c @end ifnothtml
48
49@ifnottex
50@node Top, R Internal Structures, (dir), (dir)
51@top R Internals
52
53This is a guide to the internal structures of @R{} and coding standards for
54the core team working on @R{} itself.
55
56@insertcopying
57
58@end ifnottex
59
60@menu
61* R Internal Structures::
62* .Internal vs .Primitive::
63* Internationalization in the R sources::
64* Package Structure::
65* Files::
66* Graphics Devices::
67* GUI consoles::
68* Tools::
69* R coding standards::
70* Testing R code::
71* Use of TeX dialects::
72* Current and future directions::
73* Function and variable index::
74* Concept index::
75@end menu
76@c  Could have (autogenerated) @detailmenu here ..
77
78@node R Internal Structures, .Internal vs .Primitive, Top, Top
79@chapter R Internal Structures
80
81This chapter is the beginnings of documentation about @R{} internal
82structures.  It is written for the core team and others studying the
83code in the @file{src/main} directory.
84
85It is a work-in-progress and should be checked against the current
86version of the source code.  Versions for @R{} 2.x.y contain historical
87comments about when features were introduced: this version is for the
883.x.y series.
89
90@menu
91* SEXPs::
92* Environments and variable lookup::
93* Attributes::
94* Contexts::
95* Argument evaluation::
96* Autoprinting::
97* The write barrier::
98* Serialization Formats::
99* Encodings for CHARSXPs::
100* The CHARSXP cache::
101* Warnings and errors::
102* S4 objects::
103* Memory allocators::
104* Internal use of global and base environments::
105* Modules::
106* Visibility::
107* Lazy loading::
108@end menu
109
110@node SEXPs, Environments and variable lookup, R Internal Structures, R Internal Structures
111@section SEXPs
112
113@cindex SEXP
114@cindex SEXPRREC
115What @R{} users think of as @emph{variables} or @emph{objects} are
116symbols which are bound to a value.  The value can be thought of as
117either a @code{SEXP} (a pointer), or the structure it points to, a
118@code{SEXPREC} (and there are alternative forms used for vectors, namely
119@code{VECSXP} pointing to @code{VECTOR_SEXPREC} structures).
120So the basic building blocks of @R{} objects are often called
121@emph{nodes}, meaning @code{SEXPREC}s or @code{VECTOR_SEXPREC}s.
122
123Note that the internal structure of the @code{SEXPREC} is not made
124available to @R{} Extensions: rather @code{SEXP} is an opaque pointer,
125and the internals can only be accessed by the functions provided.
126
127@cindex node
128Both types of node structure have as their first three fields a 64-bit
129@code{sxpinfo} header and then three pointers (to the attributes and the
130previous and next node in a doubly-linked list), and then some further
131fields.  On a 32-bit platform a node@footnote{strictly, a @code{SEXPREC}
132node; @code{VECTOR_SEXPREC} nodes are slightly smaller but followed by
133data in the node.} occupies 32 bytes: on a 64-bit platform typically 56
134bytes (depending on alignment constraints).
135
136The first five bits of the @code{sxpinfo} header specify one of up to 32
137@code{SEXPTYPE}s.
138
139@menu
140* SEXPTYPEs::
141* Rest of header::
142* The 'data'::
143* Allocation classes::
144@end menu
145
146@node SEXPTYPEs, Rest of header, SEXPs, SEXPs
147@subsection SEXPTYPEs
148
149@cindex SEXPTYPE
150Currently @code{SEXPTYPE}s 0:10 and 13:25 are in use.  Values 11 and 12
151were used for internal factors and ordered factors and have since been
152withdrawn.  Note that the @code{SEXPTYPE} numbers are stored in
153@code{save}d objects and that the ordering of the types is used, so the
154gap cannot easily be reused.
155
156@cindex SEXPTYPE table
157@quotation
158@multitable {no} {SPECIALSXPXXX} {S4 classes not of simple type}
159@headitem no @tab  SEXPTYPE@tab  Description
160@item @code{0}   @tab @code{NILSXP}      @tab @code{NULL}
161@item @code{1}   @tab @code{SYMSXP}      @tab symbols
162@item @code{2}   @tab @code{LISTSXP}     @tab pairlists
163@item @code{3}   @tab @code{CLOSXP}      @tab closures
164@item @code{4}   @tab @code{ENVSXP}      @tab environments
165@item @code{5}   @tab @code{PROMSXP}     @tab promises
166@item @code{6}   @tab @code{LANGSXP}     @tab language objects
167@item @code{7}   @tab @code{SPECIALSXP}  @tab special functions
168@item @code{8}   @tab @code{BUILTINSXP}  @tab builtin functions
169@item @code{9}   @tab @code{CHARSXP}     @tab internal character strings
170@item @code{10}   @tab @code{LGLSXP}     @tab logical vectors
171@item @code{13}   @tab @code{INTSXP}     @tab integer vectors
172@item @code{14}   @tab @code{REALSXP}    @tab numeric vectors
173@item @code{15}   @tab @code{CPLXSXP}    @tab complex vectors
174@item @code{16}   @tab @code{STRSXP}     @tab character vectors
175@item @code{17}   @tab @code{DOTSXP}     @tab dot-dot-dot object
176@item @code{18}   @tab @code{ANYSXP}     @tab make ``any'' args work
177@item @code{19}   @tab @code{VECSXP}     @tab list (generic vector)
178@item @code{20}   @tab @code{EXPRSXP}    @tab expression vector
179@item @code{21}   @tab @code{BCODESXP}   @tab byte code
180@item @code{22}   @tab @code{EXTPTRSXP}  @tab external pointer
181@item @code{23}   @tab @code{WEAKREFSXP} @tab weak reference
182@item @code{24}   @tab @code{RAWSXP}     @tab raw vector
183@item @code{25}   @tab @code{S4SXP}      @tab S4 classes not of simple type
184@end multitable
185@end quotation
186
187@cindex atomic vector type
188Many of these will be familiar from @R{} level: the atomic vector types
189are @code{LGLSXP}, @code{INTSXP}, @code{REALSXP}, @code{CPLXSP},
190@code{STRSXP} and @code{RAWSXP}.  Lists are @code{VECSXP} and names
191(also known as symbols) are @code{SYMSXP}.  Pairlists (@code{LISTSXP},
192the name going back to the origins of @R{} as a Scheme-like language)
193are rarely seen at @R{} level, but are for example used for argument
194lists.  Character vectors are effectively lists all of whose elements
195are @code{CHARSXP}, a type that is rarely visible at @R{} level.
196
197@cindex language object
198@cindex argument list
199Language objects (@code{LANGSXP}) are calls (including formulae and so
200on).  Internally they are pairlists with first element a
201reference@footnote{a pointer to a function or a symbol to look up the
202function by name, or a language object to be evaluated to give a
203function.} to the function to be called with remaining elements the
204actual arguments for the call (and with the tags if present giving the
205specified argument names).  Although this is not enforced, many places
206in the code assume that the pairlist is of length one or more, often
207without checking.
208
209@cindex expression
210Expressions are of type @code{EXPRSXP}: they are a vector of (usually
211language) objects most often seen as the result of @code{parse()}.
212
213@cindex function
214The functions are of types @code{CLOSXP}, @code{SPECIALSXP} and
215@code{BUILTINSXP}: where @code{SEXPTYPE}s are stored in an integer
216these are sometimes lumped into a pseudo-type @code{FUNSXP} with code
21799.  Functions defined via @code{function} are of type @code{CLOSXP} and
218have formals, body and environment.
219
220@cindex S4 type
221The @code{SEXPTYPE} @code{S4SXP} is for S4 objects which do not consist
222solely of a simple type such as an atomic vector or function.
223
224
225@node Rest of header, The 'data', SEXPTYPEs, SEXPs
226@subsection Rest of header
227
228Note that the size and structure of the header changed in @R{} 3.5.0:
229see earlier editions of this manual for the previous layout.
230
231The @code{sxpinfo} header is defined as a 64-bit C structure by
232
233@example
234#define NAMED_BITS 16
235struct sxpinfo_struct @{
236    SEXPTYPE type      :  5;  /* @r{discussed above} */
237    unsigned int scalar:  1;  /* @r{is this a numeric vector of length 1?}
238    unsigned int obj   :  1;  /* @r{is this an object with a class attribute?} */
239    unsigned int alt   :  1;  /* @r{is this an @code{ALTREP} object?} */
240    unsigned int gp    : 16;  /* @r{general purpose, see below} */
241    unsigned int mark  :  1;  /* @r{mark object as `in use' in GC} */
242    unsigned int debug :  1;
243    unsigned int trace :  1;
244    unsigned int spare :  1;  /* @r{debug once and with reference counting} */
245    unsigned int gcgen :  1;  /* @r{generation for GC} */
246    unsigned int gccls :  3;  /* @r{class of node for GC} */
247    unsigned int named : NAMED_BITS; /* @r{used to control copying} */
248    unsigned int extra : 32 - NAMED_BITS;
249@}; /*		    Tot: 64 */
250@end example
251
252@findex debug bit
253The @code{debug} bit is used for closures and environments.  For
254closures it is set by @code{debug()} and unset by @code{undebug()}, and
255indicates that evaluations of the function should be run under the
256browser.  For environments it indicates whether the browsing is in
257single-step mode.
258
259@findex trace bit
260The @code{trace} bit is used for functions for @code{trace()} and for
261other objects when tracing duplications (see @code{tracemem}).
262
263@findex spare bit
264The @code{spare} bit is used for closures to mark them for one-time
265debugging.
266
267@findex named bits
268@findex NAMED
269@findex SET_NAMED
270@cindex copying semantics
271The @code{named} field is set and accessed by the @code{SET_NAMED} and
272@code{NAMED} macros, and take values @code{0}, @code{1} and @code{2}, or
273possibly higher if @code{NAMEDMAX} is set to a higher value.
274@R{} has a `call by value' illusion, so an assignment like
275@example
276b <- a
277@end example
278[The @code{NAMED} mechanism has been replaced by reference counting.]
279
280@noindent
281appears to make a copy of @code{a} and refer to it as @code{b}.
282However, if neither @code{a} nor @code{b} are subsequently altered there
283is no need to copy.  What really happens is that a new symbol @code{b}
284is bound to the same value as @code{a} and the @code{named} field on the
285value object is set (in this case to @code{2}).  When an object is about
286to be altered, the @code{named} field is consulted.  A value of @code{2}
287or more means that the object must be duplicated before being changed.  (Note
288that this does not say that it is necessary to duplicate, only that it
289should be duplicated whether necessary or not.)  A value of @code{0}
290means that it is known that no other @code{SEXP} shares data with this
291object, and so it may safely be altered.  A value of @code{1} is used
292for situations like
293
294@example
295dim(a) <- c(7, 2)
296@end example
297
298@noindent
299where in principle two copies of @code{a} exist for the duration of the
300computation as (in principle)
301
302@example
303a <- `dim<-`(a, c(7, 2))
304@end example
305
306@noindent
307but for no longer, and so some primitive functions can be optimized to
308avoid a copy in this case.  [This mechanism is scheduled to be replaced
309in @R{} 4.0.0.]
310
311The @code{gp} bits are by definition `general purpose'.  We label these
312from 0 to 15.  Bits 0--5 and bits 14--15 have been used as described below
313(mainly from detective work on the sources).
314
315@findex gp bits
316@findex LEVELS
317@findex SETLEVELS
318The bits can be accessed and set by the @code{LEVELS} and
319@code{SETLEVELS} macros, which names appear to date back to the internal
320factor and ordered types and are now used in only a few places in the
321code.  The @code{gp} field is serialized/unserialized for the
322@code{SEXPTYPE}s other than @code{NILSXP}, @code{SYMSXP} and
323@code{ENVSXP}.
324
325Bits 14 and 15 of @code{gp} are used for `fancy bindings'.  Bit 14 is
326used to lock a binding or an environment, and bit 15 is used to indicate
327an active binding.  (For the definition of an `active binding' see the
328header comments in file @file{src/main/envir.c}.)  Bit 15 is used for an
329environment to indicate if it participates in the global cache.
330
331@findex ARGSUSED
332@findex SET_ARGUSED
333The macros @code{ARGUSED} and @code{SET_ARGUSED} are used when matching
334actual and formal function arguments, and take the values 0, 1 and 2.
335
336@findex MISSING
337@findex SET_MISSING
338The macros @code{MISSING} and @code{SET_MISSING} are used for pairlists
339of arguments.  Four bits are reserved, but only two are used (and
340exactly what for is not explained).  It seems that bit 0 is used by
341@code{matchArgs_NR} to mark missingness on the returned argument list, and
342bit 1 is used to mark the use of a default value for an argument copied
343to the evaluation frame of a closure.
344
345@findex DDVAL
346@findex SET_DDVAL
347@cindex ... argument
348Bit 0 is used by macros @code{DDVAL} and @code{SET_DDVAL}.  This
349indicates that a @code{SYMSXP} is one of the symbols @code{..n} which
350are implicitly created when @code{...} is processed, and so indicates
351that it may need to be looked up in a @code{DOTSXP}.
352
353@findex PRSEEN
354@cindex promise
355Bit 0 is used for @code{PRSEEN}, a flag to indicate if a promise has
356already been seen during the evaluation of the promise (and so to avoid
357recursive loops).
358
359Bit 0 is used for @code{HASHASH}, on the @code{PRINTNAME} of the
360@code{TAG} of the frame of an environment. (This bit is not serialized
361for @code{CHARSXP} objects.)
362
363Bits 0 and 1 are used for weak references (to indicate `ready to
364finalize', `finalize on exit').
365
366Bit 0 is used by the condition handling system (on a @code{VECSXP}) to
367indicate a calling handler.
368
369Bit 4 is turned on to mark S4 objects.
370
371Bits 1, 2, 3, 5 and 6 are used for a @code{CHARSXP} to denote its
372encoding.  Bit 1 indicates that the @code{CHARSXP} should be treated as
373a set of bytes, not necessarily representing a character in any known
374encoding.  Bits 2, 3 and 6 are used to indicate that it is known to be
375in Latin-1, UTF-8 or @acronym{ASCII} respectively.
376
377Bit 5 for a @code{CHARSXP} indicates that it is hashed by its address,
378that is @code{NA_STRING} or is in the @code{CHARSXP} cache (this is not
379serialized).  Only exceptionally is a @code{CHARSXP} not hashed, and
380this should never happen in end-user code.
381
382@node The 'data', Allocation classes, Rest of header, SEXPs
383@subsection The `data'
384
385A @code{SEXPREC} is a C structure containing the 64-bit header as
386described above, three pointers (to the attributes, previous and next
387node) and the node data, a union
388
389@example
390union @{
391    struct primsxp_struct primsxp;
392    struct symsxp_struct symsxp;
393    struct listsxp_struct listsxp;
394    struct envsxp_struct envsxp;
395    struct closxp_struct closxp;
396    struct promsxp_struct promsxp;
397@} u;
398@end example
399
400@noindent
401All of these alternatives apart from the first (an @code{int}) are three
402pointers, so the union occupies three words.
403
404@cindex vector type
405The vector types are @code{RAWSXP}, @code{CHARSXP}, @code{LGLSXP},
406@code{INTSXP}, @code{REALSXP}, @code{CPLXSXP}, @code{STRSXP},
407@code{VECSXP}, @code{EXPRSXP} and @code{WEAKREFSXP}.  Remember that such
408types are a @code{VECTOR_SEXPREC}, which again consists of the header
409and the same three pointers, but followed by two integers giving the
410length and `true length'@footnote{The only current use is for hash tables of
411environments (@code{VECSXP}s), where @code{length} is the size of the table
412and @code{truelength} is the number of primary slots in use, for the
413reference hash tables in serialization (@code{VECSXP}s), and for `growable'
414vectors (atomic vectors, @code{VECSXP}s and @code{EXPRSXP}s) which are
415created by slightly over-committing when enlarging a vector during
416subassignment, so that some number of the following enlargements during
417subassignment can be performed in place), where @code{truelength} is the
418number of slots in use.  } of the vector, and then followed by the data
419(aligned as required: on most 32-bit systems with a 24-byte
420@code{VECTOR_SEXPREC} node the data can follow immediately after the node).
421The data are a block of memory of the appropriate length to store `true
422length' elements (rounded up to a multiple of 8 bytes, with the 8-byte
423blocks being the `Vcells' referred in the documentation for @code{gc()}).
424
425The `data' for the various types are given in the table below.  A lot of
426this is interpretation, i.e.@: the types are not checked.
427
428@table @code
429@item NILSXP
430There is only one object of type @code{NILSXP}, @code{R_NilValue}, with
431no data.
432
433@item SYMSXP
434Pointers to three nodes, the name, value and internal, accessed by
435@code{PRINTNAME} (a @code{CHARSXP}), @code{SYMVALUE} and
436@code{INTERNAL}.  (If the symbol's value is a @code{.Internal} function,
437the last is a pointer to the appropriate @code{SEXPREC}.)  Many symbols
438have @code{SYMVALUE} @code{R_UnboundValue}.
439
440@item LISTSXP
441Pointers to the CAR, CDR (usually a @code{LISTSXP} or @code{NULL}) and
442TAG (a @code{SYMSXP} or @code{NULL}).
443
444@item CLOSXP
445Pointers to the formals (a pairlist), the body and the environment.
446
447@item ENVSXP
448Pointers to the frame, enclosing environment and hash table (@code{NULL} or a
449@code{VECSXP}).  A frame is a tagged pairlist with tag the symbol and
450CAR the bound value.
451
452@item PROMSXP
453Pointers to the value, expression and environment (in which to evaluate
454the expression).  Once an promise has been evaluated, the environment is
455set to @code{NULL}.
456
457@item LANGSXP
458A special type of @code{LISTSXP} used for function calls.  (The CAR
459references the function (perhaps via a symbol or language object), and
460the CDR the argument list with tags for named arguments.)  @R{}-level
461documentation references to `expressions' / `language objects' are
462mainly @code{LANGSXP}s, but can be symbols (@code{SYMSXP}s) or
463expression vectors (@code{EXPRSXP}s).
464
465@item SPECIALSXP
466@itemx BUILTINSXP
467An integer giving the offset into the table of
468primitives/@code{.Internal}s.
469
470@item CHARSXP
471@code{length}, @code{truelength} followed by a block of bytes (allowing
472for the @code{nul} terminator).
473
474@item LGLSXP
475@itemx INTSXP
476@code{length}, @code{truelength} followed by a block of C @code{int}s
477(which are 32 bits on all @R{} platforms).
478
479@item REALSXP
480@code{length}, @code{truelength} followed by a block of C @code{double}s.
481
482@item CPLXSXP
483@code{length}, @code{truelength} followed by a block of C99 @code{double
484complex}s.
485
486@item STRSXP
487@code{length}, @code{truelength} followed by a block of pointers
488(@code{SEXP}s pointing to @code{CHARSXP}s).
489
490@item DOTSXP
491A special type of @code{LISTSXP} for the value bound to a @code{...}
492symbol: a pairlist of promises.
493
494@item ANYSXP
495This is used as a place holder for any type: there are no actual objects
496of this type.
497
498@item VECSXP
499@itemx EXPRSXP
500@code{length}, @code{truelength} followed by a block of pointers.  These
501are internally identical (and identical to @code{STRSXP}) but differ in
502the interpretations placed on the elements.
503
504@item BCODESXP
505For the `byte-code' objects generated by the compiler.
506
507@item EXTPTRSXP
508Has three pointers, to the pointer, the protection value (an @R{} object
509which if alive protects this object) and a tag (a @code{SYMSXP}?).
510
511@item WEAKREFSXP
512A @code{WEAKREFSXP} is a special @code{VECSXP} of length 4, with
513elements @samp{key}, @samp{value}, @samp{finalizer} and @samp{next}.
514The @samp{key} is @code{NULL}, an environment or an external pointer,
515and the @samp{finalizer} is a function or @code{NULL}.
516
517@item RAWSXP
518@code{length}, @code{truelength} followed by a block of bytes.
519
520@item S4SXP
521two unused pointers and a tag.
522@end table
523
524@node Allocation classes,  , The 'data', SEXPs
525@subsection Allocation classes
526
527@cindex allocation classes
528As we have seen, the field @code{gccls} in the header is three bits to
529label up to 8 classes of nodes.  Non-vector nodes are of class 0, and
530`small' vector nodes are of classes 1 to 5, with a class for custom
531allocator vector nodes 6 and `large' vector nodes being of class 7.  The
532`small' vector nodes are able to store vector data of up to 8, 16, 32,
53364 and 128 bytes: larger vectors are @code{malloc}-ed individually
534whereas the `small' nodes are allocated from pages of about 2000
535bytes. Vector nodes allocated using custom allocators (via
536@code{allocVector3}) are not counted in the gc memory usage statistics
537since their memory semantics is not under R's control and may be
538non-standard (e.g., memory could be partially shared across nodes).
539
540
541@node Environments and variable lookup, Attributes, SEXPs, R Internal Structures
542@section Environments and variable lookup
543
544@cindex environment
545@cindex variable lookup
546What users think of as `variables' are symbols which are bound to
547objects in `environments'.  The word `environment' is used ambiguously
548in @R{} to mean @emph{either} the frame of an @code{ENVSXP} (a pairlist
549of symbol-value pairs) @emph{or} an @code{ENVSXP}, a frame plus an
550enclosure.
551
552@cindex user databases
553There are additional places that `variables' can be looked up, called
554`user databases' in comments in the code.  These seem undocumented in
555the @R{} sources, but apparently refer to the @pkg{RObjectTable} package
556at @uref{http://www.omegahat.net/RObjectTables/}.
557
558@cindex base environment
559@cindex environment, base
560The base environment is special.  There is an @code{ENVSXP} environment
561with enclosure the empty environment @code{R_EmptyEnv}, but the frame of
562that environment is not used.  Rather its bindings are part of the
563global symbol table, being those symbols in the global symbol table
564whose values are not @code{R_UnboundValue}.  When @R{} is started the
565internal functions are installed (by C code) in the symbol table, with
566primitive functions having values and @code{.Internal} functions having
567what would be their values in the field accessed by the @code{INTERNAL}
568macro.  Then @code{.Platform} and @code{.Machine} are computed and the
569base package is loaded into the base environment followed by the system
570profile.
571
572The frames of environments (and the symbol table) are normally hashed
573for faster access (including insertion and deletion).
574
575By default @R{} maintains a (hashed) global cache of `variables' (that
576is symbols and their bindings) which have been found, and this refers
577only to environments which have been marked to participate, which
578consists of the global environment (aka the user workspace), the base
579environment plus environments@footnote{Remember that attaching a list or
580a saved image actually creates and populates an environment and attaches
581that.} which have been @code{attach}ed.  When an environment is either
582@code{attach}ed or @code{detach}ed, the names of its symbols are flushed
583from the cache.  The cache is used whenever searching for variables from
584the global environment (possibly as part of a recursive search).
585
586@menu
587* Search paths::
588* Namespaces::
589* Hash table::
590@end menu
591
592@node Search paths, Namespaces, Environments and variable lookup, Environments and variable lookup
593@subsection Search paths
594
595@cindex search path
596@Sl{} has the notion of a `search path': the lookup for a `variable'
597leads (possibly through a series of frames) to the `session frame' the
598`working directory' and then along the search path.  The search path is
599a series of databases (as returned by @code{search()}) which contain the
600system functions (but not necessarily at the end of the path, as by
601default the equivalent of packages are added at the end).
602
603@R{} has a variant on the @Sl{} model.  There is a search path (also
604returned by @code{search()}) which consists of the global environment
605(aka user workspace) followed by environments which have been attached
606and finally the base environment.  Note that unlike @Sl{} it is not
607possible to attach environments before the workspace nor after the base
608environment.
609
610However, the notion of variable lookup is more general in @R{}, hence
611the plural in the title of this subsection.  Since environments have
612enclosures, from any environment there is a search path found by looking
613in the frame, then the frame of its enclosure and so on.  Since loops
614are not allowed, this process will eventually terminate: it can
615terminate at either the base environment or the empty environment.  (It
616can be conceptually simpler to think of the search always terminating at
617the empty environment, but with an optimization to stop at the base
618environment.)  So the `search path' describes the chain of environments
619which is traversed once the search reaches the global environment.
620
621@node Namespaces, Hash table, Search paths, Environments and variable lookup
622@subsection Namespaces
623
624@cindex namespace
625Namespaces are environments associated with packages (and once again
626the base package is special and will be considered separately).  A
627package @code{@var{pkg}} defines two environments
628@code{namespace:@var{pkg}} and @code{package:@var{pkg}}: it is
629@code{package:@var{pkg}} that can be @code{attach}ed and form part of
630the search path.
631
632The objects defined by the @R{} code in the package are symbols with
633bindings in the @code{namespace:@var{pkg}} environment.  The
634@code{package:@var{pkg}} environment is populated by selected symbols
635from the @code{namespace:@var{pkg}} environment (the exports).  The
636enclosure of this environment is an environment populated with the
637explicit imports from other namespaces, and the enclosure of
638@emph{that} environment is the base namespace.  (So the illusion of the
639imports being in the namespace environment is created via the
640environment tree.)  The enclosure of the base namespace is the global
641environment, so the search from a package namespace goes via the
642(explicit and implicit) imports to the standard `search path'.
643
644@cindex base namespace
645@cindex namespace, base
646@findex R_BaseNamespace
647The base namespace environment @code{R_BaseNamespace} is another
648@code{ENVSXP} that is special-cased.  It is effectively the same thing
649as the base environment @code{R_BaseEnv} @emph{except} that its
650enclosure is the global environment rather than the empty environment:
651the internal code diverts lookups in its frame to the global symbol
652table.
653
654@node Hash table,  , Namespaces, Environments and variable lookup
655@subsection Hash table
656
657Environments in @R{} usually have a hash table, and nowadays that is the
658default in @code{new.env()}.  It is stored as a @code{VECSXP} where
659@code{length} is used for the allocated size of the table and
660@code{truelength} is the number of primary slots in use---the pointer to
661the @code{VECSXP} is part of the header of a @code{SEXP} of type
662@code{ENVSXP}, and this points to @code{R_NilValue} if the environment
663is not hashed.
664
665For the pros and cons of hashing, see a basic text on Computer Science.
666
667The code to implement hashed environments is in @file{src/main/envir.c}.
668Unless set otherwise (e.g.@: by the @code{size} argument of
669@code{new.env()}) the initial table size is @code{29}.  The table will
670be resized by a factor of 1.2 once the load factor (the proportion of
671primary slots in use) reaches 85%.
672
673The hash chains are stored as pairlist elements of the @code{VECSXP}:
674items are inserted at the front of the pairlist.  Hashing is principally
675designed for fast searching of environments, which are from time to time
676added to but rarely deleted from, so items are not actually deleted but
677have their value set to @code{R_UnboundValue}.
678
679
680@node Attributes, Contexts, Environments and variable lookup, R Internal Structures
681@section Attributes
682
683@cindex attributes
684@findex ATTRIB
685@findex SET_ATTRIB
686@findex DUPLICATE_ATTRIB
687As we have seen, every @code{SEXPREC} has a pointer to the attributes of
688the node (default @code{R_NilValue}).  The attributes can be
689accessed/set by the macros/functions @code{ATTRIB} and
690@code{SET_ATTRIB}, but such direct access is normally only used to check
691if the attributes are @code{NULL} or to reset them.  Otherwise access
692goes through the functions @code{getAttrib} and @code{setAttrib} which
693impose restrictions on the attributes.  One thing to watch is that if
694you copy attributes from one object to another you may (un)set the
695@code{"class"} attribute and so need to copy the object and S4 bits as
696well.  There is a macro/function @code{DUPLICATE_ATTRIB} to automate
697this.
698
699Note that the `attributes' of a @code{CHARSXP} are used as part of the
700management of the @code{CHARSXP} cache: of course @code{CHARSXP}'s are
701not user-visible but C-level code might look at their attributes.
702
703The code assumes that the attributes of a node are either
704@code{R_NilValue} or a pairlist of non-zero length (and this is checked
705by @code{SET_ATTRIB}).  The attributes are named (via tags on the
706pairlist).  The replacement function @code{attributes<-} ensures that
707@code{"dim"} precedes @code{"dimnames"} in the pairlist.  Attribute
708@code{"dim"} is one of several that is treated specially: the values are
709checked, and any @code{"names"} and @code{"dimnames"} attributes are
710removed.  Similarly, you cannot set @code{"dimnames"} without having set
711@code{"dim"}, and the value assigned must be a list of the correct
712length and with elements of the correct lengths (and all zero-length
713elements are replaced by @code{NULL}).
714
715The other attributes which are given special treatment are
716@code{"names"}, @code{"class"}, @code{"tsp"}, @code{"comment"} and
717@code{"row.names"}.  For pairlist-like objects the names are not stored
718as an attribute but (as symbols) as the tags: however the @R{} interface
719makes them look like conventional attributes, and for one-dimensional
720arrays they are stored as the first element of the @code{"dimnames"}
721attribute.  The C code ensures that the @code{"tsp"} attribute is an
722@code{REALSXP}, the frequency is positive and the implied length agrees
723with the number of rows of the object being assigned to.  Classes and
724comments are restricted to character vectors, and assigning a
725zero-length comment or class removes the attribute.  Setting or removing
726a @code{"class"} attribute sets the object bit appropriately.  Integer
727row names are converted to and from the internal compact representation.
728
729@cindex copying semantics
730Care needs to be taken when adding attributes to objects of the types
731with non-standard copying semantics.  There is only one object of type
732@code{NILSXP}, @code{R_NilValue}, and that should never have attributes
733(and this is enforced in @code{installAttrib}).  For environments,
734external pointers and weak references, the attributes should be relevant
735to all uses of the object: it is for example reasonable to have a name
736for an environment, and also a @code{"path"} attribute for those
737environments populated from @R{} code in a package.
738
739@cindex attributes, preserving
740@cindex preserving attributes
741When should attributes be preserved under operations on an object?
742Becker, Chambers & Wilks (1988, pp. 144--6) give some guidance.  Scalar
743functions (those which operate element-by-element on a vector and whose
744output is similar to the input) should preserve attributes (except
745perhaps class, and if they do preserve class they need to preserve the
746@code{OBJECT} and S4 bits).  Binary operations normally call
747@findex copyMostAttrib
748@code{copyMostAttrib} to copy most attributes from the longer
749argument (and if they are of the same length from both, preferring the
750values on the first).  Here `most' means all except the @code{names},
751@code{dim} and @code{dimnames} which are set appropriately by the code
752for the operator.
753
754Subsetting (other than by an empty index) generally drops all attributes
755except @code{names}, @code{dim} and @code{dimnames} which are reset as
756appropriate.  On the other hand, subassignment generally preserves such
757attributes even if the length is changed.  Coercion drops all
758attributes. For example:
759
760@example
761> x <- structure(1:8, names=letters[1:8], comm="a comment")
762> x[]
763a b c d e f g h
7641 2 3 4 5 6 7 8
765attr(,"comm")
766[1] "a comment"
767> x[1:3]
768a b c
7691 2 3
770> x[3] <- 3
771> x
772a b c d e f g h
7731 2 3 4 5 6 7 8
774attr(,"comm")
775[1] "a comment"
776> x[9] <- 9
777> x
778a b c d e f g h
7791 2 3 4 5 6 7 8 9
780attr(,"comm")
781[1] "a comment"
782@end example
783
784
785@node Contexts, Argument evaluation, Attributes, R Internal Structures
786@section Contexts
787
788@cindex context
789@emph{Contexts} are the internal mechanism used to keep track of where a
790computation has got to (and from where), so that control-flow constructs
791can work and reasonable information can be produced on error conditions
792(such as @emph{via} traceback), and otherwise (the @code{sys.@var{xxx}}
793functions).
794
795Execution contexts are a stack of C @code{structs}:
796
797@example
798typedef struct RCNTXT @{
799    struct RCNTXT *nextcontext; /* @r{The next context up the chain} */
800    int callflag;               /* @r{The context `type'} */
801    JMP_BUF cjmpbuf;            /* @r{C stack and register information} */
802    int cstacktop;              /* @r{Top of the pointer protection stack} */
803    int evaldepth;              /* @r{Evaluation depth at inception} */
804    SEXP promargs;              /* @r{Promises supplied to closure} */
805    SEXP callfun;               /* @r{The closure called} */
806    SEXP sysparent;             /* @r{Environment the closure was called from} */
807    SEXP call;                  /* @r{The call that effected this context} */
808    SEXP cloenv;                /* @r{The environment} */
809    SEXP conexit;               /* @r{Interpreted @code{on.exit} code} */
810    void (*cend)(void *);       /* @r{C @code{on.exit} thunk} */
811    void *cenddata;             /* @r{Data for C @code{on.exit} thunk} */
812    char *vmax;                 /* @r{Top of the @code{R_alloc} stack} */
813    int intsusp;                /* @r{Interrupts are suspended} */
814    SEXP handlerstack;          /* @r{Condition handler stack} */
815    SEXP restartstack;          /* @r{Stack of available restarts} */
816    struct RPRSTACK *prstack;   /* @r{Stack of pending promises} */
817@} RCNTXT, *context;
818@end example
819
820@noindent
821plus additional fields for the byte-code compiler.  The `types'
822are from
823
824@example
825enum @{
826    CTXT_TOPLEVEL = 0,  /* @r{toplevel context} */
827    CTXT_NEXT     = 1,  /* @r{target for @code{next}} */
828    CTXT_BREAK    = 2,  /* @r{target for @code{break}} */
829    CTXT_LOOP     = 3,  /* @r{@code{break} or @code{next} target} */
830    CTXT_FUNCTION = 4,  /* @r{function closure} */
831    CTXT_CCODE    = 8,  /* @r{other functions that need error cleanup} */
832    CTXT_RETURN   = 12, /* @r{@code{return()} from a closure} */
833    CTXT_BROWSER  = 16, /* @r{return target on exit from browser} */
834    CTXT_GENERIC  = 20, /* @r{rather, running an S3 method} */
835    CTXT_RESTART  = 32, /* @r{a call to @code{restart} was made from a closure} */
836    CTXT_BUILTIN  = 64  /* @r{builtin internal function} */
837@};
838@end example
839
840@noindent
841where the @code{CTXT_FUNCTION} bit is on wherever function closures are
842involved.
843
844Contexts are created by a call to @code{begincontext} and ended by a
845call to @code{endcontext}: code can search up the stack for a
846particular type of context via @code{findcontext} (and jump there) or
847jump to a specific context via @code{R_JumpToContext}.
848@code{R_ToplevelContext} is the `idle' state (normally the command
849prompt), and @code{R_GlobalContext} is the top of the stack.
850
851Note that whilst calls to closures set a context, internal functions never
852do and primitive builtins only set it when profiling or when they are
853interfaces to foreign functions.
854
855The byte-code compiler generates a map of instructions to source references
856and expressions at compile time, which allows to produce information on
857error conditions.  As an optimization, the byte-code interpreter then does
858not set a context in some cases, such as in simple loops or when inlining
859simple builtins or wrappers for internal functions.
860
861@findex UseMethod
862@cindex method dispatch
863Dispatching from a S3 generic (via @code{UseMethod} or its internal
864equivalent) or calling @code{NextMethod} sets the context type to
865@code{CTXT_GENERIC}.  This is used to set the @code{sysparent} of the
866method call to that of the @code{generic}, so the method appears to have
867been called in place of the generic rather than from the generic.
868
869The @R{} @code{sys.frame} and @code{sys.call} functions work by counting
870calls to closures (type @code{CTXT_FUNCTION}) from either end of the
871context stack.
872
873Note that the @code{sysparent} element of the structure is not the same
874thing as @code{sys.parent()}.  Element @code{sysparent} is primarily
875used in managing changes of the function being evaluated, i.e.@: by
876@code{Recall} and method dispatch.
877
878@code{CTXT_CCODE} contexts are currently used in @code{cat()},
879@code{load()}, @code{scan()} and @code{write.table()} (to close the
880connection on error), by @code{PROTECT}, serialization (to recover from
881errors, e.g.@: free buffers) and within the error handling code (to
882raise the C stack limit and reset some variables).
883
884
885@node Argument evaluation, Autoprinting, Contexts, R Internal Structures
886@section Argument evaluation
887
888@cindex argument evaluation
889As we have seen, functions in @R{} come in three types, closures
890(@code{SEXPTYPE} @code{CLOSXP}), specials (@code{SPECIALSXP}) and
891builtins (@code{BUILTINSXP}).  In this section we consider when (and if)
892the actual arguments of function calls are evaluated.  The rules are
893different for the internal (special/builtin) and @R{}-level functions
894(closures).
895
896For a call to a closure, the actual and formal arguments are matched and
897a matched call (another @code{LANGSXP}) is constructed.  This process
898first replaces the actual argument list by a list of promises to the
899values supplied.  It then constructs a new environment which contains
900the names of the formal parameters matched to actual or default values:
901all the matched values are promises, the defaults as promises to be
902evaluated in the environment just created.  That environment is then
903used for the evaluation of the body of the function, and promises will
904be forced (and hence actual or default arguments evaluated) when they
905are encountered.
906@findex NAMED
907(Evaluating a promise sets @code{NAMED = NAMEDMAX} on its value, so if the
908argument was a symbol its binding is regarded as having multiple
909references during the evaluation of the closure call.)
910[The @code{NAMED} mechanism has been replaced by reference counting.]
911
912If the closure is an S3 generic (that is, contains a call to
913@code{UseMethod}) the evaluation process is the same until the
914@code{UseMethod} call is encountered.  At that point the argument on
915which to do dispatch (normally the first) will be evaluated if it has
916not been already.  If a method has been found which is a closure, a new
917evaluation environment is created for it containing the matched
918arguments of the method plus any new variables defined so far during the
919evaluation of the body of the generic.  (Note that this means changes to
920the values of the formal arguments in the body of the generic are
921discarded when calling the method, but @emph{actual} argument promises
922which have been forced retain the values found when they were forced.
923On the other hand, missing arguments have values which are promises to
924use the default supplied by the method and not by the generic.)  If the
925method found is a primitive it is called with the matched argument list
926of promises (possibly already forced) used for the generic.
927
928@cindex builtin function
929@cindex special function
930@cindex primitive function
931@cindex .Internal function
932The essential difference@footnote{There is currently one other
933difference: when profiling builtin functions are counted as function
934calls but specials are not.} between special and builtin functions is
935that the arguments of specials are not evaluated before the C code is
936called, and those of builtins are.  Note that being a special/builtin is
937separate from being primitive or @code{.Internal}: @code{quote} is a
938special primitive, @code{+} is a builtin primitive, @code{cbind} is a
939special @code{.Internal} and @code{grep} is a builtin @code{.Internal}.
940
941@cindex generic, internal
942@findex DispatchOrEval
943Many of the internal functions are internal generics, which for specials
944means that they do not evaluate their arguments on call, but the C code
945starts with a call to @code{DispatchOrEval}.  The latter evaluates the
946first argument, and looks for a method based on its class.  (If S4
947dispatch is on, S4 methods are looked for first, even for S3 classes.)
948If it finds a method, it dispatches to that method with a call based on
949promises to evaluate the remaining arguments.  If no method is found,
950the remaining arguments are evaluated before return to the internal
951generic.
952
953@cindex generic, generic
954@findex DispatchGeneric
955The other way that internal functions can be generic is to be group
956generic.  Most such functions are builtins (so immediately evaluate all
957their arguments), and all contain a call to the C function
958@code{DispatchGeneric}.  There are some peculiarities over the number of
959arguments for the @code{"Math"} group generic, with some members
960allowing only one argument, some having two (with a default for the
961second) and @code{trunc} allows one or more but the default method only
962accepts one.
963
964@menu
965* Missingness::
966* Dot-dot-dot arguments::
967@end menu
968
969@node Missingness, Dot-dot-dot arguments, Argument evaluation, Argument evaluation
970@subsection Missingness
971
972@cindex missingness
973Actual arguments to (non-internal) @R{} functions can be fewer than are
974required to match the formal arguments of the function.  Having
975unmatched formal arguments will not matter if the argument is never used
976(by lazy evaluation),  but when the argument is evaluated, either its
977default value is evaluated (within the evaluation environment of the
978function) or an error is thrown with a message along the lines of
979
980@example
981argument "foobar" is missing, with no default
982@end example
983
984@findex MISSING
985@findex R_MissingArg
986Internally missingness is handled by two mechanisms. The object
987@code{R_MissingArg} is used to indicate that a formal argument has no
988(default) value.  When matching the actual arguments to the formal
989arguments, a new argument list is constructed from the formals all of
990whose values are @code{R_MissingArg} with the first @code{MISSING} bit
991set.  Then whenever a formal argument is matched to an actual argument,
992the corresponding member of the new argument list has its value set to
993that of the matched actual argument, and if that is not
994@code{R_MissingArg} the missing bit is unset.
995
996This new argument list is used to form the evaluation frame for the
997function, and if named arguments are subsequently given a new value
998(before they are evaluated) the missing bit is cleared.
999
1000Missingness of arguments can be interrogated via the @code{missing()}
1001function.  An argument is clearly missing if its missing bit is set or
1002if the value is @code{R_MissingArg}.  However, missingness can be passed
1003on from function to function, for using a formal argument as an actual
1004argument in a function call does not count as evaluation.  So
1005@code{missing()} has to examine the value (a promise) of a
1006non-yet-evaluated formal argument to see if it might be missing, which
1007might involve investigating a promise and so on @dots{}.
1008
1009Special primitives also need to handle missing arguments, and in some
1010case (e.g.@: @code{log}) that is why they are special and not
1011builtin.  This is usually done by testing if an argument's value is
1012@code{R_MissingArg}.
1013
1014@node Dot-dot-dot arguments,  , Missingness, Argument evaluation
1015@subsection Dot-dot-dot arguments
1016
1017@cindex ... argument
1018Dot-dot-dot arguments are convenient when writing functions, but
1019complicate the internal code for argument evaluation.
1020
1021The formals of a function with a @code{...} argument represent that as a
1022single argument like any other argument, with tag the symbol
1023@code{R_DotsSymbol}.  When the actual arguments are matched to the
1024formals, the value of the @code{...} argument is of @code{SEXPTYPE}
1025@code{DOTSXP}, a pairlist of promises (as used for matched arguments)
1026but distinguished by the @code{SEXPTYPE}.
1027
1028Recall that the evaluation frame for a function initially contains the
1029@code{@var{name}=@var{value}} pairs from the matched call, and hence
1030this will be true for @code{...} as well.  The value of @code{...} is a
1031(special) pairlist whose elements are referred to by the special symbols
1032@code{..1}, @code{..2}, @dots{} which have the @code{DDVAL} bit set:
1033when one of these is encountered it is looked up (via @code{ddfindVar})
1034in the value of the @code{...} symbol in the evaluation frame.
1035
1036Values of arguments matched to a @code{...} argument can be missing.
1037
1038Special primitives may need to handle @code{...} arguments: see for
1039example the internal code of @code{switch} in file
1040@file{src/main/builtin.c}.
1041
1042@node Autoprinting, The write barrier, Argument evaluation, R Internal Structures
1043@section Autoprinting
1044
1045@cindex autoprinting
1046@findex R_Visible
1047
1048Whether the returned value of a top-level @R{} expression is printed is
1049controlled by the global boolean variable @code{R_Visible}.  This is set
1050(to true or false) on entry to all primitive and internal functions
1051based on the @code{eval} column of the table in file
1052@file{src/main/names.c}: the appropriate setting can be extracted by the
1053macro @code{PRIMPRINT}.
1054@findex PRIMPRINT
1055
1056@findex invisible
1057The @R{} primitive function @code{invisible} makes use of this
1058mechanism: it just sets @code{R_Visible = FALSE} before entry and
1059returns its argument.
1060
1061For most functions the intention will be that the setting of
1062@code{R_Visible} when they are entered is the setting used when they
1063return, but there need to be exceptions.  The @R{} functions
1064@code{identify}, @code{options}, @code{system} and @code{writeBin}
1065determine whether the result should be visible from the arguments or
1066user action.  Other functions themselves dispatch functions which may
1067change the visibility flag: examples@footnote{the other current example
1068is left brace, which is implemented as a primitive.} are
1069@code{.Internal}, @code{do.call}, @code{eval}, @code{withVisible},
1070@code{if}, @code{NextMethod}, @code{Recall}, @code{recordGraphics},
1071@code{standardGeneric}, @code{switch} and @code{UseMethod}.
1072
1073`Special' primitive and internal functions evaluate their arguments
1074internally @emph{after} @code{R_Visible} has been set, and evaluation of
1075the arguments (e.g.@: an assignment as in PR#9263) can change the value
1076of the flag.
1077
1078The @code{R_Visible} flag can also get altered during the evaluation of
1079a function, with comments in the code about @code{warning},
1080@code{writeChar} and graphics functions calling @code{GText} (PR#7397).
1081(Since the C-level function @code{eval} sets @code{R_Visible}, this
1082could apply to any function calling it.  Since it is called when
1083evaluating promises, even object lookup can change @code{R_Visible}.)
1084Internal and primitive functions force the documented setting of
1085@code{R_Visible} on return, unless the C code is allowed to change it
1086(the exceptions above are indicated by @code{PRIMPRINT} having value 2).
1087
1088The actual autoprinting is done by @code{PrintValueEnv} in file
1089@file{print.c}.  If the object to be printed has the S4 bit set and S4
1090methods dispatch is on, @code{show} is called to print the object.
1091Otherwise, if the object bit is set (so the object has a
1092@code{"class"} attribute), @code{print} is called to dispatch methods:
1093for objects without a class the internal code of @code{print.default}
1094is called.
1095
1096
1097@node The write barrier, Serialization Formats, Autoprinting, R Internal Structures
1098@section The write barrier and the garbage collector
1099
1100@cindex write barrier
1101@cindex garbage collector
1102@R{} has long had a generational garbage collector, and bit @code{gcgen}
1103in the @code{sxpinfo} header is used in the implementation of this.
1104This is used in conjunction with the @code{mark} bit to identify two
1105previous generations.
1106
1107There are three levels of collections.  Level 0 collects only the
1108youngest generation, level 1 collects the two youngest generations and
1109level 2 collects all generations.  After 20 level-0 collections the next
1110collection is at level 1, and after 5 level-1 collections at level 2.
1111Further, if a level-@var{n} collection fails to provide 20% free space
1112(for each of nodes and the vector heap), the next collection will be at
1113level @var{n+1}.  (The @R{}-level function @code{gc()} performs a
1114level-2 collection.)
1115
1116A generational collector needs to efficiently `age' the objects,
1117especially list-like objects (including @code{STRSXP}s).  This is done
1118by ensuring that the elements of a list are regarded as at least as old
1119as the list @emph{when they are assigned}.  This is handled by the
1120functions @code{SET_VECTOR_ELT} and @code{SET_STRING_ELT}, which is why
1121they are functions and not macros.  Ensuring the integrity of such
1122operations is termed the @dfn{write barrier} and is done by making the
1123@code{SEXP} opaque and only providing access via functions (which cannot
1124be used as lvalues in assignments in C).
1125
1126All code in @R{} extensions is by default behind the write barrier.  The
1127only way to obtain direct access to the internals of the @code{SEXPREC}s
1128is to define @samp{USE_RINTERNALS} before including header file
1129@file{Rinternals.h}, which is normally defined in @file{Defn.h}.  To
1130enable a check on the way that the access is used, @R{} can be compiled
1131with flag @option{--enable-strict-barrier} which ensures that header
1132@file{Defn.h} does not define @samp{USE_RINTERNALS} and hence that
1133@code{SEXP} is opaque in most of @R{} itself.  (There are some necessary
1134exceptions: foremost in file @file{memory.c} where the accessor
1135functions are defined and also in file @file{size.c} which needs access
1136to the sizes of the internal structures.)
1137
1138For background papers see
1139@uref{https://homepage.stat.uiowa.edu/~luke/R/barrier.html} and
1140@uref{https://homepage.stat.uiowa.edu/~luke/R/gengcnotes.html}.
1141
1142@node Serialization Formats, Encodings for CHARSXPs, The write barrier, R Internal Structures
1143@section Serialization Formats
1144
1145@cindex serialization
1146Serialized versions of @R{} objects are used by @code{load}/@code{save}
1147and also at a slightly lower level by @code{saveRDS}/@code{readRDS} (and
1148their earlier `internal' dot-name versions) and
1149@code{serialize}/@code{unserialize}.  These differ in what they
1150serialize to (a file, a connection, a raw vector) and whether they are
1151intended to serialize a single object or a collection of objects
1152(typically the workspace).  @code{save} writes a header at the beginning
1153of the file (a single LF-terminated line) which the lower-level versions
1154do not.
1155
1156@code{save} and @code{saveRDS} allow various forms of compression, and
1157@command{gzip} compression is the default (except for @acronym{ASCII}
1158saves).  Compression is applied to the whole file stream, including the
1159headers, so serialized files can be uncompressed or re-compressed by
1160external programs.  Both @code{load} and @code{readRDS} can read
1161@command{gzip}, @command{bzip2} and @command{xz} forms of compression
1162when reading from a file, and @command{gzip} compression when reading
1163from a connection.
1164
1165@R{} has used the same serialization format called `version 2' from @R{}
11661.4.0 in December 2001 until @R{} 3.5.3 in March 2019.  It has been expanded
1167in back-compatible ways since its inception, for example to support
1168additional @code{SEXPTYPE}s.  Earlier formats are still supported via
1169@code{load} and @code{save} but such formats are not described here.  The
1170current default serialization format is called `version 3', and has been
1171introduced in @R{} 3.5.0.
1172
1173@code{save} works by writing a single-line header (typically
1174@code{RDX2\n} for a binary save: the only other current value is
1175@code{RDA2\n} for @code{save(files=TRUE)}), then creating a tagged
1176pairlist of the objects to be saved and serializing that single object.
1177@code{load} reads the header line, unserializes a single object (a
1178pairlist or a vector list) and assigns the elements of the object in the
1179specified environment.  The header line serves two purposes in @R{}: it
1180identifies the serialization format so @code{load} can switch to the
1181appropriate reader code, and the newline @code{\n} allows the detection of files
1182which have been subjected to a non-binary transfer which re-mapped line
1183endings.  It can also be thought of as a `magic number' in the sense
1184used by the @command{file} program (although @R{} save files are not yet
1185by default known to that program).
1186
1187Serialization in @R{} needs to take into account that objects may
1188contain references to environments, which then have enclosing
1189environments and so on.  (Environments recognized as package or name
1190space environments are saved by name.)  There are `reference objects'
1191which are not duplicated on copy and should remain shared on
1192unserialization.  These are weak references, external pointers and
1193environments other than those associated with packages, namespaces and
1194the global environment.  These are handled via a hash table, and
1195references after the first are written out as a reference marker indexed
1196by the table entry.
1197
1198Version-2 serialization first writes a header indicating the format
1199(normally @samp{X\n} for an XDR format binary save, but @samp{A\n},
1200ASCII, and @samp{B\n}, native word-order binary, can also occur) and
1201then three integers giving the version of the format and two @R{}
1202versions (packed by the @code{R_Version} macro from @file{Rversion.h}).
1203(Unserialization interprets the two versions as the version of @R{}
1204which wrote the file followed by the minimal version of @R{} needed to
1205read the format.)  Serialization then writes out the object recursively
1206using function @code{WriteItem} in file @file{src/main/serialize.c}.
1207
1208Some objects are written as if they were @code{SEXPTYPE}s: such
1209pseudo-@code{SEXPTYPE}s cover @code{R_NilValue}, @code{R_EmptyEnv},
1210@code{R_BaseEnv}, @code{R_GlobalEnv}, @code{R_UnboundValue},
1211@code{R_MissingArg} and @code{R_BaseNamespace}.
1212
1213For all @code{SEXPTYPE}s except @code{NILSXP}, @code{SYMSXP} and
1214@code{ENVSXP} serialization starts with an integer with the
1215@code{SEXPTYPE} in bits 0:7@footnote{only bits 0:4 are currently used
1216for @code{SEXPTYPE}s but values 241:255 are used for
1217pseudo-@code{SEXPTYPE}s.} followed by the object bit, two bits
1218indicating if there are any attributes and if there is a tag (for the
1219pairlist types), an unused bit and then the @code{gp}
1220field@footnote{Currently the only relevant bits are 0:1, 4, 14:15.} in
1221bits 12:27.  Pairlist-like objects write their attributes (if any), tag
1222(if any), CAR and then CDR (using tail recursion): other objects write
1223their attributes after themselves.  Atomic vector objects write their
1224length followed by the data: generic vector-list objects write their
1225length followed by a call to @code{WriteItem} for each element.  The
1226code for @code{CHARSXP}s special-cases @code{NA_STRING} and writes it as
1227length @code{-1} with no data.  Lengths no more than @code{2^31 - 1} are
1228written in that way and larger lengths (which only occur on 64-bit
1229systems) as @code{-1} followed by the upper and lower 32-bits as integers
1230(regarded as unsigned).
1231
1232Environments are treated in several ways: as we have seen, some are
1233written as specific pseudo-@code{SEXPTYPE}s.  Package and namespace
1234environments are written with pseudo-@code{SEXPTYPE}s followed by the
1235name.  `Normal' environments are written out as @code{ENVSXP}s with an
1236integer indicating if the environment is locked followed by the
1237enclosure, frame, `tag' (the hash table) and attributes.
1238
1239In the `XDR' format integers and doubles are written in bigendian order:
1240however the format is not fully XDR (as defined in RFC 1832) as byte
1241quantities (such as the contents of @code{CHARSXP} and @code{RAWSXP}
1242types) are written as-is and not padded to a multiple of four bytes.
1243
1244The `ASCII' format writes 7-bit characters.  Integers are formatted with
1245@code{%d} (except that @code{NA_integer_} is written as @code{NA}),
1246doubles formatted with @code{%.16g} (plus @code{NA}, @code{Inf} and
1247@code{-Inf}) and bytes with @code{%02x}.  Strings are written using
1248standard escapes (e.g.@: @code{\t} and @code{\013}) for non-printing and
1249non-@acronym{ASCII} bytes.
1250
1251Version-3 serialization extends version-2 by support for custom
1252serialization of @code{ALTREP} framework objects.  It also stores the
1253current native encoding at serialization time, so that unflagged strings can
1254be converted if unserialized in R running under different native encoding.
1255
1256@node Encodings for CHARSXPs, The CHARSXP cache, Serialization Formats, R Internal Structures
1257@section Encodings for CHARSXPs
1258
1259Character data in @R{} are stored in the sexptype @code{CHARSXP}.
1260
1261There is support for encodings other than that of the current locale, in
1262particular UTF-8 and the multi-byte encodings used on Windows for CJK
1263languages. A limited means to indicate the encoding of a @code{CHARSXP}
1264is @emph{via} two of the `general purpose' bits which are used to declare
1265the encoding to be either Latin-1 or UTF-8.  (Note that it is possible
1266for a character vector to contain elements in different encodings.)
1267Both printing and plotting notice the declaration and convert the string
1268to the current locale (possibly using @code{<xx>} to display in
1269hexadecimal bytes that are not valid in the current locale).  Many (but
1270not all) of the character manipulation functions will either preserve
1271the declaration or re-encode the character string.
1272
1273Strings that refer to the OS such as file names need to be passed
1274through a wide-character interface on some OSes (e.g.@: Windows).
1275
1276When are character strings declared to be of known encoding?  One way is
1277to do so directly via @code{Encoding}.  The parser declares the encoding
1278if this is known, either via the @code{encoding} argument to
1279@code{parse} or from the locale within which parsing is being done at
1280the @R{} command line.  (Other ways are recorded on the help page for
1281@code{Encoding}.)
1282
1283It is not necessary to declare the encoding of @acronym{ASCII} strings
1284as they will work in any locale.  @acronym{ASCII} strings should never
1285have a marked encoding, as any encoding will be ignored when entering
1286such strings into the @code{CHARSXP} cache.
1287
1288The rationale behind considering only UTF-8 and Latin-1 was that most
1289systems are capable of producing UTF-8 strings and this is the nearest
1290we have to a universal format.  For those that do not (for example those
1291lacking a powerful enough @code{iconv}), it is likely that they work in
1292Latin-1, the old @R{} assumption. Then the parser can return a
1293UTF-8-encoded string if it encounters a @samp{\uxxxx} escape for a
1294Unicode point that cannot be represented in the current charset.  (This
1295needs MBCS support, and was only enabled@footnote{See define
1296@code{USE_UTF8_IF_POSSIBLE} in file @file{src/main/gram.c}.} on
1297Windows.)  This is enabled for all platforms, and a @samp{\uxxxx} or
1298@samp{\Uxxxxxxxx} escape ensures that the parsed string will be marked
1299as UTF-8.
1300
1301Most of the character manipulation functions now preserve UTF-8
1302encodings: there are some notes as to which at the top of file
1303@file{src/main/character.c} and in file
1304@file{src/library/base/man/Encoding.Rd}.
1305
1306Graphics devices are offered the possibility of handing UTF-8-encoded
1307strings without re-encoding to the native character set, by setting
1308@code{hasTextUTF8} to be @samp{TRUE} and supplying functions
1309@code{textUTF8} and @code{strWidthUTF8} that expect UTF-8-encoded
1310inputs.  Normally the symbol font is encoded in Adobe Symbol encoding,
1311but that can be re-encoded to UTF-8 by setting @code{wantSymbolUTF8} to
1312@samp{TRUE}.  The Windows' port of cairographics has a rather peculiar
1313assumption: it wants the symbol font to be encoded in UTF-8 as if it
1314were encoded in Latin-1 rather than Adobe Symbol: this is selected by
1315@code{wantSymbolUTF8 = NA_LOGICAL}.
1316
1317Windows has no UTF-8 locales, but rather expects to work with
1318UCS-2@footnote{or UTF-16 if support for surrogates is enabled in the OS,
1319which it used not to be when encoding support was added to @R{}.}
1320strings.  @R{} (being written in standard C) would not work internally
1321with UCS-2 without extensive changes.  The @file{Rgui}
1322console@footnote{but not the GraphApp toolkit.} uses UCS-2 internally,
1323but communicates with the @R{} engine in the native encoding.  To allow
1324UTF-8 strings to be printed in UTF-8 in @file{Rgui.exe}, an escape
1325convention is used (see header file @file{rgui_UTF8.h}) by
1326@code{cat}, @code{print} and autoprinting.
1327
1328`Unicode' (UCS-2LE) files are common in the Windows world, and
1329@code{readLines} and @code{scan} will read them into UTF-8 strings on
1330Windows if the encoding is declared explicitly on an unopened
1331connection passed to those functions.
1332
1333@node The CHARSXP cache, Warnings and errors, Encodings for CHARSXPs, R Internal Structures
1334@section The CHARSXP cache
1335
1336@findex mkChar
1337There is a global cache for @code{CHARSXP}s created by @code{mkChar} ---
1338the cache ensures that most @code{CHARSXP}s with the same contents share
1339storage (`contents' including any declared encoding).  Not all
1340@code{CHARSXP}s are part of the cache -- notably @samp{NA_STRING} is
1341not. @code{CHARSXP}s reloaded from the @code{save} formats of @R{} prior
1342to 0.99.0 are not cached (since the code used is frozen and very few
1343examples still exist).
1344
1345@findex mkCharLenCE
1346The cache records the encoding of the string as well as the bytes: all
1347requests to create a @code{CHARSXP} should be @emph{via} a call to
1348@code{mkCharLenCE}.  Any encoding given in @code{mkCharLenCE} call will
1349be ignored if the string's bytes are all @acronym{ASCII} characters.
1350
1351
1352@node Warnings and errors, S4 objects, The CHARSXP cache, R Internal Structures
1353@section Warnings and errors
1354
1355@findex warning
1356@findex warningcall
1357@findex error
1358@findex errorcall
1359
1360Each of @code{warning} and @code{stop} have two C-level equivalents,
1361@code{warning}, @code{warningcall}, @code{error} and @code{errorcall}.
1362The relationship between the pairs is similar: @code{warning} tries to
1363fathom out a suitable call, and then calls @code{warningcall} with that
1364call as the first argument if it succeeds, and with @code{call =
1365R_NilValue} if it does not.  When @code{warningcall} is called, it
1366includes the deparsed call in its printout unless @code{call =
1367R_NilValue}.
1368
1369@code{warning} and @code{error} look at the context stack.  If the
1370topmost context is not of type @code{CTXT_BUILTIN}, it is used to
1371provide the call, otherwise the next context provides the call.
1372This means that when these functions are called from a primitive or
1373@code{.Internal}, the imputed call will not be to
1374primitive/@code{.Internal} but to the function calling the
1375primitive/@code{.Internal} .  This is exactly what one wants for a
1376@code{.Internal}, as this will give the call to the closure wrapper.
1377(Further, for a @code{.Internal}, the call is the argument to
1378@code{.Internal}, and so may not correspond to any @R{} function.)
1379However, it is unlikely to be what is needed for a primitive.
1380
1381The upshot is that that @code{warningcall} and @code{errorcall} should
1382normally be used for code called from a primitive, and @code{warning}
1383and @code{error} should be used for code called from a @code{.Internal}
1384(and necessarily from @code{.Call}, @code{.C} and so on, where the call
1385is not passed down).  However, there are two complications.  One is that
1386code might be called from either a primitive or a @code{.Internal}, in
1387which case probably @code{warningcall} is more appropriate.  The other
1388involves replacement functions, where the call was once of the form
1389@example
1390> length(x) <- y ~ x
1391Error in "length<-"(`*tmp*`, value = y ~ x) : invalid value
1392@end example
1393
1394@noindent
1395which is unpalatable to the end user.  For replacement functions there
1396will be a suitable context at the top of the stack, so @code{warning}
1397should be used.  (The results for @code{.Internal} replacement functions
1398such as @code{substr<-} are not ideal.)
1399
1400
1401
1402@node S4 objects, Memory allocators, Warnings and errors, R Internal Structures
1403@section S4 objects
1404
1405[This section is currently a preliminary draft and should not be taken
1406as definitive.  The description assumes that @env{R_NO_METHODS_TABLES}
1407has not been set.]
1408
1409@menu
1410* Representation of S4 objects::
1411* S4 classes::
1412* S4 methods::
1413* Mechanics of S4 dispatch::
1414@end menu
1415
1416@node Representation of S4 objects, S4 classes, S4 objects, S4 objects
1417@subsection Representation of S4 objects
1418
1419S4 objects can be of any @code{SEXPTYPE}.  They are either an object of
1420a simple type (such as an atomic vector or function) with S4 class
1421information or of type @code{S4SXP}.  In all cases, the `S4 bit' (bit 4
1422of the `general purpose' field) is set, and can be tested by the
1423macro/function @code{IS_S4_OBJECT}.
1424
1425S4 objects are created via @code{new()}@footnote{This can also create
1426non-S4 objects, as in @code{new("integer")}.} and thence via the C
1427function @code{R_do_new_object}.  This duplicates the prototype of the
1428class, adds a class attribute and sets the S4 bit.  All S4 class
1429attributes should be character vectors of length one with an attribute
1430giving (as a character string) the name of the package (or
1431@code{.GlobalEnv}) containing the class definition.  Since S4 objects
1432have a class attribute, the @code{OBJECT} bit is set.
1433
1434It is currently unclear what should happen if the class attribute is
1435removed from an S4 object, or if this should be allowed.
1436
1437@node S4 classes, S4 methods, Representation of S4 objects, S4 objects
1438@subsection S4 classes
1439
1440S4 classes are stored as @R{} objects in the environment in which they
1441are created, with names @code{.__C__@var{classname}}: as such they are
1442not listed by default by @code{ls}.
1443
1444The objects are S4 objects of class @code{"classRepresentation"} which
1445is defined in the @pkg{methods} package.
1446
1447Since these are just objects, they are subject to the normal scoping
1448rules and can be imported and exported from namespaces like other
1449objects.  The directives @code{importClassesFrom} and
1450@code{exportClasses} are merely convenient ways to refer to class
1451objects without needing to know their internal `metaname' (although
1452@code{exportClasses} does a little sanity checking via @code{isClass}).
1453
1454@node S4 methods, Mechanics of S4 dispatch, S4 classes, S4 objects
1455@subsection S4 methods
1456
1457Details of the methods are stored in environments (typically hidden in the
1458respective namespace) with a non-syntactic name of the form
1459@code{.__T__@var{generic}:@var{package}} containing objects of class
1460@code{MethodDefinition} for all methods defined in the current environment
1461for the named generic derived from a specific package (which might be @code{.GlobalEnv}).
1462This is sometimes referred to as a `methods table'.
1463
1464For example,
1465@example
1466 length(nM <- asNamespace("Matrix") )                    # 941 for Matrix 1.2-6
1467 length(meth <- grep("^[.]__T__", names(nM), value=TRUE))# 107 generics with methods
1468 length(meth.Ops <- nM$`.__T__Ops:base`) # 71 methods for the 'Ops' (group)generic
1469 head(sort(names(meth.Ops))) ## "abIndex#abIndex" ... "ANY#ddiMatrix" "ANY#ldiMatrix" "ANY#Matrix"
1470@end example
1471
1472During an @R{} session there is an environment associated with each
1473non-primitive generic containing objects @code{.AllMTable},
1474@code{.Generic}, @code{.Methods}, @code{.MTable}, @code{.SigArgs} and
1475@code{.SigLength}.  @code{.MTable} and @code{AllMTable} are merged
1476methods tables containing all the methods defined directly and via
1477inheritance respectively.  @code{.Methods} is a merged methods list.
1478
1479Exporting methods from a namespace is more complicated than exporting a
1480class.  Note first that you do not export a method, but rather the
1481directive @code{exportMethods} will export all the methods defined in
1482the namespace for a specified generic: the code also adds to the list
1483of generics any that are exported directly.  For generics which are
1484listed via @code{exportMethods} or exported themselves, the
1485corresponding environment is exported and so
1486will appear (as hidden object) in the package environment.
1487
1488Methods for primitives which are internally S4 generic (see below) are
1489always exported, whether mentioned in the @file{NAMESPACE} file or not.
1490
1491Methods can be imported either via the directive
1492@code{importMethodsFrom} or via importing a namespace by @code{import}.
1493Also, if a generic is imported via @code{importFrom}, its methods are
1494also imported.  In all cases the generic will be imported if it is in
1495the namespace, so @code{importMethodsFrom} is most appropriate for
1496methods defined on generics in other packages.  Since methods for a
1497generic could be imported from several different packages, the methods
1498tables are merged.
1499
1500When a package is attached
1501@code{methods:::cacheMetaData} is called to update the internal tables:
1502only the visible methods will be cached.
1503
1504
1505@node Mechanics of S4 dispatch,  , S4 methods, S4 objects
1506@subsection Mechanics of S4 dispatch
1507
1508This subsection does not discuss how S4 methods are chosen: see
1509@uref{https://developer.@/r-project.org/howMethodsWork.pdf}.
1510
1511For all but primitive functions, setting a method on an existing
1512function that is not itself S4 generic creates a new object in the
1513current environment which is a call to @code{standardGeneric} with the
1514old definition as the default method.  Such S4 generics can also be
1515created @emph{via} a call to @code{setGeneric}@footnote{although this is
1516not recommended as it is less future-proof.} and are standard closures
1517in the @R{} language, with environment the environment within which they
1518are created.  With the advent of namespaces this is somewhat
1519problematic: if @code{myfn} was previously in a package with a name
1520space there will be two functions called @code{myfn} on the search
1521paths, and which will be called depends on which search path is in use.
1522This is starkest for functions in the base namespace, where the
1523original will be found ahead of the newly created function from any
1524other package.
1525
1526Primitive functions are treated quite differently, for efficiency
1527reasons: this results in different semantics.  @code{setGeneric} is
1528disallowed for primitive functions.  The @pkg{methods} namespace
1529contains a list @code{.BasicFunsList} named by primitive functions:
1530the entries are either @code{FALSE} or a standard S4 generic showing
1531the effective definition.  When @code{setMethod} (or
1532@code{setReplaceMethod}) is called, it either fails (if the list entry
1533is @code{FALSE}) or a method is set on the effective generic given in
1534the list.
1535
1536Actual dispatch of S4 methods for almost all primitives piggy-backs on
1537the S3 dispatch mechanism, so S4 methods can only be dispatched for
1538primitives which are internally S3 generic.  When a primitive that is
1539internally S3 generic is called with a first argument which is an S4
1540object and S4 dispatch is on (that is, the @pkg{methods} namespace is
1541loaded), @code{DispatchOrEval} calls @code{R_possible_dispatch} (defined
1542in file @file{src/main/objects.c}).  (Members of the S3 group generics,
1543which includes all the generic operators, are treated slightly
1544differently: the first two arguments are checked and
1545@code{DispatchGroup} is called.)  @code{R_possible_dispatch} first
1546checks an internal table to see if any S4 methods are set for that
1547generic (and S4 dispatch is currently enabled for that generic), and if
1548so proceeds to S4 dispatch using methods stored in another internal
1549table.  All primitives are in the base namespace, and this mechanism
1550means that S4 methods can be set for (some) primitives and will always
1551be used, in contrast to setting methods on non-primitives.
1552
1553The exception is @code{%*%}, which is S4 generic but not S3 generic as
1554its C code contains a direct call to @code{R_possible_dispatch}.
1555
1556The primitive @code{as.double} is special, as @code{as.numeric} and
1557@code{as.real} are copies of it.  The @pkg{methods} package code partly
1558refers to generics by name and partly by function, and maps
1559@code{as.double} and @code{as.real} to @code{as.numeric} (since that is
1560the name used by packages exporting methods for it).
1561
1562Some elements of the language are implemented as primitives, for example
1563@code{@}}.  This includes the subset and subassignment `functions' and
1564they are S4 generic, again piggybacking on S3 dispatch.
1565
1566@code{.BasicFunsList} is generated when @pkg{methods} is installed, by
1567computing all primitives, initially disallowing methods on all and then
1568setting generics for members of @code{.GenericArgsEnv}, the S4 group
1569generics and a short exceptions list in file @file{BasicFunsList.R}: this
1570currently contains the subsetting and subassignment operators and an
1571override for @code{c}.
1572
1573@node Memory allocators, Internal use of global and base environments, S4 objects, R Internal Structures
1574@section Memory allocators
1575
1576@R{}'s memory allocation is almost all done via routines in file
1577@file{src/main/memory.c}.  It is important to keep track of where memory
1578is allocated, as the Windows port (by default) makes use of a memory
1579allocator that differs from @code{malloc} etc as provided by MinGW.
1580Specifically, there are entry points @code{Rm_malloc}, @code{Rm_free},
1581@code{Rm_calloc} and @code{Rm_free} provided by file
1582@file{src/gnuwin32/malloc.c}.  This was done for two reasons.  The
1583primary motivation was performance: the allocator provided by MSVCRT
1584@emph{via} MinGW was far too slow at handling the many small allocations
1585that the allocation system for @code{SEXPREC}s uses.  As a side benefit,
1586we can set a limit on the amount of allocated memory: this is useful as
1587whereas Windows does provide virtual memory it is relatively far slower
1588than many other @R{} platforms and so limiting @R{}'s use of swapping is
1589highly advantageous.  The high-performance allocator is only called from
1590@file{src/main/memory.c}, @file{src/main/regex.c}, @file{src/extra/pcre}
1591and @file{src/extra/xdr}: note that this means that it is not used in
1592packages.
1593
1594The rest of @R{} should where possible make use of the allocators made
1595available by file @file{src/main/memory.c}, which are also the methods
1596recommended in
1597@ifset UseExternalXrefs
1598@ref{Memory allocation, , Memory allocation, R-exts, Writing R Extensions}
1599@end ifset
1600@ifclear UseExternalXrefs
1601`Writing R Extensions'
1602@end ifclear
1603@findex R_alloc
1604@findex Calloc
1605@findex Realloc
1606@findex Free
1607for use in @R{} packages, namely the use of @code{R_alloc},
1608@code{Calloc}, @code{Realloc} and @code{Free}.  Memory allocated by
1609@code{R_alloc} is freed by the garbage collector once the `watermark'
1610has been reset by calling
1611@findex vmaxset
1612@code{vmaxset}.  This is done automatically by the wrapper code calling
1613primitives and @code{.Internal} functions (and also by the wrapper code
1614to @code{.Call} and @code{.External}), but
1615@findex vmaxget
1616@code{vmaxget} and @code{vmaxset} can be used to reset the watermark
1617from within internal code if the memory is only required for a short
1618time.
1619
1620@findex alloca
1621All of the methods of memory allocation mentioned so far are relatively
1622expensive.  All @R{} platforms support @code{alloca}, and in almost all
1623cases@footnote{but apparently not on Windows.} this is managed by the
1624compiler, allocates memory on the C stack and is very efficient.
1625
1626There are two disadvantages in using @code{alloca}.  First, it is
1627fragile and care is needed to avoid writing (or even reading) outside
1628the bounds of the allocation block returned.  Second, it increases the
1629danger of overflowing the C stack.   It is suggested that it is only
1630used for smallish allocations (up to tens of thousands of bytes), and
1631that
1632
1633@findex R_CheckStack
1634@example
1635    R_CheckStack();
1636@end example
1637
1638@noindent
1639is called immediately after the allocation (as @R{}'s stack checking
1640mechanism will warn far enough from the stack limit to allow for modest
1641use of alloca).  (@code{do_makeunique} in file @file{src/main/unique.c}
1642provides an example of both points.)
1643
1644There is an alternative check,
1645@findex R_CheckStack2
1646@example
1647    R_CheckStack2(size_t extra);
1648@end example
1649
1650@noindent
1651to be called immediately @emph{before} trying an allocation of
1652@code{extra} bytes.
1653
1654An alternative strategy has been used for various functions which
1655require intermediate blocks of storage of varying but usually small
1656size, and this has been consolidated into the routines in the header
1657file @file{src/main/RBufferUtils.h}.  This uses a structure which
1658contains a buffer, the current size and the default size. A call to
1659@findex R_AllocStringBuffer
1660@example
1661    R_AllocStringBuffer(size_t blen, R_StringBuffer *buf);
1662@end example
1663
1664@noindent
1665sets @code{buf->data} to a memory area of at least @code{blen+1} bytes.
1666At least the default size is used, which means that for small
1667allocations the same buffer can be reused.  A call to
1668@findex R_FreeStringBufferL
1669@findex R_FreeStringBuffer
1670@code{R_FreeStringBufferL} releases memory if more than the default has
1671been allocated whereas a call to @code{R_FreeStringBuffer} frees any
1672memory allocated.
1673
1674The @code{R_StringBuffer} structure needs to be initialized, for example by
1675
1676@example
1677static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@};
1678@end example
1679
1680@noindent
1681which uses a default size of @code{MAXELTSIZE = 8192} bytes.  Most
1682current uses have a static @code{R_StringBuffer} structure, which
1683allows the (default-sized) buffer to be shared between calls to e.g.@:
1684@code{grep} and even between functions: this will need to be changed if
1685@R{} ever allows concurrent evaluation threads.  So the idiom is
1686
1687@example
1688static R_StringBuffer ex_buff = @{NULL, 0, MAXELTSIZE@};
1689...
1690    char *buf;
1691    for(i = 0; i < n; i++) @{
1692        compute len
1693        buf = R_AllocStringBuffer(len, &ex_buff);
1694        use buf
1695    @}
1696    /*  free allocation if larger than the default, but leave
1697        default allocated for future use */
1698   R_FreeStringBufferL(&ex_buff);
1699@end example
1700
1701
1702@menu
1703* Internals of R_alloc::
1704@end menu
1705
1706@node Internals of R_alloc,  , Memory allocators, Memory allocators
1707@subsection Internals of R_alloc
1708
1709The memory used by @code{R_alloc} is allocated as @R{} vectors, of type
1710@code{RAWSXP}.  Thus the allocation is in units of 8 bytes, and is
1711rounded up.  A request for zero bytes currently returns @code{NULL} (but
1712this should not be relied on).  For historical reasons, in all other
1713cases 1 byte is added before rounding up so the allocation is always
17141--8 bytes more than was asked for: again this should not be relied on.
1715
1716The vectors allocated are protected via the setting of @code{R_VStack},
1717as the garbage collector marks everything that can be reached from that
1718location.  When a vector is @code{R_alloc}ated, its @code{ATTRIB}
1719pointer is set to the current @code{R_VStack}, and @code{R_VStack} is
1720set to the latest allocation.  Thus @code{R_VStack} is a single-linked
1721chain of the vectors currently allocated via @code{R_alloc}.  Function
1722@code{vmaxset} resets the location @code{R_VStack}, and should be to a
1723value that has previously be obtained @emph{via} @code{vmaxget}:
1724allocations after the value was obtained will no longer be protected and
1725hence available for garbage collection.
1726
1727@node Internal use of global and base environments, Modules, Memory allocators, R Internal Structures
1728@section Internal use of global and base environments
1729
1730This section notes known use by the system of these environments: the
1731intention is to minimize or eliminate such uses.
1732
1733@menu
1734* Base environment::
1735* Global environment::
1736@end menu
1737
1738@node Base environment, Global environment, Internal use of global and base environments, Internal use of global and base environments
1739@subsection Base environment
1740
1741@cindex base environment
1742@cindex environment, base
1743@findex .Device
1744@findex .Devices
1745The graphics devices system maintains two variables @code{.Device} and
1746@code{.Devices} in the base environment: both are always set.  The
1747variable @code{.Devices} gives a list of character vectors of the names
1748of open devices, and @code{.Device} is the element corresponding to the
1749currently active device.  The null device will always be open.
1750
1751@findex .Options
1752There appears to be a variable @code{.Options}, a pairlist giving the
1753current options settings.  But in fact this is just a symbol with a
1754value assigned, and so shows up as a base variable.
1755
1756@findex .Last.value
1757Similarly, the evaluator creates a symbol @code{.Last.value} which
1758appears as a variable in the base environment.
1759
1760@findex .Traceback
1761@findex last.warning
1762Errors can give rise to objects @code{.Traceback} and
1763@code{last.warning} in the base environment.
1764
1765@node Global environment,  , Base environment, Internal use of global and base environments
1766@subsection Global environment
1767
1768@cindex global environment
1769@cindex environment, global
1770@findex .Random.seed
1771The seed for the random number generator is stored in object
1772@code{.Random.seed} in the global environment.
1773
1774@findex dump.frames
1775Some error handlers may give rise to objects in the global environment:
1776for example @code{dump.frames} by default produces @code{last.dump}.
1777
1778@findex .SavedPlots
1779The @code{windows()} device makes use of a variable @code{.SavedPlots}
1780to store display lists of saved plots for later display.  This is
1781regarded as a variable created by the user.
1782
1783
1784@node Modules, Visibility, Internal use of global and base environments, R Internal Structures
1785@section Modules
1786
1787@cindex modules
1788@R{} makes use of a number of shared objects/DLLs stored in the
1789@file{modules} directory.  These are parts of the code which have been
1790chosen to be loaded `on demand' rather than linked as dynamic libraries
1791or incorporated into the main executable/dynamic library.
1792
1793For the remaining modules the motivation has been the amount of (often
1794optional) code they will bring in @emph{via} libraries to which they are
1795linked.
1796
1797@table @asis
1798
1799@item @code{internet}
1800The internal HTTP and FTP clients and socket support, which link to
1801system-specific support libraries.  This may load @code{libcurl} and on
1802Windows will load @file{wininet.dll} and @file{ws2_32.dll}.
1803
1804@item @code{lapack}
1805The code which makes use of the LAPACK library, and is linked to
1806@file{libRlapack} or an external LAPACK library.
1807
1808@item @code{X11}
1809(Unix-alikes only.)  The @code{X11()}, @code{jpeg()}, @code{png()} and
1810@code{tiff()} devices. These are optional, and links to some or all of
1811the @code{X11}, @code{pango}, @code{cairo}, @code{jpeg}, @code{libpng}
1812and @code{libtiff} libraries.
1813@end table
1814
1815@node Visibility, Lazy loading, Modules, R Internal Structures
1816@section Visibility
1817@cindex visibility
1818
1819@menu
1820* Hiding C entry points::
1821* Variables in Windows DLLs::
1822@end menu
1823
1824@node Hiding C entry points, Variables in Windows DLLs, Visibility, Visibility
1825@subsection Hiding C entry points
1826
1827We make use of the visibility mechanisms discussed in
1828@ifset UseExternalXrefs
1829@ref{Controlling visibility, , Controlling visibility, R-exts, Writing R Extensions},
1830@end ifset
1831@ifclear UseExternalXrefs
1832section `Controlling Visibility' in `Writing R Extensions',
1833@end ifclear
1834C entry points not needed outside the main @R{} executable/dynamic
1835library (and in particular in no package nor module) should be prefixed
1836by @code{attribute_hidden}.
1837@findex attribute_hidden
1838Minimizing the visibility of symbols in the @R{} dynamic library will
1839speed up linking to it (which packages will do) and reduce the
1840possibility of linking to the wrong entry points of the same name.  In
1841addition, on some platforms reducing the number of entry points allows
1842more efficient versions of PIC to be used: somewhat over half the entry
1843points are hidden.  A convenient way to hide variables (as distinct from
1844functions) is to declare them @code{extern0} in header file @file{Defn.h}.
1845
1846The visibility mechanism used is only available with some compilers and
1847platforms, and in particular not on Windows, where an alternative
1848mechanism is used.  Entry points will not be made available in
1849@file{R.dll} if they are listed in the file
1850@file{src/gnuwin32/Rdll.hide}.
1851@findex Rdll.hide
1852Entries in that file start with a space and must be strictly in
1853alphabetic order in the C locale (use @command{sort} on the file to
1854ensure this if you change it).  It is possible to hide Fortran as well
1855as C entry points via this file: the former are lower-cased and have an
1856underline as suffix, and the suffixed name should be included in the
1857file.  Some entry points exist only on Windows or need to be visible
1858only on Windows, and some notes on these are provided in file
1859@file{src/gnuwin32/Maintainters.notes}.
1860
1861Because of the advantages of reducing the number of visible entry
1862points, they should be declared @code{attribute_hidden} where possible.
1863Note that this only has an effect on a shared-R-library build, and so
1864care is needed not to hide entry points that are legitimately used by
1865packages.  So it is best if the decision on visibility is made when a
1866new entry point is created, including the decision if it should be
1867included in header file @file{Rinternals.h}.  A list of the visible
1868entry points on shared-R-library build on a reasonably standard
1869Unix-alike can be made by something like
1870
1871@example
1872nm -g libR.so | grep ' [BCDT] ' | cut -b20-
1873@end example
1874
1875@node Variables in Windows DLLs,  , Hiding C entry points, Visibility
1876@subsection Variables in Windows DLLs
1877
1878Windows is unique in that it conventionally treats importing variables
1879differently from functions: variables that are imported from a DLL need
1880to be specified  by a prefix (often @samp{_imp_}) when being linked to
1881(`imported') but not when being linked from (`exported').  The details
1882depend on the compiler system, and have changed for MinGW during the
1883lifetime of that port.  They are in the main hidden behind some macros
1884defined in header file @file{R_ext/libextern.h}.
1885
1886A (non-function) variable in the main @R{} sources that needs to be
1887referred to outside @file{R.dll} (in a package, module or another DLL
1888such as @file{Rgraphapp.dll}) should be declared with prefix
1889@code{LibExtern}.  The main use is in @file{Rinternals.h}, but it needs
1890to be considered for any public header and also @file{Defn.h}.
1891
1892It would nowadays be possible to make use of the `auto-import' feature
1893of the MinGW port of @command{ld} to fix up imports from DLLs (and if
1894@R{} is built for the Cygwin platform this is what happens).  However,
1895this was not possible when the MinGW build of @R{} was first constructed
1896in ca 1998, allows less control of visibility and would not work for
1897other Windows compiler suites.
1898
1899It is only possible to check if this has been handled correctly by
1900compiling the @R{} sources on Windows.
1901
1902@node Lazy loading,  , Visibility, R Internal Structures
1903@section Lazy loading
1904
1905Lazy loading is always used for code in packages but is optional
1906(selected by the package maintainer) for datasets in packages.  When a
1907package/namespace which uses it is loaded, the package/namespace
1908environment is populated with promises for all the named objects: when
1909these promises are evaluated they load the actual code from a database.
1910
1911There are separate databases for code and data, stored in the @file{R}
1912and @file{data} subdirectories.  The database consists of two files,
1913@file{@var{name}.rdb} and @file{@var{name}.rdx}.  The @file{.rdb} file
1914is a concatenation of serialized objects, and the @file{.rdx} file
1915contains an index.  The objects are stored in (usually) a
1916@command{gzip}-compressed format with a 4-byte header giving the
1917uncompressed serialized length (in XDR, that is big-endian, byte order)
1918and read by a call to the primitive @code{lazyLoadDBfetch}.  (Note that
1919this makes lazy-loading unsuitable for really large objects: the
1920unserialized length of an @R{} object can exceed 4GB.)
1921
1922The index or `map' file @file{@var{name}.rdx} is a compressed serialized
1923@R{} object to be read by @code{readRDS}.  It is a list with three
1924elements @code{variables}, @code{references} and @code{compressed}.  The
1925first two are named lists of integer vectors of length 2 giving the
1926offset and length of the serialized object in the @file{@var{name}.rdb}
1927file.  Element @code{variables} has an entry for each named object:
1928@code{references} serializes a temporary environment used when named
1929environments are added to the database.  @code{compressed} is a logical
1930indicating if the serialized objects were compressed: compression is
1931always used nowadays. We later added the values @code{compressed = 2}
1932and @code{3} for @command{bzip2} and @command{xz} compression (with the
1933possibility of future expansion to other methods): these formats add a
1934fifth byte to the header for the type of compression, and store
1935serialized objects uncompressed if compression expands them.
1936
1937Source references are treated specially for performance reasons: bindings
1938@code{lines} and @code{parseData} from @code{srcfile} environments are
1939loaded lazily.  This uses a mechanism that allows loading selected bindings
1940from an environment lazily.  The key for such environment is a list with two
1941elements: @code{eagerKey} gives the length-two integer key for the bindings
1942loaded eagerly and @code{lazyKeys} gives a vector of length-two integer
1943keys, one for each lazily loaded binding.
1944
1945The loader for a lazy-load database of code or data is function
1946@code{lazyLoad} in the @pkg{base} package, but note that there is a
1947separate copy to load @pkg{base} itself in file
1948@file{R_HOME/base/R/base}.
1949
1950Lazy-load databases are created by the code in
1951@file{src/library/tools/R/makeLazyLoad.R}: the main tool is the
1952unexported function @code{makeLazyLoadDB} and the insertion of database
1953entries is done by calls to @code{.Call("R_lazyLoadDBinsertValue",
1954...)}.
1955
1956Lazy-load databases of less than 10MB are cached in memory at first use:
1957this was found necessary when using file systems with high latency
1958(removable devices and network-mounted file systems on Windows).
1959
1960Lazy-load databases are loaded into the exports for a package, but not
1961into the namespace environment itself.  Thus they are visible when the
1962package is @emph{attached}, and also @emph{via} the @code{::} operator.
1963This was a deliberate design decision, as packages mostly make datasets
1964available for use by the end user (or other packages), and they should
1965not be found preferentially from functions in the package, surprising
1966users who expected the normal search path to be used.  (There is an
1967alternative mechanism, @file{sysdata.rda}, for `system datasets' that
1968are intended primarily to be used within the package.)
1969
1970The same database mechanism is used to store parsed @file{Rd} files.
1971One or all of the parsed objects is fetched by a call to
1972@code{tools:::fetchRdDB}.
1973
1974@node .Internal vs .Primitive, Internationalization in the R sources, R Internal Structures, Top
1975@chapter @code{.Internal} vs @code{.Primitive}
1976
1977@findex .Internal
1978@findex .Primitive
1979C code compiled into @R{} at build time can be called directly in what
1980are termed @emph{primitives} or via the @code{.Internal} interface,
1981which is very similar to the @code{.External} interface except in
1982syntax.  More precisely, @R{} maintains a table of @R{} function names and
1983corresponding C functions to call, which by convention all start with
1984@samp{do_} and return a @code{SEXP}.  This table (@code{R_FunTab} in
1985file @file{src/main/names.c}) also specifies how many arguments to a
1986function are required or allowed, whether or not the arguments are to be
1987evaluated before calling, and whether the function is `internal' in
1988the sense that it must be accessed via the @code{.Internal} interface,
1989or directly accessible in which case it is printed in @R{} as
1990@code{.Primitive}.
1991
1992Functions using @code{.Internal()} wrapped in a closure are in general
1993preferred as this ensures standard handling of named and default
1994arguments.  For example, @code{grep} is defined as
1995
1996@example
1997@group
1998grep <-
1999function (pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
2000         fixed = FALSE, useBytes = FALSE, invert = FALSE)
2001@{
2002    if (!is.character(x)) x <- structure(as.character(x), names = names(x))
2003    .Internal(grep(as.character(pattern), x, ignore.case, value,
2004                   perl, fixed, useBytes, invert))
2005@}
2006
2007@end group
2008@end example
2009@noindent
2010and the use of @code{as.character} allows methods to be dispatched (for
2011example, for factors).
2012
2013However, for reasons of convenience and also efficiency (as there is
2014some overhead in using the @code{.Internal} interface wrapped in a
2015function closure), the primitive functions are exceptions that can be
2016accessed directly.  And of course, primitive functions are needed for
2017basic operations---for example @code{.Internal} is itself a primitive.
2018Note that primitive functions make no use of @R{} code, and hence are
2019very different from the usual interpreted functions.  In particular,
2020@code{formals} and @code{body} return @code{NULL} for such objects, and
2021argument matching can be handled differently.  For some primitives
2022(including @code{call}, @code{switch}, @code{.C} and @code{.subset})
2023positional matching is important to avoid partial matching of the first
2024argument.
2025
2026The list of primitive functions is subject to change; currently, it
2027includes the following.
2028
2029@enumerate
2030
2031@item
2032``Special functions'' which really are @emph{language} elements, but
2033implemented as primitive functions:
2034
2035@example
2036@group
2037@{       (         if     for      while  repeat  break  next
2038return  function  quote  switch
2039@end group
2040@end example
2041
2042@item
2043Language elements and basic @emph{operator}s (i.e., functions usually
2044@emph{not} called as @code{foo(a, b, ...)}) for subsetting, assignment,
2045arithmetic, comparison and logic:
2046
2047@example
2048@group
2049               [    [[    $    @@
2050<-   <<-  =    [<-  [[<-  $<-  @@<-
2051
2052+    -    *    /     ^    %%   %*%  %/%
2053<    <=   ==   !=    >=   >
2054|    ||   &    &&    !
2055@end group
2056@end example
2057
2058@noindent
2059When the arithmetic, comparison and logical operators are called as
2060functions, any argument names are discarded so positional matching is used.
2061
2062@item
2063``Low level'' 0-- and 1--argument functions which belong to one of the
2064following groups of functions:
2065
2066@enumerate a
2067@item
2068Basic mathematical functions with a single argument, i.e.,
2069
2070@example
2071@group
2072abs     sign    sqrt
2073floor   ceiling
2074@end group
2075
2076@group
2077exp     expm1
2078log2    log10   log1p
2079cos     sin     tan
2080acos    asin    atan
2081cosh    sinh    tanh
2082acosh   asinh   atanh
2083cospi   sinpi   tanpi
2084@end group
2085
2086@group
2087gamma   lgamma  digamma trigamma
2088@end group
2089
2090@group
2091cumsum  cumprod cummax  cummin
2092@end group
2093
2094@group
2095Im  Re  Arg  Conj  Mod
2096@end group
2097@end example
2098
2099@code{log} is a primitive function of one or two arguments with named
2100argument matching.
2101
2102@code{trunc} is a difficult case: it is a primitive that can have one
2103or more arguments: the default method handled in the primitive has
2104only one.
2105
2106@item
2107Functions rarely used outside of ``programming'' (i.e., mostly used
2108inside other functions), such as
2109
2110@example
2111@group
2112nargs          missing        on.exit        interactive
2113as.call        as.character   as.complex     as.double
2114as.environment as.integer     as.logical     as.raw
2115is.array       is.atomic      is.call        is.character
2116is.complex     is.double      is.environment is.expression
2117is.finite      is.function    is.infinite    is.integer
2118is.language    is.list        is.logical     is.matrix
2119is.na          is.name        is.nan         is.null
2120is.numeric     is.object      is.pairlist    is.raw
2121is.real        is.recursive   is.single      is.symbol
2122baseenv        emptyenv       globalenv      pos.to.env
2123unclass        invisible      seq_along      seq_len
2124@end group
2125@end example
2126
2127@item
2128The programming and session management utilities
2129
2130@example
2131@group
2132browser  proc.time  gc.time tracemem retracemem untracemem
2133@end group
2134@end example
2135
2136@end enumerate
2137
2138@item
2139The following basic replacement and extractor functions
2140
2141@example
2142@group
2143length      length<-
2144class       class<-
2145oldClass    oldClass<-
2146attr        attr<-
2147attributes  attributes<-
2148names       names<-
2149dim         dim<-
2150dimnames    dimnames<-
2151            environment<-
2152            levels<-
2153            storage.mode<-
2154@end group
2155@end example
2156
2157@findex NAMED
2158@noindent
2159Note that optimizing @code{NAMED = 1} is only effective within a
2160primitive (as the closure wrapper of a @code{.Internal} will set
2161@code{NAMED = NAMEDMAX} when the promise to the argument is evaluated) and
2162hence replacement functions should where possible be primitive to avoid
2163copying (at least in their default methods).
2164[The @code{NAMED} mechanism has been replaced by reference counting.]
2165
2166@item
2167The following functions are primitive for efficiency reasons:
2168
2169@example
2170@group
2171:           ~           c           list
2172call        expression  substitute
2173UseMethod   standardGeneric
2174.C          .Fortran   .Call        .External
2175round       signif      rep         seq.int
2176@end group
2177@end example
2178
2179@noindent
2180as well as the following internal-use-only functions
2181
2182@example
2183@group
2184.Primitive      .Internal
2185.Call.graphics  .External.graphics
2186.subset         .subset2
2187.primTrace      .primUntrace
2188lazyLoadDBfetch
2189@end group
2190@end example
2191
2192@end enumerate
2193
2194
2195The multi-argument primitives
2196@example
2197@group
2198call       switch
2199.C         .Fortran   .Call       .External
2200@end group
2201@end example
2202
2203@noindent
2204intentionally use positional matching, and need to do so to avoid
2205partial matching to their first argument.  They do check that the first
2206argument is unnamed or for the first two, partially matches the formal
2207argument name.  On the other hand,
2208
2209@example
2210@group
2211attr       attr<-     browser     rememtrace substitute  UseMethod
2212log        round      signif      rep        seq.int
2213@end group
2214@end example
2215
2216@noindent
2217manage their own argument matching and do work in the standard way.
2218
2219All the one-argument primitives check that if they are called with a
2220named argument that this (partially) matches the name given in the
2221documentation: this is also done for replacement functions with one
2222argument plus @code{value}.
2223
2224The net effect is that argument matching for primitives intended for
2225end-user use @emph{as functions} is done in the same way as for
2226interpreted functions except for the six exceptions where positional
2227matching is required.
2228
2229@menu
2230* Special primitives::
2231* Special internals::
2232* Prototypes for primitives::
2233* Adding a primitive::
2234@end menu
2235
2236@node Special primitives, Special internals, .Internal vs .Primitive, .Internal vs .Primitive
2237@section Special primitives
2238
2239A small number of primitives are @emph{specials} rather than
2240@emph{builtins}, that is they are entered with unevaluated arguments.
2241This is clearly necessary for the language constructs and the assignment
2242operators, as well as for @code{&&} and @code{||} which conditionally
2243evaluate their second argument, and @code{~}, @code{.Internal},
2244@code{call}, @code{expression}, @code{missing}, @code{on.exit},
2245@code{quote} and @code{substitute} which do not evaluate some of their
2246arguments.
2247
2248@code{rep} and @code{seq.int} are special as they evaluate some of their
2249arguments conditional on which are non-missing.
2250
2251@code{log}, @code{round} and @code{signif} are special to allow default
2252values to be given to missing arguments.
2253
2254The subsetting, subassignment and @code{@@} operators are all special.
2255(For both extraction and replacement forms, @code{$} and @code{@@}
2256take a symbol argument, and @code{[} and @code{[[} allow missing
2257arguments.)
2258
2259@code{UseMethod} is special to avoid the additional contexts added to
2260calls to builtins.
2261
2262@node Special internals, Prototypes for primitives, Special primitives, .Internal vs .Primitive
2263@section Special internals
2264
2265There are also special @code{.Internal} functions: @code{NextMethod},
2266@code{Recall}, @code{withVisible}, @code{cbind}, @code{rbind} (to allow
2267for the @code{deparse.level} argument), @code{eapply}, @code{lapply} and
2268@code{vapply}.
2269
2270@node Prototypes for primitives, Adding a primitive, Special internals, .Internal vs .Primitive
2271@section Prototypes for primitives
2272
2273Prototypes are available for the primitive functions and operators, and
2274these are used for printing, @code{args} and package checking (e.g.@: by
2275@code{tools::checkS3methods} and by package @CRANpkg{codetools}).  There are
2276two environments in the @pkg{base} package (and namespace),
2277@samp{.GenericArgsEnv} for those primitives which are internal S3
2278generics, and @samp{.ArgsEnv} for the rest.  Those environments contain
2279closures with the same names as the primitives, formal arguments derived
2280(manually) from the help pages, a body which is a suitable call to
2281@code{UseMethod} or @code{NULL} and environment the base namespace.
2282
2283The C code for @code{print.default} and @code{args} uses the closures in
2284these environments in preference to the definitions in base (as
2285primitives).
2286
2287The QC function @code{undoc} checks that all the functions prototyped in
2288these environments are currently primitive, and that the primitives not
2289included are better thought of as language elements (at the time of
2290writing
2291
2292@example
2293$  $<-  &&  (  :  @@  @@<-  [  [[  [[<-  [<-  @{  ||  ~  <-  <<-  =
2294break  for function  if  next  repeat  return  while
2295@end example
2296
2297@noindent
2298).  One could argue about @code{~}, but it is known to the parser and has
2299semantics quite unlike a normal function.  And @code{:} is documented
2300with different argument names in its two meanings.
2301
2302The QC functions @code{codoc} and @code{checkS3methods} also make use of
2303these environments (effectively placing them in front of base in the
2304search path), and hence the formals of the functions they contain are
2305checked against the help pages by @code{codoc}.  However, there are two
2306problems with the generic primitives.  The first is that many of the
2307operators are part of the S3 group generic @code{Ops} and that defines
2308their arguments to be @code{e1} and @code{e2}: although it would be very
2309unusual, an operator could be called as e.g.@: @code{"+"(e1=a, e2=b)}
2310and if method dispatch occurred to a closure, there would be an argument
2311name mismatch.  So the definitions in environment @code{.GenericArgsEnv}
2312have to use argument names @code{e1} and @code{e2} even though the
2313traditional documentation is in terms of @code{x} and @code{y}:
2314@code{codoc} makes the appropriate adjustment via
2315@code{tools:::.make_S3_primitive_generic_env}.  The second discrepancy
2316is with the @code{Math} group generics, where the group generic is
2317defined with argument list @code{(x, ...)}, but most of the members only
2318allow one argument when used as the default method (and @code{round} and
2319@code{signif} allow two as default methods): again fix-ups are used.
2320
2321Those primitives which are in @code{.GenericArgsEnv} are checked (via
2322@file{tests/primitives.R}) to be generic @emph{via} defining methods for
2323them, and a check is made that the remaining primitives are probably not
2324generic, by setting a method and checking it is not dispatched to (but
2325this can fail for other reasons).  However, there is no certain way to
2326know that if other @code{.Internal} or primitive functions are not
2327internally generic except by reading the source code.
2328
2329@node Adding a primitive,  , Prototypes for primitives, .Internal vs .Primitive
2330@section Adding a primitive
2331
2332[For R-core use: reverse this procedure to remove a primitive.  Most
2333commonly this is done by changing a @code{.Internal} to a primitive or
2334@emph{vice versa}.]
2335
2336Primitives are listed in the table @code{R_FunTab} in
2337@file{src/main/names.c}: primitives have @samp{Y = 0} in the @samp{eval}
2338field.
2339
2340There needs to be an @samp{\alias} entry in a help file in the @pkg{base}
2341package, and the primitive needs to be added to one of the lists at the
2342start of this section.
2343
2344Some primitives are regarded as language elements (the current ones are
2345listed above).  These need to be added to two lists of exceptions,
2346@code{langElts} in @code{undoc()} (in file
2347@file{src/library/tools/R/QC.R}) and @code{lang_elements} in
2348@file{tests/primitives.R}.
2349
2350All other primitives are regarded as functions and should be listed in
2351one of the environments defined in @file{src/library/base/R/zzz.R},
2352either @code{.ArgsEnv} or @code{.GenericArgsEnv}: internal generics also
2353need to be listed in the character vector @code{.S3PrimitiveGenerics}.
2354Note too the discussion about argument matching above: if you add a
2355primitive function with more than one argument by converting a
2356@code{.Internal} you need to add argument matching to the C code, and
2357for those with a single argument, add argument-name checking.
2358
2359Do ensure that @command{make check-devel} has been run: that tests most
2360of these requirements.
2361
2362@node Internationalization in the R sources, Package Structure, .Internal vs .Primitive, Top
2363@chapter Internationalization in the R sources
2364
2365The process of marking messages (errors, warnings etc) for translation
2366in an @R{} package is described in
2367@ifset UseExternalXrefs
2368@ref{Internationalization, , Internationalization, R-exts, Writing R Extensions},
2369@end ifset
2370@ifclear UseExternalXrefs
2371`Writing R Extensions',
2372@end ifclear
2373and the standard packages included with @R{} have (with an exception in
2374@pkg{grDevices} for the menus of the @code{windows()} device) been
2375internationalized in the same way as other packages.
2376
2377@menu
2378* R code::
2379* Main C code::
2380* Windows-GUI-specific code::
2381* macOS GUI::
2382* Updating::
2383@end menu
2384
2385@node R code, Main C code, Internationalization in the R sources, Internationalization in the R sources
2386@section R code
2387
2388Internationalization for @R{} code is done in exactly the same way as
2389for extension packages.  As all standard packages which have @R{} code
2390also have a namespace, it is never necessary to specify @code{domain},
2391but for efficiency calls to @code{message}, @code{warning} and
2392@code{stop} should include @code{domain = NA} when the message is
2393constructed @emph{via} @code{gettextf}, @code{gettext} or
2394@code{ngettext}.
2395
2396For each package, the extracted messages and translation sources are
2397stored under package directory @file{po} in the source package, and
2398compiled translations under @file{inst/po} for installation to package
2399directory @file{po} in the installed package.  This also applies to C
2400code in packages.
2401
2402@node Main C code, Windows-GUI-specific code, R code, Internationalization in the R sources
2403@section Main C code
2404
2405The main C code (e.g.@: that in files @file{src/*/*.c} and in
2406the modules) is where @R{} is closest to the sort of application for
2407which @samp{gettext} was written.  Messages in the main C code are in
2408domain @code{R} and stored in the top-level directory @file{po} with
2409compiled translations under @file{share/locale}.
2410
2411The list of files covered by the @R{} domain is specified in file
2412@file{po/POTFILES.in}.
2413
2414The normal way to mark messages for translation is via @code{_("msg")}
2415just as for packages.  However, sometimes one needs to mark passages for
2416translation without wanting them translated at the time, for example
2417when declaring string constants.  This is the purpose of the @code{N_}
2418macro, for example
2419
2420@example
2421@{ ERROR_ARGTYPE,           N_("invalid argument type")@},
2422@end example
2423
2424@noindent
2425from file @file{src/main/errors.c}.
2426
2427The @code{P_} macro
2428
2429@example
2430#ifdef ENABLE_NLS
2431#define P_(StringS, StringP, N) ngettext (StringS, StringP, N)
2432#else
2433#define P_(StringS, StringP, N) (N > 1 ? StringP: StringS)
2434#endif
2435@end example
2436
2437@noindent
2438may be used
2439as a wrapper for @code{ngettext}: however in some cases the preferred
2440approach has been to conditionalize (on @code{ENABLE_NLS}) code using
2441@code{ngettext}.
2442
2443The macro @code{_("msg")} can safely be used in directory
2444@file{src/appl}; the header for standalone @samp{nmath} skips possible
2445translation.  (This does not apply to @code{N_} or @code{P_}).
2446
2447
2448@node Windows-GUI-specific code, macOS GUI, Main C code, Internationalization in the R sources
2449@section Windows-GUI-specific code
2450
2451Messages for the Windows GUI are in a separate domain @samp{RGui}.  This
2452was done for two reasons:
2453
2454@itemize
2455@item
2456The translators for the Windows version of @R{} might be separate from
2457those for the rest of @R{} (familiarity with the GUI helps), and
2458
2459@item
2460Messages for Windows are most naturally handled in the native charset
2461for the language, and in the case of CJK languages the charset is
2462Windows-specific.  (It transpires that as the @code{iconv} we ported
2463works well under Windows, this is less important than anticipated.)
2464@end itemize
2465
2466Messages for the @samp{RGui} domain are marked by @code{G_("msg")}, a
2467macro that is defined in header file @file{src/gnuwin32/win-nls.h}.  The
2468list of files that are considered is hardcoded in the
2469@code{RGui.pot-update} target of file @file{po/Makefile.in.in}: note
2470that this includes @file{devWindows.c} as the menus on the
2471@code{windows} device are considered to be part of the GUI.  (There is
2472also @code{GN_("msg")}, the analogue of @code{N_("msg")}.)
2473
2474The template and message catalogs for the @samp{RGui} domain are in the
2475top-level @file{po} directory.
2476
2477
2478@node macOS GUI, Updating, Windows-GUI-specific code, Internationalization in the R sources
2479@section macOS GUI
2480
2481This is handled separately: see
2482@uref{https://developer.r-project.org/Translations30.html}.
2483
2484
2485@node Updating,  , macOS GUI, Internationalization in the R sources
2486@section Updating
2487
2488See file @file{po/README} for how to update the message templates and catalogs.
2489
2490@node Package Structure, Files, Internationalization in the R sources, Top
2491@chapter Structure of an Installed Package
2492
2493@menu
2494* Metadata::
2495* Help::
2496@end menu
2497
2498The structure of a @emph{source} packages is described in @ref{Creating
2499R packages, , Creating R packages, R-exts, Writing R Extensions}: this
2500chapter is concerned with the structure of @emph{installed} packages.
2501
2502An installed package has a top-level file @file{DESCRIPTION}, a copy of
2503the file of that name in the package sources with a @samp{Built} field
2504appended, and file @file{INDEX}, usually describing the objects on which
2505help is available, a file @file{NAMESPACE} if the package has a name
2506space, optional files such as @file{CITATION}, @file{LICENCE} and
2507@file{NEWS}, and any other files copied in from @file{inst}.  It will
2508have directories @file{Meta}, @file{help} and @file{html} (even if the
2509package has no help pages), almost always has a directory @file{R} and
2510often has a directory @file{libs} to contain compiled code.  Other
2511directories with known meaning to @R{} are @file{data}, @file{demo},
2512@file{doc} and @file{po}.
2513
2514Function @code{library} looks for a namespace and if one is found
2515passes control to @code{loadNamespace}.  Then @code{library} or
2516@code{loadNamespace} looks for file @file{R/@var{pkgname}}, warns if it
2517is not found and otherwise sources the code (using @code{sys.source})
2518into the package's environment, then lazy-loads a database
2519@file{R/sysdata} if present.  So how @R{} code gets loaded depends on
2520the contents of  @file{R/@var{pkgname}}: a standard template to load
2521lazy-load databases are provided in @file{share/R/nspackloader.R}.
2522
2523Compiled code is usually loaded when the package's namespace is loaded
2524by a @code{useDynlib} directive in a @file{NAMESPACE} file or by the
2525package's @code{.onLoad} function.  Conventionally compiled code is
2526loaded by a call to @code{library.dynam} and this looks in directory
2527@file{libs} (and in an appropriate sub-directory if sub-architectures
2528are in use) for a shared object (Unix-alike) or DLL (Windows).
2529
2530Subdirectory @file{data} serves two purposes. In a package using
2531lazy-loading of data, it contains a lazy-load database @file{Rdata},
2532plus a file @file{Rdata.rds} which contain a named character vector used
2533by @code{data()} in the (unusual) event that it is used for such a
2534package.  Otherwise it is a copy of the @file{data} directory in the
2535sources, with saved images re-compressed if @command{R CMD INSTALL
2536--resave-data} was used.
2537
2538Subdirectory @file{demo} supports the @code{demo} function, and is
2539copied from the sources.
2540
2541Subdirectory @file{po} contains (in subdirectories) compiled message
2542catalogs.
2543
2544@node Metadata, Help, Package Structure, Package Structure
2545@section Metadata
2546
2547Directory @file{Meta} contains several files in @code{.rds} format, that
2548is serialized @R{} objects written by @code{saveRDS}.  All packages
2549have files @file{Rd.rds}, @file{hsearch.rds}, @file{links.rds},
2550@file{features.rds}, and
2551@file{package.rds}.  Packages with namespaces have a file
2552@file{nsInfo.rds}, and those with data, demos or vignettes have
2553@file{data.rds}, @file{demo.rds} or @file{vignette.rds} files.
2554
2555The structure of these files (and their existence and names) is private
2556to @R{}, so the description here is for those trying to follow the @R{}
2557sources: there should be no reference to these files in non-base
2558packages.
2559
2560File @file{package.rds} is a dump of information extracted from the
2561@file{DESCRIPTION} file.  It is a list of several components.  The
2562first, @samp{DESCRIPTION}, is a character vector, the @file{DESCRIPTION}
2563file as read by @code{read.dcf}.  Further elements @samp{Depends},
2564@samp{Suggests}, @samp{Imports}, @samp{Rdepends} and @samp{Rdepends2}
2565record the @samp{Depends}, @samp{Suggests} and @samp{Imports} fields.
2566These are all lists, and can be empty.  The first three have an entry
2567for each package named, each entry being a list of length 1 or 3, which
2568element @samp{name} (the package name) and optional elements @samp{op}
2569(a character string) and @samp{version} (an object of class
2570@samp{"package_version"}).  Element @samp{Rdepends} is used for the
2571first version dependency on @R{}, and @samp{Rdepends2} is a list of zero
2572or more @R{} version dependencies---each is a three-element list of the
2573form described for packages.  Element @samp{Rdepends} is no longer used,
2574but it is still potentially needed so @R{} < 2.7.0 can detect that the
2575package was not installed for it.
2576
2577File @file{nsInfo.rds} records a list, a parsed version of the
2578@file{NAMESPACE} file.
2579
2580File @file{Rd.rds} records a data frame with one row for each help file.
2581The columns are @samp{File} (the file name with extension), @samp{Name}
2582(the @samp{\name} section), @samp{Type} (from the optional
2583@samp{\docType} section), @samp{Title}, @samp{Encoding}, @samp{Aliases},
2584@samp{Concepts} and @samp{Keywords}.  All columns are character vectors
2585apart from @samp{Aliases}, which is a list of character vectors.
2586
2587File @file{hsearch.rds} records the information to be used by
2588@samp{help.search}.  This is a list of four unnamed elements which are
2589character matrices for help files, aliases, keywords and concepts.  All
2590the matrices have columns @samp{ID} and @samp{Package} which are used to
2591tie the aliases, keywords and concepts (the remaining column of the last
2592three elements) to a particular help file.  The first element has
2593further columns @samp{LibPath} (stored as @code{""} and filled in what
2594the file is loaded), @samp{name}, @samp{title}, @samp{topic} (the first
2595alias, used when presenting the results as
2596@samp{@var{pkgname}::@var{topic}}) and @samp{Encoding}.
2597
2598File @file{links.rds} records a named character vector, the names being
2599aliases and the values character strings of the form
2600@example
2601"../../@var{pkgname}/html/@var{filename}.html"
2602@end example
2603
2604File @file{data.rds} records a two-column character matrix with columns
2605of dataset names and titles from the corresponding help file.  File
2606@file{demo.rds} has the same structure for package demos.
2607
2608File @file{vignette.rds} records a data frame with one row for each
2609`vignette' (@file{.[RS]nw} file in @file{inst/doc}) and with columns
2610@samp{File} (the full file path in the sources), @samp{Title},
2611@samp{PDF} (the pathless file name of the installed PDF version, if
2612present), @samp{Depends}, @samp{Keywords} and @samp{R} (the pathless
2613file name of the installed @R{} code, if present).
2614
2615
2616@node Help,  , Metadata, Package Structure
2617@section Help
2618
2619All installed packages, whether they had any @file{.Rd} files or not,
2620have @file{help} and @file{html} directories. The latter normally only
2621contains the single file @file{00Index.html}, the package index which
2622has hyperlinks to the help topics (if any).
2623
2624Directory @file{help} contains files @file{AnIndex}, @file{paths.rds}
2625and @file{@var{pkgname}.rd[bx]}.  The latter two files are a lazy-load
2626database of parsed @file{.Rd} files, accessed by
2627@code{tools:::fetchRdDB}.  File @file{paths.rds} is a saved character
2628vector of the original path names of the @file{.Rd} files, used when
2629updating the database.
2630
2631File @file{AnIndex} is a two-column tab-delimited file: the first column
2632contains the aliases defined in the help files and the second the
2633basename (without the @file{.Rd} or @file{.rd} extension) of the file
2634containing that alias.  It is read by @code{utils:::index.search} to
2635search for files matching a topic (alias), and read by @code{scan} in
2636@code{utils:::matchAvailableTopics}, part of the completion system.
2637
2638File @file{aliases.rds} is the same information as @file{AnIndex} as a
2639named character vector (names the topics, values the file basename), for
2640faster access.
2641
2642@node Files, Graphics Devices, Package Structure, Top
2643@chapter Files
2644
2645@R{} provides many functions to work with files and directories: many of
2646these have been added relatively recently to facilitate scripting in
2647@R{} and in particular the replacement of Perl scripts by @R{} scripts
2648in the management of @R{} itself.
2649
2650These functions are implemented by standard C/POSIX library calls,
2651except on Windows.  That means that filenames must be encoded in the
2652current locale as the OS provides no other means to access the file
2653system: increasingly filenames are stored in UTF-8 and the OS will
2654translate filenames to UTF-8 in other locales.  So using a UTF-8 locale
2655gives transparent access to the whole file system.
2656
2657Windows is another story.  There the internal view of filenames is in
2658UTF-16LE (so-called `Unicode'), and standard C library calls can only
2659access files whose names can be expressed in the current codepage.  To
2660circumvent that restriction, there is a parallel set of Windows-specific
2661calls which take wide-character arguments for filepaths.  Much of the
2662file-handling in @R{} has been moved over to using these functions, so
2663filenames can be manipulated in @R{} as UTF-8 encoded character strings,
2664converted to wide characters (which on Windows are UTF-16LE) and passed
2665to the OS.  The utilities @code{RC_fopen} and @code{filenameToWchar}
2666help this process.  Currently @code{file.copy} to a directory,
2667@code{list.files}, @code{list.dirs} and @code{path.expand} work only
2668with filepaths encoded in the current codepage.
2669
2670All these functions do tilde expansion, in the same way as
2671@code{path.expand}, with the deliberate exception of @code{Sys.glob}.
2672
2673File names may be case sensitive or not: the latter is the norm on
2674Windows and macOS, the former on other Unix-alikes.  Note that this
2675is a property of both the OS and the file system: it is often possible
2676to map names to upper or lower case when mounting the file system.  This
2677can affect the matching of patterns in @code{list.files} and
2678@code{Sys.glob}.
2679
2680File names commonly contain spaces on Windows and macOS but not
2681elsewhere.  As file names are handled as character strings by @R{},
2682spaces are not usually a concern unless file names are passed to other
2683process, e.g.@: by a @code{system} call.
2684
2685Windows has another couple of peculiarities.  Whereas a POSIX file
2686system has a single root directory (and other physical file systems are
2687mounted onto logical directories under that root), Windows has separate
2688roots for each physical or logical file system (`volume'), organized
2689under @emph{drives} (with file paths starting @code{D:} for an
2690@acronym{ASCII} letter, case-insensitively) and @emph{network shares}
2691(with paths like @code{\netname\topdir\myfiles\a file}).  There is a
2692current drive, and path names without a drive part are relative to the
2693current drive.  Further, each drive has a current directory, and
2694relative paths are relative to that current directory, on a particular
2695drive if one is specified.  So @file{D:dir\file} and @file{D:} are valid
2696path specifications (the last being the current directory on drive
2697@file{D:}).
2698
2699@c basename        Wchar   na
2700@c dir.create      Wchar   ~
2701@c dirname         Wchar   ~
2702@c getwd
2703@c file.access     Wchar   ~
2704@c file.append     RC_fopen
2705@c file.copy       no      ~ (+ file.append)
2706@c file.create     RC_fopen
2707@c file.edit       UTF-8   in R code
2708@c file.exists     Wchar   ~
2709@c file.info       Wchar   ~
2710@c file.link       8-bit   ~
2711@c file.remove     Wchar   ~
2712@c file.rename     Wchar   ~
2713@c file.show       UTF-8   in R code
2714@c file.symlink    not     ~
2715@c file_test
2716@c list.dirs       no      ~
2717@c list.files      no      ~
2718@c normalizePath   Wchar   ~
2719@c path.expand     no
2720@c setwd           Wchar   ~
2721@c Sys.chmod       Wchar   ~
2722@c Sys.glob        Wchar   not
2723@c Sys.readlink    not     ~
2724@c Sys.umask
2725@c unlink          Wchar   ~
2726
2727
2728@node Graphics Devices, GUI consoles, Files, Top
2729@chapter Graphics
2730
2731@R{}'s graphics internals were re-designed to enable multiple graphics
2732systems to be installed on top on the graphics `engine' -- currently
2733there are two such systems, one supporting `base' graphics (based on
2734that in S and whose @R{} code@footnote{The C code is in files
2735@file{base.c}, @file{graphics.c}, @file{par.c}, @file{plot.c} and
2736@file{plot3d.c} in directory @file{src/main}.} is in package
2737@pkg{graphics}) and one implemented in package @pkg{grid}.
2738
2739Some notes on the historical changes can be found at
2740@uref{https://www.stat.auckland.ac.nz/~paul/R/basegraph.html} and
2741@uref{https://www.stat.auckland.ac.nz/~paul/R/graphicsChanges.html}.
2742
2743At the lowest level is a graphics device, which manages a plotting
2744surface (a screen window or a representation to be written to a file).
2745This implements a set of graphics primitives, to `draw'
2746
2747@itemize
2748@item a circle, optionally filled
2749@item a rectangle, optionally filled
2750@item a line
2751@item a set of connected lines
2752@item a polygon, optionally filled
2753@item a paths, optionally filled using a winding rule
2754@item text
2755@item a raster image (optional)
2756@item and to set a clipping rectangle
2757@end itemize
2758
2759@noindent
2760as well as requests for information such as
2761
2762@itemize
2763@item the width of a string if plotted
2764@item the metrics (width, ascent, descent) of a single character
2765@item the current size of the plotting surface
2766@end itemize
2767
2768@noindent
2769and requests/opportunities to take action such as
2770
2771@itemize
2772@item start a new `page', possibly after responding to a request to ask
2773the user for confirmation.
2774@item return the position of the device pointer (if any).
2775@item when a device become the current device or stops being the current
2776device (this is usually used to change the window title on a screen
2777device).
2778@item when drawing starts or finishes (e.g.@: used to flush graphics to
2779the screen when drawing stops).
2780@item wait for an event, for example a mouse click or keypress.
2781@item an `onexit' action, to clean up if plotting is interrupted (by an
2782error or by the user).
2783@item capture the current contents of the device as a raster image.
2784@item close the device.
2785@end itemize
2786
2787The device also sets a number of variables, mainly Boolean flags
2788indicating its capabilities.  Devices work entirely in `device units'
2789which are up to its developer: they can be in pixels, big points (1/72
2790inch), twips, @dots{}, and can differ@footnote{although that needs to be
2791handled carefully, as for example the @code{circle} callback is given a
2792radius (and that should be interpreted as in the x units).} in the
2793@samp{x} and @samp{y} directions.
2794
2795@c think of the engine as colors.c, devices.c, engine.c, plotmath.c, vfonts.c
2796The next layer up is the graphics `engine' that is the main interface to
2797the device (although the graphics subsystems do talk directly to
2798devices).  This is responsible for clipping lines, rectangles and
2799polygons, converting the @code{pch} values @code{0...26} to sets of
2800lines/circles, centring (and otherwise adjusting) text, rendering
2801mathematical expressions (`plotmath') and mapping colour descriptions
2802such as names to the internal representation.
2803
2804@c graphics.c looks at device dimensions, locator, metricinfo
2805@c par.c looks at various device pars
2806@c plot3d.c looks at useRotatedTextInContour
2807@c grid looks at size, clipping, locator, ipr
2808
2809Another function of the engine is to manage display lists and snapshots.
2810Some but not all instances of graphics devices maintain display lists, a
2811`list' of operations that have been performed on the device to produce
2812the current plot (since the device was opened or the plot was last
2813cleared, e.g.@: by @code{plot.new}).  Screen devices generally maintain
2814a display list to handle repaint and resize events whereas file-based
2815formats do not---display lists are also used to implement
2816@code{dev.copy()} and friends.  The display list is a pairlist of
2817@code{.Internal} (base graphics) or @code{.Call.graphics} (grid
2818graphics) calls, which means that the C code implementing a graphics
2819operation will be re-called when the display list is replayed: apart
2820from the part which records the operation if successful.
2821
2822Snapshots of the current graphics state are taken by
2823@code{GEcreateSnapshot} and replayed later in the session by
2824@code{GEplaySnapshot}.  These are used by @code{recordPlot()},
2825@code{replayPlot()} and the GUI menus of the @code{windows()} device.
2826The `state' includes the display list.
2827
2828
2829The top layer comprises the graphics subsystems. Although there is
2830provision for 24 subsystems since about 2001, currently still only two
2831exist, `base' and
2832`grid'.  The base subsystem is registered with the engine when @R{} is
2833initialized, and unregistered (via @code{KillAllDevices}) when an @R{}
2834session is shut down.  The grid subsystem is registered in its
2835@code{.onLoad} function and unregistered in the @code{.onUnload}
2836function.  The graphics subsystem may also have `state' information
2837saved in a snapshot (currently base does and grid does not).
2838
2839Package @pkg{grDevices} was originally created to contain the basic
2840graphics devices (although @code{X11} is in a separate load-on-demand
2841module because of the volume of external libraries it brings in).  Since
2842then it has been used for other functionality that was thought desirable
2843for use with @pkg{grid}, and hence has been transferred from package
2844@pkg{graphics} to @pkg{grDevices}.  This is principally concerned with
2845the handling of colours and recording and replaying plots.
2846
2847@menu
2848* Graphics devices::
2849* Colours::
2850* Base graphics::
2851* Grid graphics::
2852@end menu
2853
2854@node Graphics devices, Colours, Graphics Devices, Graphics Devices
2855@section Graphics Devices
2856
2857@R{} ships with several graphics devices, and there is support for
2858third-party packages to provide additional devices---several packages
2859now do.  This section describes the device internals from the viewpoint
2860of a would-be writer of a graphics device.
2861
2862@menu
2863* Device structures::
2864* Device capabilities::
2865* Handling text::
2866* Conventions::
2867* 'Mode'::
2868* Graphics events::
2869* Specific devices::
2870@end menu
2871
2872@node Device structures, Device capabilities, Graphics devices, Graphics devices
2873@subsection Device structures
2874
2875There are two types used internally which are pointers to structures
2876related to graphics devices.
2877
2878The @code{DevDesc} type is a structure defined in the header file
2879@file{R_ext/GraphicsDevice.h} (which is included by
2880@file{R_ext/GraphicsEngine.h}).  This describes the physical
2881characteristics of a device, the capabilities of the device driver and
2882contains a set of callback functions that will be used by the graphics
2883engine to obtain information about the device and initiate actions
2884(e.g.@: a new page, plotting a line or some text).  Type @code{pDevDesc}
2885is a pointer to this type.
2886
2887The following callbacks can be omitted (or set to the null pointer,
2888their default value) when appropriate default behaviour will be taken by
2889the graphics engine: @code{activate}, @code{cap}, @code{deactivate},
2890@code{locator}, @code{holdflush} (API version 9), @code{mode},
2891@code{newFrameConfirm}, @code{path}, @code{raster} and @code{size}.
2892
2893The relationship of device units to physical dimensions is set by the
2894element @code{ipr} of the @code{DevDesc} structure: a @samp{double}
2895array of length 2.
2896
2897
2898The @code{GEDevDesc} type is a structure defined in
2899@file{R_ext/GraphicsEngine.h} (with comments in the file) as
2900
2901@example
2902typedef struct _GEDevDesc GEDevDesc;
2903struct _GEDevDesc @{
2904    pDevDesc dev;
2905    Rboolean displayListOn;
2906    SEXP displayList;
2907    SEXP DLlastElt;
2908    SEXP savedSnapshot;
2909    Rboolean dirty;
2910    Rboolean recordGraphics;
2911    GESystemDesc *gesd[MAX_GRAPHICS_SYSTEMS];
2912    Rboolean ask;
2913@}
2914@end example
2915
2916@noindent
2917So this is essentially a device structure plus information about the
2918device maintained by the graphics engine and normally@footnote{It is
2919possible for the device to find the @code{GEDevDesc} which points to its
2920@code{DevDesc}, and this is done often enough that there is a
2921convenience function @code{desc2GEDesc} to do so.} visible to the engine
2922and not to the device.  Type @code{pGEDevDesc} is a pointer to this
2923type.
2924
2925The graphics engine maintains an array of devices, as pointers to
2926@code{GEDevDesc} structures.  The array is of size 64 but the first
2927element is always occupied by the @code{"null device"} and the final
2928element is kept as NULL as a sentinel.@footnote{Calling
2929@code{R_CheckDeviceAvailable()} ensures there is a free slot or throws
2930an error.}  This array is reflected in the @R{} variable
2931@samp{.Devices}.  Once a device is killed its element becomes available
2932for reallocation (and its name will appear as @code{""} in
2933@samp{.Devices}).  Exactly one of the devices is `active': this is the
2934the null device if no other device has been opened and not killed.
2935
2936Each instance of a graphics device needs to set up a @code{GEDevDesc}
2937structure by code very similar to
2938
2939@example
2940    pGEDevDesc gdd;
2941
2942    R_GE_checkVersionOrDie(R_GE_version);
2943    R_CheckDeviceAvailable();
2944    BEGIN_SUSPEND_INTERRUPTS @{
2945        pDevDesc dev;
2946        /* Allocate and initialize the device driver data */
2947        if (!(dev = (pDevDesc) calloc(1, sizeof(DevDesc))))
2948            return 0; /* or error() */
2949        /* set up device driver or free 'dev' and error() */
2950        gdd = GEcreateDevDesc(dev);
2951        GEaddDevice2(gdd, "dev_name");
2952    @} END_SUSPEND_INTERRUPTS;
2953@end example
2954
2955The @code{DevDesc} structure contains a @code{void *} pointer
2956@samp{deviceSpecific} which is used to store data specific to the
2957device.  Setting up the device driver includes initializing all the
2958non-zero elements of the @code{DevDesc} structure.
2959
2960Note that the device structure is zeroed when allocated: this provides
2961some protection against future expansion of the structure since the
2962graphics engine can add elements that need to be non-NULL/non-zero to be
2963`on' (and the structure ends with 64 reserved bytes which will be zeroed
2964and allow for future expansion).
2965
2966Rather more protection is provided by the version number of the
2967engine/device API, @code{R_GE_version} defined in
2968@file{R_ext/GraphicsEngine.h} together with access functions
2969
2970@example
2971int R_GE_getVersion(void);
2972void R_GE_checkVersionOrDie(int version);
2973@end example
2974
2975@noindent
2976If a graphics device calls @code{R_GE_checkVersionOrDie(R_GE_version)}
2977it can ensure it will only be used in versions of @R{} which provide the
2978API it was designed for and compiled against.
2979
2980@node Device capabilities, Handling text, Device structures, Graphics devices
2981@subsection Device capabilities
2982
2983The following `capabilities' can be defined for the device's
2984@code{DevDesc} structure.
2985
2986@itemize
2987@item @code{canChangeGamma} --
2988@code{Rboolean}: can the display gamma be adjusted?  This is now
2989ignored, as gamma support has been removed.
2990@item @code{canHadj} --
2991@code{integer}: can the device do horizontal adjustment of text
2992@emph{via} the @code{text} callback, and if so, how precisely? 0 = no
2993adjustment, 1 = @{0, 0.5, 1@} (left, centre, right justification) or 2 =
2994continuously variable (in [0,1]) between left and right justification.
2995@item @code{canGenMouseDown} --
2996@code{Rboolean}: can the device handle mouse down events?  This
2997flag and the next three are not currently used by R, but are maintained
2998for back compatibility.
2999@item @code{canGenMouseMove} --
3000@code{Rboolean}: ditto for mouse move events.
3001@item @code{canGenMouseUp} --
3002@code{Rboolean}: ditto for mouse up events.
3003@item @code{canGenKeybd} --
3004@code{Rboolean}: ditto for keyboard events.
3005@item @code{hasTextUTF8} --
3006@code{Rboolean}: should non-symbol text be sent (in UTF-8) to the
3007@code{textUTF8} and @code{strWidthUTF8} callbacks, and sent as Unicode
3008points (negative values) to the @code{metricInfo} callback?
3009@item @code{wantSymbolUTF8} --
3010@code{Rboolean}: should symbol text be handled in UTF-8 in the same way
3011as other text?  Requires @code{textUTF8 = TRUE}.
3012@item @code{haveTransparency}:
3013does the device support semi-transparent colours?
3014@item @code{haveTransparentBg}:
3015can the background be fully or semi-transparent?
3016@item @code{haveRaster}:
3017is there support for rendering raster images?
3018@item @code{haveCapture}:
3019is there support for @code{grid::grid.cap}?
3020@item @code{haveLocator}:
3021is there an interactive locator?
3022@end itemize
3023
3024The last three can often be deduced to be false from the presence of
3025@code{NULL} entries instead of the corresponding functions.
3026
3027@node Handling text, Conventions, Device capabilities, Graphics devices
3028@subsection Handling text
3029
3030Handling text is probably the hardest task for a graphics device, and
3031the design allows for the device to optionally indicate that it has
3032additional capabilities.  (If the device does not, these will if
3033possible be handled in the graphics engine.)
3034
3035The three callbacks for handling text that must be in all graphics
3036devices are @code{text}, @code{strWidth} and @code{metricInfo} with
3037declarations
3038
3039@example
3040void text(double x, double y, const char *str, double rot, double hadj,
3041          pGgcontext gc, pDevDesc dd);
3042
3043double strWidth(const char *str, pGEcontext gc, pDevDesc dd);
3044
3045void metricInfo(int c, pGEcontext gc,
3046               double* ascent, double* descent, double* width,
3047               pDevDesc dd);
3048@end example
3049
3050@noindent
3051The @samp{gc} parameter provides the graphics context, most importantly
3052the current font and fontsize, and @samp{dd} is a pointer to the active
3053device's structure.
3054
3055The @code{text} callback should plot @samp{str} at @samp{(x,
3056y)}@footnote{in device coordinates} with an anti-clockwise rotation of
3057@samp{rot} degrees.  (For @samp{hadj} see below.)  The interpretation
3058for horizontal text is that the baseline is at @code{y} and the start is
3059a @code{x}, so any left bearing for the first character will start at
3060@code{x}.
3061
3062The @code{strWidth} callback computes the width of the string which it
3063would occupy if plotted horizontally in the current font.  (Width here
3064is expected to include both (preferably) or neither of left and right
3065bearings.)
3066
3067The @code{metricInfo} callback computes the size of a single
3068character: @code{ascent} is the distance it extends above the baseline
3069and @code{descent} how far it extends below the baseline.
3070@code{width} is the amount by which the cursor should be advanced when
3071the character is placed.  For @code{ascent} and @code{descent} this is
3072intended to be the bounding box of the `ink' put down by the glyph and
3073not the box which might be used when assembling a line of conventional
3074text (it needs to be for e.g.@: @code{hat(beta)} to work correctly).
3075However, the @code{width} is used in plotmath to advance to the next
3076character, and so needs to include left and right bearings.
3077
3078The @emph{interpretation} of @samp{c} depends on the locale.  In a
3079single-byte locale values @code{32...255} indicate the corresponding
3080character in the locale (if present).  For the symbol font (as used by
3081@samp{graphics::par(font=5)}, @samp{grid::gpar(fontface=5}) and by
3082`plotmath'), values @code{32...126, 161...239, 241...254} indicate
3083glyphs in the Adobe Symbol encoding.  In a multibyte locale, @code{c}
3084represents a Unicode point (except in the symbol font).  So the function
3085needs to include code like
3086
3087@example
3088    Rboolean Unicode = mbcslocale && (gc->fontface != 5);
3089    if (c < 0) @{ Unicode = TRUE; c = -c; @}
3090    if(Unicode) UniCharMetric(c, ...); else CharMetric(c, ...);
3091@end example
3092
3093@noindent
3094In addition, if device capability @code{hasTextUTF8} (see below) is
3095true, Unicode points will be passed as negative values: the code snippet
3096above shows how to handle this.  (This applies to the symbol font only
3097if device capability @code{wantSymbolUTF8} is true.)
3098
3099If possible, the graphics device should handle clipping of text.  It
3100indicates this by the structure element @code{canClip} which if true
3101will result in calls to the callback @code{clip} to set the clipping
3102region. If this is not done, the engine will clip very crudely (by
3103omitting any text that does not appear to be wholly inside the clipping
3104region).
3105
3106The device structure has an integer element @code{canHadj}, which
3107indicates if the device can do horizontal alignment of text.  If this is
3108one, argument @samp{hadj} to @code{text} will be called as @code{0 ,0.5,
31091} to indicate left-, centre- and right-alignment at the indicated
3110position.  If it is two, continuous values in the range @code{[0, 1]}
3111are assumed to be supported.
3112
3113Capability @code{hasTextUTF8} if true, it has two consequences.
3114First, there are callbacks @code{textUTF8} and @code{strWidthUTF8} that
3115should behave identically to @code{text} and @code{strWidth} except that
3116@samp{str} is assumed to be in UTF-8 rather than the current locale's
3117encoding.  The graphics engine will call these for all text except in
3118the symbol font.  Second, Unicode points will be passed to the
3119@code{metricInfo} callback as negative integers.  If your device would
3120prefer to have UTF-8-encoded symbols, define @code{wantSymbolUTF8} as
3121well as @code{hasTextUTF8}.  In that case text in the symbol font is
3122sent to @code{textUTF8} and @code{strWidthUTF8}.
3123
3124Some devices can produce high-quality rotated text, but those based on
3125bitmaps often cannot.  Those which can should set
3126@code{useRotatedTextInContour} to be true from graphics API version 4.
3127
3128Several other elements relate to the precise placement of text by the
3129graphics engine:
3130
3131@example
3132double xCharOffset;
3133double yCharOffset;
3134double yLineBias;
3135double cra[2];
3136@end example
3137
3138@noindent
3139These are more than a little mysterious.  Element @code{cra} provides an
3140indication of the character size, @code{par("cra")} in base graphics, in
3141device units.  The mystery is what is meant by `character size': which
3142character, which font at which size?  Some help can be obtained by
3143looking at what this is used for.  The first element, `width', is not
3144used by @R{} except to set the graphical parameters.  The second,
3145`height', is use to set the line spacing, that is the relationship
3146between @code{par("mai")} and @code{par("mai")} and so on.  It is
3147suggested that a good choice is
3148
3149@example
3150dd->cra[0] = 0.9 * fnsize;
3151dd->cra[1] = 1.2 * fnsize;
3152@end example
3153
3154@noindent
3155where @samp{fnsize} is the `size' of the standard font (@code{cex=1})
3156on the device, in device units.  So for a 12-point font (the usual
3157default for graphics devices), @samp{fnsize} should be 12 points in
3158device units.
3159
3160The remaining elements are yet more mysterious.  The @code{postscript()}
3161device says
3162
3163@example
3164    /* Character Addressing Offsets */
3165    /* These offsets should center a single */
3166    /* plotting character over the plotting point. */
3167    /* Pure guesswork and eyeballing ... */
3168
3169    dd->xCharOffset =  0.4900;
3170    dd->yCharOffset =  0.3333;
3171    dd->yLineBias = 0.2;
3172@end example
3173
3174@noindent
3175It seems that @code{xCharOffset} is not currently used, and
3176@code{yCharOffset} is used by the base graphics system to set vertical
3177alignment in @code{text()} when @code{pos} is specified, and in
3178@code{identify()}.  It is occasionally used by the graphic engine when
3179attempting exact centring of text, such as character string values of
3180@code{pch} in @code{points()} or @code{grid.points()}---however, it is
3181only used when precise character metric information is not available or
3182for multi-line strings.
3183
3184@code{yLineBias} is used in the base graphics system in @code{axis()} and
3185@code{mtext()} to provide a default for their @samp{padj} argument.
3186
3187@node Conventions, 'Mode', Handling text, Graphics devices
3188@subsection Conventions
3189
3190The aim is to make the (default) output from graphics devices as similar
3191as possible.  Generally people follow the model of the @code{postscript}
3192and @code{pdf} devices (which share most of their internal code).
3193
3194The following conventions have become established:
3195
3196@itemize
3197
3198@item
3199The default size of a device should be 7 inches square.
3200
3201@item
3202There should be a @samp{pointsize} argument which defaults to 12, and it
3203should give the pointsize in big points (1/72 inch).  How exactly this
3204is interpreted is font-specific, but it should use a font which works
3205with lines packed 1/6 inch apart, and looks good with lines 1/5 inch
3206apart (that is with 2pt leading).
3207
3208@item
3209The default font family should be a sans serif font, e.g Helvetica or
3210similar (e.g.@: Arial on Windows).
3211
3212@item
3213@code{lwd = 1} should correspond to a line width of 1/96 inch.  This
3214will be a problem with pixel-based devices, and generally there is a
3215minimum line width of 1 pixel (although this may not be appropriate
3216where anti-aliasing of lines is used, and @code{cairo} prefers a minimum
3217of 2 pixels).
3218
3219@item
3220Even very small circles should be visible, e.g.@: by using a minimum
3221radius of 1 pixel or replacing very small circles by a single filled
3222pixel.
3223
3224@item
3225How RGB colour values will be interpreted should be documented, and
3226preferably be sRGB.
3227
3228@item
3229The help page should describe its policy on these conventions.
3230
3231@end itemize
3232
3233These conventions are less clear-cut for bitmap devices, especially
3234where the bitmap format does not have a design resolution.
3235
3236The interpretation of the line texture (@code{par("lty"}) is described
3237in the header @file{GraphicsEngine.h} and in the help for @code{par}: note that the
3238`scale' of the pattern should be proportional to the line width (at
3239least for widths above the default).
3240
3241
3242@node 'Mode', Graphics events, Conventions, Graphics devices
3243@subsection `Mode'
3244
3245One of the device callbacks is a function @code{mode}, documented in
3246the header as
3247
3248@example
3249     * device_Mode is called whenever the graphics engine
3250     * starts drawing (mode=1) or stops drawing (mode=0)
3251     * GMode (in graphics.c) also says that
3252     * mode = 2 (graphical input on) exists.
3253     * The device is not required to do anything
3254@end example
3255
3256@noindent
3257Since @code{mode = 2} has only recently been documented at device level.
3258It could be used to change the graphics cursor, but devices currently do
3259that in the @code{locator} callback.  (In base graphics the mode is set
3260for the duration of a @code{locator} call, but if @code{type != "n"} is
3261switched back for each point whilst annotation is being done.)
3262
3263Many devices do indeed do nothing on this call, but some screen devices
3264ensure that drawing is flushed to the screen when called with @code{mode
3265= 0}.  It is tempting to use it for some sort of buffering, but note
3266that `drawing' is interpreted at quite a low level and a typical single
3267figure will stop and start drawing many times.  The buffering introduced
3268in the @code{X11()} device makes use of @code{mode = 0} to indicate
3269activity: it updates the screen after @emph{ca} 100ms of inactivity.
3270
3271This callback need not be supplied if it does nothing.
3272
3273@node Graphics events, Specific devices, 'Mode', Graphics devices
3274@subsection Graphics events
3275
3276Graphics devices may be designed to handle user interaction: not all are.
3277
3278Users may use @code{grDevices::setGraphicsEventEnv} to set the
3279@code{eventEnv} environment in the device driver to hold event
3280handlers. When the user calls @code{grDevices::getGraphicsEvent}, R will
3281take three steps.  First, it sets the device driver member
3282@code{gettingEvent} to @code{true} for each device with a
3283non-@code{NULL} @code{eventEnv} entry, and calls @code{initEvent(dd,
3284true)} if the callback is defined.  It then enters an event loop.  Each
3285time through the loop R will process events once, then check whether any
3286device has set the @code{result} member of @code{eventEnv} to a
3287non-@code{NULL} value, and will save the first such value found to be
3288returned.  C functions @code{doMouseEvent} and @code{doKeybd} are
3289provided to call the R event handlers @code{onMouseDown},
3290@code{onMouseMove}, @code{onMouseUp}, and @code{onKeybd} and set
3291@code{eventEnv$result} during this step.  Finally, @code{initEvent} is
3292called again with @code{init=false} to inform the devices that the
3293loop is done, and the result is returned to the user.
3294
3295@node Specific devices,  , Graphics events, Graphics devices
3296@subsection Specific devices
3297
3298Specific devices are mostly documented by comments in their sources,
3299although for devices of many years' standing those comments can be in
3300need of updating.  This subsection is a repository of notes on design
3301decisions.
3302
3303@menu
3304* X11()::
3305* windows()::
3306@end menu
3307
3308@node X11(), windows(), Specific devices, Specific devices
3309@subsubsection X11()
3310
3311The @code{X11(type="Xlib")} device dates back to the mid 1990's and was
3312written then in @code{Xlib}, the most basic X11 toolkit.  It has since
3313optionally made use of a few features from other toolkits: @code{libXt}
3314is used to read X11 resources, and @code{libXmu} is used in the handling
3315of clipboard selections.
3316
3317Using basic @code{Xlib} code makes drawing fast, but is limiting.  There
3318is no support of translucent colours (that came in the @code{Xrender}
3319toolkit of 2000) nor for rotated text (which @R{} implements by
3320rendering text to a bitmap and rotating the latter).
3321
3322The hinting for the X11 window asks for backing store to be used, and
3323some windows managers may use it to handle repaints, but it seems that
3324most repainting is done by replaying the display list (and here the fast
3325drawing is very helpful).
3326
3327There are perennial problems with finding fonts.  Many users fail to
3328realize that fonts are a function of the X server and not of the machine
3329that @R{} is running on.  After many difficulties, @R{} tries first to
3330find the nearest size match in the sizes provided for Adobe fonts in the
3331standard 75dpi and 100dpi X11 font packages---even that will fail to
3332work when users of near-100dpi screens have only the 75dpi set
3333installed.  The 75dpi set allows sizes down to 6 points on a 100dpi
3334screen, but some users do try to use smaller sizes and even 6 and 8
3335point bitmapped fonts do not look good.
3336
3337Introduction of UTF-8 locales has caused another wave of difficulties.
3338X11 has very few genuine UTF-8 fonts, and produces composite fontsets
3339for the @code{iso10646-1} encoding.  Unfortunately these seem to have
3340low coverage apart from a few monospaced fonts in a few sizes (which are
3341not suitable for graph annotation), and where glyphs are missing what is
3342plotted is often quite unsatisfactory.
3343
3344The current approach is to make use of more modern toolkits, namely
3345@code{cairo} for rendering and @code{Pango} for font
3346management---because these are associated with @code{Gtk+2} they are
3347widely available.  Cairo supports translucent colours and alpha-blending
3348(@emph{via} @code{Xrender}), and anti-aliasing for the display of lines
3349and text.  Pango's font management is based on @code{fontconfig} and
3350somewhat mysterious, but it seems mainly to use Type 1 and TrueType
3351fonts on the machine running @R{} and send grayscale bitmaps to cairo.
3352
3353
3354@node windows(),  , X11(), Specific devices
3355@subsubsection windows()
3356
3357The @code{windows()} device is a family of devices: it supports plotting
3358to Windows (enhanced) metafiles, @code{BMP}, @code{JPEG}, @code{PNG} and
3359@code{TIFF} files as well as to Windows printers.
3360
3361In most of these cases the primary plotting is to a bitmap: this is used
3362for the (default) buffering of the screen device, which also enables the
3363current plot to be saved to BMP, JPEG, PNG or TIFF (it is the internal
3364bitmap which is copied to the file in the appropriate format).
3365
3366The device units are pixels (logical ones on a metafile device).
3367
3368The code was originally written by Guido Masarotto with extensive use of
3369macros, which can make it hard to disentangle.
3370
3371For a screen device, @code{xd->gawin} is the canvas of the screen, and
3372@code{xd->bm} is the off-screen bitmap.  So macro @code{DRAW} arranges
3373to plot to @code{xd->bm}, and if buffering is off, also to
3374@code{xd->gawin}.  For all other device, @code{xd->gawin} is the canvas,
3375a bitmap for the @code{jpeg()} and @code{png()} device, and an internal
3376representation of a Windows metafile for the @code{win.metafile()} and
3377@code{win.print} device.  Since `plotting' is done by Windows GDI calls
3378to the appropriate canvas, its precise nature is hidden by the GDI
3379system.
3380
3381Buffering on the screen device is achieved by running a timer, which
3382when it fires copies the internal bitmap to the screen.  This is set to
3383fire every 500ms (by default) and is reset to 100ms after plotting
3384activity.
3385
3386Repaint events are handled by copying the internal bitmap to the screen
3387canvas (and then reinitializing the timer), unless there has been a resize.
3388Resizes are handled by replaying the display list: this might not be
3389necessary if a fixed canvas with scrollbars is being used, but that is
3390the least popular of the three forms of resizing.
3391
3392Text on the device has moved to `Unicode' (UCS-2) in recent years.
3393UTF-8 is requested (@code{hasTextUTF8 = TRUE}) for standard text, and
3394converted to UCS-2 in the plotting functions in file
3395@file{src/extra/graphapp/gdraw.c}.  However, GDI has no support for
3396Unicode symbol fonts, and symbols are handled in Adobe Symbol encoding.
3397
3398There is support for translucent colours (with alpha channel between 0
3399and 255) was introduced on the screen device and bitmap
3400devices.@footnote{It is technically possible to use alpha-blending on
3401metafile devices such as printers, but it seems few drivers have support
3402for this.} This is done by drawing on a further internal bitmap,
3403@code{xd->bm2}, in the opaque version of the colour then alpha-blending
3404that bitmap to @code{xd->bm}.  The alpha-blending routine is in a
3405separate DLL, @file{msimg32.dll}, which is loaded on first use.  As
3406small a rectangular region as reasonably possible is alpha-blended (this
3407is rectangle @code{r} in the code), but things like mitre joins make
3408estimation of a tight bounding box too much work for lines and polygonal
3409boundaries.  Translucent-coloured lines are not common, and the
3410performance seems acceptable.
3411
3412The support for a transparent background in @code{png()} predates full
3413alpha-channel support in @code{libpng} (let alone in PNG viewers), so
3414makes use of the limited transparency support in earlier versions of
3415PNG.  Where 24-bit colour is used, this is done by marking a single
3416colour to be rendered as transparent.  @R{} chose @samp{#fdfefd}, and
3417uses this as the background colour (in @code{GA_NewPage} if the
3418specified background colour is transparent (and all non-opaque
3419background colours are treated as transparent).  So this works by
3420marking that colour in the PNG file, and viewers without transparency
3421support see a slightly-off-white background, as if there were a
3422near-white canvas.  Where a palette is used in the PNG file (if less
3423than 256 colours were used) then this colour is recorded with full
3424transparency and the remaining colours as opaque.  If 32-bit colour were
3425available then we could add a full alpha channel, but this is dependent
3426on the graphics hardware and undocumented properties of GDI.
3427
3428
3429@node Colours, Base graphics, Graphics devices, Graphics Devices
3430@section Colours
3431
3432Devices receive colours as a @code{typedef} @code{rcolor} (an
3433@code{unsigned int}) defined in the header
3434@file{R_ext/GraphicsEngine.h}).  The 4 bytes are @emph{R} ,@emph{G},
3435@emph{B} and @emph{alpha} from least to most significant. So each of RGB
3436has 256 levels of luminosity from 0 to 255.  The alpha byte represents
3437opacity, so value 255 is fully opaque and 0 fully transparent: many but
3438not all devices handle semi-transparent colours.
3439
3440Colors can be created in C via the macro @code{R_RGBA}, and a set of
3441macros are defined in @file{R_ext/GraphicsDevice.h} to extract the
3442various components.
3443
3444Colours in the base graphics system were originally adopted from S (and
3445before that the GRZ library from Bell Labs), with the concept of a
3446(variable-sized) palette of colours referenced by numbers
3447@samp{1...@var{N}} plus @samp{0} (the background colour of the current
3448device).  @R{} introduced the idea of referring to colours by character
3449strings, either in the forms @samp{#RRGGBB} or @samp{#RRGGBBAA}
3450(representing the bytes in hex) as given by function @code{rgb()} or via
3451names: the 657 known names are given in the character vector
3452@code{colors} and in a table in file @file{colors.c} in package
3453@pkg{grDevices}.  Note that semi-transparent colours are not
3454`premultiplied', so 50% transparent white is @samp{#ffffff80}.
3455
3456Integer or character @code{NA} colours are mapped internally to
3457transparent white, as is the character string @code{"NA"}.
3458
3459Negative colour numbers are an error.  Colours greater than
3460@samp{@var{N}} are wrapped around, so that for example with the default
3461palette of size 8, colour @samp{10} is colour @samp{2} in the palette.
3462
3463Integer colours have been used more widely than the base graphics
3464sub-system, as they are supported by package @pkg{grid} and hence by
3465@CRANpkg{lattice} and @CRANpkg{ggplot2}.  (They are also used by package
3466@CRANpkg{rgl}.)  @pkg{grid} did re-define colour @samp{0} to be
3467transparent white, but @CRANpkg{rgl} used @code{col2rgb} and hence the
3468background colour of base graphics.
3469
3470Note that positive integer colours refer to the current palette and
3471colour @samp{0} to the current device (and a device is opened if needs
3472be).  These are mapped to type @code{rcolor} at the time of use: this
3473matters when re-playing the display list, e.g.@: when a device is
3474resized or @code{dev.copy} is used.  The palette should be thought of as
3475per-session: it is stored in package @pkg{grDevices}.
3476
3477The convention is that devices use the colorspace `sRGB'. This is an
3478industry standard: it is used by Web browsers and JPEGs from all but
3479high-end digital cameras.  The interpretation is a matter for graphics
3480devices and for code that manipulates colours, but not for the graphics
3481engine or subsystems.
3482
3483@R{} uses a painting model similar to PostScript and PDF.  This means
3484that where shapes (circles, rectangles and polygons) can both be filled
3485and have a stroked border, the fill should be painted first and then the
3486border (or otherwise only half the border will be visible).  Where both
3487the fill and the border are semi-transparent there is some room for
3488interpretation of the intention.  Most devices first paint the fill and
3489then the border, alpha-blending at each step.  However, PDF does some
3490automatic grouping of objects, and @emph{when the fill and the border
3491have the same alpha}, they are painted onto the same layer and then
3492alpha-blended in one step.  (See p. 569 of the PDF Reference Sixth
3493Edition, version 1.7.  Unfortunately, although this is what the PDF
3494standard says should happen, it is not correctly implemented by some
3495viewers.)
3496
3497The mapping from colour numbers to type @code{rcolor} is primarily done
3498by function @code{RGBpar3}: this is exported from the @R{} binary but
3499linked to code in package @pkg{grDevices}.  The first argument is a
3500@code{SEXP} pointing to a character, integer or double vector, and the
3501second is the @code{rcolor} value for colour @code{0} (or @code{"0"}).
3502C entry point @code{RGBpar} is a wrapper that takes @code{0} to be
3503transparent white: it is often used to set colour defaults for devices.
3504The @R{}-level wrapper is @code{col2rgb}.
3505
3506There is also @code{R_GE_str2col} which takes a C string and converts to
3507type @code{rcolor}: @code{"0'} is converted to transparent white.
3508
3509There is a @R{}-level conversion of colours to @samp{##RRGGBBAA} by
3510@code{image.default(useRaster = TRUE)}.
3511
3512The other color-conversion entry point in the API is @code{name2col}
3513which takes a colour name (a C string) and returns a value of type
3514@code{rcolor}.  This handles @code{"NA"}, @code{"transparent"} and the
3515657 colours known to the @R{} function @code{colors()}.
3516
3517@node Base graphics, Grid graphics, Colours, Graphics Devices
3518@section Base graphics
3519
3520The base graphics system was migrated to package @pkg{graphics} in @R{}
35213.0.0: it was previously implemented in files in @file{src/main}.
3522
3523For historical reasons it is largely implemented in two layers.
3524Files @file{plot.c}, @file{plot3d.c} and @file{par.c} contain the code
3525for the around 30 @code{.External} calls that implement the basic
3526graphics operations.  This code then calls functions with names starting
3527with @code{G} and declared in header @file{Rgraphics.h} in file
3528@file{graphics.c}, which in turn call the graphics engine (whose
3529functions almost all have names starting with @code{GE}).
3530
3531A large part of the infrastructure of the base graphics subsystem are
3532the graphics parameters (as set/read by @code{par()}).  These are stored
3533in a @code{GPar} structure declared in the private header
3534@file{Graphics.h}.  This structure has two variables (@code{state} and
3535@code{valid}) tracking the state of the base subsystem on the device,
3536and many variables recording the graphics parameters and functions of
3537them.
3538
3539The base system state is contained in @code{baseSystemState} structure
3540defined in @file{R_ext/GraphicsBase.h}.  This contains three @code{GPar}
3541structures and a Boolean variable used to record if @code{plot.new()}
3542(or @code{persp}) has been used successfully on the device.
3543
3544The three copies of the @code{GPar} structure are used to store the
3545current parameters (accessed via @code{gpptr}), the `device copy'
3546(accessed via @code{dpptr}) and space for a saved copy of the `device
3547copy' parameters.  The current parameters are, clearly, those currently
3548in use and are copied from the `device copy' whenever @code{plot.new()}
3549is called (whether or not that advances to the next `page'). The saved
3550copy keeps the state when the device was last completely cleared (e.g.@:
3551when @code{plot.new()} was called with @code{par(new=TRUE)}), and is
3552used to replay the display list.
3553
3554The separation is not completely clean: the `device copy' is altered if
3555a plot with log scale(s) is set up via @code{plot.window()}.
3556
3557There is yet another copy of most of the graphics parameters in
3558@code{static} variables in @file{graphics.c} which are used to preserve
3559the current parameters across the processing of inline parameters in
3560high-level graphics calls (handled by @code{ProcessInlinePars}).
3561
3562Snapshots of the base subsystem record the `saved device copy' of the
3563@code{GPar} structure.
3564
3565@menu
3566* Arguments and parameters::
3567@end menu
3568
3569@node Arguments and parameters,  , Base graphics, Base graphics
3570@subsection Arguments and parameters
3571
3572There is an unfortunate confusion between some of the graphical
3573parameters (as set by @code{par}) and arguments to base graphic
3574functions of the same name.  This description may help set the record
3575straight.
3576
3577Most of the high-level plotting functions accept graphical parameters as
3578additional arguments, which are then often passed to lower-level
3579functions if not already named arguments (which is the main source of
3580confusion).
3581
3582Graphical parameter @code{bg} is the background colour of the plot.
3583Argument @code{bg} refers to the fill colour for the filled symbols
3584@code{21} to @code{25}.  It is an argument to the function
3585@code{plot.xy}, but normally passed by the default method of
3586@code{points}, often from a @code{plot} method.
3587
3588Graphics parameters @code{cex}, @code{col}, @code{lty}, @code{lwd} and
3589@code{pch} also appear as arguments of @code{plot.xy} and so are often
3590passed as arguments from higher-level plot functions such as
3591@code{lines}, @code{points} and @code{plot} methods.  They appear as
3592arguments of @code{legend}, @code{col}, @code{lty} and @code{lwd} are
3593arguments of @code{arrows} and @code{segments}.  When used as arguments
3594they can be vectors, recycled to control the various lines, points and
3595segments.  When set a graphical parameters they set the default
3596rendering: in addition @code{par(cex=)} sets the overall character
3597expansion which subsequent calls (as arguments or on-line graphical
3598parameters) multiply.
3599
3600The handling of missing values differs in the two classes of uses.
3601Generally these are errors when used in @code{par} but cause the
3602corresponding element of the plot to be omitted when used as an element
3603of a vector argument.  Originally the interpretation of arguments was
3604mainly left to the device, but nowadays some of this is pre-empted in
3605the graphics engine (but for example the handling of @code{lwd = 0}
3606remains device-specific, with some interpreting it as a `thinnest
3607possible' line).
3608
3609@node Grid graphics,  , Base graphics, Graphics Devices
3610@section Grid graphics
3611
3612[At least pointers to documentation.]
3613
3614@node GUI consoles, Tools, Graphics Devices, Top
3615@chapter GUI consoles
3616
3617The standard @R{} front-ends are programs which run in a terminal, but
3618there are several ways to provide a GUI console.
3619
3620This can be done by a package which is loaded from terminal-based @R{}
3621and launches a console as part of its startup code or by the user
3622running a specific function: package @CRANpkg{Rcmdr} is a well-known
3623example with a Tk-based GUI.
3624
3625There used to be a Gtk-based console invoked by @command{R --gui=GNOME}:
3626this relied on special-casing in the front-end shell script to launch a
3627different executable.  There still is @command{R --gui=Tk}, which starts
3628terminal-based @R{} and runs @code{tcltk::tkStartGui()} as part of the
3629modified startup sequence.
3630
3631However, the main way to run a GUI console is to launch a separate
3632program which runs embedded @R{}: this is done by @command{Rgui.exe} on
3633Windows and @command{R.app} on macOS.  The first is an integral part
3634of @R{} and the code for the console is currently in @file{R.dll}.
3635
3636@menu
3637* R.app::
3638@end menu
3639
3640@node R.app,  , GUI consoles, GUI consoles
3641@section R.app
3642
3643@command{R.app} is a macOS application which provides a console.  Its
3644sources are a separate project@footnote{an Xcode project, in SVN at
3645@uref{https://svn.r-project.org/R-packages/trunk/Mac-GUI/}.}, and its binaries
3646link to an @R{} installation which it runs as a dynamic library
3647@file{libR.dylib}.  The standard @acronym{CRAN} distribution of @R{} for
3648macOS bundles the GUI and @R{} itself, but installing the GUI is optional
3649and either component can be updated separately.
3650
3651@command{R.app} relies on @file{libR.dylib} being in a specific place,
3652and hence on @R{} having been built and installed as a Mac macOS
3653`framework'.  Specifically, it uses
3654@file{/Library/Frameworks/R.framework/R}.  This is a symbolic link, as
3655frameworks can contain multiple versions of @R{}.  It eventually
3656resolves to
3657@file{/Library/Frameworks/R.framework/Versions/Current/Resources/lib/libR.dylib},
3658which is (in the @acronym{CRAN} distribution) a `fat' binary containing
3659multiple sub-architectures.
3660
3661macOS applications are directory trees: each @command{R.app} contains
3662a front-end written in Objective-C for one sub-architecture: in the
3663standard distribution there are separate applications for 32- and 64-bit
3664Intel architectures.
3665
3666Originally the @R{} sources contained quite a lot of code used only by
3667the macOS GUI, but this was migrated to the @command{R.app} sources.
3668
3669@command{R.app} starts @R{} as an embedded application with a
3670command-line which includes @option{--gui=aqua} (see below).  It uses
3671most of the interface pointers defined in the header
3672@file{Rinterface.h}, plus a private interface pointer in file
3673@file{src/main/sysutils.c}.  It adds an environment
3674it names @code{tools:RGUI} to the second position in the search path.
3675This contains a number of utility functions used to support the menu
3676items, for example @code{package.manager()}, plus functions @code{q()}
3677and @code{quit()} which mask those in package @pkg{base}---the custom
3678versions save the history in a way specific to @code{R.app}.
3679
3680There is a @command{configure} option @option{--with-aqua} for @R{}
3681which customizes the way @R{} is built: this is distinct from the
3682@option{--enable-R-framework} option which causes @command{make install}
3683to install @R{} as the framework needed for use with @code{R.app}.  (The
3684option @option{--with-aqua} is the default on macOS.)  It sets the
3685macro @code{HAVE_AQUA} in @file{config.h} and the make variable
3686@code{BUILD_AQUA_TRUE}.  These have several consequences:
3687
3688@itemize
3689@item
3690The @code{quartz()} device is built (other than as a stub) in package
3691@pkg{grDevices}: this needs an Objective-C compiler.  Then
3692@code{quartz()} can be used with terminal @R{} provided the latter has
3693access to the macOS screen.
3694
3695@item
3696File @file{src/unix/aqua.c} is compiled.  This now only contains an
3697interface pointer for the @code{quartz()} device(s).
3698
3699@item
3700@code{capabilities("aqua")} is set to @code{TRUE}.
3701
3702@item
3703The default path for a personal library directory is set as
3704@file{~/Library/R/arch/x.y/library}.
3705@c This is done in @file{etc/Renviron}.
3706
3707@item
3708There is support for setting a `busy' indicator whilst waiting for
3709@code{system()} to return.
3710
3711@item
3712@code{R_ProcessEvents} is inhibited in a forked child from package
3713@pkg{parallel}.  The associated callback in @code{R.app} does things
3714which should not be done in a child, and forking forks the whole process
3715including the console.
3716
3717@item
3718There is support for starting the embedded @R{} with the option
3719@option{--gui=aqua}: when this is done the global C variable
3720@code{useaqua} is set to a true value.  This has consequences:
3721
3722@itemize
3723@item
3724The @R{} session is asserted to be interactive @emph{via} @code{R_Interactive}.
3725
3726@item
3727@code{.Platform$GUI} is set to @code{"AQUA"}.  That has consequences:
3728@itemize
3729@item
3730The environment variable @env{DISPLAY} is set to @samp{:0} if not
3731already set.
3732
3733@item
3734@file{/usr/local/bin} is appended to @env{PATH} since that is where
3735@command{gfortran} is installed.
3736
3737@item
3738The default @HTML{} browser is switched to the one in @command{R.app}.
3739
3740@item
3741Various widgets are switched to the versions provided in
3742@command{R.app}: these include graphical menus, the data editor (but not
3743the data viewer used by @code{View()}) and the workspace browser invoked
3744by @code{browseEnv()}.
3745
3746@item
3747The @pkg{grDevices} package when loaded knows that it is being run
3748under @command{R.app} and so informs any @code{quartz} devices that a
3749Quartz event loop is already running.
3750@end itemize
3751
3752@item
3753The use of the OS's @code{system} function (including by @code{system()}
3754and @code{system2()}, and to launch editors and pagers) is replaced by a
3755version in @code{R.app} (which by default just calls the OS's
3756@code{system} with various signal handlers reset).
3757
3758@end itemize
3759
3760@item
3761If either @R{} was started by @option{--gui=aqua} or @R{} is running in
3762a terminal which is not of type @samp{dumb}, the standard output to
3763files @file{stdout} and @file{stderr} is directed through the C function
3764@code{Rstd_WriteConsoleEx}.  This uses ANSI terminal escapes to render
3765lines sent to @code{stderr} as bold on @code{stdout}.
3766
3767@item
3768For historical reasons the startup option @code{-psn} is allowed but
3769ignored.  (It seems that in 2003, @samp{r27492}, this was added by Finder.)
3770
3771@end itemize
3772
3773
3774
3775@node Tools, R coding standards, GUI consoles, Top
3776@chapter Tools
3777
3778The behavior of @command{R CMD check} can be controlled through a
3779variety of command line arguments and environment variables.
3780
3781There is an internal @option{--install=@var{value}} command line
3782argument not shown by @command{R CMD check --help}, with possible values
3783
3784@table @code
3785@item check:@var{file}
3786Assume that installation was already performed with stdout/stderr to
3787@var{file}, the contents of which need to be checked (without repeating
3788the installation).  This is useful for checks applied by repository
3789maintainers: it reduces the check time by the installation time given
3790that the package has already been installed.  In this case, one also
3791needs to specify @emph{where} the package was installed to using command
3792line option @option{--library}.
3793@item fake
3794Fake installation, and turn off the run-time tests.
3795@item skip
3796Skip installation, e.g., when testing recommended packages bundled with
3797R.
3798@item no
3799The same as @option{--no-install} : turns off installation and the tests
3800which require the package to be installed.
3801@end table
3802
3803The following environment variables can be used to customize the
3804operation of @command{check}: a convenient place to set these is the
3805check environment file (default, @file{~/.R/check.Renviron}).
3806
3807@vtable @code
3808@item _R_CHECK_ALL_NON_ISO_C_
3809If true, do not ignore compiler (typically GCC) warnings about non ISO C
3810code in @emph{system} headers.  Note that this may also show additional
3811ISO C++ warnings.
3812Default: false.
3813@item _R_CHECK_FORCE_SUGGESTS_
3814If true, give an error if suggested packages are not available.
3815Default: true (but false for CRAN submission checks).
3816@item _R_CHECK_RD_CONTENTS_
3817If true, check @file{Rd} files for auto-generated content which needs
3818editing, and missing argument documentation.
3819Default: true.
3820@item _R_CHECK_RD_LINE_WIDTHS_
3821If true, check @file{Rd} line widths in usage and examples sections.
3822Default: false (but true for CRAN submission checks).
3823@item _R_CHECK_RD_STYLE_
3824If true, check whether @file{Rd} usage entries for S3 methods use the full
3825function name rather than the appropriate @code{\method} markup.
3826Default: true.
3827@item _R_CHECK_RD_XREFS_
3828If true, check the cross-references in @file{.Rd} files.
3829Default: true.
3830@item _R_CHECK_SUBDIRS_NOCASE_
3831If true, check the case of directories such as @file{R} and @file{man}.
3832Default: true.
3833@item _R_CHECK_SUBDIRS_STRICT_
3834Initial setting for @option{--check-subdirs}.
3835Default: @samp{default} (which checks only tarballs, and checks in the
3836@file{src} only if there is no @file{configure} file).
3837@item _R_CHECK_USE_CODETOOLS_
3838If true, make use of the @CRANpkg{codetools} package, which provides a
3839detailed analysis of visibility of objects (but may give false
3840positives).
3841Default: true (if recommended packages are installed).
3842@item _R_CHECK_USE_INSTALL_LOG_
3843If true, record the output from installing a package as part of its
3844check to a log file (@file{00install.out} by default), even when running
3845interactively.
3846Default: true.
3847@item _R_CHECK_VIGNETTES_NLINES_
3848Maximum number of lines to show from the bottom of the output when
3849reporting errors in running or re-building vignettes. ( Value @code{0}
3850means all lines will be shown.)
3851Default: 10 for running, 25 for re-building.
3852@item _R_CHECK_CODOC_S4_METHODS_
3853Control whether @code{codoc()} testing is also performed on S4 methods.
3854Default: true.
3855@item _R_CHECK_DOT_INTERNAL_
3856Control whether the package code is scanned for @code{.Internal} calls,
3857which should only be used by base (and occasionally by recommended) packages.
3858Default: true.
3859@item _R_CHECK_EXECUTABLES_
3860Control checking for executable (binary) files.
3861Default: true.
3862@item _R_CHECK_EXECUTABLES_EXCLUSIONS_
3863Control whether checking for executable (binary) files ignores files
3864listed in the package's @file{BinaryFiles} file.
3865Default: true (but false for CRAN submission checks).
3866However, most likely this package-level override mechanism will be
3867removed eventually.
3868@item _R_CHECK_PERMISSIONS_
3869Control whether permissions of files should be checked.
3870Default: true iff @code{.Platform$OS.type == "unix"}.
3871@item _R_CHECK_FF_CALLS_
3872Allows turning off @code{checkFF()} testing. If set to
3873@samp{registration}, checks the registration information (number of
3874arguments, correct choice of @code{.C/.Fortran/.Call/.External}) for
3875such calls provided the package is installed.
3876Default: true.
3877@item _R_CHECK_FF_DUP_
3878Controls @code{checkFF(check_DUP)}
3879Default: true (and forced to be true for CRAN submission checks).
3880@item _R_CHECK_LICENSE_
3881Control whether/how license checks are performed. A possible value is
3882@samp{maybe} (warn in case of problems, but not about standardizable
3883non-standard license specs).
3884Default: true.
3885@item _R_CHECK_RD_EXAMPLES_T_AND_F_
3886Control whether @code{check_T_and_F()} also looks for ``bad'' (global)
3887@samp{T}/@samp{F} uses in examples.
3888Off by default because this can result in false positives.
3889@item _R_CHECK_RD_CHECKRD_MINLEVEL_
3890Controls the minimum level for reporting warnings from @code{checkRd}.
3891Default: -1.
3892@item _R_CHECK_XREFS_REPOSITORIES_
3893If set to a non-empty value, a space-separated list of repositories to
3894use to determine known packages.  Default: empty, when the CRAN
3895and Bioconductor repositories known to @R{} is used.
3896@item _R_CHECK_SRC_MINUS_W_IMPLICIT_
3897Control whether installation output is checked for compilation warnings
3898about implicit function declarations (as spotted by GCC with command
3899line option @option{-Wimplicit-function-declaration}, which is implied
3900by @option{-Wall}).
3901Default: false.
3902@item _R_CHECK_SRC_MINUS_W_UNUSED_
3903Control whether installation output is checked for compilation warnings
3904about unused code constituents (as spotted by GCC with command line
3905option @option{-Wunused}, which is implied by @option{-Wall}).
3906Default: true.
3907@item _R_CHECK_WALL_FORTRAN_
3908Control whether gfortran 4.0 or later @option{-Wall} warnings are used in
3909the analysis of installation output.
3910Default: false, even though the warnings are justifiable.
3911@item _R_CHECK_ASCII_CODE_
3912If true, check @R{} code for non-ascii characters.
3913Default: true.
3914@item _R_CHECK_ASCII_DATA_
3915If true, check data for non-ascii characters.  @emph{En route}, checks
3916that all the datasets can be loaded and that their components can be
3917accessed.
3918Default: true.
3919@item _R_CHECK_COMPACT_DATA_
3920If true, check data for ascii and uncompressed saves, and also check if
3921using @command{bzip2} or @code{xz} compression would be significantly
3922better.
3923Default: true.
3924@item _R_CHECK_SKIP_ARCH_
3925Comma-separated list of architectures that will be omitted from
3926checking in a multi-arch setup.
3927Default: none.
3928@item _R_CHECK_SKIP_TESTS_ARCH_
3929Comma-separated list of architectures that will be omitted from
3930running tests in a multi-arch setup.
3931Default: none.
3932@item _R_CHECK_SKIP_EXAMPLES_ARCH_
3933Comma-separated list of architectures that will be omitted from
3934running examples in a multi-arch setup.
3935Default: none.
3936@item _R_CHECK_VC_DIRS_
3937Should the unpacked package directory be checked for version-control
3938directories (@file{CVS}, @file{.svn} @dots{})?
3939Default: true for tarballs.
3940@item _R_CHECK_PKG_SIZES_
3941Should @command{du} be used to find the installed sizes of packages?
3942@command{R CMD check} does check for the availability of @command{du}.
3943but this option allows the check to be overruled if an unsuitable
3944command is found (including one that does not respect the @option{-k}
3945flag to report in units of 1Kb, or reports in a different format -- the
3946GNU, macOS and Solaris @command{du} commands have been tested).
3947Default: true if @command{du} is found.
3948@item _R_CHECK_PKG_SIZES_THRESHOLD_
3949Threshold used for @env{_R_CHECK_PKG_SIZES_} (in Mb).
3950Default: 5
3951@item _R_CHECK_DOC_SIZES_
3952Should @command{qpdf} be used to check the installed sizes of PDFs?
3953Default: true if @command{qpdf} is found.
3954@item _R_CHECK_DOC_SIZES2_
3955Should @command{gs} be used to check the installed sizes of PDFs?  This
3956is slower than (and in addition to) the previous check, but does detect
3957figures with excessive detail (often hidden by over-plotting) or bitmap
3958figures with too high a resolution.  Requires that @env{R_GSCMD} is set
3959to a valid program, or @command{gs} (or on Windows,
3960@command{gswin32.exe} or @command{gswin64c.exe}) is on the path.
3961Default: false (but true for CRAN submission checks).
3962@item _R_CHECK_ALWAYS_LOG_VIGNETTE_OUTPUT_
3963By default the output from running the @R{} code in the vignettes is
3964kept only if there is an error.  This also applies to the
3965@file{build_vignettes.log} log from the re-building of vignettes.
3966Default: false.
3967@item _R_CHECK_CLEAN_VIGN_TEST_
3968Should the @file{vign_test} directory be removed if the test is
3969successful?
3970Default: true.
3971@item _R_CHECK_REPLACING_IMPORTS_
3972Should warnings about replacing imports be reported?  These sometimes come
3973from auto-generated @file{NAMESPACE} files in other packages, but most
3974often from importing the whole of a namespace rather than using
3975@code{importFrom}.
3976Default: true.
3977@item _R_CHECK_UNSAFE_CALLS_
3978Check for calls that appear to tamper with (or allow tampering with)
3979already loaded code not from the current package: such calls may well
3980contravene CRAN policies.
3981Default: true.
3982@item _R_CHECK_TIMINGS_
3983Optionally report timings for installation, examples, tests and
3984running/re-building vignettes as part of the check log.  The format is
3985@samp{[as/bs]} for the total CPU time (including child processes)
3986@samp{a} and elapsed time @samp{b}, except on Windows, when it is
3987@samp{[bs]}.  In most cases timings are only given for @samp{OK} checks.
3988Times with an elapsed component over 10 mins are reported in minutes
3989(with abbreviation @samp{m}).  The value is the smallest numerical value
3990in elapsed seconds that should be reported: non-numerical values
3991indicate that no report is required, a value of @samp{0} that a report
3992is always required.
3993Default: @code{""}. (@code{10} for CRAN checks.)
3994
3995@item _R_CHECK_EXAMPLE_TIMING_THRESHOLD_
3996If timings are being recorded, set the threshold in seconds for
3997reporting long-running examples (either user+system CPU time or elapsed
3998time).  Default: @code{"5"}.
3999
4000@item _R_CHECK_EXAMPLE_TIMING_CPU_TO_ELAPSED_THRESHOLD_
4001For checks with timings enabled, report examples where the ratio of CPU
4002time to elapsed time exceeds this threshold (and the CPU time is at
4003least one second).  This can help detect the simultaneous use of
4004multiple CPU cores.
4005Default: @code{NA}.
4006
4007@item _R_CHECK_TEST_TIMING_CPU_TO_ELAPSED_THRESHOLD_
4008Report for running an individual test if the ratio of CPU time to
4009elapsed time exceeds this threshold (and the CPU time is at least one
4010second).  Not supported on Windows.
4011Default: @code{NA}.
4012
4013@item _R_CHECK_VIGNETTE_TIMING_CPU_TO_ELAPSED_THRESHOLD_
4014Report if when running/re-building vignettes (individually or in
4015aggregate) the ratio of CPU time to elapsed time exceeds this threshold
4016(and the CPU time is at least one second).  Not supported on
4017Windows.
4018Default: @code{NA}.
4019
4020@item _R_CHECK_INSTALL_DEPENDS_
4021If set to a true value and a test installation is to be done, this is
4022done with @code{.libPaths()} containing just a temporary library
4023directory and @code{.Library}.  The temporary library is populated by
4024symbolic links@footnote{under Windows, junction points, or copies if
4025environment variable @env{R_WIN_NO_JUNCTIONS} has a non-empty value.}
4026to the installed copies of all the Depends/Imports/LinkingTo packages
4027which are not in @code{.Library}.  Default: false (but true for CRAN
4028submission checks).
4029
4030Note that this is actually implemented in @command{R CMD INSTALL}, so it
4031is available to those who first install recording to a log, then call
4032@command{R CMD check}.
4033
4034@item _R_CHECK_DEPENDS_ONLY_
4035@itemx _R_CHECK_SUGGESTS_ONLY_
4036If set to a true value, running examples, tests and vignettes is done
4037with @code{.libPaths()} containing just a temporary library directory
4038and @code{.Library}.  The temporary library is populated by symbolic
4039links@footnote{see the previous footnote.} to the installed copies of
4040all the Depends/Imports and (for the second only) Suggests packages
4041which are not in @code{.Library}.  (As exceptions, packages in a
4042@samp{VignetteBuilder} field and test-suite managers in @samp{Suggests}
4043are always made available.)
4044Default: false (but
4045@env{_R_CHECK_SUGGESTS_ONLY_} is true for CRAN submission checks: some
4046of the regular checks use true
4047@c Solaris and Windows
4048and some use false).
4049
4050@item _R_CHECK_DEPENDS_ONLY_DATA_
4051Apply @env{_R_CHECK_DEPENDS_ONLY_} only to the check of loading from
4052the @file{data} directory, so checks if any dataset depends on
4053packages which are in Suggests or undeclared.  Default: false (but
4054true for CRAN submission checks)
4055
4056@item _R_CHECK_NO_RECOMMENDED_
4057If set to a true value, augment the previous checks to make recommended
4058packages unavailable unless declared.
4059Default: false (but true for CRAN submission checks).
4060
4061This may give false positives on code which uses
4062@code{grDevices::densCols} and @code{stats:::asSparse} as these invoke
4063@CRANpkg{KernSmooth} and @CRANpkg{Matrix} respectively.
4064
4065@item _R_CHECK_CODETOOLS_PROFILE_
4066A string with comma-separated @code{@var{name}=@var{value}} pairs (with
4067@var{value} a logical constant) giving additional arguments for the
4068@CRANpkg{codetools} functions used for analyzing package code.  E.g.,
4069use @code{_R_CHECK_CODETOOLS_PROFILE_="suppressLocalUnused=FALSE"} to
4070turn off suppressing warnings about unused local variables.  Default: no
4071additional arguments, corresponding to using @code{skipWith = TRUE},
4072@code{suppressPartialMatchArgs = FALSE} and @code{suppressLocalUnused =
4073TRUE}.
4074
4075@item _R_CHECK_CRAN_INCOMING_
4076Check whether package is suitable for publication on CRAN.
4077Default: false, except for CRAN submission checks.
4078
4079@item _R_CHECK_CRAN_INCOMING_REMOTE_
4080Include checks that require remote access among the above.
4081Default: same as @code{_R_CHECK_CRAN_INCOMING_}
4082
4083@item _R_CHECK_XREFS_USE_ALIASES_FROM_CRAN_
4084When checking anchored Rd xrefs, use Rd aliases from the CRAN package
4085web areas in addition to those in the packages installed locally.
4086Default: false.
4087
4088@item _R_SHLIB_BUILD_OBJECTS_SYMBOL_TABLES_
4089Make the checks of compiled code more accurate by recording the symbol
4090tables for objects (@file{.o} files) at installation in a file
4091@file{symbols.rds}.  (Only currently supported on Linux, Solaris, macOS,
4092Windows and FreeBSD.)
4093Default: true.
4094
4095@item _R_CHECK_CODE_ASSIGN_TO_GLOBALENV_
4096Should the package code be checked for assignments to the global
4097environment?
4098Default: false (but true for CRAN submission checks).
4099
4100@item _R_CHECK_CODE_ATTACH_
4101Should the package code be checked for calls to @code{attach()}?
4102Default: false (but true for CRAN submission checks).
4103
4104@item _R_CHECK_CODE_DATA_INTO_GLOBALENV_
4105Should the package code be checked for calls to @code{data()} which load
4106into the global environment?
4107Default: false (but true for CRAN submission checks).
4108
4109@item _R_CHECK_DOT_FIRSTLIB_
4110Should the package code be checked for the presence of the obsolete function
4111@code{.First.lib()}?
4112Default: false (but true for CRAN submission checks).
4113
4114@item _R_CHECK_DEPRECATED_DEFUNCT_
4115Should the package code be checked for the presence of recently deprecated
4116or defunct functions (including completely removed functions).  Also for
4117platform-specific graphics devices.
4118Default: false (but true for CRAN submission checks).
4119
4120@item _R_CHECK_SCREEN_DEVICE_
4121If set to @samp{warn}, give a warning if examples etc open a screen
4122device.  If set to @samp{stop}, give an error.
4123Default: empty (but @samp{stop} for CRAN submission checks).
4124
4125@item _R_CHECK_WINDOWS_DEVICE_
4126If set to @samp{stop}, give an error if a Windows-only device is used in
4127example etc.  This is only useful on Windows: the devices do not exist
4128elsewhere.
4129Default: empty (but @samp{stop} for CRAN submission checks on Windows).
4130
4131@item _R_CHECK_TOPLEVEL_FILES_
4132Report on top-level files in the package sources that are not described
4133in `Writing R Extensions' nor are commonly understood (like
4134@file{ChangeLog}).  Variations on standard names (e.g.@:
4135@file{COPYRIGHT}) are also reported.
4136Default: false (but true for CRAN submission checks).
4137
4138@item _R_CHECK_GCT_N_
4139Should the @option{--use-gct} use @code{gctorture2(@var{n})} rather than
4140@code{gctorture(TRUE)}?  Use a positive integer to enable this.
4141Default: @code{0}.
4142
4143@item _R_CHECK_LIMIT_CORES_
4144If set, check the usage of too many cores in package @pkg{parallel}.  If
4145set to @samp{warn} gives a warning, to @samp{false} or @samp{FALSE} the
4146check is skipped, and any other non-empty value gives an error when more
4147than 2 children are spawned.
4148Default: unset (but @samp{TRUE} for CRAN submission checks).
4149
4150@item _R_CHECK_CODE_USAGE_VIA_NAMESPACES_
4151If set, check code usage (via @CRANpkg{codetools}) directly on the
4152package namespace without loading and attaching the package and its
4153suggests and enhances.
4154Default: true (and true for CRAN submission checks).
4155
4156@item _R_CHECK_CODE_USAGE_WITH_ONLY_BASE_ATTACHED_
4157If set, check code usage (via @CRANpkg{codetools}) with only the base
4158package attached.
4159Default: true.
4160
4161@item _R_CHECK_EXIT_ON_FIRST_ERROR_
4162If set to a true value, the check will exit on the first error.
4163Default: false.
4164
4165@item _R_CHECK_S3_METHODS_NOT_REGISTERED_
4166If set to a true value, report (apparent) S3 methods exported but not
4167registered.
4168Default: true.
4169
4170@item _R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_
4171If set to a true value, report already registered S3 methods in
4172base/recommended packages which are overwritten when this package's
4173namespace is loaded.
4174Default: false (but true for CRAN submission checks).
4175
4176@item _R_CHECK_TESTS_NLINES_
4177Number of trailing lines of test output to reproduce in the log.  If
4178@code{0} all lines except the @R{} preamble are reproduced.
4179Default: 13.
4180
4181@item _R_CHECK_NATIVE_ROUTINE_REGISTRATION_
4182If set to a true value, report if the entry points to register native
4183routines and to suppress dynamic search are not found in a package's
4184DLL.  (@strong{NB:} this requires system command @command{nm} to be on the
4185@env{PATH}. On Windows, @command{objdump.exe} is first searched for in
4186compiler toolchain specified via @code{Makeconf} (can be customized by
4187environment variable @env{BINPREF}). If not found there, it must be on the
4188@env{PATH}. On Unix this would be normal when using a package with compiled
4189code (which are the only ones this checks), but Windows' users should check.
4190Default: false (but true for CRAN submission checks).
4191
4192@item _R_CHECK_NO_STOP_ON_TEST_ERROR_
4193If set to a true value, do not stop running tests after first error (as
4194if command line option @option{--no-stop-on-test-error} had been given).
4195Default: false (but true for CRAN submission checks).
4196
4197@item _R_CHECK_PRAGMAS_
4198Run additional checks on the pragmas in C/C++ source code and headers.
4199Default: false (but true for CRAN submission checks).
4200
4201@item _R_CHECK_COMPILATION_FLAGS_
4202If the package is installed and has C/C++/Fortran code, check the
4203install log for non-portable flags (for example those added to
4204@file{src/Makevars} during configuration).  Currently @option{-W} flags
4205are reported, except @option{-Wall}, @option{-Wextra} and
4206@option{-Weverything}, and flags which appear to be attempts to suppress
4207warnings are highlighted.
4208See
4209@ifset UseExternalXrefs
4210@ref{Writing portable packages, , Writing portable packages, R-exts, Writing R Extensions}
4211@end ifset
4212@ifclear UseExternalXrefs
4213`Writing R Extensions'
4214@end ifclear
4215for the rationale of this check (and why even @option{-Werror} is
4216unsafe).
4217
4218Environment variable @env{_R_CHECK_COMPILATION_FLAGS_KNOWN_} can be set
4219to a space-separated set of flags which come from the @R{} build used
4220for testing (flags such as @option{-Wall} and @option{-Wextra} are
4221already known).  For example, for CRAN build of @R{} >= 4.0.0 on macOS
4222one could use
4223@example
4224_R_CHECK_COMPILATION_FLAGS_KNOWN_="-mmacosx-version-min=10.13"
4225@end example
4226@noindent
4227Default: false (but true for CRAN submission checks).
4228
4229@item _R_CHECK_R_DEPENDS_
4230Check that any dependence on R is not on a recent patch-level version
4231such as @code{R (>= 3.3.3)} since blocking installation of a package
4232will also block its reverse dependencies.  Possible values
4233@samp{"note"}, @samp{"warn"} and logical values (where currently true
4234values are equivalent to @samp{"note"}).
4235Default: false (but @samp{"warn"} for @option{--as-cran}).
4236
4237@item _R_CHECK_SERIALIZATION_
4238Check that serialized @R{} objects in the package sources were
4239serialized with version 2 and there is no dependence on @samp{R >=
42403.5.0}.  (Version 3 is in use as from @R{} 3.5.0 but should only be used
4241when necessary.)
4242Default: false (but true for CRAN submission checks).
4243
4244@item _R_CHECK_R_ON_PATH_
4245This checks if the package attempts to use @command{R} or
4246@command{Rscript} from the path rather than that under test.
4247It does so by putting scripts at the head of the path which print a
4248message and fail.
4249Default: false (but true for CRAN submission checks).
4250
4251@item _R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_
4252If set to a true value, also check the R code in common unit test
4253subdirectories of @file{tests} for undeclared package dependencies.
4254Default: false (but true for CRAN submission checks).
4255
4256@item _R_CHECK_SHLIB_OPENMP_FLAGS_
4257Check correct and portable use of @code{SHLIB_OPENMP_*FLAGS} in
4258@file{src/Makevars} (and similar).
4259Default: false (but true for CRAN submission checks).
4260
4261@item _R_CHECK_CONNECTIONS_LEFT_OPEN_
4262When checking examples, check for each example if connections are left
4263open: if any are found, this is reported with a fatal error.  NB:
4264`connections' includes most use of files and any parallel clusters which
4265have not be stopped by @code{stopCluster()}.
4266Default: false (but true for CRAN submission checks).
4267
4268@item _R_CHECK_FUTURE_FILE_TIMESTAMPS_
4269Check if any of the input files has a timestamp in the future (and to do
4270so, checks that the system clock is correct to within 5 minutes).
4271Default: false (but true for CRAN submission checks).
4272@c _R_CHECK_SYSTEM_CLOCK_ can be used to disable the clock check, for
4273@c use on a check farm.
4274
4275@item _R_CHECK_LENGTH_1_CONDITION_
4276Optionally check if the condition in @code{if} and @code{while} statements
4277has length greater than one.  For a true value (@samp{T}, @samp{True},
4278@samp{TRUE} or @samp{true}), give an error.  For a false value (@samp{F},
4279@samp{False}, @samp{FALSE} or @samp{false}) or when unset, print a warning.
4280Any other non-true non-empty value needs to be a list of commands separated
4281by comma: @samp{abort} causes R to terminate unconditionally instead of
4282signalling an error, @samp{verbose} prints very detailed diagnostic message,
4283@samp{package:pkg} restricts the check to if/while statements executing in
4284the namespace of package @samp{pkg}, @samp{package:_R_CHECK_PACKAGE_NAME_}
4285restricts the check to if/while statements executing in the package that is
4286currently being checked by @code{R CMD check}, @samp{warn} causes R to
4287report a warning instead of signalling an error.
4288Default: unset (warning is reported, but
4289@samp{package:_R_CHECK_PACKAGE_NAME_,[abort,]verbose} for the CRAN submission checks).
4290
4291@item _R_CHECK_LENGTH_1_LOGIC2_
4292Optionally check if an argument of the binary operators @code{&&} and
4293@code{||} has length greater than one, checked only if it is used.  The
4294format is the same as for @samp{_R_CHECK_LENGTH_1_CONDITION_}.
4295Default: unset (nothing is reported, but
4296@samp{package:_R_CHECK_PACKAGE_NAME_,[abort,]verbose} for the CRAN
4297submission checks).
4298
4299@item _R_CHECK_BUILD_VIGNETTES_SEPARATELY_
4300Prior to @R{} 3.6.0, re-building the vignette outputs was done in a
4301single @R{} session which allowed accidental reliance of one vignette on
4302another (for example, in the loading of packages).  The current default
4303is to use a separate session for each vignette; this option allows
4304testing the older behaviour,
4305Default: true
4306
4307@item _R_CHECK_SYSTEM_CLOCK_
4308As part of the `checking for future file timestamps' enabled by
4309@option{--as-cran}, check the system clock against an external clock to
4310catch errors such as the wrong day or even year.  Not necessary on
4311systems doing repeated checks.
4312Default: true (but false for CRAN checking)
4313
4314@item _R_CHECK_AUTOCONF_
4315For packages with a @file{configure} file generated by GNU
4316@command{autoconf} and either @file{configure.ac} or
4317@file{configure,.in}, check that @command{autoreconf} can, if available,
4318be run in a copy of the sources (this will detect missing source files
4319and report @command{autoconf} warnings).
4320Default: false (but true for CRAN submission checks).
4321
4322@item _R_CHECK_DATALIST_
4323Check whether file @file{data/datalist} is out-of-date.
4324Default: false (but true for CRAN submission checks).
4325
4326@item _R_CHECK_THINGS_IN_CHECK_DIR_
4327Check and report at the end of the check run if files have been left in
4328the check directory.
4329Default: false (but true for CRAN submission checks).
4330
4331@item _R_CHECK_THINGS_IN_TEMP_DIR_
4332Check and report at the end of tthe check run if files would have been
4333left in the temporary directory (usually @file{/tmp} on a Unix-alike).
4334It does this by setting the environment variable @env{TEMPDIR} to a
4335subdirectory of the @R{} session directory for the @code{check} process:
4336if any files or directories are left there they are removed.  Since some
4337of these might be out of the user's control, environment variable
4338@env{_R_CHECK_THINGS_IN_TEMP_DIR_EXCLUDE_} can specify an (extended
4339regex) pattern of file names not to be reported -- CRAN uses
4340@samp{^ompi.} for directories left behind by OpenMPI.  There are rare
4341instances where @env{TEMPDIR} is not respected and so files are left in
4342@file{/tmp} (and not reported): one example is
4343@file{/tmp/boost_interprocess} on some OSes.
4344@c macOS is one.
4345Default: false (but true for CRAN submission checks).
4346
4347@item _R_CHECK_BASHISMS_
4348Check the top-level scripts @file{configure} (unless generated by
4349@file{autoconf}) and @file{cleanup} for non-Bourne-shell code, using the
4350Perl script @command{checkbashisms} if available.  This includes
4351reporting scripts using the non-portable @code{#! /bin/bash}.
4352(Script @command{checkbashisms} is available in most Linux distributions
4353in a package named either @samp{devscripts} or @samp{devscripts-checkbashisms}
4354and from @uref{https://sourceforge.net/projects/checkbaskisms/files}.)
4355Default: false (but true for CRAN submission checks except on Windows).
4356
4357@item _R_CHECK_ORPHANED_
4358Check if dependencies are orphaned packages.  As from @R{} 4.1.0 this
4359checks strict dependencies recursively, so will report any orphaned
4360packages which are needed to attach the package by @code{library()} as
4361well as any orphaned packages which are suggested.
4362Default: false (but true for CRAN submission checks).
4363
4364@item _R_CHECK_EXCESSIVE_IMPORTS_
4365A positive integer.  If set, give a NOTE if the number of imports from
4366non-base packages exceed this threshold.  Large numbers of imports
4367make a package vulnerable to any of them becoming unavailable.
4368Default: unset (but 20 for CRAN submission checks)
4369
4370@item _R_CHECK_DONTTEST_EXAMPLES_
4371If true and examples are found with @code{\donttest} sections, the
4372tests are run in one pass with these commented out and then in a
4373second pass including the @code{\donttest} sections, (for the main
4374architecture only).  Only for the first pass are the results compared
4375to any @file{.Rout.save} file and timings analysed.  Overridden by
4376@option{--run-donttest}.
4377Default: false unless @option{-as-cran} is specified (which can be
4378overridden by setting @samp{_R_CHECK_DONTTEST_EXAMPLES_=false}).
4379
4380@item _R_CHECK_XREFS_PKGS_ARE_DECLARED_
4381Check if packages used in `anchored' cross-references in @file{.Rd}
4382files (those of the form @code{\link[@var{pkg}]@{@var{foo}@}} and
4383@code{\link[@var{pkg:bar}]@{@var{foo}@}}) are declared in the
4384@file{DESCRIPTION} file and so these links can be checked.
4385Default: false.
4386
4387@item _R_CHECK_XREFS_MIND_SUSPECT_ANCHORS_
4388Check if package-anchored Rd cross-references are to @emph{files} (and
4389not aliases).
4390Default: false.
4391
4392@item _R_CHECK_BOGUS_RETURN_
4393If true and @env{_R_CHECK_USE_CODETOOLS_} is also true, functions are
4394scanned for use of @code{return} rather than @code{return()}.
4395Default: false (but true for CRAN submission checks).
4396@end vtable
4397
4398CRAN's submission checks use something like
4399
4400@example
4401_R_CHECK_CRAN_INCOMING_=TRUE
4402_R_CHECK_CRAN_INCOMING_REMOTE_=TRUE
4403_R_CHECK_VC_DIRS_=TRUE
4404_R_CHECK_TIMINGS_=10
4405_R_CHECK_INSTALL_DEPENDS_=TRUE
4406_R_CHECK_SUGGESTS_ONLY_=TRUE
4407_R_CHECK_NO_RECOMMENDED_=TRUE
4408_R_CHECK_EXECUTABLES_EXCLUSIONS_=FALSE
4409_R_CHECK_DOC_SIZES2_=TRUE
4410_R_CHECK_CODE_ASSIGN_TO_GLOBALENV_=TRUE
4411_R_CHECK_CODE_ATTACH_=TRUE
4412_R_CHECK_CODE_DATA_INTO_GLOBALENV_=TRUE
4413_R_CHECK_CODE_USAGE_VIA_NAMESPACES_=TRUE
4414_R_CHECK_DOT_FIRSTLIB_=TRUE
4415_R_CHECK_DEPRECATED_DEFUNCT_=TRUE
4416_R_CHECK_REPLACING_IMPORTS_=TRUE
4417_R_CHECK_SCREEN_DEVICE_=stop
4418_R_CHECK_TOPLEVEL_FILES_=TRUE
4419_R_CHECK_S3_METHODS_NOT_REGISTERED_=TRUE
4420_R_CHECK_OVERWRITE_REGISTERED_S3_METHODS_=TRUE
4421_R_CHECK_PRAGMAS_=TRUE
4422_R_CHECK_COMPILATION_FLAGS_=TRUE
4423_R_CHECK_R_DEPENDS_=warn
4424_R_CHECK_SERIALIZATION_=TRUE
4425_R_CHECK_R_ON_PATH_=TRUE
4426_R_CHECK_PACKAGES_USED_IN_TESTS_USE_SUBDIRS_=TRUE
4427_R_CHECK_SHLIB_OPENMP_FLAGS_=TRUE
4428_R_CHECK_CONNECTIONS_LEFT_OPEN_=TRUE
4429_R_CHECK_FUTURE_FILE_TIMESTAMPS_=TRUE
4430_R_CHECK_LENGTH_1_CONDITION_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose
4431_R_CHECK_LENGTH_1_LOGIC2_=package:_R_CHECK_PACKAGE_NAME_,abort,verbose
4432_R_CHECK_AUTOCONF_=true
4433_R_CHECK_DATALIST_=true
4434_R_CHECK_THINGS_IN_CHECK_DIR_=true
4435_R_CHECK_THINGS_IN_TEMP_DIR_=true
4436_R_CHECK_BASHISMS_=true
4437_R_CLASS_MATRIX_ARRARY_=true
4438_R_CHECK_ORPHANED_=true
4439_R_CHECK_BOGUS_RETURN_=true
4440@end example
4441
4442@noindent
4443These are turned on by @command{R CMD check --as-cran}: the incoming
4444checks also use
4445@example
4446_R_CHECK_FORCE_SUGGESTS_=FALSE
4447@end example
4448
4449@noindent
4450since some packages do suggest other packages not available on CRAN or
4451other commonly-used repositories.
4452
4453Several environment variables can be used to set `timeouts': limits for
4454the elapsed time taken by the sub-processes used for parts of the
4455checks.  A value of @code{0} indicates no limit, and is the default.
4456Character strings ending in @samp{s}, @samp{m} or @samp{h} indicate a
4457number of seconds, minutes or hours respectively: other values are
4458interpreted as a whole number of seconds (with invalid inputs being
4459treated as no limit).
4460@vtable @code
4461@item _R_CHECK_ELAPSED_TIMEOUT_
4462The default timeout for sub-processes not otherwise mentioned, and the
4463default value for all except @env{_R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_}.
4464(This is also used by @code{tools::check_packages_in_dir}.)
4465
4466@item _R_CHECK_INSTALL_ELAPSED_TIMEOUT_
4467Limit for when @command{R CMD INSTALL} is run by @command{check}.
4468
4469@item _R_CHECK_EXAMPLES_ELAPSED_TIMEOUT_
4470Limit for running all the examples for one sub-architecture.
4471
4472@item _R_CHECK_ONE_TEST_ELAPSED_TIMEOUT_
4473Limit for running one test for one sub-architecture.  Default
4474@env{_R_CHECK_TESTS_ELAPSED_TIMEOUT_}.
4475
4476@item _R_CHECK_TESTS_ELAPSED_TIMEOUT_
4477Limit for running all the tests for one sub-architecture (and the
4478default limit for running one test).
4479
4480@item _R_CHECK_ONE_VIGNETTE_ELAPSED_TIMEOUT_
4481Limit for running the @R{} code in one vignette, including for
4482re-building each vignette separately.
4483
4484@item _R_CHECK_BUILD_VIGNETTES_ELAPSED_TIMEOUT_
4485Limit for re-building all vignettes.
4486
4487@item _R_CHECK_PKGMAN_ELAPSED_TIMEOUT_
4488Limit for each attempt at building the PDF package manual.
4489@end vtable
4490
4491Another variable which enables stricter checks is to set
4492@env{R_CHECK_CONSTANTS} to @code{5}.  This checks that
4493nothing@footnote{The usual culprits are calls to compiled code
4494@emph{via} @code{.Call} or @code{.External} which alter their
4495arguments.} changes the values of `constants'@footnote{things which the
4496byte compiler assumes do not change, e.g.@: function bodies.} in @R{}
4497code.  This is best used in conjunction with setting
4498@env{R_JIT_STRATEGY} to @code{3}, which checks code on first use (by
4499default most code is only checked after byte-compilation on second use).
4500Unfortunately these checks slow down checking of examples, tests and
4501vignettes, typically two-fold but in the worst cases at least a
4502hundred-fold.
4503
4504The following environment variables can be used to customize the
4505operation of @command{INSTALL}.
4506
4507@vtable @code
4508@item _R_INSTALL_LIBS_ONLY_FORCE_DEPENDS_IMPORTS_
4509If true, give an error if installing only package libraries via
4510@option{--libs-only} and some package imported or depended on is not
4511available.
4512Default: true (false only for special applications, which analyze native
4513code of packages).
4514@end vtable
4515
4516@node R coding standards, Testing R code, Tools, Top
4517@chapter R coding standards
4518
4519@cindex coding standards
4520@R{} is meant to run on a wide variety of platforms, including Linux and
4521most variants of Unix as well as Windows and macOS.
4522Therefore, when extending @R{} by either adding to the @R{} base
4523distribution or by providing an add-on package, one should not rely on
4524features specific to only a few supported platforms, if this can be
4525avoided.  In particular, although most @R{} developers use @acronym{GNU}
4526tools, they should not employ the @acronym{GNU} extensions to standard
4527tools.  Whereas some other software packages explicitly rely on e.g.@:
4528@acronym{GNU} make or the @acronym{GNU} C++ compiler, @R{} does not.
4529Nevertheless, @R{} is a @acronym{GNU} project, and the spirit of the
4530@cite{@acronym{GNU} Coding Standards} should be followed if possible.
4531
4532The following tools can ``safely be assumed'' for @R{} extensions.
4533
4534@itemize @bullet
4535@item
4536An ISO C99 C compiler.  Note that extensions such as @acronym{POSIX}
45371003.1 must be tested for, typically using Autoconf unless you are sure
4538they are supported on all mainstream @R{} platforms (including Windows
4539and macOS).
4540
4541@item
4542A fixed-form Fortran compiler.
4543
4544@item
4545A simple @command{make}, considering the features of @command{make} in
45464.2 @acronym{BSD} systems as a baseline.
4547@findex make
4548
4549@acronym{GNU} or other extensions, including pattern rules using
4550@samp{%}, the automatic variable @samp{$^}, the @samp{+=} syntax to
4551append to the value of a variable, the (``safe'') inclusion of makefiles
4552with no error, conditional execution, and many more, must not be used
4553(see Chapter ``Features'' in the @cite{@acronym{GNU} Make Manual} for
4554more information).  On the other hand, building @R{} in a separate
4555directory (not containing the sources) should work provided that
4556@command{make} supports the @code{VPATH} mechanism.
4557
4558Windows-specific makefiles can assume @acronym{GNU} @command{make} 3.79
4559or later, as no other @command{make} is viable on that platform.
4560
4561@item
4562A Bourne shell and the ``traditional'' Unix programming tools, including
4563@command{grep}, @command{sed}, and @command{awk}.
4564
4565There are @acronym{POSIX} standards for these tools, but these may not
4566be fully supported.  Baseline features could be determined from a book
4567such as @cite{The UNIX Programming Environment} by Brian W. Kernighan &
4568Rob Pike.  Note in particular that @samp{|} in a regexp is an extended
4569regexp, and is not supported by all versions of @command{grep} or
4570@command{sed}.  The Open Group Base Specifications, Issue 7, which are
4571technically identical to  IEEE Std 1003.1 (POSIX), 2008,
4572are available at
4573@uref{https://pubs.opengroup.org/onlinepubs/9699919799/mindex.html}.
4574@end itemize
4575
4576Under Windows, most users will not have these tools installed, and you
4577should not require their presence for the operation of your package.
4578However, users who install your package from source will have them, as
4579they can be assumed to have followed the instructions in ``the Windows
4580toolset'' appendix of the ``R Installation and Administration'' manual
4581to obtain them.  Redirection cannot be assumed to be available via
4582@command{system} as this does not use a standard shell (let alone a
4583Bourne shell).
4584
4585@noindent
4586In addition, the following tools are needed for certain tasks.
4587
4588@itemize @bullet
4589@item
4590Perl version 5 is only needed for the maintainer-only script
4591@file{tools/help2man.pl}.
4592@findex Perl
4593
4594@item
4595Makeinfo version 4.7 or later is needed to build the Info files for the
4596@R{} manuals written in the @acronym{GNU} Texinfo system.
4597@findex makeinfo
4598@end itemize
4599
4600It is also important that code is written in a way that allows others to
4601understand it.  This is particularly helpful for fixing problems, and
4602includes using self-descriptive variable names, commenting the code, and
4603also formatting it properly.  The @R{} Core Team recommends to use a
4604basic indentation of 4 for @R{} and C (and most likely also Perl) code,
4605and 2 for documentation in Rd format.  Emacs (21 or later) users can
4606implement this indentation style by putting the following in one of
4607their startup files, and using customization to set the
4608@code{c-default-style} to @code{"bsd"} and @code{c-basic-offset} to
4609@code{4}.)
4610@findex emacs
4611
4612@smallexample
4613@group
4614;;; ESS
4615(add-hook 'ess-mode-hook
4616          (lambda ()
4617            (ess-set-style 'C++ 'quiet)
4618            ;; Because
4619            ;;                                 DEF GNU BSD K&R C++
4620            ;; ess-indent-level                  2   2   8   5   4
4621            ;; ess-continued-statement-offset    2   2   8   5   4
4622            ;; ess-brace-offset                  0   0  -8  -5  -4
4623            ;; ess-arg-function-offset           2   4   0   0   0
4624            ;; ess-expression-offset             4   2   8   5   4
4625            ;; ess-else-offset                   0   0   0   0   0
4626            ;; ess-close-brace-offset            0   0   0   0   0
4627            (add-hook 'local-write-file-hooks
4628                      (lambda ()
4629                        (ess-nuke-trailing-whitespace)))))
4630(setq ess-nuke-trailing-whitespace-p 'ask)
4631;; or even
4632;; (setq ess-nuke-trailing-whitespace-p t)
4633@end group
4634@group
4635;;; Perl
4636(add-hook 'perl-mode-hook
4637          (lambda () (setq perl-indent-level 4)))
4638@end group
4639@end smallexample
4640
4641@noindent
4642(The `GNU' styles for Emacs' C and R modes use a basic indentation of 2,
4643which has been determined not to display the structure clearly enough
4644when using narrow fonts.)
4645
4646@node Testing R code, Use of TeX dialects, R coding standards, Top
4647@chapter Testing R code
4648
4649When you (as @R{} developer) add new functions to the R base (all the
4650packages distributed with @R{}), be careful to check if @kbd{make
4651test-Specific} or particularly, @kbd{cd tests; make no-segfault.Rout}
4652still works (without interactive user intervention, and on a standalone
4653computer).  If the new function, for example, accesses the Internet, or
4654requires @acronym{GUI} interaction, please add its name to the ``stop
4655list'' in @file{tests/no-segfault.Rin}.
4656
4657[To be revised: use @command{make check-devel}, check the write barrier
4658if you change internal structures.]
4659
4660@node Use of TeX dialects, Current and future directions, Testing R code, Top
4661@chapter Use of TeX dialects
4662
4663Various dialects of TeX are used for different purposes in @R{}.  The
4664policy is that manuals be written in @samp{texinfo}, and for convenience
4665the main and Windows FAQs are also.  This has the advantage that is is
4666easy to produce @HTML{} and plain text versions as well as typeset manuals.
4667
4668@LaTeX{} is not used directly, but rather as an intermediate format for
4669typeset help documents and for vignettes.
4670
4671Care needs to be taken about the assumptions made about the @R{} user's
4672system: it may not have either @samp{texinfo} or a TeX system
4673installed.  We have attempted to abstract out the cross-platform
4674differences, and almost all the setting of typeset documents is done by
4675@code{tools::texi2dvi}.  This is used for offline printing of help
4676documents, preparing vignettes and for package manuals via @command{R
4677CMD Rd2pdf}.  It is not currently used for the @R{} manuals created in
4678directory @file{doc/manual}.
4679
4680@code{tools::texi2dvi} makes use of a system command @command{texi2dvi}
4681where available.  On a Unix-alike this is usually part of
4682@samp{texinfo}, whereas on Windows if it exists at all it would be an
4683executable, part of MiKTeX.  If none is available, the @R{} code runs
4684a sequence of @command{(pdf)latex}, @command{bibtex} and
4685@command{makeindex} commands.
4686
4687This process has been rather vulnerable to the versions of the external
4688software used: particular issues have been @command{texi2dvi} and
4689@file{texinfo.tex} updates, mismatches between the two@footnote{Linux
4690distributions tend to unbundle @file{texinfo.tex} from @samp{texinfo}.},
4691versions of the @LaTeX{} package @samp{hyperref} and quirks in index
4692production.  The licenses used for @LaTeX{} and latterly @samp{texinfo}
4693prohibit us from including `known good' versions in the @R{}
4694distribution.
4695
4696On a Unix-alike @command{configure} looks for the executables for TeX and
4697friends and if found records the absolute paths in the system
4698@file{Renviron} file.  This used to record @samp{false} if no command
4699was found, but it nowadays records the name for looking up on the path
4700at run time.  The latter can be important for binary distributions: one
4701does not want to be tied to, for example, TeX Live 2007.
4702
4703
4704@node Current and future directions, Function and variable index, Use of TeX dialects, Top
4705@chapter Current and future directions
4706
4707This chapter is for notes about possible in-progress and future changes
4708to @R{}: there is no commitment to release such changes, let alone to a
4709timescale.
4710
4711@menu
4712* Long vectors::
4713* 64-bit types::
4714* Large matrices::
4715@end menu
4716
4717@node Long vectors, 64-bit types, Current and future directions, Current and future directions
4718@section Long vectors
4719
4720Vectors in @R{} 2.x.y were limited to a length of 2^31 - 1 elements
4721(about 2 billion), as the length is stored in the @code{SEXPREC} as a C
4722@code{int}, and that type is used extensively to record lengths and
4723element numbers, including in packages.
4724
4725Note that longer vectors are effectively impossible under 32-bit
4726platforms because of their address limit, so this section applies only
4727on 64-bit platforms.  The internals are unchanged on a 32-bit build of
4728@R{}.
4729
4730A single object with 2^31 or more elements will take up at least 8GB of
4731memory if integer or logical and 16GB if numeric or character, so
4732routine use of such objects is still some way off.
4733
4734There is now some support for long vectors.  This applies to raw,
4735logical, integer, numeric and character vectors, and lists and
4736expression vectors.  (Elements of character vectors (@code{CHARSXP}s)
4737remain limited to 2^31 - 1 bytes.)  Some considerations:
4738
4739
4740@itemize
4741
4742@item
4743This has been implemented by recording the length (and true length) as
4744@code{-1} and recording the actual length as a 64-bit field at the
4745beginning of the header.  Because a fair amount of code in @R{} uses a
4746signed type for the length, the `long length' is recorded using the
4747signed C99 type @code{ptrdiff_t}, which is typedef-ed to
4748@code{R_xlen_t}.
4749
4750@item
4751These can in theory have 63-bit lengths, but note that current 64-bit
4752OSes do not even theoretically offer 64-bit address spaces and there is
4753currently a 52-bit limit (which exceeds the theoretical limit of current
4754OSes and ensures that such lengths can be stored exactly in doubles).
4755
4756@item
4757The serialization format has been changed to accommodate longer lengths,
4758but vectors of lengths up to 2^31-1 are stored in the same way as
4759before.  Longer vectors have their length field set to @code{-1} and
4760followed by two 32-bit fields giving the upper and lower 32-bits of the
4761actual length.  There is currently a sanity check which limits lengths
4762to 2^48 on unserialization.
4763
4764@item
4765The type @code{R_xlen_t} is made available to packages in C header
4766@file{Rinternals.h}: this should be fine in C code since C99 is
4767required.  People do try to use @R{} internals in C++, but C++98
4768compilers are not required to support these types.
4769
4770@item
4771Indexing can be done via the use of doubles.  The internal indexing code
4772used to work with positive integer indices (and negative, logical and
4773matrix indices were all converted to positive integers): it now works
4774with either @code{INTSXP} or @code{REALSXP} indices.
4775
4776@item
4777The @R{} function @code{length} returns a double value if the length
4778exceeds 2^31-1. Code calling @code{as.integer(length(x))} before passing
4779to @code{.C}/@code{.Fortran} should checks for an @code{NA} result.
4780
4781@end itemize
4782
4783@node 64-bit types, Large matrices, Long vectors, Current and future directions
4784@section 64-bit types
4785
4786There is also some desire to be able to store larger integers in @R{},
4787although the possibility of storing these as @code{double} is often
4788overlooked (and e.g.@: file pointers as returned by @code{seek} are
4789already stored as @code{double}).
4790
4791Different routes have been proposed:
4792
4793@itemize
4794
4795@item
4796Add a new type to @R{} and use that for lengths and indices---most likely
4797this would be a 64-bit signed type, say @code{longint}.  @R{}'s usual
4798implicit coercion rules would ensure that supplying an @code{integer}
4799vector for indexing or @code{length<-} would work.
4800
4801@item
4802A more radical alternative is to change the existing @code{integer} type
4803to be 64-bit on 64-bit platforms (which was the approach taken by S-PLUS
4804for DEC/Compaq Alpha systems).  Or even on all platforms.
4805
4806@item
4807Allow either @code{integer} or @code{double} values for lengths and
4808indices, and return @code{double} only when necessary.
4809
4810@end itemize
4811
4812The third has the advantages of minimal disruption to existing code and
4813not increasing memory requirements. In the first and third scenarios
4814both @R{}'s own code and user code would have to be adapted for lengths
4815that were not of type @code{integer}, and in the third code branches for
4816long vectors would be tested rarely.
4817
4818Most users of the @code{.C} and @code{.Fortran} interfaces use
4819@code{as.integer} for lengths and element numbers, but a few omit these
4820in the knowledge that these were of type @code{integer}.  It may be
4821reasonable to assume that these are never intended to be used with long
4822vectors.
4823
4824The remaining interfaces will need to cope with the changed
4825@code{VECTOR_SEXPREC} types.  It seems likely that in most cases lengths
4826are accessed by the @code{length} and @code{LENGTH}
4827functions@footnote{but @code{LENGTH} is a macro under some internal
4828uses.}  The current approach is to keep these returning 32-bit lengths and
4829introduce `long' versions @code{xlength} and @code{XLENGTH} which return
4830@code{R_xlen_t} values.
4831
4832
4833See also @uref{https://homepage.cs.uiowa.edu/~luke/talks/useR10.pdf}.
4834
4835@node Large matrices,  , 64-bit types, Current and future directions
4836@section Large matrices
4837
4838Matrices are stored as vectors and so were also limited to 2^31-1
4839elements.  Now longer vectors are allowed on 64-bit platforms, matrices
4840with more elements are supported provided that each of the dimensions is
4841no more than 2^31-1.  However, not all applications can be supported.
4842
4843The main problem is linear algebra done by Fortran code compiled
4844with 32-bit @code{INTEGER}.  Although not guaranteed, it seems that all
4845the compilers currently used with @R{} on a 64-bit platform allow
4846matrices each of whose dimensions is less than 2^31 but with more than
48472^31 elements, and index them correctly, and a substantial part of the
4848support software (such as @acronym{BLAS} and @acronym{LAPACK}) also
4849work.
4850
4851There are exceptions: for example some complex @acronym{LAPACK}
4852auxiliary routines do use a single @code{INTEGER} index and hence
4853overflow silently and segfault or give incorrect results.  One example
4854is @code{svd()} on a complex matrix.
4855
4856Since this is implementation-dependent, it is possible that optimized
4857@acronym{BLAS} and @acronym{LAPACK} may have further restrictions,
4858although none have yet been encountered.  For matrix algebra on large
4859matrices one almost certainly wants a machine with a lot of RAM (100s of
4860gigabytes), many cores and a multi-threaded @acronym{BLAS}.
4861
4862
4863
4864@node Function and variable index, Concept index, Current and future directions, Top
4865@unnumbered Function and variable index
4866
4867@printindex vr
4868
4869@node Concept index,  , Function and variable index, Top
4870@unnumbered Concept index
4871
4872@printindex cp
4873
4874@bye
4875
4876@c Local Variables: ***
4877@c mode: TeXinfo ***
4878@c End: ***
4879