1% File src/library/utils/man/data.Rd 2% Part of the R package, https://www.R-project.org 3% Copyright 1995-2021 R Core Team 4% Distributed under GPL 2 or later 5 6\name{data} 7\alias{data} 8\alias{print.packageIQR} 9\title{Data Sets} 10\description{ 11 Loads specified data sets, or list the available data sets. 12} 13\usage{ 14data(\dots, list = character(), package = NULL, lib.loc = NULL, 15 verbose = getOption("verbose"), envir = .GlobalEnv, 16 overwrite = TRUE) 17} 18\arguments{ 19 \item{\dots}{literal character strings or names.} 20 \item{list}{a character vector.} 21 \item{package}{ 22 a character vector giving the package(s) to look 23 in for data sets, or \code{NULL}. 24 25 By default, all packages in the search path are used, then 26 the \file{data} subdirectory (if present) of the current working 27 directory. 28 } 29 \item{lib.loc}{a character vector of directory names of \R libraries, 30 or \code{NULL}. The default value of \code{NULL} corresponds to all 31 libraries currently known.} 32 \item{verbose}{a logical. If \code{TRUE}, additional diagnostics are 33 printed.} 34 \item{envir}{the \link{environment} where the data should be loaded.} 35 \item{overwrite}{logical: should existing objects of the same name in 36 \env{envir} be replaced?} 37} 38\details{ 39 Currently, four formats of data files are supported: 40 41 \enumerate{ 42 \item files ending \file{.R} or \file{.r} are 43 \code{\link{source}()}d in, with the \R working directory changed 44 temporarily to the directory containing the respective file. 45 (\code{data} ensures that the \pkg{utils} package is attached, in 46 case it had been run \emph{via} \code{utils::data}.) 47 48 \item files ending \file{.RData} or \file{.rda} are 49 \code{\link{load}()}ed. 50 51 \item files ending \file{.tab}, \file{.txt} or \file{.TXT} are read 52 using \code{\link{read.table}(\dots, header = TRUE, as.is=FALSE)}, 53 and hence 54 result in a data frame. 55 56 \item files ending \file{.csv} or \file{.CSV} are read using 57 \code{\link{read.table}(\dots, header = TRUE, sep = ";", as.is=FALSE)}, 58 and also result in a data frame. 59 } 60 If more than one matching file name is found, the first on this list 61 is used. (Files with extensions \file{.txt}, \file{.tab} or 62 \file{.csv} can be compressed, with or without further extension 63 \file{.gz}, \file{.bz2} or \file{.xz}.) 64 65 The data sets to be loaded can be specified as a set of character 66 strings or names, or as the character vector \code{list}, or as both. 67 68 For each given data set, the first two types (\file{.R} or \file{.r}, 69 and \file{.RData} or \file{.rda} files) can create several variables 70 in the load environment, which might all be named differently from the 71 data set. The third and fourth types will always result in the 72 creation of a single variable with the same name (without extension) 73 as the data set. 74 75 If no data sets are specified, \code{data} lists the available data 76 sets. It looks for a new-style data index in the \file{Meta} or, if 77 this is not found, an old-style \file{00Index} file in the \file{data} 78 directory of each specified package, and uses these files to prepare a 79 listing. If there is a \file{data} area but no index, available data 80 files for loading are computed and included in the listing, and a 81 warning is given: such packages are incomplete. The information about 82 available data sets is returned in an object of class 83 \code{"packageIQR"}. The structure of this class is experimental. 84 Where the datasets have a different name from the argument that should 85 be used to retrieve them the index will have an entry like 86 \code{beaver1 (beavers)} which tells us that dataset \code{beaver1} 87 can be retrieved by the call \code{data(beaver)}. 88 89 If \code{lib.loc} and \code{package} are both \code{NULL} (the 90 default), the data sets are searched for in all the currently loaded 91 packages then in the \file{data} directory (if any) of the current 92 working directory. 93 94 If \code{lib.loc = NULL} but \code{package} is specified as a 95 character vector, the specified package(s) are searched for first 96 amongst loaded packages and then in the default library/ies 97 (see \code{\link{.libPaths}}). 98 99 If \code{lib.loc} \emph{is} specified (and not \code{NULL}), packages 100 are searched for in the specified library/ies, even if they are 101 already loaded from another library. 102 103 To just look in the \file{data} directory of the current working 104 directory, set \code{package = character(0)} 105 (and \code{lib.loc = NULL}, the default). 106} 107\value{ 108 A character vector of all data sets specified (whether found or not), 109 or information about all available data sets in an object of class 110 \code{"packageIQR"} if none were specified. 111} 112\section{Good practice}{ 113 There is no requirement for \code{data(\var{foo})} to create an object 114 named \code{\var{foo}} (nor to create one object), although it much 115 reduces confusion if this convention is followed (and it is enforced 116 if datasets are lazy-loaded). 117 118 \code{data()} was originally intended to allow users to load datasets 119 from packages for use in their examples, and as such it loaded the 120 datasets into the workspace \code{\link{.GlobalEnv}}. This avoided 121 having large datasets in memory when not in use: that need has been 122 almost entirely superseded by lazy-loading of datasets. 123 124 The ability to specify a dataset by name (without quotes) is a 125 convenience: in programming the datasets should be specified by 126 character strings (with quotes). 127 128 Use of \code{data} within a function without an \code{envir} argument 129 has the almost always undesirable side-effect of putting an object in 130 the user's workspace (and indeed, of replacing any object of that name 131 already there). It would almost always be better to put the object in 132 the current evaluation environment by 133 \code{data(\dots, envir = environment())}. 134 However, two alternatives are usually preferable, 135 both described in the \sQuote{Writing R Extensions} manual. 136 \itemize{ 137 \item For sets of data, set up a package to use lazy-loading of data. 138 \item For objects which are system data, for example lookup tables 139 used in calculations within the function, use a file 140 \file{R/sysdata.rda} in the package sources or create the objects by 141 \R code at package installation time. 142 } 143 A sometimes important distinction is that the second approach places 144 objects in the namespace but the first does not. So if it is important 145 that the function sees \code{mytable} as an object from the package, 146 it is system data and the second approach should be used. In the 147 unusual case that a package uses a lazy-loaded dataset as a default 148 argument to a function, that needs to be specified by \code{\link{::}}, 149 e.g., \code{survival::survexp.us}. 150} 151\note{ 152 One can take advantage of the search order and the fact that a 153 \file{.R} file will change directory. If raw data are stored in 154 \file{mydata.txt} then one can set up \file{mydata.R} to read 155 \file{mydata.txt} and pre-process it, e.g., using \code{\link{transform}()}. 156 For instance one can convert numeric vectors to factors with the 157 appropriate labels. Thus, the \file{.R} file can effectively contain 158 a metadata specification for the plaintext formats. 159 160 In older versions of \R, up to 3.6.x, both \code{package = "base"} and 161 \code{package = "stats"} were using \code{package = "datasets"}, (with a 162 warning), as before 2004, (most of) the datasets in \pkg{datasets} were 163 either in \pkg{base} or \pkg{stats}. For these packages, the result 164 is now empty as they contain no data sets. 165} 166\section{Warning}{ 167 This function creates objects in the \code{envir} environment (by 168 default the user's workspace) replacing any which already 169 existed. \code{data("foo")} can silently create objects other than 170 \code{foo}: there have been instances in published packages where it 171 created/replaced \code{\link{.Random.seed}} and hence change the seed 172 for the session. 173} 174\seealso{ 175 \code{\link{help}} for obtaining documentation on data sets, 176 \code{\link{save}} for \emph{creating} the second (\file{.rda}) kind 177 of data, typically the most efficient one. 178 179 The \sQuote{Writing R Extensions} for considerations in preparing the 180 \file{data} directory of a package. 181} 182\examples{ 183require(utils) 184data() # list all available data sets 185try(data(package = "rpart"), silent = TRUE) # list the data sets in the rpart package 186data(USArrests, "VADeaths") # load the data sets 'USArrests' and 'VADeaths' 187\dontrun{## Alternatively 188ds <- c("USArrests", "VADeaths"); data(list = ds)} 189help(USArrests) # give information on data set 'USArrests' 190} 191\keyword{documentation} 192\keyword{datasets} 193