1% File src/library/utils/man/data.Rd
2% Part of the R package, https://www.R-project.org
3% Copyright 1995-2021 R Core Team
4% Distributed under GPL 2 or later
5
6\name{data}
7\alias{data}
8\alias{print.packageIQR}
9\title{Data Sets}
10\description{
11  Loads specified data sets, or list the available data sets.
12}
13\usage{
14data(\dots, list = character(), package = NULL, lib.loc = NULL,
15     verbose = getOption("verbose"), envir = .GlobalEnv,
16     overwrite = TRUE)
17}
18\arguments{
19  \item{\dots}{literal character strings or names.}
20  \item{list}{a character vector.}
21  \item{package}{
22    a character vector giving the package(s) to look
23    in for data sets, or \code{NULL}.
24
25    By default, all packages in the search path are used, then
26    the \file{data} subdirectory (if present) of the current working
27    directory.
28  }
29  \item{lib.loc}{a character vector of directory names of \R libraries,
30    or \code{NULL}.  The default value of \code{NULL} corresponds to all
31    libraries currently known.}
32  \item{verbose}{a logical.  If \code{TRUE}, additional diagnostics are
33    printed.}
34  \item{envir}{the \link{environment} where the data should be loaded.}
35  \item{overwrite}{logical: should existing objects of the same name in
36    \env{envir} be replaced?}
37}
38\details{
39  Currently, four formats of data files are supported:
40
41  \enumerate{
42    \item files ending \file{.R} or \file{.r} are
43    \code{\link{source}()}d in, with the \R working directory changed
44    temporarily to the directory containing the respective file.
45    (\code{data} ensures that the \pkg{utils} package is attached, in
46    case it had been run \emph{via} \code{utils::data}.)
47
48    \item files ending \file{.RData} or \file{.rda} are
49    \code{\link{load}()}ed.
50
51    \item files ending \file{.tab}, \file{.txt} or \file{.TXT} are read
52    using \code{\link{read.table}(\dots, header = TRUE, as.is=FALSE)},
53    and hence
54    result in a data frame.
55
56    \item files ending \file{.csv} or \file{.CSV} are read using
57    \code{\link{read.table}(\dots, header = TRUE, sep = ";", as.is=FALSE)},
58    and also result in a data frame.
59  }
60  If more than one matching file name is found, the first on this list
61  is used.  (Files with extensions \file{.txt}, \file{.tab} or
62  \file{.csv} can be compressed, with or without further extension
63  \file{.gz}, \file{.bz2} or \file{.xz}.)
64
65  The data sets to be loaded can be specified as a set of character
66  strings or names, or as the character vector \code{list}, or as both.
67
68  For each given data set, the first two types (\file{.R} or \file{.r},
69  and \file{.RData} or \file{.rda} files) can create several variables
70  in the load environment, which might all be named differently from the
71  data set.  The third and fourth types will always result in the
72  creation of a single variable with the same name (without extension)
73  as the data set.
74
75  If no data sets are specified, \code{data} lists the available data
76  sets.  It looks for a new-style data index in the \file{Meta} or, if
77  this is not found, an old-style \file{00Index} file in the \file{data}
78  directory of each specified package, and uses these files to prepare a
79  listing.  If there is a \file{data} area but no index, available data
80  files for loading are computed and included in the listing, and a
81  warning is given: such packages are incomplete.  The information about
82  available data sets is returned in an object of class
83  \code{"packageIQR"}.  The structure of this class is experimental.
84  Where the datasets have a different name from the argument that should
85  be used to retrieve them the index will have an entry like
86  \code{beaver1 (beavers)} which tells us that dataset \code{beaver1}
87  can be retrieved by the call \code{data(beaver)}.
88
89  If \code{lib.loc} and \code{package} are both \code{NULL} (the
90  default), the data sets are searched for in all the currently loaded
91  packages then in the \file{data} directory (if any) of the current
92  working directory.
93
94  If \code{lib.loc = NULL} but \code{package} is specified as a
95  character vector, the specified package(s) are searched for first
96  amongst loaded packages and then in the default library/ies
97  (see \code{\link{.libPaths}}).
98
99  If \code{lib.loc} \emph{is} specified (and not \code{NULL}), packages
100  are searched for in the specified library/ies, even if they are
101  already loaded from another library.
102
103  To just look in the \file{data} directory of the current working
104  directory, set \code{package = character(0)}
105  (and \code{lib.loc = NULL}, the default).
106}
107\value{
108  A character vector of all data sets specified (whether found or not),
109  or information about all available data sets in an object of class
110  \code{"packageIQR"} if none were specified.
111}
112\section{Good practice}{
113  There is no requirement for \code{data(\var{foo})} to create an object
114  named \code{\var{foo}} (nor to create one object), although it much
115  reduces confusion if this convention is followed (and it is enforced
116  if datasets are lazy-loaded).
117
118  \code{data()} was originally intended to allow users to load datasets
119  from packages for use in their examples, and as such it loaded the
120  datasets into the workspace \code{\link{.GlobalEnv}}.  This avoided
121  having large datasets in memory when not in use: that need has been
122  almost entirely superseded by lazy-loading of datasets.
123
124  The ability to specify a dataset by name (without quotes) is a
125  convenience: in programming the datasets should be specified by
126  character strings (with quotes).
127
128  Use of \code{data} within a function without an \code{envir} argument
129  has the almost always undesirable side-effect of putting an object in
130  the user's workspace (and indeed, of replacing any object of that name
131  already there).  It would almost always be better to put the object in
132  the current evaluation environment by
133  \code{data(\dots, envir = environment())}.
134  However, two alternatives are usually preferable,
135  both described in the \sQuote{Writing R Extensions} manual.
136  \itemize{
137    \item For sets of data, set up a package to use lazy-loading of data.
138    \item For objects which are system data, for example lookup tables
139    used in calculations within the function, use a file
140    \file{R/sysdata.rda} in the package sources or create the objects by
141    \R code at package installation time.
142  }
143  A sometimes important distinction is that the second approach places
144  objects in the namespace but the first does not.  So if it is important
145  that the function sees \code{mytable} as an object from the package,
146  it is system data and the second approach should be used.  In the
147  unusual case that a package uses a lazy-loaded dataset as a default
148  argument to a function, that needs to be specified by \code{\link{::}},
149  e.g., \code{survival::survexp.us}.
150}
151\note{
152  One can take advantage of the search order and the fact that a
153  \file{.R} file will change directory.  If raw data are stored in
154  \file{mydata.txt} then one can set up \file{mydata.R} to read
155  \file{mydata.txt} and pre-process it, e.g., using \code{\link{transform}()}.
156  For instance one can convert numeric vectors to factors with the
157  appropriate labels.  Thus, the \file{.R} file can effectively contain
158  a metadata specification for the plaintext formats.
159
160  In older versions of \R, up to 3.6.x, both \code{package = "base"} and
161  \code{package = "stats"} were using \code{package = "datasets"}, (with a
162  warning), as before 2004, (most of) the datasets in \pkg{datasets} were
163  either in \pkg{base} or \pkg{stats}.  For these packages, the result
164  is now empty as they contain no data sets.
165}
166\section{Warning}{
167  This function creates objects in the \code{envir} environment (by
168  default the user's workspace) replacing any which already
169  existed. \code{data("foo")} can silently create objects other than
170  \code{foo}: there have been instances in published  packages where it
171  created/replaced \code{\link{.Random.seed}} and hence change the seed
172  for the session.
173}
174\seealso{
175  \code{\link{help}} for obtaining documentation on data sets,
176  \code{\link{save}} for \emph{creating} the second (\file{.rda}) kind
177  of data, typically the most efficient one.
178
179  The \sQuote{Writing R Extensions} for considerations in preparing the
180  \file{data} directory of a package.
181}
182\examples{
183require(utils)
184data()                         # list all available data sets
185try(data(package = "rpart"), silent = TRUE) # list the data sets in the rpart package
186data(USArrests, "VADeaths")    # load the data sets 'USArrests' and 'VADeaths'
187\dontrun{## Alternatively
188ds <- c("USArrests", "VADeaths"); data(list = ds)}
189help(USArrests)                # give information on data set 'USArrests'
190}
191\keyword{documentation}
192\keyword{datasets}
193