1% File src/library/utils/man/download.file.Rd
2% Part of the R package, https://www.R-project.org
3% Copyright 1995-2021 R Core Team
4% Distributed under GPL 2 or later
5
6\name{download.file}
7\alias{download.file}
8\concept{proxy}
9\concept{ftp}
10\concept{http}
11\title{Download File from the Internet}
12\description{
13  This function can be used to download a file from the Internet.
14}
15\usage{
16download.file(url, destfile, method, quiet = FALSE, mode = "w",
17              cacheOK = TRUE,
18              extra = getOption("download.file.extra"),
19              headers = NULL, \dots)
20}
21\arguments{
22  \item{url}{a \code{\link{character}} string (or longer vector e.g.,
23    for the \code{"libcurl"} method) naming the URL of a resource to be
24    downloaded.}
25
26  \item{destfile}{a character string (or vector, see the \code{url}
27    argument) with the file path where the downloaded file is to be
28    saved.  Tilde-expansion is performed.}
29
30  \item{method}{Method to be used for downloading files.  Current
31    download methods are \code{"internal"}, \code{"wininet"} (Windows
32    only) \code{"libcurl"}, \code{"wget"} and \code{"curl"}, and there
33    is a value \code{"auto"}: see \sQuote{Details} and \sQuote{Note}.
34
35    The method can also be set through the option
36    \code{"download.file.method"}: see \code{\link{options}()}.
37  }
38
39  \item{quiet}{If \code{TRUE}, suppress status messages (if any), and
40    the progress bar.}
41
42  \item{mode}{character.  The mode with which to write the file.  Useful
43    values are \code{"w"}, \code{"wb"} (binary), \code{"a"} (append) and
44    \code{"ab"}.  Not used for methods \code{"wget"} and \code{"curl"}.
45    See also \sQuote{Details}, notably about using \code{"wb"} for Windows.
46  }
47  \item{cacheOK}{logical.  Is a server-side cached value acceptable?}
48
49  \item{extra}{character vector of additional command-line arguments for
50    the \code{"wget"} and \code{"curl"} methods.}
51
52  \item{headers}{named character vector of HTTP headers to use in HTTP
53    requests.  It is ignored for non-HTTP URLs.  The \code{User-Agent}
54    header, coming from the \code{HTTPUserAgent} option (see
55    \code{\link{options}}) is used as the first header, automatically.}
56
57  \item{\dots}{allow additional arguments to be passed, unused.}
58}
59\details{
60  The function \code{download.file} can be used to download a single
61  file as described by \code{url} from the internet and store it in
62  \code{destfile}.
63
64  The \code{url} must start with a scheme such as \samp{http://},
65  \samp{https://}, \samp{ftp://} or \samp{file://}.  Which methods
66  support which schemes varies by \R version.
67
68  If \code{method = "auto"} is chosen (the default), the behavior
69  depends on the platform:
70  \itemize{
71    \item On a Unix-alike method \code{"libcurl"} is used except
72    \code{"internal"} for \samp{file://} URLs, where \code{"libcurl"}
73    uses the library of that name (\url{https://curl.se/libcurl/}).
74
75    \item On Windows the \code{"wininet"} method is used apart from for
76    \samp{ftp://} and \samp{ftps://} URLs where \code{"libcurl"} is
77    tried.  The \code{"wininet"} method uses the WinINet functions (part
78    of the OS).  However, it is deprecated for \samp{http://} and
79    \samp{https://} URLs in favour of \code{"libcurl"}.
80    % https://msdn.microsoft.com/en-us/library/windows/desktop/aa383630%28v=vs.85%29.aspx
81
82    Support for method \code{"libcurl"} is optional on Windows: use
83    \code{\link{capabilities}("libcurl")} to see if it is supported on
84    your build (it is on \acronym{CRAN} builds).
85  }
86
87  When method \code{"libcurl"} is used, it provides
88  (non-blocking) access to \samp{https://} and (usually) \samp{ftps://}
89  URLs.  There is support for simultaneous downloads, so \code{url} and
90  \code{destfile} can be character vectors of the same length greater
91  than one (but the method has to be specified explicitly and not
92  \emph{via} \code{"auto"}).  For a single URL and \code{quiet = FALSE}
93  a progress bar is shown in interactive use.
94
95  For methods \code{"wget"} and \code{"curl"} a system call is made to
96  the tool given by \code{method}, and the respective program must be
97  installed on your system and be in the search path for executables.
98  They will block all other activity on the \R process until they
99  complete: this may make a GUI unresponsive.
100
101  \code{cacheOK = FALSE} is useful for \samp{http://} and
102  \samp{https://} URLs: it will attempt to get a copy directly from the
103  site rather than from an intermediate cache.  It is used by
104  \code{\link{available.packages}}.
105
106  The \code{"libcurl"} and \code{"wget"} methods follow \samp{http://}
107  and \samp{https://} redirections to any scheme they support: the
108  \code{"internal"} method follows \samp{http://} to \samp{http://}
109  redirections only.  (For method \code{"curl"} use argument
110  \code{extra = "-L"}.  To disable redirection in \command{wget}, use
111  \code{extra = "--max-redirect=0"}.)
112  The \code{"wininet"} method supports some
113  redirections but not all.  (For method \code{"libcurl"}, messages will
114  quote the endpoint of redirections.)
115
116  Note that \samp{https://} URLs are not supported by the
117  \code{"internal"} method but are supported by the \code{"libcurl"}
118  method and the \code{"wininet"} method on Windows.
119
120  Support for \samp{ftp://} URLs in the \code{"internal"} method was
121  deprecated in \R 4.1.1.
122
123  See \code{\link{url}} for how \samp{file://} URLs are interpreted,
124  especially on Windows.  The \code{"internal"} and \code{"wininet"}
125  methods do not percent-decode, but the \code{"libcurl"} and
126  \code{"curl"} methods do: method \code{"wget"} does not support them.
127
128  Most methods do not percent-encode special characters such as spaces
129  in URLs (see \code{\link{URLencode}}), but it seems the
130  \code{"wininet"} method does.
131
132  The remaining details apply to the \code{"internal"}, \code{"wininet"}
133  and \code{"libcurl"} methods only.
134
135  The timeout for many parts of the transfer can be set by the option
136  \code{timeout} which defaults to 60 seconds.  This is often
137  insufficient for downloads of large files (50MB or more) and
138  so should be increased when \code{download.file} is used in packages
139  to do so.  Note that the user can set the default timeout by the
140  environment variable \env{R_DEFAULT_INTERNET_TIMEOUT} in recent
141  versions of \R, so to ensure that this is not decreased packages should
142  use something like
143  \preformatted{
144    options(timeout = max(300, getOption("timeout")))
145  }
146  (It is unrealistic to require download times of less than 1s/MB.)
147
148  The level of detail provided during transfer can be set by the
149  \code{quiet} argument and the \code{internet.info} option: the details
150  depend on the platform and scheme.  For the \code{"internal"} method
151  setting option \code{internet.info} to 0 gives all available details,
152  including all server responses.  Using 2 (the default) gives only
153  serious messages, and 3 or more suppresses all messages.  For the
154  \code{"libcurl"} method values of the option less than 2 give verbose
155  output.
156
157  A progress bar tracks the transfer platform-specifically:
158  \describe{
159    \item{On Windows}{If the file length is known, the
160      full width of the bar is the known length.  Otherwise the initial
161      width represents 100 Kbytes and is doubled whenever the current width
162      is exceeded.  (In non-interactive use this uses a text version.  If the
163      file length is known, an equals sign represents 2\% of the transfer
164      completed: otherwise a dot represents 10Kb.)}
165    \item{On a Unix-alike}{If the file length is known, an
166      equals sign represents 2\% of the transfer completed: otherwise a dot
167      represents 10Kb.}
168  }
169
170
171  The choice of binary transfer (\code{mode = "wb"} or \code{"ab"}) is
172  important on Windows, since unlike Unix-alikes it does distinguish
173  between text and binary files and for text transfers changes \code{\\n}
174  line endings to \code{\\r\\n} (aka \file{CRLF}).
175
176  On Windows, if \code{mode} is not supplied (\code{\link{missing}()})
177  and \code{url} ends in one of \code{.gz}, \code{.bz2}, \code{.xz},
178  \code{.tgz}, \code{.zip}, \code{.jar}, \code{.rda}, \code{.rds} or
179  \code{.RData}, \code{mode = "wb"} is set so that a binary transfer
180  is done to help unwary users.
181
182  Code written to download binary files must use \code{mode = "wb"} (or
183  \code{"ab"}), but the problems incurred by a text transfer will only
184  be seen on Windows.
185}
186\note{
187  Files of more than 2GB are supported on 64-bit builds of \R; they
188  may be truncated on some 32-bit builds.
189
190  Methods \code{"wget"} and \code{"curl"} are mainly for historical
191  compatibility but provide may provide capabilities not supported by
192  the \code{"libcurl"} or \code{"wininet"} methods.
193
194  Method \code{"wget"} can be used with proxy firewalls which require
195  user/password authentication if proper values are stored in the
196  configuration file for \code{wget}.
197
198  \command{wget} (\url{https://www.gnu.org/software/wget/}) is commonly
199  installed on Unix-alikes (but not macOS).  Windows binaries are
200  available from Cygwin, gnuwin32 and elsewhere.
201
202  \command{curl} (\url{https://curl.se/}) is installed on macOS and
203  commonly on Unix-alikes.  Windows binaries are available at that URL.
204}
205\section{Setting Proxies}{
206  For the Windows-only method \code{"wininet"}, the \sQuote{Internet
207  Options} of the system are used to choose proxies and so on; these are
208  set in the Control Panel and are those used for system browsers.
209
210  The next two paragraphs apply to the internal code only.
211
212  Proxies can be specified via environment variables.
213  Setting \env{no_proxy} to \code{*} stops any proxy being tried.
214  Otherwise the setting of \env{http_proxy} or \env{ftp_proxy}
215  (or failing that, the all upper-case version) is consulted and if
216  non-empty used as a proxy site.  For FTP transfers, the username
217  and password on the proxy can be specified by \env{ftp_proxy_user}
218  and \env{ftp_proxy_password}.  The form of \env{http_proxy}
219  should be \code{http://proxy.dom.com/} or
220  \code{http://proxy.dom.com:8080/} where the port defaults to
221  \code{80} and the trailing slash may be omitted.  For
222  \env{ftp_proxy} use the form \code{ftp://proxy.dom.com:3128/}
223  where the default port is \code{21}.  These environment variables
224  must be set before the download code is first used: they cannot be
225  altered later by calling \code{\link{Sys.setenv}}.
226
227  Usernames and passwords can be set for HTTP proxy transfers via
228  environment variable \env{http_proxy_user} in the form
229  \code{user:passwd}.  Alternatively, \env{http_proxy} can be of the
230  form \code{http://user:pass@proxy.dom.com:8080/} for compatibility
231  with \code{wget}.  Only the HTTP/1.0 basic authentication scheme is
232  supported.
233  \cr
234  Under Windows, if \env{http_proxy_user} is set to \code{ask} then
235  a dialog box will come up for the user to enter the username and
236  password.  \bold{NB:} you will be given only one opportunity to enter this,
237  but if proxy authentication is required and fails there will be one
238  further prompt per download.
239
240  Much the same scheme is supported by \code{method = "libcurl"}, including
241  \env{no_proxy}, \env{http_proxy} and \env{ftp_proxy}, and for the last
242  two a contents of \code{[user:password@]machine[:port]} where the
243  parts in brackets are optional.  See
244  \url{https://curl.se/libcurl/c/libcurl-tutorial.html} for details.
245}
246\section{Secure URLs}{
247  Methods which access \samp{https://} and \samp{ftps://} URLs should
248  try to verify the site certificates.  This is usually done using the CA
249  root certificates installed by the OS (although we have seen instances
250  in which these got removed rather than updated). For further information
251  see \url{https://curl.se/docs/sslcerts.html}.
252
253  This is an issue for \code{method = "libcurl"} on Windows, where the
254  OS does not provide a suitable CA certificate bundle, so by default on
255  Windows certificates are not verified.  To turn verification on, set
256  environment variable \env{CURL_CA_BUNDLE} to the path to a certificate
257  bundle file, usually named \file{ca-bundle.crt} or
258  \file{curl-ca-bundle.crt}.  (This is normally done for a binary
259  installation of \R, which installs
260  \file{\var{R_HOME}/etc/curl-ca-bundle.crt} and sets
261  \env{CURL_CA_BUNDLE} to point to it if that environment variable is not
262  already set.)  For an updated certificate bundle, see
263  \url{https://curl.se/docs/sslcerts.html}.
264  Currently one can download a copy from
265  \url{https://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt}
266  and set \env{CURL_CA_BUNDLE} to the full path to the downloaded file.
267
268  Note that the root certificates used by \R may or may not be the same
269  as used in a browser, and indeed different browsers may use different
270  certificate bundles (there is typically a build option to choose
271  either their own or the system ones).
272}
273\section{FTP sites}{
274  \samp{ftp:} URLs are accessed using the FTP protocol which has a
275  number of variants.  One distinction is between \sQuote{active} and
276  \sQuote{(extended) passive} modes: which is used is chosen by the
277  client.  The \code{"internal"} and \code{"libcurl"} methods use passive
278  mode, and that is almost universally used by browsers.  The
279  \code{"wininet"} method first tries passive and then active.
280}
281\section{Good practice}{
282  Setting the \code{method} should be left to the end user.  Neither of
283  the \command{wget} nor \command{curl} commands is widely available:
284  you can check if one is available \emph{via} \code{\link{Sys.which}},
285  and should do so in a package or script.
286
287  If you use \code{download.file} in a package or script, you must check
288  the return value, since it is possible that the download will fail
289  with a non-zero status but not an \R error.
290
291  The supported \code{method}s do change: method \code{libcurl} was
292  introduced in \R 3.2.0 and is still optional on Windows -- use
293  \code{\link{capabilities}("libcurl")} in a program to see if it is
294  available.
295}
296\value{
297  An (invisible) integer code, \code{0} for success and non-zero for
298  failure.  For the \code{"wget"} and \code{"curl"} methods this is the
299  status code returned by the external program.  The \code{"internal"}
300  method can return \code{1}, but will in most cases throw an error.
301
302  What happens to the destination file(s) in the case of error depends
303  on the method and \R{} version. Currently the \code{"internal"},
304  \code{"wininet"} and \code{"libcurl"} methods will remove the file if
305  there the URL is unavailable except when \code{mode} specifies
306  appending when the file should be unchanged.
307}
308\seealso{
309  \code{\link{options}} to set the \code{HTTPUserAgent}, \code{timeout}
310  and \code{internet.info} options used by some of the methods.
311
312  \code{\link{url}} for a finer-grained way to read data from URLs.
313
314  \code{\link{url.show}}, \code{\link{available.packages}},
315  \code{\link{download.packages}} for applications.
316
317  Contributed packages \CRANpkg{RCurl} and \CRANpkg{curl} provide more
318  comprehensive facilities to download from URLs.
319}
320\keyword{utilities}
321