1% File src/library/utils/man/download.file.Rd 2% Part of the R package, https://www.R-project.org 3% Copyright 1995-2021 R Core Team 4% Distributed under GPL 2 or later 5 6\name{download.file} 7\alias{download.file} 8\concept{proxy} 9\concept{ftp} 10\concept{http} 11\title{Download File from the Internet} 12\description{ 13 This function can be used to download a file from the Internet. 14} 15\usage{ 16download.file(url, destfile, method, quiet = FALSE, mode = "w", 17 cacheOK = TRUE, 18 extra = getOption("download.file.extra"), 19 headers = NULL, \dots) 20} 21\arguments{ 22 \item{url}{a \code{\link{character}} string (or longer vector e.g., 23 for the \code{"libcurl"} method) naming the URL of a resource to be 24 downloaded.} 25 26 \item{destfile}{a character string (or vector, see the \code{url} 27 argument) with the file path where the downloaded file is to be 28 saved. Tilde-expansion is performed.} 29 30 \item{method}{Method to be used for downloading files. Current 31 download methods are \code{"internal"}, \code{"wininet"} (Windows 32 only) \code{"libcurl"}, \code{"wget"} and \code{"curl"}, and there 33 is a value \code{"auto"}: see \sQuote{Details} and \sQuote{Note}. 34 35 The method can also be set through the option 36 \code{"download.file.method"}: see \code{\link{options}()}. 37 } 38 39 \item{quiet}{If \code{TRUE}, suppress status messages (if any), and 40 the progress bar.} 41 42 \item{mode}{character. The mode with which to write the file. Useful 43 values are \code{"w"}, \code{"wb"} (binary), \code{"a"} (append) and 44 \code{"ab"}. Not used for methods \code{"wget"} and \code{"curl"}. 45 See also \sQuote{Details}, notably about using \code{"wb"} for Windows. 46 } 47 \item{cacheOK}{logical. Is a server-side cached value acceptable?} 48 49 \item{extra}{character vector of additional command-line arguments for 50 the \code{"wget"} and \code{"curl"} methods.} 51 52 \item{headers}{named character vector of HTTP headers to use in HTTP 53 requests. It is ignored for non-HTTP URLs. The \code{User-Agent} 54 header, coming from the \code{HTTPUserAgent} option (see 55 \code{\link{options}}) is used as the first header, automatically.} 56 57 \item{\dots}{allow additional arguments to be passed, unused.} 58} 59\details{ 60 The function \code{download.file} can be used to download a single 61 file as described by \code{url} from the internet and store it in 62 \code{destfile}. 63 64 The \code{url} must start with a scheme such as \samp{http://}, 65 \samp{https://}, \samp{ftp://} or \samp{file://}. Which methods 66 support which schemes varies by \R version. 67 68 If \code{method = "auto"} is chosen (the default), the behavior 69 depends on the platform: 70 \itemize{ 71 \item On a Unix-alike method \code{"libcurl"} is used except 72 \code{"internal"} for \samp{file://} URLs, where \code{"libcurl"} 73 uses the library of that name (\url{https://curl.se/libcurl/}). 74 75 \item On Windows the \code{"wininet"} method is used apart from for 76 \samp{ftp://} and \samp{ftps://} URLs where \code{"libcurl"} is 77 tried. The \code{"wininet"} method uses the WinINet functions (part 78 of the OS). However, it is deprecated for \samp{http://} and 79 \samp{https://} URLs in favour of \code{"libcurl"}. 80 % https://msdn.microsoft.com/en-us/library/windows/desktop/aa383630%28v=vs.85%29.aspx 81 82 Support for method \code{"libcurl"} is optional on Windows: use 83 \code{\link{capabilities}("libcurl")} to see if it is supported on 84 your build (it is on \acronym{CRAN} builds). 85 } 86 87 When method \code{"libcurl"} is used, it provides 88 (non-blocking) access to \samp{https://} and (usually) \samp{ftps://} 89 URLs. There is support for simultaneous downloads, so \code{url} and 90 \code{destfile} can be character vectors of the same length greater 91 than one (but the method has to be specified explicitly and not 92 \emph{via} \code{"auto"}). For a single URL and \code{quiet = FALSE} 93 a progress bar is shown in interactive use. 94 95 For methods \code{"wget"} and \code{"curl"} a system call is made to 96 the tool given by \code{method}, and the respective program must be 97 installed on your system and be in the search path for executables. 98 They will block all other activity on the \R process until they 99 complete: this may make a GUI unresponsive. 100 101 \code{cacheOK = FALSE} is useful for \samp{http://} and 102 \samp{https://} URLs: it will attempt to get a copy directly from the 103 site rather than from an intermediate cache. It is used by 104 \code{\link{available.packages}}. 105 106 The \code{"libcurl"} and \code{"wget"} methods follow \samp{http://} 107 and \samp{https://} redirections to any scheme they support: the 108 \code{"internal"} method follows \samp{http://} to \samp{http://} 109 redirections only. (For method \code{"curl"} use argument 110 \code{extra = "-L"}. To disable redirection in \command{wget}, use 111 \code{extra = "--max-redirect=0"}.) 112 The \code{"wininet"} method supports some 113 redirections but not all. (For method \code{"libcurl"}, messages will 114 quote the endpoint of redirections.) 115 116 Note that \samp{https://} URLs are not supported by the 117 \code{"internal"} method but are supported by the \code{"libcurl"} 118 method and the \code{"wininet"} method on Windows. 119 120 Support for \samp{ftp://} URLs in the \code{"internal"} method was 121 deprecated in \R 4.1.1. 122 123 See \code{\link{url}} for how \samp{file://} URLs are interpreted, 124 especially on Windows. The \code{"internal"} and \code{"wininet"} 125 methods do not percent-decode, but the \code{"libcurl"} and 126 \code{"curl"} methods do: method \code{"wget"} does not support them. 127 128 Most methods do not percent-encode special characters such as spaces 129 in URLs (see \code{\link{URLencode}}), but it seems the 130 \code{"wininet"} method does. 131 132 The remaining details apply to the \code{"internal"}, \code{"wininet"} 133 and \code{"libcurl"} methods only. 134 135 The timeout for many parts of the transfer can be set by the option 136 \code{timeout} which defaults to 60 seconds. This is often 137 insufficient for downloads of large files (50MB or more) and 138 so should be increased when \code{download.file} is used in packages 139 to do so. Note that the user can set the default timeout by the 140 environment variable \env{R_DEFAULT_INTERNET_TIMEOUT} in recent 141 versions of \R, so to ensure that this is not decreased packages should 142 use something like 143 \preformatted{ 144 options(timeout = max(300, getOption("timeout"))) 145 } 146 (It is unrealistic to require download times of less than 1s/MB.) 147 148 The level of detail provided during transfer can be set by the 149 \code{quiet} argument and the \code{internet.info} option: the details 150 depend on the platform and scheme. For the \code{"internal"} method 151 setting option \code{internet.info} to 0 gives all available details, 152 including all server responses. Using 2 (the default) gives only 153 serious messages, and 3 or more suppresses all messages. For the 154 \code{"libcurl"} method values of the option less than 2 give verbose 155 output. 156 157 A progress bar tracks the transfer platform-specifically: 158 \describe{ 159 \item{On Windows}{If the file length is known, the 160 full width of the bar is the known length. Otherwise the initial 161 width represents 100 Kbytes and is doubled whenever the current width 162 is exceeded. (In non-interactive use this uses a text version. If the 163 file length is known, an equals sign represents 2\% of the transfer 164 completed: otherwise a dot represents 10Kb.)} 165 \item{On a Unix-alike}{If the file length is known, an 166 equals sign represents 2\% of the transfer completed: otherwise a dot 167 represents 10Kb.} 168 } 169 170 171 The choice of binary transfer (\code{mode = "wb"} or \code{"ab"}) is 172 important on Windows, since unlike Unix-alikes it does distinguish 173 between text and binary files and for text transfers changes \code{\\n} 174 line endings to \code{\\r\\n} (aka \file{CRLF}). 175 176 On Windows, if \code{mode} is not supplied (\code{\link{missing}()}) 177 and \code{url} ends in one of \code{.gz}, \code{.bz2}, \code{.xz}, 178 \code{.tgz}, \code{.zip}, \code{.jar}, \code{.rda}, \code{.rds} or 179 \code{.RData}, \code{mode = "wb"} is set so that a binary transfer 180 is done to help unwary users. 181 182 Code written to download binary files must use \code{mode = "wb"} (or 183 \code{"ab"}), but the problems incurred by a text transfer will only 184 be seen on Windows. 185} 186\note{ 187 Files of more than 2GB are supported on 64-bit builds of \R; they 188 may be truncated on some 32-bit builds. 189 190 Methods \code{"wget"} and \code{"curl"} are mainly for historical 191 compatibility but provide may provide capabilities not supported by 192 the \code{"libcurl"} or \code{"wininet"} methods. 193 194 Method \code{"wget"} can be used with proxy firewalls which require 195 user/password authentication if proper values are stored in the 196 configuration file for \code{wget}. 197 198 \command{wget} (\url{https://www.gnu.org/software/wget/}) is commonly 199 installed on Unix-alikes (but not macOS). Windows binaries are 200 available from Cygwin, gnuwin32 and elsewhere. 201 202 \command{curl} (\url{https://curl.se/}) is installed on macOS and 203 commonly on Unix-alikes. Windows binaries are available at that URL. 204} 205\section{Setting Proxies}{ 206 For the Windows-only method \code{"wininet"}, the \sQuote{Internet 207 Options} of the system are used to choose proxies and so on; these are 208 set in the Control Panel and are those used for system browsers. 209 210 The next two paragraphs apply to the internal code only. 211 212 Proxies can be specified via environment variables. 213 Setting \env{no_proxy} to \code{*} stops any proxy being tried. 214 Otherwise the setting of \env{http_proxy} or \env{ftp_proxy} 215 (or failing that, the all upper-case version) is consulted and if 216 non-empty used as a proxy site. For FTP transfers, the username 217 and password on the proxy can be specified by \env{ftp_proxy_user} 218 and \env{ftp_proxy_password}. The form of \env{http_proxy} 219 should be \code{http://proxy.dom.com/} or 220 \code{http://proxy.dom.com:8080/} where the port defaults to 221 \code{80} and the trailing slash may be omitted. For 222 \env{ftp_proxy} use the form \code{ftp://proxy.dom.com:3128/} 223 where the default port is \code{21}. These environment variables 224 must be set before the download code is first used: they cannot be 225 altered later by calling \code{\link{Sys.setenv}}. 226 227 Usernames and passwords can be set for HTTP proxy transfers via 228 environment variable \env{http_proxy_user} in the form 229 \code{user:passwd}. Alternatively, \env{http_proxy} can be of the 230 form \code{http://user:pass@proxy.dom.com:8080/} for compatibility 231 with \code{wget}. Only the HTTP/1.0 basic authentication scheme is 232 supported. 233 \cr 234 Under Windows, if \env{http_proxy_user} is set to \code{ask} then 235 a dialog box will come up for the user to enter the username and 236 password. \bold{NB:} you will be given only one opportunity to enter this, 237 but if proxy authentication is required and fails there will be one 238 further prompt per download. 239 240 Much the same scheme is supported by \code{method = "libcurl"}, including 241 \env{no_proxy}, \env{http_proxy} and \env{ftp_proxy}, and for the last 242 two a contents of \code{[user:password@]machine[:port]} where the 243 parts in brackets are optional. See 244 \url{https://curl.se/libcurl/c/libcurl-tutorial.html} for details. 245} 246\section{Secure URLs}{ 247 Methods which access \samp{https://} and \samp{ftps://} URLs should 248 try to verify the site certificates. This is usually done using the CA 249 root certificates installed by the OS (although we have seen instances 250 in which these got removed rather than updated). For further information 251 see \url{https://curl.se/docs/sslcerts.html}. 252 253 This is an issue for \code{method = "libcurl"} on Windows, where the 254 OS does not provide a suitable CA certificate bundle, so by default on 255 Windows certificates are not verified. To turn verification on, set 256 environment variable \env{CURL_CA_BUNDLE} to the path to a certificate 257 bundle file, usually named \file{ca-bundle.crt} or 258 \file{curl-ca-bundle.crt}. (This is normally done for a binary 259 installation of \R, which installs 260 \file{\var{R_HOME}/etc/curl-ca-bundle.crt} and sets 261 \env{CURL_CA_BUNDLE} to point to it if that environment variable is not 262 already set.) For an updated certificate bundle, see 263 \url{https://curl.se/docs/sslcerts.html}. 264 Currently one can download a copy from 265 \url{https://raw.githubusercontent.com/bagder/ca-bundle/master/ca-bundle.crt} 266 and set \env{CURL_CA_BUNDLE} to the full path to the downloaded file. 267 268 Note that the root certificates used by \R may or may not be the same 269 as used in a browser, and indeed different browsers may use different 270 certificate bundles (there is typically a build option to choose 271 either their own or the system ones). 272} 273\section{FTP sites}{ 274 \samp{ftp:} URLs are accessed using the FTP protocol which has a 275 number of variants. One distinction is between \sQuote{active} and 276 \sQuote{(extended) passive} modes: which is used is chosen by the 277 client. The \code{"internal"} and \code{"libcurl"} methods use passive 278 mode, and that is almost universally used by browsers. The 279 \code{"wininet"} method first tries passive and then active. 280} 281\section{Good practice}{ 282 Setting the \code{method} should be left to the end user. Neither of 283 the \command{wget} nor \command{curl} commands is widely available: 284 you can check if one is available \emph{via} \code{\link{Sys.which}}, 285 and should do so in a package or script. 286 287 If you use \code{download.file} in a package or script, you must check 288 the return value, since it is possible that the download will fail 289 with a non-zero status but not an \R error. 290 291 The supported \code{method}s do change: method \code{libcurl} was 292 introduced in \R 3.2.0 and is still optional on Windows -- use 293 \code{\link{capabilities}("libcurl")} in a program to see if it is 294 available. 295} 296\value{ 297 An (invisible) integer code, \code{0} for success and non-zero for 298 failure. For the \code{"wget"} and \code{"curl"} methods this is the 299 status code returned by the external program. The \code{"internal"} 300 method can return \code{1}, but will in most cases throw an error. 301 302 What happens to the destination file(s) in the case of error depends 303 on the method and \R{} version. Currently the \code{"internal"}, 304 \code{"wininet"} and \code{"libcurl"} methods will remove the file if 305 there the URL is unavailable except when \code{mode} specifies 306 appending when the file should be unchanged. 307} 308\seealso{ 309 \code{\link{options}} to set the \code{HTTPUserAgent}, \code{timeout} 310 and \code{internet.info} options used by some of the methods. 311 312 \code{\link{url}} for a finer-grained way to read data from URLs. 313 314 \code{\link{url.show}}, \code{\link{available.packages}}, 315 \code{\link{download.packages}} for applications. 316 317 Contributed packages \CRANpkg{RCurl} and \CRANpkg{curl} provide more 318 comprehensive facilities to download from URLs. 319} 320\keyword{utilities} 321