1# URL syntax and their use in curl 2 3## Specifications 4 5The official "URL syntax" is primarily defined in these two different 6specifications: 7 8 - [RFC 3986](https://tools.ietf.org/html/rfc3986) (although URL is called 9 "URI" in there) 10 - [The WHATWG URL Specification](https://url.spec.whatwg.org/) 11 12RFC 3986 is the earlier one, and curl has always tried to adhere to that one 13(since it shipped in January 2005). 14 15The WHATWG URL spec was written later, is incompatible with the RFC 3986 and 16changes over time. 17 18## Variations 19 20URL parsers as implemented in browsers, libraries and tools usually opt to 21support one of the mentioned specifications. Bugs, differences in 22interpretations and the moving nature of the WHATWG spec does however make it 23unlikely that multiple parsers treat URLs the exact same way! 24 25## Security 26 27Due to the inherent differences between URL parser implementations, it is 28considered a security risk to mix different implementations and assume the 29same behavior! 30 31For example, if you use one parser to check if a URL uses a good host name or 32the correct auth field, and then pass on that same URL to a *second* parser, 33there will always be a risk it treats the same URL differently. There is no 34right and wrong in URL land, only differences of opinions. 35 36libcurl offers a separate API to its URL parser for this reason, among others. 37 38Applications may at times find it convenient to allow users to specify URLs 39for various purposes and that string would then end up fed to curl. Getting a 40URL from an external untrusted party and using it with curl brings several 41security concerns: 42 431. If you have an application that runs as or in a server application, getting 44 an unfiltered URL can trick your application to access a local resource 45 instead of a remote resource. Protecting yourself against localhost accesses 46 is hard when accepting user provided URLs. 47 482. Such custom URLs can access other ports than you planned as port numbers 49 are part of the regular URL format. The combination of a local host and a 50 custom port number can allow external users to play tricks with your local 51 services. 52 533. Such a URL might use other schemes than you thought of or planned for. 54 55## "RFC3986 plus" 56 57curl recognizes a URL syntax that we call "RFC 3986 plus". It is grounded on 58the well established RFC 3986 to make sure previously written command lines and 59curl using scripts will remain working. 60 61curl's URL parser allows a few deviations from the spec in order to 62inter-operate better with URLs that appear in the wild. 63 64### spaces 65 66In particular `Location:` headers that indicate to the client where a resource 67has been redirected to, sometimes contain spaces. This is a violation of RFC 683986 but is fine in the WHATWG spec. curl handles these by re-encoding them to 69`%20`. 70 71### non-ASCII 72 73Byte values in a provided URL that are outside of the printable ASCII range 74are percent-encoded by curl. 75 76### multiple slashes 77 78An absolute URL always starts with a "scheme" followed by a colon. For all the 79schemes curl supports, the colon must be followed by two slashes according to 80RFC 3986 but not according to the WHATWG spec - which allows one to infinity 81amount. 82 83curl allows one, two or three slashes after the colon to still be considered a 84valid URL. 85 86### "scheme-less" 87 88curl supports "URLs" that do not start with a scheme. This is not supported by 89any of the specifications. This is a shortcut to entering URLs that was 90supported by browsers early on and has been mimicked by curl. 91 92Based on what the host name starts with, curl will "guess" what protocol to 93use: 94 95 - `ftp.` means FTP 96 - `dict.` means DICT 97 - `ldap.` means LDAP 98 - `imap.` means IMAP 99 - `smtp.` means SMTP 100 - `pop3.` means POP3 101 - all other means HTTP 102 103### globbing letters 104 105The curl command line tool supports "globbing" of URLs. It means that you can 106create ranges and lists using `[N-M]` and `{one,two,three}` sequences. The 107letters used for this (`[]{}`) are reserved in RFC 3986 and can therefore not 108legitimately be part of such a URL. 109 110They are however not reserved or special in the WHATWG specification, so 111globbing can mess up such URLs. Globbing can be turned off for such occasions 112(using `--globoff`). 113 114# URL syntax details 115 116A URL may consist of the following components - many of them are optional: 117 118 [scheme][divider][userinfo][hostname][port number][path][query][fragment] 119 120Each component is separated from the following component with a divider 121character or string. 122 123For example, this could look like: 124 125 http://user:password@www.example.com:80/index.hmtl?foo=bar#top 126 127## Scheme 128 129The scheme specifies the protocol to use. A curl build can support a few or 130many different schemes. You can limit what schemes curl should accept. 131 132curl supports the following schemes on URLs specified to transfer. They are 133matched case insensitively: 134 135`dict`, `file`, `ftp`, `ftps`, `gopher`, `gophers`, `http`, `https`, `imap`, 136`imaps`, `ldap`, `ldaps`, `mqtt`, `pop3`, `pop3s`, `rtmp`, `rtmpe`, `rtmps`, 137`rtmpt`, `rtmpte`, `rtmpts`, `rtsp`, `smb`, `smbs`, `smtp`, `smtps`, `telnet`, 138`tftp` 139 140When the URL is specified to identify a proxy, curl recognizes the following 141schemes: 142 143`http`, `https`, `socks4`, `socks4a`, `socks5`, `socks5h`, `socks` 144 145## Userinfo 146 147The userinfo field can be used to set user name and password for 148authentication purposes in this transfer. The use of this field is discouraged 149since it often means passing around the password in plain text and is thus a 150security risk. 151 152URLs for IMAP, POP3 and SMTP also support *login options* as part of the 153userinfo field. they are provided as a semicolon after the password and then 154the options. 155 156## Hostname 157 158The hostname part of the URL contains the address of the server that you want 159to connect to. This can be the fully qualified domain name of the server, the 160local network name of the machine on your network or the IP address of the 161server or machine represented by either an IPv4 or IPv6 address (within 162brackets). For example: 163 164 http://www.example.com/ 165 166 http://hostname/ 167 168 http://192.168.0.1/ 169 170 http://[2001:1890:1112:1::20]/ 171 172### "localhost" 173 174Starting in curl 7.77.0, curl will use loopback IP addresses for the name 175`localhost`: `127.0.0.1` and `::1`. It will not try to resolve the name using 176the resolver functions. 177 178This is done to make sure the host accessed is truly the localhost - the local 179machine. 180 181### IDNA 182 183If curl was built with International Domain Name (IDN) support, it can also 184handle host names using non-ASCII characters. 185 186When built with libidn2, curl uses the IDNA 2008 standard. This is equivalent 187to the WHATWG URL spec, but differs from certain browsers that use IDNA 2003 188Transitional Processing. The two standards have a huge overlap but differ 189slightly, perhaps most famously in how they deal with the German "double s" 190(`ß`). 191 192When winidn is used, curl uses IDNA 2003 Transitional Processing, like the rest 193of Windows. 194 195## Port number 196 197If there's a colon after the hostname, that should be followed by the port 198number to use. 1 - 65535. curl also supports a blank port number field - but 199only if the URL starts with a scheme. 200 201If the port number is not specified in the URL, curl will used a default port 202based on the provide scheme: 203 204DICT 2628, FTP 21, FTPS 990, GOPHER 70, GOPHERS 70, HTTP 80, HTTPS 443, 205IMAP 132, IMAPS 993, LDAP 369, LDAPS 636, MQTT 1883, POP3 110, POP3S 995, 206RTMP 1935, RTMPS 443, RTMPT 80, RTSP 554, SCP 22, SFTP 22, SMB 445, SMBS 445, 207SMTP 25, SMTPS 465, TELNET 23, TFTP 69 208 209# Scheme specific behaviors 210 211## FTP 212 213The path part of an FTP request specifies the file to retrieve and from which 214directory. If the file part is omitted then libcurl downloads the directory 215listing for the directory specified. If the directory is omitted then the 216directory listing for the root / home directory will be returned. 217 218FTP servers typically put the user in its "home directory" after login, which 219then differs between users. To explicitly specify the root directory of an FTP 220server start the path with double slash `//` or `/%2f` (2F is the hexadecimal 221value of the ascii code for the slash). 222 223## FILE 224 225When a `FILE://` URL is accessed on Windows systems, it can be crafted in a 226way so that Windows attempts to connect to a (remote) machine when curl wants 227to read or write such a path. 228 229curl only allows the hostname part of a FILE URL to be one out of these three 230alternatives: `localhost`, `127.0.0.1` or blank ("", zero characters). 231Anything else will make curl fail to parse the URL. 232 233### Windows-specific FILE details 234 235curl accepts that the FILE URL's path starts with a "drive letter". That is a 236single letter `a` to `z` followed by a colon or a pipe character (`|`). 237 238The Windows operating system itself will convert some file accesses to perform 239network accesses over SMB/CIFS, through several different file path patterns. 240This way, a `file://` URL passed to curl *might* be converted into a network 241access inadvertently and unknowingly to curl. This is a Windows feature curl 242cannot control or disable. 243 244## IMAP 245 246The path part of an IMAP request not only specifies the mailbox to list or 247select, but can also be used to check the `UIDVALIDITY` of the mailbox, to 248specify the `UID`, `SECTION` and `PARTIAL` octets of the message to fetch and 249to specify what messages to search for. 250 251A top level folder list: 252 253 imap://user:password@mail.example.com 254 255A folder list on the user's inbox: 256 257 imap://user:password@mail.example.com/INBOX 258 259Select the user's inbox and fetch message with uid = 1: 260 261 imap://user:password@mail.example.com/INBOX/;UID=1 262 263Select the user's inbox and fetch the first message in the mail box: 264 265 imap://user:password@mail.example.com/INBOX/;MAILINDEX=1 266 267Select the user's inbox, check the `UIDVALIDITY` of the mailbox is 50 and 268fetch message 2 if it is: 269 270 imap://user:password@mail.example.com/INBOX;UIDVALIDITY=50/;UID=2 271 272Select the user's inbox and fetch the text portion of message 3: 273 274 imap://user:password@mail.example.com/INBOX/;UID=3/;SECTION=TEXT 275 276Select the user's inbox and fetch the first 1024 octets of message 4: 277 278 imap://user:password@mail.example.com/INBOX/;UID=4/;PARTIAL=0.1024 279 280Select the user's inbox and check for NEW messages: 281 282 imap://user:password@mail.example.com/INBOX?NEW 283 284Select the user's inbox and search for messages containing "shadows" in the 285subject line: 286 287 imap://user:password@mail.example.com/INBOX?SUBJECT%20shadows 288 289Searching via the query part of the URL `?` is a search request for the results 290to be returned as message sequence numbers (MAILINDEX). It is possible to make 291a search request for results to be returned as unique ID numbers (UID) by using 292a custom curl request via `-X`. UID numbers are unique per session (and 293multiple sessions when UIDVALIDITY is the same). For example, if you are 294searching for `"foo bar"` in header+body (TEXT) and you want the matching 295MAILINDEX numbers returned then you could search via URL: 296 297 imap://user:password@mail.example.com/INBOX?TEXT%20%22foo%20bar%22 298 299.. but if you wanted matching UID numbers you would have to use a custom request: 300 301 imap://user:password@mail.example.com/INBOX -X "UID SEARCH TEXT \"foo bar\"" 302 303For more information about IMAP commands please see RFC 9051. For more 304information about the individual components of an IMAP URL please see RFC 5092. 305 306* Note old curl versions would FETCH by message sequence number when UID was 307specified in the URL. That was a bug fixed in 7.62.0, which added MAILINDEX to 308FETCH by mail sequence number. 309 310## LDAP 311 312The path part of a LDAP request can be used to specify the: Distinguished 313Name, Attributes, Scope, Filter and Extension for a LDAP search. Each field is 314separated by a question mark and when that field is not required an empty 315string with the question mark separator should be included. 316 317Search for the DN as `My Organisation`: 318 319 ldap://ldap.example.com/o=My%20Organisation 320 321the same search but will only return postalAddress attributes: 322 323 ldap://ldap.example.com/o=My%20Organisation?postalAddress 324 325Search for an empty DN and request information about the 326`rootDomainNamingContext` attribute for an Active Directory server: 327 328 ldap://ldap.example.com/?rootDomainNamingContext 329 330For more information about the individual components of a LDAP URL please 331see [RFC 4516](https://tools.ietf.org/html/rfc4516). 332 333## POP3 334 335The path part of a POP3 request specifies the message ID to retrieve. If the 336ID is not specified then a list of waiting messages is returned instead. 337 338## SCP 339 340The path part of an SCP URL specifies the path and file to retrieve or 341upload. The file is taken as an absolute path from the root directory on the 342server. 343 344To specify a path relative to the user's home directory on the server, prepend 345`~/` to the path portion. 346 347## SFTP 348 349The path part of an SFTP URL specifies the file to retrieve or upload. If the 350path ends with a slash (`/`) then a directory listing is returned instead of a 351file. If the path is omitted entirely then the directory listing for the root 352/ home directory will be returned. 353 354## SMB 355The path part of a SMB request specifies the file to retrieve and from what 356share and directory or the share to upload to and as such, may not be omitted. 357If the user name is embedded in the URL then it must contain the domain name 358and as such, the backslash must be URL encoded as %2f. 359 360curl supports SMB version 1 (only) 361 362## SMTP 363 364The path part of a SMTP request specifies the host name to present during 365communication with the mail server. If the path is omitted, then libcurl will 366attempt to resolve the local computer's host name. However, this may not 367return the fully qualified domain name that is required by some mail servers 368and specifying this path allows you to set an alternative name, such as your 369machine's fully qualified domain name, which you might have obtained from an 370external function such as gethostname or getaddrinfo. 371 372The default smtp port is 25. Some servers use port 587 as an alternative. 373 374## RTMP 375 376There's no official URL spec for RTMP so libcurl uses the URL syntax supported 377by the underlying librtmp library. It has a syntax where it wants a 378traditional URL, followed by a space and a series of space-separated 379`name=value` pairs. 380 381While space is not typically a "legal" letter, libcurl accepts them. When a 382user wants to pass in a `#` (hash) character it will be treated as a fragment 383and get cut off by libcurl if provided literally. You will instead have to 384escape it by providing it as backslash and its ASCII value in hexadecimal: 385`\23`. 386