1# Proxy support in Chrome
2
3This document establishes basic proxy terminology and describes Chrome-specific
4proxy behaviors.
5
6[TOC]
7
8## Proxy server identifiers
9
10A proxy server is an intermediary used for network requests. A proxy server can
11be described by its address, along with the proxy scheme that should be used to
12communicate with it.
13
14This can be written as a string using either the "PAC format" or the "URI
15format".
16
17The PAC format is how one names a proxy server in [Proxy
18auto-config](https://en.wikipedia.org/wiki/Proxy_auto-config) scripts. For
19example:
20* `PROXY foo:2138`
21* `SOCKS5 foo:1080`
22* `DIRECT`
23
24The "URI format" instead encodes the information as a URL. For example:
25* `foo:2138`
26* `http://foo:2138`
27* `socks5://foo:1080`
28* `direct://`
29
30The port number is optional in both formats. When omitted, a per-scheme default
31is used.
32
33See the [Proxy server schemes](#Proxy-server-schemes) section for details on
34what schemes Chrome supports, and how to write them in the PAC and URI formats.
35
36Most UI surfaces in Chrome (including command lines and policy) expect URI
37formatted proxy server identifiers. However outside of Chrome, proxy servers
38are generally identified less precisely by just an address -- the proxy
39scheme is assumed based on context.
40
41In Windows' proxy settings there are host and port fields for the
42"HTTP", "Secure", "FTP", and "SOCKS" proxy. With the exception of "SOCKS",
43those are all identifiers for insecure HTTP proxy servers (proxy scheme is
44assumed as HTTP).
45
46## Proxy resolution
47
48Proxying in Chrome is done at the URL level.
49
50When the browser is asked to fetch a URL, it needs to decide which IP endpoint
51to send the request to. This can be either a proxy server, or the target host.
52
53This is called proxy resolution. The input to proxy resolution is a URL, and
54the output is an ordered list of [proxy server
55identifiers](#Proxy-server-identifiers).
56
57What proxies to use can be described using either:
58
59* [Manual proxy settings](#Manual-proxy-settings) - proxy resolution is defined
60  using a declarative set of rules. These rules are expressed as a mapping from
61  URL scheme to proxy server identifier(s), and a list of proxy bypass rules for
62  when to go DIRECT instead of using the mapped proxy.
63
64* PAC script - proxy resolution is defined using a JavaScript program, that is
65  invoked whenever fetching a URL to get the list of proxy server identifiers
66  to use.
67
68* Auto-detect - the WPAD protocol is used to probe the network (using DHCP/DNS)
69  and possibly discover the URL of a PAC script.
70
71## Proxy server schemes
72
73When using an explicit proxy in the browser, multiple layers of the network
74request are impacted, depending on the scheme that is used. Some implications
75of the proxy scheme are:
76
77* Is communication to the proxy done over a secure channel?
78* Is name resolution (ex: DNS) done client side, or proxy side?
79* What authentication schemes to the proxy server are supported?
80* What network traffic can be sent through the proxy?
81
82Chrome supports these proxy server schemes:
83
84* [DIRECT](#DIRECT-proxy-scheme)
85* [HTTP](#HTTP-proxy-scheme)
86* [HTTPS](#HTTPS-proxy-scheme)
87* [SOCKSv4](#SOCKSv4-proxy-scheme)
88* [SOCKSv5](#SOCKSv5-proxy-scheme)
89* [QUIC](#QUIC-proxy-scheme)
90
91### DIRECT proxy scheme
92
93* Default port: N/A (neither host nor port are applicable)
94* Example identifier (PAC): `DIRECT`
95* Example identifier (URI): `direct://`
96
97This is a pseudo proxy scheme that indicates instead of using a proxy we are
98sending the request directly to the target server.
99
100It is imprecise to call this a "proxy server", but it is a convenient abstraction.
101
102### HTTP proxy scheme
103
104* Default port: 80
105* Example identifier (PAC): `PROXY proxy:8080`, `proxy` (non-standard; don't use)
106* Example identifiers (URI): `http://proxy:8080`, `proxy:8080` (can omit scheme)
107
108Generally when one refers to a "proxy server" or "web proxy", they are talking
109about an HTTP proxy.
110
111When using an HTTP proxy in Chrome, name resolution is always deferred to the
112proxy. HTTP proxies can proxy `http://`, `https://`, `ws://` and `wss://` URLs.
113(Chrome's FTP support is deprecated, and HTTP proxies cannot proxy `ftp://` anymore)
114
115Communication to HTTP proxy servers is insecure, meaning proxied `http://`
116requests are sent in the clear. When proxying `https://` requests through an
117HTTP proxy, the TLS exchange is forwarded through the proxy using the `CONNECT`
118method, so end-to-end encryption is not broken. However when establishing the
119tunnel, the hostname of the target URL is sent to the proxy server in the
120clear.
121
122HTTP proxies in Chrome support the same HTTP authentiation schemes as for
123target servers: Basic, Digest, Negotiate, NTLM.
124
125### HTTPS proxy scheme
126
127* Default port: 443
128* Example identifier (PAC): `HTTPS proxy:8080`
129* Example identifier (URI): `https://proxy:8080`
130
131This works like an [HTTP proxy](#HTTP-proxy-scheme), except the
132communication to the proxy server is protected by TLS, and may negotiate
133HTTP/2 (but not QUIC).
134
135Because the connection to the proxy server is secure, https:// requests
136sent through the proxy are not sent in the clear as with an HTTP proxy.
137Similarly, since CONNECT requests are sent over a protected channel, the
138hostnames for proxied https:// URLs is also not revealed.
139
140In addition to the usual HTTP authentication methods, HTTPS proxies also
141support client certificates.
142
143HTTPS proxies using HTTP/2 can offer better performance in Chrome than a
144regular HTTP proxy due to higher connection limits (HTTP/1.1 proxies in Chrome
145are limited to 32 simultaneous connections across all domains).
146
147Chrome, Firefox, and Opera support HTTPS proxies; however, most older HTTP
148stacks do not.
149
150Specifying an HTTPS proxy is generally not possible through system proxy
151settings. Instead, one must use either a PAC script or a Chrome proxy setting
152(command line, extension, or policy).
153
154See the dev.chromium.org document on [secure web
155proxies](http://dev.chromium.org/developers/design-documents/secure-web-proxy)
156for tips on how to run and test against an HTTPS proxy.
157
158### SOCKSv4 proxy scheme
159
160* Default port: 1080
161* Example identifiers (PAC): `SOCKS4 proxy:8080`, `SOCKS proxy:8080`
162* Example identifier (URI): `socks4://proxy:8080`
163
164SOCKSv4 is a simple transport layer proxy that wraps a TCP socket. Its use
165is transparent to the rest of the protocol stack; after an initial
166handshake when connecting the TCP socket (to the proxy), the rest of the
167loading stack is unchanged.
168
169No proxy authentication methods are supported for SOCKSv4.
170
171When using a SOCKSv4 proxy, name resolution for target hosts is always done
172client side, and moreover must resolve to an IPv4 address (SOCKSv4 encodes
173target address as 4 octets, so IPv6 targets are not possible).
174
175There are extensions to SOCKSv4 that allow for proxy side name resolution, and
176IPv6, namely SOCKSv4a. However Chrome does not allow configuring, or falling
177back to v4a.
178
179A better alternative is to just use the newer version of the protocol, SOCKSv5
180(which is still 20+ years old).
181
182### SOCKSv5 proxy scheme
183
184* Default port: 1080
185* Example identifier (PAC): `SOCKS5 proxy:8080`
186* Example identifiers (URI): `socks://proxy:8080`, `socks5://proxy:8080`
187
188[SOCKSv5](https://tools.ietf.org/html/rfc1928) is a transport layer proxy that
189wraps a TCP socket, and allows for name resolution to be deferred to the proxy.
190
191In Chrome when a proxy's scheme is set to SOCKSv5, name resolution is always
192done proxy side (even though the protocol allows for client side as well). In
193Firefox client side vs proxy side name resolution can be configured with
194`network.proxy.socks_remote_dns`; Chrome has no equivalent option and will
195always use proxy side resolution.
196
197No authentication methods are supported for SOCKSv5 in Chrome (although some do
198exist for the protocol).
199
200A handy way to create a SOCKSv5 proxy is with `ssh -D`, which can be used to
201tunnel web traffic to a remote host over SSH.
202
203In Chrome SOCKSv5 is only used to proxy TCP-based URL requests. It cannot be
204used to relay UDP traffic.
205
206### QUIC proxy scheme
207
208* Default (UDP) port: 443
209* Example identifier (PAC): `QUIC proxy:8080`
210* Example identifier (URI): `quic://proxy:8080`
211
212A QUIC proxy uses QUIC (UDP) as the underlying transport, but otherwise
213behaves as an HTTP proxy. It has similar properties to an [HTTPS
214proxy](#HTTPS-proxy-scheme), in that the connection to the proxy server
215is secure, and connection limits are less restrictive.
216
217Support for QUIC proxies in Chrome is currently experimental and not
218ready for production use. In particular, sending https:// and wss://
219URLs through a QUIC proxy is [disabled by
220default](https://bugs.chromium.org/p/chromium/issues/detail?id=969859).
221
222Another caveat is that QUIC does not currently support
223client certificates since it does not use a TLS
224handshake. This may change in future versions.
225
226## Manual proxy settings
227
228The simplest way to configure proxy resolution is by providing a static list of
229rules comprised of:
230
2311. A mapping of URL schemes to [proxy server identifiers](#Proxy-server-identifiers).
2322. A list of [proxy bypass rules](#Proxy-bypass-rules)
233
234We refer to this mode of configuration as "manual proxy settings".
235
236Manual proxy settings can succinctly describe setups like:
237
238* Use proxy `http://foo:8080` for all requests
239* Use proxy `http://foo:8080` for all requests except those to a `google.com`
240  subdomain.
241* Use proxy `http://foo:8080` for all `https://` requests, and proxy
242  `socsk5://mysocks:90` for everything else
243
244Although manual proxy settings are a ubiquituous way to configure proxies
245across platforms, there is no standard representation or feature set.
246
247Chrome's manual proxy settings most closely resembles that of WinInet. But it
248also supports idioms from other platforms -- for instance KDE's notion of
249reversing the bypass list, or Gnome's interpretation of bypass patterns as
250suffix matches.
251
252When defining manual proxy settings in Chrome, we specify three (possibly
253empty) lists of [proxy server identifiers](#Proxy-server-identifiers).
254
255  * proxies for HTTP - A list of proxy server identifiers to use for `http://`
256    requests, if non-empty.
257  * proxies for HTTPS - A list of proxy server identifiers to use for
258    `https://` requests, if non-empty.
259  * other proxies - A list of proxy server identifiers to use for everything
260    else (whatever isn't matched by the other two lists)
261
262There are a lot of ways to end up with manual proxy settings in Chrome
263(discussed in other sections).
264
265The following examples will use the command line method. Launching Chrome with
266`--proxy-server=XXX` (and optionally `--proxy-bypass-list=YYY`)
267
268Example: To use proxy `http://foo:8080` for all requests we can launch
269Chrome with `--proxy-server="http://foo:8080"`. This translates to:
270
271  * proxies for HTTP - *empty*
272  * proxies for HTTPS - *empty*
273  * other proxies - `http://foo:8080`
274
275With the above configuration, if the proxy server was unreachable all requests
276would fail with `ERR_PROXY_CONNECTION_FAILED`. To address this we could add a
277fallback to `DIRECT` by launching using
278`--proxy-server="http://foo:8080,direct://"` (note the comma separated list).
279This command line means:
280
281  * proxies for HTTP - *empty*
282  * proxies for HTTPS - *empty*
283  * other proxies - `http://foo:8080`, `direct://`
284
285If instead we wanted to proxy only `http://` URLs through the
286HTTPS proxy `https://foo:443`, and have everything else use the SOCKSv5 proxy
287`socks5://mysocks:1080` we could launch Chrome with
288`--proxy-server="http=https://foo:443;socks=socks5://mysocks:1080"`. This now
289expands to:
290
291  * proxies for HTTP - `https://foo:443`
292  * proxies for HTTPS - *empty*
293  * other proxies - `socks5://mysocks:1080`
294
295The command line above uses WinInet's proxy map format, with some additional
296features:
297
298* Instead of naming proxy servers by just a hostname:port, you can use Chrome's
299  URI format for proxy server identifiers. In other words, you can prefix the
300  proxy scheme so it doesn't default to HTTP.
301* The `socks=` mapping is understood more broadly as "other proxies". The
302  subsequent proxy list can include proxies of any scheme, however if the
303  scheme is omitted it will be understood as SOCKSv4 rather than HTTP.
304
305### Mapping WebSockets URLs to a proxy
306
307[Manual proxy settings](#Manual-proxy-settings) don't have mappings for `ws://`
308or `wss://` URLs.
309
310Selecting a proxy for these URL schemes is a bit different from other URL
311schemes. The algorithm that Chrome uses is:
312
313* If "other proxies" is non-empty use it
314* If "proxies for HTTPS" is non-empty use it
315* Otherwise use "proxies for HTTP"
316
317This is per the recommendation in section 4.1.3 of [RFC
3186455](https://tools.ietf.org/html/rfc6455).
319
320It is possible to route `ws://` and `wss://` separately using a PAC script.
321
322### Proxy credentials in manual proxy settings
323
324Most platforms' [manual proxy settings](#Manual-proxy-settings) allow
325specifying a cleartext username/password for proxy sign in. Chrome does not
326implement this, and will not use any credentials embedded in the proxy
327settings.
328
329Proxy authentication will instead go through the ordinary flow to find
330credentials.
331
332## Proxy bypass rules
333
334In addition to specifying three lists of [proxy server
335identifiers](#proxy-server-identifiers), Chrome's [manual proxy
336settings](#Manual-proxy-settings) lets you specify a list of "proxy bypass
337rules".
338
339This ruleset determines whether a given URL should skip use of a proxy all
340together, even when a proxy is otherwise defined for it.
341
342This concept is also known by names like "exception list", "exclusion list" or
343"no proxy list".
344
345Proxy bypass rules can be written as an ordered list of strings. Ordering
346generally doesn't matter, but may when using subtractive rules.
347
348When manual proxy settings are specified from the command line, the
349`--proxy-bypass-list="RULES"` switch can be used, where `RULES` is a semicolon
350or comma separated list of bypass rules.
351
352Following are the string constructions for the bypass rules that Chrome
353supports. They can be used when defining a Chrome manual proxy settings from
354command line flags, extensions, or policy.
355
356When using system proxy settings, one should use the platform's rule format and
357not Chrome's.
358
359### Bypass rule: Hostname
360
361```
362[ URL_SCHEME "://" ] HOSTNAME_PATTERN [ ":" <port> ]
363```
364
365Matches a hostname using a wildcard pattern, and an optional scheme and port
366restriction.
367
368Examples:
369
370* `foobar.com` - Matches URL of any scheme and port, whose normalized host is
371  `foobar.com`
372* `*foobar.com` - Matches URL of any scheme and port, whose normalized host
373  ends with `foobar.com` (for instance `blahfoobar.com` and `foo.foobar.com`).
374* `*.org:443` - Matches URLs of any scheme, using port 443 and whose top level
375  domain is `.org`
376* `https://x.*.y.com:99` - Matches https:// URLs on port 99 whose normalized
377  hostname matches `x.*.y.com`
378
379### Bypass rule: Subdomain
380
381```
382[ URL_SCHEME "://" ] "." HOSTNAME_SUFFIX_PATTERN [ ":" PORT ]
383```
384
385Hostname patterns that start with a dot are special cased to mean a subdomain
386matches. `.foo.com` is effectively another way of writing `*.foo.com`.
387
388Examples:
389
390* `.google.com` - Matches `calendar.google.com` and `foo.bar.google.com`, but
391  not `google.com`.
392* `http://.google.com` - Matches only http:// URLs that are a subdomain of `google.com`.
393
394### Bypass rule: IP literal
395
396```
397[ SCHEME "://" ] IP_LITERAL [ ":" PORT ]
398```
399
400Matches URLs that are IP address literals, and optional scheme and port
401restrictions. This is a special case of hostname matching that takes into
402account IP literal canonicalization. For example the rules `[0:0:0::1]` and
403`[::1]` are equivalent (both represent the same IPv6 address).
404
405Examples:
406
407* `127.0.0.1`
408* `http://127.0.0.1`
409* `[::1]` - Matches any URL to the IPv6 loopback address.
410* `[0:0::1]` - Same as above
411* `http://[::1]:99` - Matches any http:// URL to the IPv6 loopback on port 99
412
413### Bypass rule: IPv4 address range
414
415```
416IPV4_LITERAL "/" PREFIX_LENGTH_IN_BITS
417```
418
419Matches any URL whose hostname is an IPv4 literal, and falls between the given
420address range.
421
422Note this [only applies to URLs that are IP
423literals](#Meaning-of-IP-address-range-bypass-rules).
424
425Examples:
426
427* `192.168.1.1/16`
428
429### Bypass rule: IPv6 address range
430
431```
432IPV6_LITERAL "/" PREFIX_LENGTH_IN_BITS
433```
434
435Matches any URL that is an IPv6 literal that falls between the given range.
436Note that IPv6 literals must *not* be bracketed.
437
438Note this [only applies to URLs that are IP
439literals](#Meaning-of-IP-address-range-bypass-rules).
440
441Examples:
442
443* `fefe:13::abc/33`
444* `[fefe::]/40` -- WRONG! IPv6 literals must not be bracketed.
445
446### Bypass rule: Simple hostnames
447
448```
449<local>
450```
451
452Matches hostnames without a period in them, and that are not IP literals. This
453is a naive string search -- meaning that periods appearing *anywhere* count
454(including trailing dots!).
455
456This rule corresponds to the "Exclude simple hostnames" checkbox on macOS and
457the "Don't use proxy server for local (intranet) addresses" on Windows.
458
459The rule name comes from WinInet, and can easily be confused with the concept
460of localhost. However the two concepts are completely orthogonal. In practice
461one wouldn't add rules to bypass localhost, as it is [already done
462implicitly](#Implicit-bypass-rules).
463
464### Bypass rule: Subtract implicit rules
465
466```
467<-loopback>
468```
469
470*Subtracts* the [implicit proxy bypass rules](#Implicit-bypass-rules)
471(localhost and link local addresses). This is generally only needed for test
472setupe. Beware of the security implications to proxying localhost.
473
474Whereas regular bypass rules instruct the browser about URLs that should *not*
475use the proxy, this rule has the opposite effect and tells the browser to
476instead *use* the proxy.
477
478Ordering may matter when using a subtractive rule, as rules will be evaluated
479in a left-to-right order. `<-loopback>;127.0.0.1` has a subtly different effect
480than `127.0.0.1;<-loopback>`.
481
482### Meaning of IP address range bypass rules
483
484The IP address range bypass rules in manual proxy settings applies only to URL
485literals. This is not what one would intuitively expect.
486
487Example:
488
489Say we have have configured a proxy for all requests, but added a bypass rule
490for `192.168.0.0.1/16`. If we now navigate to `http://foo` (which resolves
491to `192.168.1.5` in our setup) will the browser connect directly (bypass proxy)
492because we have indicated a bypass rule that includes this IP?
493
494It will go through the proxy.
495
496The bypass rule in this case is not applicable, since the browser never
497actually does a name resolution for `foo`. Proxy resolution happens before
498name resolution, and depending on what proxy scheme is subsequently chosen,
499client side name resolution may never be performed.
500
501The usefulness of IP range proxy bypass rules is rather limited, as they only
502apply to requests whose URL was explicitly an IP literal.
503
504If proxy decisions need to be made based on the resolved IP address(es) of a
505URL's hostname, one must use a PAC script.
506
507## Implicit bypass rules
508
509Requests to certain hosts will not be sent through a proxy, and will instead be
510sent directly.
511
512We call these the _implicit bypass rules_. The implicit bypass rules match URLs
513whose host portion is either a localhost name or a link-local IP literal.
514Essentially it matches:
515
516```
517localhost
518*.localhost
519[::1]
520127.0.0.1/8
521169.254/16
522[FE80::]/10
523```
524
525The complete rules are slightly more complicated. For instance on
526Windows we will also recognize `loopback`, and there is special casing of
527`localhost6` and `localhost6.localdomain6` in Chrome's localhost matching.
528
529This concept of implicit proxy bypass rules is consistent with the
530platform-level proxy support on Windows and macOS (albeit with some differences
531due to their implementation quirks - see compatibility notes in
532`net::ProxyBypassRules::MatchesImplicitRules`)
533
534Why apply implicit proxy bypass rules in the first place? Certainly there are
535considerations around ergonomics and user expectation, but the bigger problem
536is security. Since the web platform treats `localhost` as a secure origin, the
537ability to proxy it grants extra powers. This is [especially
538problematic](https://bugs.chromium.org/p/chromium/issues/detail?id=899126) when
539proxy settings are externally controllable, as when using PAC scripts.
540
541Historical support in Chrome:
542
543* Prior to M71 there were no implicit proxy bypass rules (except if using
544  `--winhttp-proxy-resolver`)
545* In M71 Chrome applied implicit proxy bypass rules to PAC scripts
546* In M72 Chrome generalized the implicit proxy bypass rules to manually
547  configured proxies
548
549### Overriding the implicit bypass rules
550
551If you want traffic to `localhost` to be sent through a proxy despite the
552security concerns, it can be done by adding the special proxy bypass rule
553`<-loopback>`. This has the effect of _subtracting_ the implicit rules.
554
555For instance, launch Chrome with the command line flag:
556
557```
558--proxy-bypass-list="<-loopback>"
559```
560
561Note that there currently is no mechanism to disable the implicit proxy bypass
562rules when using a PAC script. Proxy bypass lists only apply to manual
563settings, so the technique above cannot be used to let PAC scripts decide the
564proxy for localhost URLs.
565
566## Evaluating proxy lists (proxy fallback)
567
568Proxy resolution results in a _list_ of [proxy server
569identifiers](#Proxy-server-identifiers) to use for a
570given request, not just a single proxy server identifier.
571
572For instance, consider this PAC script:
573
574```
575function FindProxyForURL(url, host) {
576    if (host == "www.example.com") {
577        return "PROXY proxy1; HTTPS proxy2; SOCKS5 proxy3";
578    }
579    return "DIRECT";
580}
581
582```
583
584What proxy will Chrome use for connections to `www.example.com`, given that
585we have a choice of three separate proxy server identifiers to choose from
586{`http://proxy1:80`, `https://proxy2:443`, `socks5://proxy3:1080`}?
587
588Initially, Chrome will try the proxies in order. This means first attempting
589the request through `http://proxy1:80`. If that "fails", the request is
590next attempted through `https://proxy2:443`. Lastly if that fails, the
591request is attempted through `socks5://proxy3:1080`.
592
593This process is referred to as _proxy fallback_. What constitutes a
594"failure" is described later.
595
596Proxy fallback is stateful. The actual order of proxy attempts made be Chrome
597is influenced by the past responsiveness of proxy servers.
598
599Let's say we request `http://www.example.com/`. Per the PAC script this
600resolves to a list of three proxy server identifiers:
601
602{`http://proxy1:80`, `https://proxy2:443`, `socks5://proxy3:1080`}
603
604Chrome will first attempt to issue the request through these proxies in the
605left-to-right order.
606
607Let's say that the attempt through `http://proxy1:80` fails, but then the
608attempt through `https://proxy2:443` succeeds. Chrome will mark
609`http://proxy1:80` as _bad_ for the next 5 minutes. Being marked as _bad_
610means that `http://proxy1:80` is de-prioritized with respect to
611other proxy server identifiers (including `direct://`) that are not marked as
612bad.
613
614That means the next time `http://www.example.com/` is requested, the effective
615order for proxies to attempt will be:
616
617{`https://proxy2:443`, `socks5://proxy3:1080`, `http://proxy1:80`}
618
619Conceptually, _bad_ proxies are moved to the end of the list, rather than being
620removed from consideration all together.
621
622What constitutes a "failure" when it comes to triggering proxy fallback depends
623on the proxy type. Generally speaking, only connection level failures
624are deemed eligible for proxy fallback. This includes:
625
626* Failure resolving the proxy server's DNS
627* Failure connecting a TCP socket to the proxy server
628
629(There are some caveats for how HTTPS and QUIC proxies count failures for
630fallback)
631
632Prior to M67, Chrome would consider failures establishing a
633CONNECT tunnel as an error eligible for proxy fallback. This policy [resulted
634in problems](https://bugs.chromium.org/p/chromium/issues/detail?id=680837) for
635deployments whose HTTP proxies intentionally failed certain https:// requests,
636since that necessitates inducing a failure during the CONNECT tunnel
637establishment. The problem would occur when a working proxy fallback option
638like DIRECT was given, since the failing proxy would then be marked as bad.
639
640Currently there are no options to configure proxy fallback (including disabling
641the caching of bad proxies). Future versions of Chrome may [remove caching
642of bad proxies](https://bugs.chromium.org/p/chromium/issues/detail?id=936130)
643to make fallback predictable.
644
645To investigate issues relating to proxy fallback, one can [collect a NetLog
646dump using
647chrome://net-export/](https://dev.chromium.org/for-testers/providing-network-details).
648These logs can then be loaded with the [NetLog
649viewer](https://netlog-viewer.appspot.com/).
650
651There are a few things of interest in the logs:
652
653* The "Proxy" tab will show which proxies (if any) were marked as bad at the
654  time the capture ended.
655* The "Events" tab notes what the resolved proxy list was, and what the
656  re-ordered proxy list was after taking into account bad proxies.
657* The "Events" tab notes when a proxy is marked as bad and why (provided the
658  event occurred while capturing was enabled).
659
660When debugging issues with bad proxies, it is also useful to reset Chrome's
661cache of bad proxies. This can be done by clicking the "Clear bad proxies"
662button on
663[chrome://net-internals/#proxy](chrome://net-internals/#proxy). Note the UI
664will not give feedback that the bad proxies were cleared, however capturing a
665new NetLog dump can confirm it was cleared.
666
667## Arguments passed to FindProxyForURL() in PAC scripts
668
669PAC scripts in Chrome are expected to define a JavaScript function
670`FindProxyForURL`.
671
672The historical signature for this function is:
673
674```
675function FindProxyForURL(url, host) {
676  ...
677}
678```
679
680Scripts can expect to be called with string arguments `url` and `host` such
681that:
682
683* `url` is a *sanitized* version of the request's URL
684* `host` is the unbracketed host portion of the origin.
685
686Sanitization of the URL means that the path, query, fragment, and identity
687portions of the URL are stripped. Effectively `url` will be
688limited to a `scheme://host:port/` style URL
689
690Examples of how `FindProxyForURL()` will be called:
691
692```
693// Actual URL:   https://www.google.com/Foo
694FindProxyForURL('https://www.google.com/', 'www.google.com')
695
696// Actual URL:   https://[dead::beef]/foo?bar
697FindProxyForURL('https://[dead::beef]/', 'dead::beef')
698
699// Actual URL:   https://www.example.com:8080#search
700FindProxyForURL('https://www.example.com:8080/', 'example.com')
701
702// Actual URL:   https://username:password@www.example.com
703FindProxyForURL('https://www.example.com/', 'example.com')
704```
705
706Stripping the path and query from the `url` is a departure from the original
707Netscape implementation of PAC. It was introduced in Chrome 52 for [security
708reasons](https://bugs.chromium.org/p/chromium/issues/detail?id=593759).
709
710There is currently no option to turn off sanitization of URLs passed to PAC
711scripts (removed in Chrome 75).
712
713The sanitization of http:// URLs currently has a different policy, and does not
714strip query and path portions of the URL. That said, users are advised not to
715depend on reading the query/path portion of any URL
716type, since future versions of Chrome may [deprecate that
717capability](https://bugs.chromium.org/p/chromium/issues/detail?id=882536) in
718favor of a consistent policy.
719
720## Resolving client's IP address within a PAC script using myIpAddress()
721
722PAC scripts can invoke `myIpAddress()` to obtain the client's IP address. This
723function returns a single IP literal, or `"127.0.0.1"` on failure.
724
725`myIpAddress()` is fundamentally broken for multi-homed hosts.
726
727Consider what happens when a machine has multiple network interfaces, each with
728its own IP address. Answering "what is my IP address" depends on what interface
729the request is sent out on. Which in turn depends on what the destination IP
730is. Which in turn depends on the result of proxy resolution + fallback, which
731is what we are currently blocked in!
732
733Chrome's algorithm uses these ordered steps to find an IP address
734(short-circuiting when a candidate is found).
735
7361. Select the IP of an interface that can route to public Internet:
737    * Probe for route to `8.8.8.8`.
738    * Probe for route to `2001:4860:4860::8888`.
7392. Select an IP by doing a DNS resolve of the machine's hostname:
740    * Select the first IPv4 result if there is one.
741    * Select the first IP result if there is one.
7423. Select the IP of an interface that can route to private IP space:
743    * Probe for route to `10.0.0.0`.
744    * Probe for route to `172.16.0.0`.
745    * Probe for route to `192.168.0.0`.
746    * Probe for route to `FC00::`.
747
748When searching for candidate IP addresses, link-local and loopback addresses
749are skipped over. Link-local or loopback address will only be returned as a
750last resort when no other IP address was found by following these steps.
751
752This sequence of steps explicitly favors IPv4 over IPv6 results.
753
754*Historical note*: Prior to M72, Chrome's implementation of `myIpAddress()` was
755effectively just `getaddrinfo(gethostname)`. This is now step 2 of the heuristic.
756
757### What about pacUseMultihomedDNS?
758
759In Firefox, if you define a global variable named `pacUseMultihomedDNS` in your
760PAC script, it causes `myIpAddress()` to report the IP address of the interface
761that would (likely) have been used had we connected to it DIRECT.
762
763In particular, it will do a DNS resolution of the target host (the hostname of
764the URL that the proxy resolution is being done for), and then
765connect a datagram socket to get the source address.
766
767Chrome does not recognize the `pacUseMultihomedDNS` global as having special
768meaning. A PAC script is free to define such a global, and it won't have
769side-effects. Chrome has no APIs or settings to change `myIpAddress()`'s
770algorithm.
771
772## Resolving client's IP address within a PAC script using myIpAddressEx()
773
774Chrome supports the [Microsoft PAC
775extension](https://docs.microsoft.com/en-us/windows/desktop/winhttp/myipaddressex)
776`myIpAddressEx()`.
777
778This is like `myIpAddress()`, but instead of returning a single IP address, it
779can return multiple IP addresses. It returns a string containing a semi-colon
780separated list of addresses. On failure it returns an empty string to indicate
781no results (whereas `myIpAddress()` returns `127.0.0.1`).
782
783There are some differences with Chrome's implementation:
784
785* In Chrome the function is unconditionally defined, whereas in Internet
786  Explorer one must have used the `FindProxyForURLEx` entrypoint.
787* Chrome does not enumerate all of the host's network interfaces
788* Chrome does not return link-local or loopback addresses (except if no other
789  addresses were found).
790
791The algorithm that Chrome uses is nearly identical to that of `myIpAddress()`
792described earlier. The main difference is that we don't short-circuit
793after finding the first candidate IP, so multiple IPs may be returned.
794
7951. Select all the IPs of interfaces that can route to public Internet:
796    * Probe for route to `8.8.8.8`.
797    * Probe for route to `2001:4860:4860::8888`.
798    * If any IPs were found, return them, and finish.
7992. Select an IP by doing a DNS resolve of the machine's hostname:
800    * If any IPs were found, return them, and finish.
8013. Select the IP of an interface that can route to private IP space:
802    * Probe for route to `10.0.0.0`.
803    * Probe for route to `172.16.0.0`.
804    * Probe for route to `192.168.0.0`.
805    * Probe for route to `FC00::`.
806    * If any IPs were found, return them, and finish.
807
808Note that short-circuiting happens whenever steps 1-3 find a candidate IP. So
809for example if at least one IP address was discovered by checking routes to
810public Internet, only those IPs will be returned, and steps 2-3 will not run.
811
812## Android quirks
813
814Proxy resolving via PAC works differently on Android than other desktop Chrome
815platforms:
816
817* Android Chrome uses the same Chromium PAC resolver, however does not run it
818  out-of-process as on Desktop Chrome. This architectural difference is
819  due to the higher process cost on Android, and means Android Chrome is more
820  susceptible to malicious PAC scripts. The other consequence is that Android
821  Chrome can have distinct regressions from Desktop Chrome as the service setup
822  is quite different (and most `browser_tests` are not run on Android either).
823
824* [WebView does not use Chrome's PAC
825  resolver](https://bugs.chromium.org/p/chromium/issues/detail?id=989667).
826  Instead Android WebView uses the Android system's PAC resolver, which is less
827  optimized and uses an old build of V8. When the system is configured to use
828  PAC, Android WebView's net code will see the proxy settings as being a
829  single HTTP proxy on `localhost`. The system localhost proxy will in turn
830  evaluate the PAC script and forward the HTTP request on to the resolved
831  proxy. This translation has a number of effects, including what proxy
832  schemes are supported, the maximum connection limits, how proxy fallback
833  works, and overall performance (the current Android PAC evaluator blocks on
834  DNS).
835
836* Android system log messages for `PacProcessor` are not related to Chrome or
837  its PAC evaluator. Rather, these are log messages generated by the Android
838  system's PAC implementation. This confusion can arise when users add
839  `alert()` to debug PAC script logic, and then refer to output in `logcat` to
840  try and diagnose a resolving issue in Android Chrome.
841
842## Downloading PAC scripts
843
844When a network context is configured to use a PAC script, proxy resolution will
845stall while downloading the PAC script.
846
847Fetches for PAC URLs are initiated by the network stack, and behave differently
848from ordinary web visible requests:
849
850* Must complete within 30 seconds.
851* Must complete with an HTTP response code of exactly 200.
852* Must have an uncompressed body smaller than 1 MB.
853* Do not follow ordinary HTTP caching semantics.
854* Are never fetched through a proxy
855* Are not visible to the WebRequest extension API, or to service workers.
856* Do not support HTTP authentication (ambient authentication may work, but
857  cannot prompt UI for credentials).
858* Do not support client certificates (including `AutoSelectCertificateForUrls`)
859* Do not support auxiliary certificate network fetches (will only used cached
860  OCSP, AIA, and CRL responses during certificate verification).
861
862### Caching of successful PAC fetches
863
864PAC URLs are always fetched from the network, and never from the HTTP cache.
865After a PAC URL is successfully fetched, its contents (which are used to create
866a long-lived Java Script context) will be assumed to be fresh until either:
867
868* The network changes (IP address changes, DNS configuration changes)
869* The response becomes older than 12 hours
870* A user explicitly invalidates PAC through `chrome://net-internals#proxy`
871
872Once considered stale, the PAC URL will be re-fetched the next time proxy
873resolution is requested.
874
875### Fallback for failed PAC fetches
876
877When the proxy settings are configured to use a PAC URL, and that PAC URL
878cannot be fetched, proxy resolution will fallback to the next option, which is
879often `DIRECT`:
880
881* If using system proxy settings, and the platform supports fallback to manual
882  proxy settings (e.g. Windows), the specified manual proxy servers will be
883  used after the PAC fetch fails.
884* If using Chrome's proxy settings, and the PAC script was marked as
885  [mandatory](https://developer.chrome.com/extensions/proxy), fallback to
886  `DIRECT` is not permitted. Subsequent network requests will fail proxy
887  resolution and complete with `ERR_MANDATORY_PROXY_CONFIGURATION_FAILED`.
888* Otherwise proxy resolution will silently fall back to `DIRECT`.
889
890### Recovering from failed PAC fetches
891
892When fetching an explicitly configured PAC URL fails, the browser will try to
893re-fetch it:
894
895* In exactly 8 seconds
896* 32 seconds after that
897* 2 minutes after that
898* Every 4 hours thereafter
899
900This background polling of the PAC URL is only initiated in response to an
901incoming proxy resolution request, so it will not trigger work when the browser
902is otherwise idle.
903
904Similarly to successful fetches, the PAC URL will be also be re-fetched
905whenever the network changes, the proxy settings change, or it was manually
906invalidated via `chrome://net-internals#proxy`.
907
908### Text encoding
909
910Note that UTF-8 is *not* the default interpretation of PAC response bodies.
911
912The priority for encoding is determined in this order:
913
9141. The `charset` property of the HTTP response's `Content-Type`
9152. Any BOM at the start of response body
9163. Otherwise defaults to ISO-8859-1.
917
918When setting the `Content-Type`, servers should prefer using a mime type of
919`application/x-ns-proxy-autoconfig` or `application/x-javascript-config`.
920However in practice, Chrome does not enforce the mime type.
921