README.md
1# AppCache
2
3AppCache is the well-known shorthand for `Application Cache`, the key mechanism
4in the
5[Offline Web applications specification](https://html.spec.whatwg.org/multipage/offline.html#offline).
6
7*** promo
8AppCache is deprecated and slated for removal from the Web Platform. Chrome's
9implementation is in maintenance mode. We're only tacking critical bugs and code
10health improvements that allow us to reason about bugs easier. Long-term efforts
11should be focused on Service Workers.
12***
13
14
15## Overview
16
17AppCache is aimed at SPAs (single-page Web applications).
18
19The application's HTML page (Document) points to a **manifest** in an `<html
20manifest=...>` attribute. The manifest lists all the sub-resources (style
21sheets, scripts, images, etc.) that the page needs to work offline. When a
22user navigates to the HTML page for the first time, the browser caches all
23the resources in the manifest. Future navigations use the cached resources,
24so the application still works even if the network is down.
25
26The simplified model above misses two critical pieces, which are responsible for
27the bulk of AppCache's complexity. The sections below can be skimmed on a first
28reading.
29
30### Updates (Why AppCache is Hard, Part 1)
31
32The ease of deploying updates is a key strength of Web applications. Browsers
33automatically (barring misconfigured HTTP caching) load the latest version of
34an application's resources when a user navigates to one of the application's
35pages.
36
37AppCache aims for comparable ease by automatically updating its locally cached
38copy of the manifest and its resources whenever a page is visited. This comes
39with some significant caveats:
40
411. AppCache bails early in the update process if the manifest hasn't changed
42 (byte for byte). This behavior is intended to save network bandwidth.
43 The downside is that developers must change their manifest whenever any of
44 the sub-resources change.
452. The manifest does not have any versioning information in it. So, when a
46 manifest changes, the browser must reload all the resources referenced by
47 it.
483. The manifest is only checked for updates when a page is visited, to keep the
49 Web ephemeral. The update check is performed concurrently with page loading,
50 for performance reasons. If the manifest changed, all the resources used
51 by the page are served from the outdated cache. This is necessary, because by
52 the time the browser can detect a manifest update, the page has been
53 partially loaded using the (now known to be outdated) cached resources.
54 It's not reasonable to ask Web developers to support mixing resources from
55 different application versions.
564. While the browser is downloading a page's cache (the manifest and its
57 resources), the user could navigate a different tab to the same page. The
58 second tab uses the result of the ongoing cache download, rather than
59 updating the cache on its own. This removes many race conditions from the
60 cache update process, at the cost of having the browser coordinate between
61 all instances of a page that uses AppCache.
625. AppCache also supports application-driven updates. The support is aimed at
63 applications that may be left open in the same tab for a long time, like
64 e-mail and chat clients. This means browsers must support both
65 navigation-driven cache updates and application-driven updates.
66
67### Multi-Page Applications (Why AppCache is Hard, Part 2)
68
69AppCache supports multi-page applications by allowing multiple pages to share
70the same manifest, and therefore use the same cached resources.
71
72Manifest sharing is particularly complex when combined with implicit caching.
73An AppCache manifest is not required to list the HTML pages that refer to it
74via an `<html manifest>` attribute. (Listing the pages is however recommended.)
75This allowance introduces the following complexities:
76
771. When a browser encounters an HTML page that refers to a manifest it hasn't
78 seen before, the browser creates an implicit resource entry for the HTML
79 page. The HTML page is cached together with the other resources listed in
80 the manifest, so it can be available for offline browsing.
812. When a browser encounters an HTML page that refers to a manifest it has
82 already cached, the browser also creates an implicit resource entry for
83 the HTML page. The existing cache must be changed to include the new
84 implicit resource.
853. When a manifest changes, the browser must update all the implicit resources
86 (HTML pages that refer to the manifest) as well as the resources explicitly
87 mentioned in the manifest. If any of the HTML pages using the manifest are
88 opened, they must be notified that a manifest update is available.
894. When a browser encounters an HTML page that refers to a manifest whose
90 resources are still being downloaded, it needs to ensure that the page's
91 implicit resource eventually gets associated with the manifest. To avoid race
92 conditions, the browser must add the HTML page to a list of pages that need
93 updating. The manifest update logic must also process this list, after
94 downloading the resources already associated with the manifest.
95
96While the pages in multi-page applications can share a manifest, they are not
97required to do so. In other words, an application's pages can use different
98manifests. However, each manifest conceptually spawns its own resource cache,
99which is updated independently from other manifests' caches. So, different pages
100from the same application may use different versions of the same sub-resource,
101if they are associated with different manifests.
102
103A particularly complex case is loading an HTML page that is associated with a
104cached manifest, discovering that the manifest has changed and requires an
105update, updating the HTML page, and obtaining a new version of the HTML page
106that refers to a different manifest. In this case, loading a single page ends up
107downloading two manifests and all the resources associated with them.
108
109
110## Data Model
111
112AppCache uses the following terms:
113
114* A **manifest** is a list of URLs to resources. The listed resources should be
115 be sufficient for the page to be used while offline.
116* An **application cache** contains one version of a manifest and all the
117 resources associated with it. This includes the resources explicitly listed in
118 the manifest, and the implicitly cached HTML pages that refer to the manifest.
119 The HTTP responses are stored in a disk_cache (//net term), then all other
120 AppCache information is stored in a per-profile SQLite database that points
121 into the disk_cache. The disk_cache scope is per-profile.
122* A **response** represents the headers and body for a given server response.
123 This response is first served by a server and may then be stored and retrieved
124 in the disk_cache. The application cache in the SQLite database updates each
125 entry to track the associated response id in the disk_cache for that entry.
126* An **application cache group** is a collection of all the application caches
127 that have the same manifest.
128* A **cache host** is a name used to refer to a Document (HTML page) when the
129 emphasis is on the connection between the page, the manifest it references,
130 and the application cache / cache group associated with that manifest.
131
132### Application Cache
133
134An application cache has the following components:
135
1361. **Entries** that identify resources to be cached.
1372. **Namespaces** that direct the loading of sub-resource URLs for a page
138 associated with the cache.
1393. **Flags** that influence the cache's behavior.
140
141All of these components are stored in and retrieved from a SQLite database.
142
143Entries have the following types:
144
145* **manifest** - the AppCache manifest; the absolute URL of this entry is used
146 to identify the group that this application cache belongs to
147* **master** - documents (HTML pages) whose `<html manifest>` attribute points
148 to the cache's manifest; these are added to an application cache as they are
149 discovered during user navigations
150* **explicit** - listed in the manifest's explicit section (`CACHE:`)
151* **fallback** - listed in the manifest's fallback section (`FALLBACK:`)
152
153Explicit and fallback entries can also be marked as **foreign**. A foreign entry
154indicates a document whose `<html manifest>` attribute does not point to this
155cache's manifest.
156
157Each entry can refer to its response, which allows AppCache to know where to
158find a given entry's cached response data in its disk or memory cache.
159
160Namespaces are conceptually patterns that match resource URLs. AppCache supports
161the following namespaces:
162
163* **fallback** - URLs matching the namespace are first fetched from the network.
164 If the fetch fails, a cached fallback resource is used instead. Fallback
165 namespaces are listed in the `FALLBACK:` manifest section.
166* **online safelist** -- URLs matching the namespace are always fetched from the
167 network. Online safelist namespaces are listed in the `NETWORK:` manifest
168 section.
169
170*** promo
171Chrome's AppCache implementation also supports **intercept** namespaces, listed
172in the `CHROMIUM-INTERCEPT:` manifest section. URLs matching an intercept
173namespace are loaded as if the fetch request encountered an HTTP redirect.
174***
175
176The AppCache specification supports specifying namespaces as URL prefixes. Given
177a list of namespaces in an application cache, a resource URL matches the longest
178namespace that is a prefix of the URL.
179
180An application cache has the following flags:
181
182* **completeness** - the application cache is *complete* when all the resources
183 in the manifest have been fetched and cached, and *incomplete* otherwise
184* **online safelist wildcard** - *blocking* by default, which means that all
185 resources not listed in the manifest are considered unavailable; can be set
186 to *open* by adding an `*` entry in the `NETWORK:` manifest section, causing
187 all unlisted resources to be fetched from the network
188* **cache mode** - not supported by Chrome, which does not implement the
189 `SETTINGS:` manifest section
190
191### Historical
192
193Our AppCache implementation supported specifying namespaces as regular
194expressions that match URLs. This extension was invoked by adding the
195`isPattern` keyword after the namespace in the manifest. Support has
196mostly been removed, but a column remains in the database and tests
197validate correct parsing even when it is present.
198