README.md
1# Architecture (as of July 29th 2016)
2This document describes the browser-process implementation of the [Cache
3Storage specification](
4https://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.html).
5
6As of June 2018, Chrome components can use the Cache Storage interface via
7`CacheStorageManager` to store Request/Response key-value pairs. The concept of
8`CacheStorageOwner` was added to distinguish and isolate the different
9components.
10
11## Major Classes and Ownership
12### Ownership
13Where '=>' represents ownership, '->' is a reference, and '~>' is a weak
14reference.
15
16##### `CacheStorageContextImpl`->`CacheStorageManager`=>`CacheStorage`=>`CacheStorageCache`
17* A `CacheStorageManager` can own multiple `CacheStorage` objects.
18* A `CacheStorage` can own multiple `CacheStorageCache` objects.
19
20##### `StoragePartitionImpl`->`CacheStorageContextImpl`
21* `StoragePartitionImpl` effectively owns the `CacheStorageContextImpl` in the
22 sense that it calls `CacheStorageContextImpl::Shutdown()` on deletion which
23 resets its `CacheStorageManager`.
24
25##### `RenderProcessHost`->`CacheStorageDispatcherHost`->`CacheStorageContextImpl`
26
27##### `CacheStorageDispatcherHost`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
28* The `CacheStorageDispatcherHost` holds onto handles for:
29 * JavaScript references to cache objects
30
31##### `CacheStorageDispatcherHost`=>`CacheStorageHandle`~>`CacheStorage`
32* The `CacheStorageDispatcherHost` holds onto handles for:
33 * JavaScript references to caches
34
35##### `CacheStorageCacheDataHandle`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
36* `CacheStorageCacheDataHandle` is the blob data handle for a response body
37 and it holds a `CacheStorageCacheHandle`. It streams from the
38 `disk_cache::Entry` response stream. It's necessary that the
39 `disk_cache::Backend` (owned by `CacheStorageCache`) stays open so long as
40 one of its `disk_cache::Entry`s is reachable. Otherwise, a new backend might
41 open and clobber the entry.
42
43##### `CacheStorageCache`=>`CacheStorageCacheHandle`~>`CacheStorageCache`
44* The `CacheStorageCache` will hold a self-reference while executing an
45 operation. This self-reference is dropped between subsequent operations,
46 so shutdown is possible when there are no external references even if there
47 are more operations in the scheduler queue.
48
49### CacheStorageDispatcherHost
501. Receives IPC messages from a render process and creates the appropriate
51 `CacheStorageManager` or `CacheStorageCache` operation.
522. For each operation, holds a `CacheStorageCacheHandle` to keep the cache
53 alive since the operation is asynchronous.
543. For each cache reference held by the render process, holds a
55 `CacheStorageCacheHandle`.
564. For each CacheStorage reference held by the renderer process, holds a
57 `CacheStorageHandle`. This is used to inform the CacheStorage about
58 whether its externally used so it can keep warmed cache objects alive
59 to mitigate rapid opening/closing/opening churn.
60
61### CacheStorageManager
621. Forwards calls to the appropriate `CacheStorage` for a given origin-owner
63 pair, loading `CacheStorage`s on demand.
642. Handles `QuotaManager` and `BrowsingData` calls.
65
66### CacheStorage
671. Manages the caches for a single origin-owner pair.
682. Handles creation/deletion of caches and updates the index on disk
69 accordingly.
703. Manages operations that span multiple caches (e.g., `CacheStorage::Match`).
714. Backend-specific information is handled by `CacheStorage::CacheLoader`
72
73### CacheStorageCache
741. Creates or opens a net::disk_cache (either `SimpleCache` or `MemoryCache`)
75 on initialization.
762. Handles add/put/delete/match/keys calls.
773. Owned by `CacheStorage` and deleted either when `CacheStorage` deletes or
78 when the last `CacheStorageCacheHandle` for the cache is gone.
79
80### CacheStorageIndex
811. Manages an ordered collection of metadata
82 (CacheStorageIndex::CacheStorageMetadata) for each CacheStorageCache owned
83 by a given CacheStorage instance.
842. Is serialized by CacheStorage::CacheLoader (WriteIndex/LoadIndex) as a
85 Protobuf file.
86
87### CacheStorageCacheHandle
881. Holds a weak reference to a `CacheStorageCache`.
892. When the last `CacheStorageCacheHandle` to a `CacheStorageCache` is
90 deleted, so to is the `CacheStorageCache`.
913. The `CacheStorageCache` may be deleted before the `CacheStorageCacheHandle`
92 (on `CacheStorage` destruction), so it must be checked for validity before
93 use.
94
95### CacheStorageHandle
961. Holds a weak reference to a `CacheStorage`.
972. When the last `CacheStorageHandle` to a `CacheStorage` is
98 deleted, internal state is cleaned up. The `CacheStorage` object is not
99 deleted, however.
1003. The `CacheStorage` may be deleted before the `CacheStorageHandle`
101 (on browser shutdown), so it must be checked for validity before use.
102
103## Directory Structure
104$PROFILE/Service Worker/CacheStorage/`origin`/`cache`/
105
106Where `origin` is a hash of the origin and `cache` is a GUID generated at the
107cache's creation time.
108
109The reason a random directory is used for a cache is so that a cache can be
110doomed and still used by old references while another cache with the same name
111is created.
112
113### Directory Contents
114`CacheStorage` creates its own index file (index.txt), which contains a
115mapping of cache names to its path on disk. On `CacheStorage` initialization,
116directories not in the index are deleted.
117
118Each `CacheStorageCache` has a `disk_cache::Backend` backend, which writes in
119the `CacheStorageCache`'s directory.
120
121## Layout of the disk_cache::Backend
122A cache is represented by a `disk_cache::Backend`. The Request/Response pairs
123referred to in the specification are stored as `disk_cache::Entry`s. Each
124`disk_cache::Entry` has three streams: one for storing a protobuf with the
125request/response metadata (e.g., the headers, the request URL, and opacity
126information), another for storing the response body, and a final stream for
127storing any additional data (e.g., compiled JavaScript).
128
129The entries are keyed by full URL. This has a few ramifications:
130 1. Multiple vary responses for a single request URL are not supported.
131 2. Operations that may require scanning multiple URLs (e.g., `ignoreSearch`)
132 must scan every entry in the cache.
133
134*The above could be fixed by changes to the backend or by introducing indirect
135entries in the cache. The indirect entries would be for the query-stripped
136request URL. It would point to entries to each query request/response pair and
137for each vary request/response pair.*
138
139## Threads
140* CacheStorage classes live on the IO thread. Exceptions include:
141 * `CacheStorageContextImpl` which is created on UI but otherwise runs and is
142 deleted on IO.
143 * `CacheStorageDispatcherHost` which is created on UI but otherwise runs and
144 is deleted on IO.
145* Index file manipulation and directory creation/deletion occurs on a
146 `SequencedTaskRunner` assigned at `CacheStorageContextImpl` creation.
147* The `disk_cache::Backend` lives on the IO thread and uses its own worker
148 pool to implement async operations.
149
150## Asynchronous Idioms in CacheStorage and CacheStorageCache
1511. All async methods should asynchronously run their callbacks.
1522. The async methods often include several asynchronous steps. Each step
153 passes a continuation callback on to the next. The continuation includes
154 all of the necessary state for the operation.
1553. Callbacks are guaranteed to run so long as the object
156 (`CacheStorageCacheCache` or `CacheStorage`) is still alive. Once the
157 object is deleted, the callbacks are dropped. We don't worry about dropped
158 callbacks on shutdown. If deleting prior to shutdown, one should `Close()`
159 a `CacheStorage` or `CacheStorageCache` to ensure that all operations have
160 completed before deleting it.
161
162### Scheduling Operations
163Operations are scheduled in a sequential scheduler (`CacheStorageScheduler`).
164Each `CacheStorage` and `CacheStorageCache` has its own scheduler. If an
165operation freezes, then the scheduler is frozen. If a `CacheStorage` call winds
166up calling something from every `CacheStorageCache` (e.g.,
167`CacheStorage::Match`), then one frozen `CacheStorageCache` can freeze the
168`CacheStorage` as well. This has happened in the past (`Cache::Put` called
169`QuotaManager` to determine how much room was available, which in turn called
170`Cache::Size`). Be careful to avoid situations in which one operation triggers
171a dependency on another operation from the same scheduler.
172
173At the end of an operation, the scheduler needs to be kicked to start the next
174operation. The idiom for this in CacheStorage/ is to wrap the operation's
175callback with a function that will run the callback as well as advance the
176scheduler. So long as the operation runs its wrapped callback the scheduler
177will advance.
178
179## Opaque Resource Size Obfuscation
180Applications can cache cross-origin resources as per
181[Cross-Origin Resources and CORS](https://www.w3.org/TR/service-workers-1/#cross-origin-resources).
182Opaque responses are also cached, but in order to prevent "leaking" the size
183of opaque responses their sizes are obfuscated. Random padding is added to the
184actual size making it difficult for an attacker to ascertain the actual resource
185size via quota APIs.
186
187When Chromium starts, a new random padding key is generated and used
188for all new caches created. This key is used by each cache to calculate padding
189for opaque resources. Each cache's key is persisted to disk in the cache index file
190
191Each cache maintains the total padding for all opaque resources within the
192cache. This padding is added to the actual resource size when reporting sizes
193to the quota manager.
194
195The padding algorithm version is also written to each cache allowing for it
196to be changed at a future date. CacheStorage will use the persisted key and
197padding from the cache's index unless the padding algorithm has been changed,
198one of values is missing, or deemed to be incorrect. In this situation the cache
199is enumerated and the padding recalculated during open.
200