README.md
1# Transport Security State Generator
2
3This directory contains the code for the transport security state generator, a
4tool that generates a C++ file based on preload data in
5[transport_security_state_static.json](/net/http/transport_security_state_static.json).
6This JSON file contains the domain security policy configurations for all
7preloaded domains.
8
9[TOC]
10
11## Domain Security Policies
12
13Website owners can set a number of security policies for their domains, usually
14by sending configuration in a HTTP header. Chromium supports preloading for some
15of these security policies so that users benefit from these policies regardless
16of their browsing history. Website owners can request preloading for their
17domains. Chromium supports preloading for the following domain security
18policies:
19
20* [HTTP Strict Transport Security (HSTS)](https://tools.ietf.org/html/rfc6797)
21* [Public Key Pinning Extension for HTTP](https://tools.ietf.org/html/rfc7469)
22* [Expect-CT Extension for HTTP](http://httpwg.org/http-extensions/expect-ct.html)
23
24Chromium and most other browsers ship the preloaded configurations inside their
25binary. Chromium uses a custom data structure for this.
26
27### I want to preload a website
28
29Please follow the instructions at [hstspreload.org](https://hstspreload.org/).
30
31## I want to use the preload list for another project
32
33Please contact [the list maintainers](https://hstspreload.org/#contact) before
34you do.
35
36## The Preload Generator
37
38The transport security state generator is executed during the build process (it
39may execute multiple times depending on the targets you're building) and
40generates data structures that are compiled into the binary. You can find the
41generated output in
42`[build-folder]/gen/net/http/transport_security_state_static*.h`.
43
44### Usage
45
46Make sure you have build the `transport_security_state_generator` target.
47
48`transport_security_state_generator <json-file> <pins-file> <template-file> <output-file> [--v=1]`
49
50* **json-file**: JSON file containing all preload configurations (e.g.
51 `net/http/transport_security_state_static.json`)
52* **pins-file**: file containing the public key information for the pinsets
53 referenced from **json-file** (e.g.
54 `net/http/transport_security_state_static.pins`)
55* **template-file**: contains the global structure of the header file with
56 placeholder for the generated data (e.g.
57 `net/http/transport_security_state_static.template`)
58* **output-file**: file to write the output to
59* **--v**: verbosity level
60
61## The Preload Format
62
63The preload data is stored in the Chromium binary as a trie encoded in a byte
64array (`net::TransportSecurityStateSource::preloaded_data`). The hostnames are
65stored in their canonicalized form and compressed using a Huffman coding. The
66generic decoder for preloaded Huffman encoded trie data is `PreloadDecoder` and
67lives in `net/extras/preload_data/decoder.cc`. The HSTS specific implementation
68is `DecodeHSTSPreload` and lives in `net/http/transport_security_state.cc`.
69
70### Huffman Coding
71
72A Huffman coding is calculated for all characters used in the trie (characters
73in hostnames and the `end of table` and `terminal` values). The Huffman tree
74can be rebuild from the `net::TransportSecurityStateSource::huffman_tree`
75array.
76
77The (internal) nodes of the tree are encoded as pairs of uint8s. The last node
78in the array is the root of the tree. Each node is two uint8_t values, the first
79is "left" and the second is "right". If a uint8_t value has the MSB set it is a
80leaf value and the 7 least significant bits represent a ASCII character (from
81the range 0-127, the tree does not support extended ASCII). If the MSB is not
82set it is a pointer to the n'th node in the array.
83
84For example, the following uint8_t array
85
86`0xE1, 0xE2, 0xE3, 0x0, 0xE4, 0xE5, 0x1, 0x2`
87
88represents 9 elements:
89
90* the implicit root node (node 3)
91* 3 internal nodes: 0x0 (node 0), 0x1 (node 1), and 0x2 (node 2)
92* 5 leaf values: 0xE1, 0xE2, 0xE3, 0xE4, and 0xE5 (which all have the most
93significant bit set)
94
95When decoded this results in the following Huffman tree:
96
97```
98 root (node 3)
99 / \
100 node 1 node 2
101 / \ / \
1020xE3 (c) node 0 0xE4 (d) 0xE5 (e)
103 / \
104 0xE1 (a) 0xE2 (b)
105```
106
107
108### The Trie Encoding
109
110The byte array containing the trie is made up of a set of nodes represented by
111dispatch tables. Each dispatch table contains a (possibly empty) shared prefix,
112a value, and zero or more pointers to child dispatch tables. The node value
113is an encoded entry and the associated hostname can be found by going up the
114trie.
115
116The trie contains the hostnames in reverse and the hostnames are terminated by a
117`terminal value`.
118
119The dispatch table for the root node starts at bit position
120`net::TransportSecurityStateSource::root_position`.
121
122The binary format for the trie is defined by the following
123[ABNF](https://tools.ietf.org/html/rfc5234).
124
125```abnf
126trie = 1*dispatch-table
127
128dispatch-table = prefix-part ; a common prefix for the node and its children
129 1*value-part ; 1 or more values or pointers to children
130 end-of-table-value ; signals the end of the table
131
132prefix-part = prefix-length ; a prefix code encoding of the number
133of characters in the prefix
134 prefix-characters ; the actual prefix characters
135prefix-length = 1*BIT ; See net::extras::PreloadDecoder::DecodeSize for the format
136value-part = huffman-character node-value
137 ; table with the node value and pointers to children
138
139node-value = node-entry ; preload entry for the hostname at this node
140 / node-pointer ; a bit offset pointing to another dispatch
141 ; table
142
143node-entry = preloaded-entry ; encoded preload configuration for one
144 ; hostname (see section below)
145node-pointer = long-bit-offset
146 / short-bit-offset
147
148long-bit-offset = %b1 ; 1 bit indicates long form will follow
149 4BIT ; 4 bit number indicating bit length of the offset
150 8*22BIT ; offset encoded as an n bit number (see above)
151 ; where n is the offset length (see above) + 8
152short-bit-offset = %b0 ; 0 bit indicates short form will follow
153 7BIT ; offset as a 7 bit number
154
155terminal-value = huffman-character ; ASCII value 0x00 encoded using Huffman
156end-of-table-value = huffman-character ; ASCII value 0x7F encoded using Huffman
157
158prefix-characters = *huffman-character
159huffman-character = 1*BIT
160```
161
162### The Preloaded Entry Encoding
163
164The entries are encoded using a variable length encoding. Each entry is made up
165of 4 parts, one for each supported policy. The length of these parts depends
166on the actual configuration, some field will be omitted in some cases.
167
168The binary format for an entry is defined by the following ABNF.
169
170```abnf
171preloaded-entry = BIT ; simple entry flag
172 [hsts-part hpkp-part expect-ct-part]
173 ; policy specific parts are only
174 ; present when the simple entry flag
175 ; is set to 0 and omitted otherwise
176
177hsts-part = include-subdomains ; HSTS includeSubdomains flag
178 BIT ; whether to force HTTPS
179
180hpkp-part = BIT ; whether to enable pinning
181 [pinset-id] ; only present when pinning is enabled
182 [include-subdomains] ; HPKP includeSubdomains flag, only
183 ; present when pinning is enabled and
184 ; HSTS includeSubdomains is not used
185
186hpkp-pinset-id = array-index
187
188expect-ct-part = BIT ; whether to enable Expect-CT
189 [report-uri-id] ; only present when Expect-CT is enabled
190
191report-uri-id = array-index
192include-subdomains = BIT
193array-index = 4BIT ; a 4 bit number
194```
195
196The **array-index** values are indices in the associated arrays:
197
198* `net::TransportSecurityStateSource::pinsets` for **pinset-id**
199* `net::TransportSecurityStateSource::expect_ct_report_uris` for Expect-CT's
200**report-uri-id**
201
202#### Simple entries
203
204The majority of entries on the preload list are submitted through
205[hstspreload.org](https://hstspreload.org) and share the same policy
206configuration (HSTS + includeSubdomains only). To safe space, these entries
207(called **simple entries**) use a shorter encoding where the first bit (simple
208entry flag) is set to 1 and the rest of the configuration is omitted.
209
210### Tests
211
212The generator code has its own unittests in the
213`net/tools/transport_security_state_generator` folder.
214
215The encoder and decoder for the preload format life in different places and are
216tested by end-to-end tests (`TransportSecurityStateTest.DecodePreload*`) in
217`net/http/transport_security_state_unittest.cc`. The tests use their own
218preload lists, the data structures for these lists are generated in the same way
219as for the official Chromium list.
220
221All these tests are part of the `net_unittests` target.
222
223#### Writing tests that depend on static transport security state
224
225Tests in `net_unittests` (except for `TransportSecurityStateStaticTest`) should
226not depend on the real preload list. If you are writing tests that require a
227static transport security state use
228`transport_security_state_static_unittest_default.json` instead. Tests can
229override the active preload list by calling
230`SetTransportSecurityStateSourceForTesting`.
231
232## See also
233
234* <https://hstspreload.org/>
235* <https://www.chromium.org/hsts>
236