1# OCI Content Descriptors
2
3* An OCI image consists of several different components, arranged in a [Merkle Directed Acyclic Graph (DAG)](https://en.wikipedia.org/wiki/Merkle_tree).
4* References between components in the graph are expressed through _Content Descriptors_.
5* A Content Descriptor (or simply _Descriptor_) describes the disposition of the targeted content.
6* A Content Descriptor includes the type of the content, a content identifier (_digest_), and the byte-size of the raw content.
7* Descriptors SHOULD be embedded in other formats to securely reference external content.
8* Other formats SHOULD use descriptors to securely reference external content.
9
10This section defines the `application/vnd.oci.descriptor.v1+json` [media type](media-types.md).
11
12## Properties
13
14A descriptor consists of a set of properties encapsulated in key-value fields.
15
16The following fields contain the primary properties that constitute a Descriptor:
17
18- **`mediaType`** *string*
19
20  This REQUIRED property contains the media type of the referenced content.
21  Values MUST comply with [RFC 6838][rfc6838], including the [naming requirements in its section 4.2][rfc6838-s4.2].
22
23  The OCI image specification defines [several of its own MIME types](media-types.md) for resources defined in the specification.
24
25- **`digest`** *string*
26
27  This REQUIRED property is the _digest_ of the targeted content, conforming to the requirements outlined in [Digests](#digests).
28  Retrieved content SHOULD be verified against this digest when consumed via untrusted sources.
29
30- **`size`** *int64*
31
32  This REQUIRED property specifies the size, in bytes, of the raw content.
33  This property exists so that a client will have an expected size for the content before processing.
34  If the length of the retrieved content does not match the specified length, the content SHOULD NOT be trusted.
35
36- **`urls`** *array of strings*
37
38  This OPTIONAL property specifies a list of URIs from which this object MAY be downloaded.
39  Each entry MUST conform to [RFC 3986][rfc3986].
40  Entries SHOULD use the `http` and `https` schemes, as defined in [RFC 7230][rfc7230-s2.7].
41
42- **`annotations`** *string-string map*
43
44    This OPTIONAL property contains arbitrary metadata for this descriptor.
45    This OPTIONAL property MUST use the [annotation rules](annotations.md#rules).
46
47Descriptors pointing to [`application/vnd.oci.image.manifest.v1+json`](manifest.md) SHOULD include the extended field `platform`, see [Image Index Property Descriptions](image-index.md#image-index-property-descriptions) for details.
48
49### Reserved
50
51The following field keys are reserved and MUST NOT be used by other specifications.
52
53- **`data`** *string*
54
55  This key is RESERVED for future versions of the specification.
56
57All other fields may be included in other OCI specifications.
58Extended _Descriptor_ field additions proposed in other OCI specifications SHOULD first be considered for addition into this specification.
59
60## Digests
61
62The _digest_ property of a Descriptor acts as a content identifier, enabling [content addressability](http://en.wikipedia.org/wiki/Content-addressable_storage).
63It uniquely identifies content by taking a [collision-resistant hash](https://en.wikipedia.org/wiki/Cryptographic_hash_function) of the bytes.
64If the _digest_ can be communicated in a secure manner, one can verify content from an insecure source by recalculating the digest independently, ensuring the content has not been modified.
65
66The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
67The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.
68
69A digest string MUST match the following [grammar](considerations.md#ebnf):
70
71```
72digest                ::= algorithm ":" encoded
73algorithm             ::= algorithm-component (algorithm-separator algorithm-component)*
74algorithm-component   ::= [a-z0-9]+
75algorithm-separator   ::= [+._-]
76encoded               ::= [a-zA-Z0-9=_-]+
77```
78
79Note that _algorithm_ MAY impose algorithm-specific restriction on the grammar of the _encoded_ portion.
80See also [Registered Algorithms](#registered-algorithms).
81
82Some example digest strings include the following:
83
84digest                                                                    | algorithm           | Registered |
85--------------------------------------------------------------------------|---------------------|------------|
86`sha256:6c3c624b58dbbcd3c0dd82b4c53f04194d1247c6eebdaab7c610cf7d66709b3b` | [SHA-256](#sha-256) | Yes        |
87`sha512:401b09eab3c013d4ca54922bb802bec8fd5318192b0a75f201d8b372742...`   | [SHA-512](#sha-512) | Yes        |
88`multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`         | Multihash           | No         |
89`sha256+b64u:LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564`                 | SHA-256 with urlsafe base64 | No |
90
91Please see [Registered Algorithms](#registered-algorithms) for a list of registered algorithms.
92
93Implementations SHOULD allow digests with unrecognized algorithms to pass validation if they comply with the above grammar.
94While `sha256` will only use hex encoded digests, separators in _algorithm_ and alphanumerics in _encoded_ are included to allow for extensions.
95As an example, we can parameterize the encoding and algorithm as `multihash+base58:QmRZxt2b1FVZPNqd8hsiykDL3TdBDeTSPX9Kv46HmX4Gx8`, which would be considered valid but unregistered by this specification.
96
97### Verification
98
99Before consuming content targeted by a descriptor from untrusted sources, the byte content SHOULD be verified against the digest string.
100Before calculating the digest, the size of the content SHOULD be verified to reduce hash collision space.
101Heavy processing before calculating a hash SHOULD be avoided.
102Implementations MAY employ [canonicalization](canonicalization.md#canonicalization) of the underlying content to ensure stable content identifiers.
103
104### Digest calculations
105
106A _digest_ is calculated by the following pseudo-code, where `H` is the selected hash algorithm, identified by string `<alg>`:
107```
108let ID(C) = Descriptor.digest
109let C = <bytes>
110let D = '<alg>:' + Encode(H(C))
111let verified = ID(C) == D
112```
113Above, we define the content identifier as `ID(C)`, extracted from the `Descriptor.digest` field.
114Content `C` is a string of bytes.
115Function `H` returns the hash of `C` in bytes and is passed to function `Encode` and prefixed with the algorithm to obtain the digest.
116The result `verified` is true if `ID(C)` is equal to `D`, confirming that `C` is the content identified by `D`.
117After verification, the following is true:
118
119```
120D == ID(C) == '<alg>:' + Encode(H(C))
121```
122
123The _digest_ is confirmed as the content identifier by independently calculating the _digest_.
124
125### Registered algorithms
126
127While the _algorithm_ component of the digest string allows the use of a variety of cryptographic algorithms, compliant implementations SHOULD use [SHA-256](#sha-256).
128
129The following algorithm identifiers are currently defined by this specification:
130
131| algorithm identifier | algorithm           |
132|----------------------|---------------------|
133| `sha256`             | [SHA-256](#sha-256) |
134| `sha512`             | [SHA-512](#sha-512) |
135
136If a useful algorithm is not included in the above table, it SHOULD be submitted to this specification for registration.
137
138#### SHA-256
139
140[SHA-256][rfc4634-s4.1] is a collision-resistant hash function, chosen for ubiquity, reasonable size and secure characteristics.
141Implementations MUST implement SHA-256 digest verification for use in descriptors.
142
143When the _algorithm identifier_ is `sha256`, the _encoded_ portion MUST match `/[a-f0-9]{64}/`.
144Note that `[A-F]` MUST NOT be used here.
145
146#### SHA-512
147
148[SHA-512][rfc4634-s4.2] is a collision-resistant hash function which [may be more perfomant][sha256-vs-sha512] than [SHA-256](#sha-256) on some CPUs.
149Implementations MAY implement SHA-512 digest verification for use in descriptors.
150
151When the _algorithm identifier_ is `sha512`, the _encoded_ portion MUST match `/[a-f0-9]{128}/`.
152Note that `[A-F]` MUST NOT be used here.
153
154## Examples
155
156The following example describes a [_Manifest_](manifest.md#image-manifest) with a content identifier of "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270" and a size of 7682 bytes:
157
158```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson
159{
160  "mediaType": "application/vnd.oci.image.manifest.v1+json",
161  "size": 7682,
162  "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270"
163}
164```
165
166In the following example, the descriptor indicates that the referenced manifest is retrievable from a particular URL:
167
168```json,title=Content%20Descriptor&mediatype=application/vnd.oci.descriptor.v1%2Bjson
169{
170  "mediaType": "application/vnd.oci.image.manifest.v1+json",
171  "size": 7682,
172  "digest": "sha256:5b0bcabd1ed22e9fb1310cf6c2dec7cdef19f0ad69efa1f392e94a4333501270",
173  "urls": [
174    "https://example.com/example-manifest"
175  ]
176}
177```
178
179[rfc3986]: https://tools.ietf.org/html/rfc3986
180[rfc4634-s4.1]: https://tools.ietf.org/html/rfc4634#section-4.1
181[rfc4634-s4.2]: https://tools.ietf.org/html/rfc4634#section-4.2
182[rfc6838]: https://tools.ietf.org/html/rfc6838
183[rfc6838-s4.2]: https://tools.ietf.org/html/rfc6838#section-4.2
184[rfc7230-s2.7]: https://tools.ietf.org/html/rfc7230#section-2.7
185[sha256-vs-sha512]: https://groups.google.com/a/opencontainers.org/forum/#!topic/dev/hsMw7cAwrZE
186