1
2
3# MMTF Specification
4
5*Version*: v1.0
6
7The **m**acro**m**olecular **t**ransmission **f**ormat (MMTF) is a binary encoding of biological structures. It includes the coordinates, the topology and associated data. Specifically, a large subset of the data in mmCIF or PDB files can be represented. Pronounced goals are a reduced file size for efficient transmission over the Internet or from hard disk to memory and fast decoding/parsing speed. Additionally, the format aims to be easily understood and implemented to facilitate its wide dissemination. For testing encoder and decoder implementations a [test suite](test-suite/) is available.
8
9
10## Table of contents
11
12* [Overview](#overview)
13* [Container](#container)
14* [Types](#types)
15* [Codecs](#codecs)
16    * [Header](#header)
17    * [Strategies](#strategies)
18* [Encodings](#encodings)
19* [Fields](#fields)
20    * [Format data](#format-data)
21    * [Structure data](#structure-data)
22    * [Model data](#model-data)
23    * [Chain data](#chain-data)
24    * [Group data](#group-data)
25    * [Atom data](#atom-data)
26* [Traversal](#traversal)
27
28
29## Overview
30
31This specification describes a set of required and optional [fields](#fields) representing molecular structures and associated data. The fields are limited to six primitive [types](#types) for efficient serialization and deserialization using the binary [MessagePack](http://msgpack.org/) format. The [fields](#fields) in MMTF are stored in a binary [container](#container) format. The top-level of the container contains the field names as keys and field data as values. To describe the layout of data in MMTF we use the [JSON](http://www.json.org/) notation throughout this document.
32
33The first step of decoding MMTF is decoding the MessagePack-encoded container. Many of the resulting MMTF fields do not need to be decoded any further. However, to allow for custom compression some fields are given as binary data and must be decoded using the [strategies](#encodings) described below. For maximal size savings the binary MMTF data can be compressed using general purpose algorithms like [gzip](https://www.gnu.org/software/gzip/) or [brotli](https://github.com/google/brotli).
34
35The fields in the MMTF format group data of the same type together to create a flat data-structure, for instance, the coordinates of all atoms are stored together, instead of in atom objects with other atom-related data. This avoids imposing a deeply-nested hierarchical structure on consuming programs, while still allowing efficient [traversal](traversal) of models, chains, groups, and atoms.
36
37
38## Container
39
40In principle any serialization format that supports the [types](#types) described below can be used to store the above [fields](#fields). MMTF files (specifically files with the `.mmtf` extension) use the binary [MessagePack](http://msgpack.org/) serialization format.
41
42
43### MessagePack
44
45The MessagePack format (version 5) is used as the binary container format of MMTF. The MessagePack [specification](https://github.com/msgpack/msgpack/blob/master/spec.md) describes the data types and the data layout. Encoding and decoding libraries for MessagePack are available in many languages, see the MessagePack [website](http://msgpack.org/).
46
47
48### JSON
49
50The test suite will additionally provide files representing the MMTF [fields](#fields) as [JSON](http://www.json.org/) to help validating implementations of this specification.
51
52
53## Types
54
55The following types are used for the fields in this specification.
56
57* `String` An UTF-8 encoded string.
58* `Float` A 32-bit floating-point number.
59* `Integer` A 32-bit signed integer.
60* `Map` A data structure of key-value pairs where each key is unique. Also known as "dictionary", "hash".
61* `Array` A sequence of elements that have the same type.
62* `Binary` An array of unsigned 8-bit integer numbers representing binary data.
63
64The `Binary` type is used here to store encoded data as described in the [Codecs](#codecs) section. When the encoded data is to be interpreted as a multi-byte type (e.g. 32-bit integers) it must be represented in big-endian format.
65
66Note that the MessagePack format limits the `String`, `Map`, `Array` and `Binary` type to (2^32)-1 entries per instance.
67
68
69## Codecs
70
71This section describes the binary layout of the header and the encoded data as well as the available en/decoding strategies.
72
73
74### Header
75
76* Bytes  0 to  3: 32-bit signed integer specifying the codec type
77* Bytes  4 to  7: 32-bit signed integer specifying the length of the resulting array
78* Bytes  8 to 11: 4 bytes containing codec-specific parameter data
79* Bytes 12 to  N: bytes containing the encoded array data
80
81
82### Strategies
83
84#### Pass-through: 32-bit floating-point number array
85
86*Type* 1
87
88*Signature* `byte[] -> float32[]`
89
90*Description* Interpret bytes as array of 32-bit floating-point numbers.
91
92
93#### Pass-through: 8-bit signed integer array
94
95*Type* 2
96
97*Signature* `byte[] -> int8[]`
98
99*Description* Interpret bytes as array of 8-bit signed integers.
100
101
102#### Pass-through: 16-bit signed integer array
103
104*Type* 3
105
106*Signature* `byte[] -> int16[]`
107
108*Description* Interpret bytes as array of 16-bit signed integers.
109
110
111#### Pass-through: 32-bit signed integer array
112
113*Type* 4
114
115*Signature* `byte[] -> int32[]`
116
117*Description* Interpret bytes as array of 32-bit signed integers.
118
119
120#### UTF8/ASCII fixed-length string array
121
122*Type* 5
123
124*Parameter* `byte[4] -> int32` denoting the string length
125
126*Signature* `byte[] -> uint8[] -> string<length>[]`
127
128*Description* Interpret bytes as array of 8-bit unsigned integers, then iteratively consume `length` many bytes to form a string array.
129
130
131#### Run-length encoded character array
132
133*Type* 6
134
135*Signature* `byte[] -> int32[] -> char[]`
136
137*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of characters.
138
139
140#### Run-length encoded 32-bit signed integer array
141
142*Type* 7
143
144*Signature* `byte[] -> int32[] -> int32[]`
145
146*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers.
147
148
149#### Delta & run-length encoded 32-bit signed integer array
150
151*Type* 8
152
153*Signature* `byte[] -> int32[] -> int32[] -> int32[]`
154
155*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers, then delta decode into array of 32-bit signed integers.
156
157
158#### Integer & run-length encoded 32-bit floating-point number array
159
160*Type* 9
161
162*Parameter* `byte[4] -> int32` denoting the divisor
163
164*Signature* `byte[] -> int32[] -> int32[] -> float32[]`
165
166*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter.
167
168
169#### Integer & delta encoded & two-byte-packed 32-bit floating-point number array
170
171*Type* 10
172
173*Parameter* `byte[4] -> int32` denoting the divisor
174
175*Signature* `byte[] -> int16[] -> int32[] -> int32[] -> float32[]`
176
177*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit integers, then delta decode into array of 32-bit integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter.
178
179
180#### Integer encoded 32-bit floating-point number array
181
182*Type* 11
183
184*Parameter* `byte[4] -> int32` denoting the divisor
185
186*Signature* `byte[] -> int16[] -> float32[]`
187
188*Description* Interpret bytes as array of 16-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter.
189
190
191#### Integer & two-byte-packed 32-bit floating-point number array
192
193*Type* 12
194
195*Parameter* `byte[4] -> int32` denoting the divisor
196
197*Signature* `byte[] -> int16[] -> int32[] -> float32[]`
198
199*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter.
200
201*Note* Useful for arrays where a small amount of values may be slightly larger than two bytes. However, note that with many values larger than that the packing becomes inefficient.
202
203
204#### Integer & one-byte-packed 32-bit floating-point number array
205
206*Type* 13
207
208*Parameter* `byte[4] -> int32` denoting the divisor
209
210*Signature* `byte[] -> int8[] -> int32[] -> float32[]`
211
212*Description* Interpret array of bytes as array of 8-bit signed integers, then unpack into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter.
213
214*Note* Useful for arrays where a small amount of values may be slightly larger than one bytes. However, note that with many values larger than that the packing becomes inefficient.
215
216
217#### Two-byte-packed 32-bit signed integer array
218
219*Type* 14
220
221*Signature* `byte[] -> int16[] -> int32[]`
222
223*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit signed integers.
224
225*Note* Useful for arrays where a small amount of values may be slightly larger than two bytes. However, note that with many values larger than that the packing becomes inefficient.
226
227
228#### One-byte-packed 32-bit signed integer array
229
230*Type* 15
231
232*Signature* `byte[] -> int8[] -> int32[]`
233
234*Description* Interpret bytes as array of 8-bit signed integers, then unpack into array of 32-bit signed integers.
235
236*Note* Useful for arrays where a small amount of values may be slightly larger than one bytes. However, note that with many values larger than that the packing becomes inefficient.
237
238
239## Encodings
240
241The following general encoding strategies are used to compress the data contained in MMTF files.
242
243
244### Run-length encoding
245
246Run-length encoding can generally be used to compress arrays that contain stretches of equal values. Instead of storing each value itself, stretches of equal values are represented by the value itself and the occurrence count, that is a value/count pair.
247
248*Example*:
249
250Starting with the encoded array of value/count pairs. In the following example there are three pairs `1, 10`, `2, 1` and `1, 4`. The first entry in a pair is the value to be repeated and the second entry denotes how often the value must be repeated.
251
252```JSON
253[ 1, 10, 2, 1, 1, 4 ]
254```
255
256Applying run-length decoding by repeating, for each pair, the value as often as denoted by the count entry.
257
258```JSON
259[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ]
260```
261
262
263### Delta encoding
264
265Delta encoding is used to store an array of numbers. Instead of storing the numbers themselves, the differences (deltas) between the numbers are stored. When the values of the deltas are smaller than the numbers themselves they can be more efficiently packed to require less space.
266
267Note that arrays in which the values change by an identical amount for a range of consecutive values lend themselves to subsequent run-length encoding.
268
269*Example*:
270
271Starting with the encoded array of delta values:
272
273```JSON
274[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ]
275```
276
277Applying delta decoding. The first entry in the array is left as is, the second is calculated as the sum of the first and the second (not decoded) value, the third as the sum of the second (decoded) and third (not decoded) value and so forth.
278
279```JSON
280[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16 ]
281```
282
283
284### Packing/Recursive indexing encoding
285
286Packing/Recursive indexing encodes values such that the encoded values lie within the open interval (MIN, MAX). This allows to create a more compact representation of a 32-bit signed integer array when the majority of values in the array fit into 16-bit (or 8-bit). To encode each value in the input array the method stores the value itself if it lies within the open interval (MIN, MAX), otherwise the MAX (or MIN if the number is negative) interval endpoint is stored and subtracted from the input value. This process of storing and subtracting is repeated recursively until the remainder lies within the interval.
287
288*Example*:
289
290Starting with the array of 8-bit integer values, so the open interval is (127, -128):
291
292```JSON
293[ 127, 41, 34, 1, 0, -50, -128, 0, 7, 127, 0, 127, 127, 14 ]
294```
295
296Unpacking/Applying recursive indexing decoding. Values that lie within the interval are copied over to the output array. Values that are equal to an interval endpoint are added to the subsequent value while the subsequent value is equal to an interval endpoint, e.g. the sequence `127, 127, 14` becomes `268`:
297
298```JSON
299[ 168, 34, 1, 0, -50, -128, 7, 127, 268 ]
300```
301
302
303### Integer encoding
304
305In integer encoding, floating point numbers are converted to integer values by multiplying with a factor and discard everything after the decimal point. Depending on the multiplication factor this can change the precision but with a sufficiently large factor it is lossless. The integer values can then often be compressed with delta encoding which is the main motivation for it.
306
307*Example*:
308
309Starting with the array of integer values:
310
311```JSON
312[ 100, 100, 100, 100, 50, 50 ]
313```
314
315Applying integer decoding with a divisor of `100`:
316
317```JSON
318[ 1.00, 1.00, 1.00, 1.00, 0.50, 0.50 ]
319```
320
321
322### Dictionary encoding
323
324For dictionary encoding an `Array` is created to store values. Indices as references to the values can then be used instead of repeating the values over and over again. Arrays of indices can afterwards be compressed with delta and run-length encoding.
325
326*Example*:
327
328First create a `Array` to hold values that are referable by indices. In the following example the are two indices, `0` and `1` with some values associated.
329
330```JSON
331[
332    {
333        "groupName": "ASP",
334        "singleLetterCode": "D",
335        "chemCompType": "L-PEPTIDE LINKING",
336        "atomNameList": [ "N", "CA", "C", "O", "CB", "CG", "OD1", "OD2" ],
337        "elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ],
338        "formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ],
339        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ],
340        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ]
341    },
342    {
343        "groupName": "SER",
344        "singleLetterCode": "S",
345        "chemCompType": "L-PEPTIDE LINKING",
346        "atomNameList": [ "N", "CA", "C", "O", "CB", "OG" ],
347        "elementList": [ "N", "C", "C", "O", "C", "O" ],
348        "formalChargeList": [ 0, 0, 0, 0, 0, 0 ],
349        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ],
350        "bondOrderList": [ 1, 1, 2, 1, 1 ]
351    }
352]
353```
354
355The indices can then be used to reference the values as often as needed:
356
357```JSON
358[ 0, 1, 1, 0, 1 ]
359```
360
361
362## Fields
363
364The following table lists all top level fields, including their [type](#types) and whether they are required or optional. The top-level fields themselves are stores as a `Map`.
365
366| Name                                        | Type                | Required |
367|---------------------------------------------|---------------------|:--------:|
368| [mmtfVersion](#mmtfversion)                 | [String](#types)    |    Y     |
369| [mmtfProducer](#mmtfproducer)               | [String](#types)    |    Y     |
370| [unitCell](#unitcell)                       | [Array](#types)     |          |
371| [spaceGroup](#spacegroup)                   | [String](#types)    |          |
372| [structureId](#structureid)                 | [String](#types)    |          |
373| [title](#title)                             | [String](#types)    |          |
374| [depositionDate](#depositiondate)           | [String](#types)    |          |
375| [releaseDate](#releasedate)                 | [String](#types)    |          |
376| [ncsOperatorList](#ncsoperatorlist)         | [Array](#types)     |          |
377| [bioAssemblyList](#bioassemblylist)         | [Array](#types)     |          |
378| [entityList](#entitylist)                   | [Array](#types)     |          |
379| [experimentalMethods](#experimentalmethods) | [Array](#types)     |          |
380| [resolution](#resolution)                   | [Float](#types)     |          |
381| [rFree](#rfree)                             | [Float](#types)     |          |
382| [rWork](#rwork)                             | [Float](#types)     |          |
383| [numBonds](#numbonds)                       | [Integer](#types)   |    Y     |
384| [numAtoms](#numatoms)                       | [Integer](#types)   |    Y     |
385| [numGroups](#numgroups)                     | [Integer](#types)   |    Y     |
386| [numChains](#numchains)                     | [Integer](#types)   |    Y     |
387| [numModels](#nummodels)                     | [Integer](#types)   |    Y     |
388| [groupList](#grouplist)                     | [Array](#types)     |    Y     |
389| [bondAtomList](#bondatomlist)               | [Binary](#types)    |          |
390| [bondOrderList](#bondorderlist)             | [Binary](#types)    |          |
391| [xCoordList](#xcoordlist)                   | [Binary](#types)    |    Y     |
392| [yCoordList](#ycoordlist)                   | [Binary](#types)    |    Y     |
393| [zCoordList](#zcoordlist)                   | [Binary](#types)    |    Y     |
394| [bFactorList](#bfactorlist)                 | [Binary](#types)    |          |
395| [atomIdList](#atomidlist)                   | [Binary](#types)    |          |
396| [altLocList](#altloclist)                   | [Binary](#types)    |          |
397| [occupancyList](#occupancylist)             | [Binary](#types)    |          |
398| [groupIdList](#groupidlist)                 | [Binary](#types)    |    Y     |
399| [groupTypeList](#grouptypelist)             | [Binary](#types)    |    Y     |
400| [secStructList](#secstructlist)             | [Binary](#types)    |          |
401| [insCodeList](#inscodelist)                 | [Binary](#types)    |          |
402| [sequenceIndexList](#sequenceindexlist)     | [Binary](#types)    |          |
403| [chainIdList](#chainidlist)                 | [Binary](#types)    |    Y     |
404| [chainNameList](#chainnamelist)             | [Binary](#types)    |          |
405| [groupsPerChain](#groupsperchain)           | [Array](#types)     |    Y     |
406| [chainsPerModel](#chainspermodel)           | [Array](#types)     |    Y     |
407
408
409### Format data
410
411#### mmtfVersion
412
413*Required field*
414
415*Type*: [String](#types).
416
417*Description*: The version number of the specification the file adheres to. The specification follows a [semantic versioning](http://semver.org/) scheme. In a version number `MAJOR.MINOR`, the `MAJOR` part is incremented when specification changes are incompatible with previous versions. The `MINOR` part is changed for additions to the specification that are backwards compatible.
418
419*Examples*:
420
421The current, unreleased, in development specification:
422
423```JSON
424"0.1"
425```
426
427A future version with additions backwards compatible to versions "1.0" and "1.1":
428
429```JSON
430"1.2"
431```
432
433
434#### mmtfProducer
435
436*Required field*
437
438*Type*: [String](#types).
439
440*Description*: The name and version of the software used to produce the file. For development versions it can be useful to also include the checksum of the commit. The main purpose of this field is to identify the software that has written a file, for instance because it has format errors.
441
442*Examples*:
443
444A software name and the checksum of a commit:
445
446```JSON
447"RCSB PDB mmtf-java-encoder---version: 6b8635f8d319beea9cd7cc7f5dd2649578ac01a0"
448```
449
450Another software name and its version number:
451
452```JSON
453"NGL mmtf exporter v1.2"
454```
455
456
457### Structure data
458
459#### title
460
461*Optional field*
462
463*Type*: [String](#types).
464
465*Description*: A short description of the structural data included in the file.
466
467*Example*:
468
469```JSON
470"CRAMBIN"
471```
472
473
474#### structureId
475
476*Optional field*
477
478*Type*: [String](#types).
479
480*Description*: An ID for the structure, for example the PDB ID if applicable. If not in conflict with the format of the ID, it must be given in uppercase.
481
482*Example*:
483
484```JSON
485"1CRN"
486```
487
488
489#### depositionDate
490
491*Optional field*
492
493*Type*: [String](#types) with the format `YYYY-MM-DD`, where `YYYY` stands for the year in the Gregorian calendar, `MM` is the month of the year between 01 (January) and 12 (December), and `DD` is the day of the month between 01 and 31.
494
495*Description*: A date that relates to the deposition of the structure in a database, e.g. the wwPDB archive.
496
497*Example*:
498
499For example, the second day of October in the year 2005 is written as:
500
501```JSON
502"2005-10-02"
503```
504
505
506#### releaseDate
507
508*Optional field*
509
510*Type*: [String](#types) with the format `YYYY-MM-DD`, where `YYYY` stands for the year in the Gregorian calendar, `MM` is the month of the year between 01 (January) and 12 (December), and `DD` is the day of the month between 01 and 31.
511
512*Description*: A date that relates to the release of the structure in a database, e.g. the wwPDB archive.
513
514*Example*:
515
516For example, the third day of December in the year 2013 is written as:
517
518```JSON
519"2013-12-03"
520```
521
522
523#### numBonds
524
525*Required field*
526
527*Type*: [Integer](#types).
528
529*Description*: The overall number of bonds. This number must reflect both the bonds given in `bondAtomList` and the bonds given in the `groupType` entries in `groupList`.
530
531*Example*:
532
533```JSON
5341142
535```
536
537
538#### numAtoms
539
540*Required field*
541
542*Type*: [Integer](#types).
543
544*Description*: The overall number of atoms in the structure. This also includes atoms at alternate locations.
545
546*Example*:
547
548```JSON
5491023
550```
551
552
553#### numGroups
554
555*Required field*
556
557*Type*: [Integer](#types).
558
559*Description*: The overall number of groups in the structure. This also includes extra groups due to micro-heterogeneity.
560
561*Example*:
562
563```JSON
564302
565```
566
567
568#### numChains
569
570*Required field*
571
572*Type*: [Integer](#types).
573
574*Description*: The overall number of chains in the structure.
575
576*Example*:
577
578```JSON
5794
580```
581
582
583#### numModels
584
585*Required field*
586
587*Type*: [Integer](#types).
588
589*Description*: The overall number of models in the structure.
590
591*Example*:
592
593```JSON
5941
595```
596
597
598#### spaceGroup
599
600*Optional field*
601
602*Type*: [String](#types).
603
604*Description*: The Hermann-Mauguin space-group symbol.
605
606*Example*:
607
608```JSON
609"P 1 21 1"
610```
611
612
613#### unitCell
614
615*Optional field*
616
617*Type*: [Array](#types) of six [Float](#types) values.
618
619*Description*: Array of six values defining the unit cell. The first three entries are the length of the sides `a`, `b`, and `c` in Å. The last three angles are the `alpha`, `beta`, and `gamma` angles in degree.
620
621*Example*:
622
623```JSON
624[ 80.37, 96.12, 57.67, 90.00, 90.00, 90.00 ]
625```
626
627
628#### ncsOperatorList
629
630*Optional field*
631
632*Type*: [Array](#types) of [Array](#types)s of 16 [Float](#types) values.
633
634*Description*: Array of arrays representing 4x4 transformation matrices that are stored linearly in row major order. Thus, the translational component comprises the 4th, 8th, and 12th element. The transformation matrices describe noncrystallographic symmetry operations needed to create all molecules in the unit cell.
635
636*Example*:
637
638```JSON
639[
640    [
641         0.5,   -0.809, -0.309,  128.875,
642         0.809,  0.309,  0.5,   -208.524,
643        -0.309, -0.5,    0.809,   79.649,
644         0.0,    0.0,    0.0,      1.0
645    ],
646    [
647        -0.5,    0.809, -0.309,  386.625,
648         0.809,  0.309, -0.5,   -208.524,
649        -0.309, -0.5,   -0.809,   79.649,
650         0.0,    0.0,    0.0,      1.0
651    ]
652]
653```
654
655
656#### bioAssemblyList
657
658*Optional field*
659
660*Type*: `Array` of assembly objects with the following fields:
661
662| Name             | Type             | Description                       |
663|------------------|------------------|-----------------------------------|
664| transformList    | [Array](#types)  | Array of transform objects         |
665| name             | [String](#types) | Name of the biological assembly   |
666
667Fields in a `transform` object:
668
669| Name             | Type             | Description                                          |
670|------------------|------------------|------------------------------------------------------|
671| chainIndexList   | [Array](#types)  | Pointers into chain data fields, [Integers](#types)  |
672| matrix           | [Array](#types)  | 4x4 transformation matrix, [Floats](#types)          |
673
674The entries of `chainIndexList` are indices into the [chainIdList](#chainidlist) and [chainNameList](#chainnamelist) fields.
675
676The elements of the 4x4 transformation `matrix` are stored linearly in row major order. Thus, the translational component comprises the 4th, 8th, and 12th element.
677
678*Description*: Array of instructions on how to transform coordinates for an array of chains to create (biological) assemblies. The translational component is given in Å.
679
680*Example*:
681
682The following example shows two transform objects from PDB ID [4OPJ](http://www.rcsb.org/pdb/explore.do?structureId=4OPJ). The transformation matrix of the first object performs no rotation and a translation of 42.387 Å in dimension x. The second one translates -42.387 Å in dimension x.
683
684```JSON
685[
686    {
687        "transformList": [
688            {
689                "chainIndexList": [ 0, 4, 6 ],
690                "matrix": [
691                    1.0, 0.0, 0.0,  42.387,
692                    0.0, 1.0, 0.0,   0.000,
693                    0.0, 0.0, 1.0,   0.000,
694                    0.0, 0.0, 0.0,   1.000
695                ]
696            }
697        ]
698    },
699    {
700        "transformList": [
701            {
702                "chainIndexList": [ 0, 4, 6 ],
703                "matrix": [
704                    1.0, 0.0, 0.0, -42.387,
705                    0.0, 1.0, 0.0,   0.000,
706                    0.0, 0.0, 1.0,   0.000,
707                    0.0, 0.0, 0.0,   1.000
708                ]
709            }
710        ]
711    }
712]
713```
714
715
716#### entityList
717
718*Optional field*
719
720*Type*: [Array](#types) of entity objects with the following fields:
721
722| Name             | Type               | Description                                          |
723|------------------|--------------------|------------------------------------------------------|
724| chainIndexList   | [Array](#array)    | Pointers into chain data fields, [Integers](#types)  |
725| description      | [String](#string)  | Description of the entity                            |
726| type             | [String](#string)  | Name of the entity type                              |
727| sequence         | [String](#string)  | Sequence of the full construct in one-letter-code    |
728
729The entries of `chainIndexList` are indices into the [chainIdList](#chainidlist) and [chainNameList](#chainnamelist) fields.
730
731The `sequence` string contains the full construct, not just the resolved residues. Its characters are referenced by the entries of the [sequenceIndexList](#sequenceindexlist) field. Further, characters follow the IUPAC single letter code for [protein](https://dx.doi.org/10.1111/j.1432-1033.1984.tb07877.x) or [DNA/RNA](https://dx.doi.org/10.1093/nar/13.9.3021) residues, otherwise the character 'X'.
732
733*Description*: Array of unique molecular entities within the structure. Each entry in `chainIndexList` represents an instance of that entity in the structure.
734
735*Vocabulary*: Known values for the entity field `type` from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_entity.type.html) are `macrolide`, `non-polymer`, `polymer`, `water`.
736
737*Example*:
738
739```JSON
740[
741    {
742        "description": "BROMODOMAIN ADJACENT TO ZINC FINGER DOMAIN PROTEIN 2B",
743        "type": "polymer",
744        "chainIndexList": [ 0 ],
745        "sequence": "SMSVKKPKRDDSKDLALCSMILTEMETHEDAWPFLLPVNLKLVPGYKKVIKKPMDFSTIREKLSSGQYPNLETFALDVRLVFDNCETFNEDDSDIGRAGHNMRKYFEKKWTDTFKVS"
746    },
747    {
748        "description": "4-FLUOROBENZAMIDOXIME",
749        "type": "non-polymer",
750        "chainIndexList": [ 1 ],
751        "sequence": ""
752    },
753    {
754        "description": "METHANOL",
755        "type": "non-polymer",
756        "chainIndexList": [ 2, 3, 4 ],
757        "sequence": ""
758    },
759    {
760        "description": "water",
761        "type": "water",
762        "chainIndexList": [ 5 ],
763        "sequence": ""
764    }
765]
766```
767
768
769#### resolution
770
771*Optional field*
772
773*Type*: [Float](#types).
774
775*Description*: The experimental resolution in Angstrom. If not applicable the field must be omitted.
776
777*Examples*:
778
779```JSON
7802.3
781```
782
783
784#### rFree
785
786*Optional field*
787
788*Type*: [Float](#types).
789
790*Description*: The R-free value. If not applicable the field must be omitted.
791
792*Examples*:
793
794```JSON
7950.203
796```
797
798
799#### rWork
800
801*Optional field*
802
803*Type*: [Float](#types).
804
805*Description*: The R-work value. If not applicable the field must be omitted.
806
807*Examples*:
808
809```JSON
8100.176
811```
812
813
814#### experimentalMethods
815
816*Optional field*
817
818*Type*: [Array](#types) of [String](#types)s.
819
820*Description*: The array of experimental methods employed for structure determination.
821
822*Vocabulary*: Known values from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_exptl.method.html) are `ELECTRON CRYSTALLOGRAPHY`, `ELECTRON MICROSCOPY`, `EPR`, `FIBER DIFFRACTION`, `FLUORESCENCE TRANSFER`, `INFRARED SPECTROSCOPY`, `NEUTRON DIFFRACTION`, `POWDER DIFFRACTION`, `SOLID-STATE NMR`, `SOLUTION NMR`, `SOLUTION SCATTERING`, `THEORETICAL MODEL`, `X-RAY DIFFRACTION`.
823
824*Example*:
825
826```JSON
827[ "X-RAY DIFFRACTION" ]
828```
829
830
831#### bondAtomList
832
833*Optional field*
834
835*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers.
836
837*Description*: Pairs of values represent indices of covalently bonded atoms. The indices point to the [Atom data](#atom-data) arrays. Only covalent bonds may be given.
838
839*Example*:
840
841Using the 'Pass-through: 32-bit signed integer array' encoding strategy (type 4).
842
843In the following example there are three bonds, one between the atoms with the indices 0 and 61, one between the atoms with the indices 2 and 4, as well as one between the atoms with the indices 6 and 12.
844
845```JSON
846[ 0, 61, 2, 4, 6, 12 ]
847```
848
849
850#### bondOrderList
851
852*Optional field* If it exists [bondAtomList](#bondatomlist) must also be present. However `bondAtomList` may exist without `bondOrderList`.
853
854*Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers.
855
856*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 1 and 4, defining single, double, triple, and quadruple bonds.
857
858*Example*:
859
860Using the 'Pass-through: 8-bit signed integer array' encoding strategy (type 2).
861
862In the following example there are bond orders given for three bonds. The first and third bond have a bond order of 1 while the second bond has a bond order of 2.
863
864```JSON
865[ 1, 2, 1 ]
866```
867
868
869### Model data
870
871The number of models in a structure is equal to the length of the [chainsPerModel](chainspermodel) field. The `chainsPerModel` field also defines which chains belong to each model.
872
873
874#### chainsPerModel
875
876*Required field*
877
878*Type*: [Array](#types) of [Integer](#types) numbers. The number of models is thus equal to the length of the `chainsPerModel` field.
879
880*Description*: Array of the number of chains in each model. The array allows looping over all models:
881
882```Python
883# initialize index counter
884set modelIndex to 0
885
886# traverse models
887for modelChainCount in chainsPerModel
888    print modelIndex
889    increment modelIndex by one
890```
891
892*Examples*:
893
894In the following example there are 2 models. The first model has 5 chains and the second model has 8 chains. This also means that the chains with indices 0 to 4 belong to the first model and that the chains with indices 5 to 12 belong to the second model.
895
896```JSON
897[ 5, 8 ]
898```
899
900For structures with homogeneous models the number of chains per model is identical for all models. In the following example there are five models, each with four chains.
901
902```JSON
903[ 4, 4, 4, 4, 4 ]
904```
905
906
907### Chain data
908
909The number of chains in a structure is equal to the length of the [groupsPerChain](#groupsperchain) field. The `groupsPerChain` field also defines which groups belong to each chain.
910
911
912#### groupsPerChain
913
914*Required field*
915
916*Type*: [Array](#types) of [Integer](#types) numbers.
917
918*Description*: Array of the number of groups (aka residues) in each chain. The number of chains is thus equal to the length of the `groupsPerChain` field. In conjunction with `chainsPerModel`, the array allows looping over all chains:
919
920```Python
921# initialize index counters
922set modelIndex to 0
923set chainIndex to 0
924
925# traverse models
926for modelChainCount in chainsPerModel
927    print modelIndex
928    # traverse chains
929    for 1 to modelChainCount
930        print chainIndex
931        set offset to chainIndex * 4
932        print chainIdList[ offset : offset + 4 ]
933        print chainNameList[ offset : offset + 4 ]
934        increment chainIndex by 1
935    increment modelIndex by 1
936```
937
938*Example*:
939
940In the following example there are 3 chains. The first chain has 73 groups, the second 59 and the third 1. This also means that the groups with indices 0 to 72 belong to the first chain, groups with indices 73 to 131 to the second chain and the group with index 132 to the third chain.
941
942```JSON
943[ 73, 59, 1 ]
944```
945
946
947#### chainIdList
948
949*Required field*
950
951*Type*: [Binary](#types) data that decodes into an array of 4-character strings.
952
953*Description*: Array of chain IDs. For storing data from mmCIF files the `chainIdList` field should contain the value from the `label_asym_id` mmCIF data item and the `chainNameList` the `auth_asym_id` mmCIF data item. In PDB files there is only a single name/identifier for chains that corresponds to the `auth_asym_id` item. When there is only a single chain identifier available it must be stored in the `chainIdList` field.
954
955*Note*: The character strings must be left aligned and unused characters must be represented by 0 bytes.
956
957*Example*:
958
959Using the 'UTF8/ASCII fixed-length string array' encoding strategy (type 5).
960
961Starting with the array of 8-bit unsigned integers:
962
963```JSON
964[ 65, 0, 0, 0, 66, 0, 0, 0, 67, 0, 0, 0 ]
965```
966
967Decoding the ASCII characters:
968
969```JSON
970[ "A", "", "", "", "B", "", "", "", "C", "", "", "" ]
971```
972
973Creating the array of chain IDs:
974
975```JSON
976[ "A", "B", "C" ]
977```
978
979
980#### chainNameList
981
982*Optional field*
983
984*Type*: [Binary](#types) data that decodes into an array of 4-character strings.
985
986*Description*: Array of chain names. This field allows to specify an additional set of labels/names for chains. For example, it can be used to store both, the `label_asym_id` (in `chainIdList`) and the `auth_asym_id` (in `chainNameList`) from mmCIF files.
987
988*Example*:
989
990Using the 'UTF8/ASCII fixed-length string array' encoding strategy (type 5).
991
992Starting with the array of 8-bit unsigned integers:
993
994```JSON
995[ 65, 0, 0, 0, 68, 65, 0, 0 ]
996```
997
998Decoding the ASCII characters:
999
1000```JSON
1001[ "A", "", "", "", "DA", "", "", "" ]
1002```
1003
1004Creating the array of chain IDs:
1005
1006```JSON
1007[ "A", "DA" ]
1008```
1009
1010
1011
1012### Group data
1013
1014The fields in the following sections hold group-related data.
1015
1016The mmCIF format allows for so-called micro-heterogeneity on the group-level. For groups (residues) with micro-heterogeneity there are two or more entries given that have the same [sequence index](#sequenceindexlist), [group id](#groupidlist) (and [insertion code](#inscodelist)) but are of a different [group type](#grouptypelist). The defining property is their identical sequence index.
1017
1018
1019#### groupList
1020
1021*Required field*
1022
1023*Type*: [Array](#types) of `groupType` objects with the following fields:
1024
1025| Name             | Type              | Description                                                 |
1026|------------------|-------------------|-------------------------------------------------------------|
1027| formalChargeList | [Array](#types)   | Array of formal charges as [Integers](#types)               |
1028| atomNameList     | [Array](#types)   | Array of atom names, 0 to 5 character [Strings](#types)     |
1029| elementList      | [Array](#types)   | Array of elements, 0 to 3 character [Strings](#types)       |
1030| bondAtomList     | [Array](#types)   | Array of bonded atom indices, [Integers](#types)            |
1031| bondOrderList    | [Array](#types)   | Array of bond orders as [Integers](#types) between 1 and 4  |
1032| groupName        | [String](#types)  | The name of the group, 0 to 5 characters                    |
1033| singleLetterCode | [String](#types)  | The single letter code, 1 character                         |
1034| chemCompType     | [String](#types)  | The chemical component type                                 |
1035
1036
1037The element name must follow the IUPAC [standard](http://dx.doi.org/10.1515/ci.2014.36.4.25) where only the first character is capitalized and the remaining ones are lower case, for instance `Cd` for Cadmium.
1038
1039Two consecutive entries in `bondAtomList` representing indices of covalently bound atoms. The indices point into the `formalChargeList`, `atomNameList`, and `elementList` fields.
1040
1041The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx.doi.org/10.1111/j.1432-1033.1984.tb07877.x) or [DNA/RNA](https://dx.doi.org/10.1093/nar/13.9.3021) residues, otherwise the character 'X' for polymer groups or '?' for non-polymer groups.
1042
1043*Description*: Common group (residue) data that is referenced via the `groupType` key by group entries.
1044
1045*Vocabulary*: Known values for the groupType field `chemCompType` from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_chem_comp.type.html) are `D-beta-peptide, C-gamma linking`, `D-gamma-peptide, C-delta linking`, `D-peptide COOH carboxy terminus`, `D-peptide NH3 amino terminus`, `D-peptide linking`, `D-saccharide`, `D-saccharide 1,4 and 1,4 linking`, `D-saccharide 1,4 and 1,6 linking`, `DNA OH 3 prime terminus`, `DNA OH 5 prime terminus`, `DNA linking`, `L-DNA linking`, `L-RNA linking`, `L-beta-peptide, C-gamma linking`, `L-gamma-peptide, C-delta linking`, `L-peptide COOH carboxy terminus`, `L-peptide NH3 amino terminus`, `L-peptide linking`, `L-saccharide`, `L-saccharide 1,4 and 1,4 linking`, `L-saccharide 1,4 and 1,6 linking`, `RNA OH 3 prime terminus`, `RNA OH 5 prime terminus`, `RNA linking`, `non-polymer`, `other`, `peptide linking`, `peptide-like`, `saccharide`.
1046
1047*Example*:
1048
1049```JSON
1050[
1051    {
1052        "groupName": "GLY",
1053        "singleLetterCode": "G",
1054        "chemCompType": "PEPTIDE LINKING",
1055        "atomNameList": [ "N", "CA", "C", "O" ],
1056        "elementList": [ "N", "C", "C", "O" ],
1057        "formalChargeList": [ 0, 0, 0, 0 ],
1058        "bondAtomList": [ 1, 0, 2, 1, 3, 2 ],
1059        "bondOrderList": [ 1, 1, 2 ],
1060    },
1061    {
1062        "groupName": "ASP",
1063        "singleLetterCode": "D",
1064        "chemCompType": "L-PEPTIDE LINKING",
1065        "atomNameList": [ "N", "CA", "C", "O", "CB", "CG", "OD1", "OD2" ],
1066        "elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ],
1067        "formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ],
1068        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ],
1069        "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ]
1070    },
1071    {
1072        "groupName": "SER",
1073        "singleLetterCode": "S",
1074        "chemCompType": "L-PEPTIDE LINKING",
1075        "atomNameList": [ "N", "CA", "C", "O", "CB", "OG" ],
1076        "elementList": [ "N", "C", "C", "O", "C", "O" ],
1077        "formalChargeList": [ 0, 0, 0, 0, 0, 0 ],
1078        "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ],
1079        "bondOrderList": [ 1, 1, 2, 1, 1 ]
1080    }
1081]
1082```
1083
1084
1085#### groupTypeList
1086
1087*Required field*
1088
1089*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers.
1090
1091*Description*: Array of pointers to `groupType` entries in `groupList` by their keys. One entry for each residue, thus the number of residues is equal to the length of the `groupTypeList` field.
1092
1093*Example*:
1094
1095Using the 'Pass-through: 32-bit signed integer array' encoding strategy (type 4).
1096
1097In the following example there are 5 groups. The 1st, 4th and 5th reference the `groupType` with index `2`, the 2nd references index `0` and the third references index `1`. So using the data from the `groupList` example this describes the polymer `SER-GLY-ASP-SER-SER`.
1098
1099```JSON
1100[ 2, 0, 1, 2, 2 ]
1101```
1102
1103
1104#### groupIdList
1105
1106*Required field*
1107
1108*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers.
1109
1110*Description*: Array of group (residue) numbers. One entry for each group/residue.
1111
1112*Example*:
1113
1114Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8).
1115
1116Starting with the array of 32-bit signed integers:
1117
1118```JSON
1119[ 1, 10, -10, 1, 1, 4 ]
1120```
1121
1122Applying run-length decoding:
1123
1124```JSON
1125[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -10, 1, 1, 1, 1 ]
1126```
1127
1128Applying delta decoding:
1129
1130```JSON
1131[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5 ]
1132```
1133
1134
1135#### secStructList
1136
1137*Optional field*
1138
1139*Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers.
1140
1141*Description*: Array of secondary structure assignments coded according to the following table, which shows the eight different types of secondary structure the [DSSP](https://dx.doi.org/10.1002%2Fbip.360221211) algorithm distinguishes. If the field is included there must be an entry for each group (residue) either in all models or only in the first model.
1142
1143| Code | Name         |
1144|-----:|--------------|
1145|    0 | pi helix     |
1146|    1 | bend         |
1147|    2 | alpha helix  |
1148|    3 | extended     |
1149|    4 | 3-10 helix   |
1150|    5 | bridge       |
1151|    6 | turn         |
1152|    7 | coil         |
1153|   -1 | undefined    |
1154
1155*Example*:
1156
1157Using the 'Pass-through: 8-bit signed integer array' encoding strategy (type 2).
1158
1159Starting with the array of 8-bit signed integers:
1160
1161```JSON
1162[ 7, 7, 2, 2, 2, 2, 2, 2, 2, 7 ]
1163```
1164
1165
1166#### insCodeList
1167
1168*Optional field*
1169
1170*Type*: [Binary](#types) data that decodes into an array of characters.
1171
1172*Description*: Array of insertion codes, one for each group (residue). The lack of an insertion code must be denoted by a 0 byte.
1173
1174*Example*:
1175
1176Using the 'Run-length encoded character array' encoding strategy (type 6).
1177
1178Starting with the array of 32-bit signed integers:
1179
1180```JSON
1181[ 0, 5, 65, 3, 66, 2 ]
1182```
1183
1184Applying run-length decoding:
1185
1186```JSON
1187[ 0, 0, 0, 0, 0, 65, 65, 65, 66, 66 ]
1188```
1189
1190If needed the ASCII codes can be converted to an `Array` of `String`s with the zeros as zero-length `String`s:
1191
1192```JSON
1193[ "", "", "", "", "", "A", "A", "A", "B", "B" ]
1194```
1195
1196
1197#### sequenceIndexList
1198
1199*Optional field*
1200
1201*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers.
1202
1203*Description*: Array of indices that point into the `sequence` property of an entity object in the [entityList](entitylist) field that is associated with the chain the group belongs to (i.e. the index of the chain is included in the `chainIndexList` of the entity). There is one entry for each group (residue). It must be set to `-1` when a group entry has no associated entity (and thus no sequence), for example water molecules.
1204
1205*Example*:
1206
1207Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8).
1208
1209Starting with the array of 32-bit signed integers:
1210
1211```JSON
1212[ 1, 10, -10, 1, 1, 4 ]
1213```
1214
1215Applying run-length decoding:
1216
1217```JSON
1218[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -10, 1, 1, 1, 1 ]
1219```
1220
1221Applying delta decoding:
1222
1223```JSON
1224[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4 ]
1225```
1226
1227
1228### Atom data
1229
1230The fields in the following sections hold atom-related data.
1231
1232The mmCIF format allows for alternate locations of atoms. Such atoms have multiple entries in the atom-level fields (including the fields in the [groupList](grouplist) entries). They can be identified and distinguished by their distinct values in the [altLocList](altloclist) field.
1233
1234
1235#### atomIdList
1236
1237*Optional field*
1238
1239*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers.
1240
1241*Description*: Array of atom serial numbers. One entry for each atom.
1242
1243*Example*:
1244
1245Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8).
1246
1247Starting with the array of 32-bit signed integers:
1248
1249```JSON
1250[ 1, 7, 2, 1 ]
1251```
1252
1253Applying run-length decoding:
1254
1255```JSON
1256[ 1, 1, 1, 1, 1, 1, 1, 2 ]
1257```
1258
1259Applying delta decoding:
1260
1261```JSON
1262[ 1, 2, 3, 4, 5, 6, 7, 9 ]
1263```
1264
1265
1266#### altLocList
1267
1268*Optional field*
1269
1270*Type*: [Binary](#types) data that decodes into an array of characters.
1271
1272*Description*: Array of alternate location labels, one for each atom. The lack of an alternate location label must be denoted by a 0 byte.
1273
1274*Example*:
1275
1276Using the 'Run-length encoded character array' encoding strategy (type 6).
1277
1278Starting with the array of 32-bit signed integers:
1279
1280```JSON
1281[ 0, 5, 65, 3, 66, 2 ]
1282```
1283
1284Applying run-length decoding:
1285
1286```JSON
1287[ 0, 0, 0, 0, 0, 65, 65, 65, 66, 66 ]
1288```
1289
1290If needed the ASCII codes can be converted to an `Array` of `String`s with the zeros as zero-length `String`s:
1291
1292```JSON
1293[ "", "", "", "", "", "A", "A", "A", "B", "B" ]
1294```
1295
1296
1297#### bFactorList
1298
1299*Optional fields*
1300
1301*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers.
1302
1303*Description*: Array of atom B-factors in in Å^2. One entry for each atom.
1304
1305*Example*:
1306
1307Using the 'Integer & delta encoded & two-byte-packed 32-bit floating-point number array' encoding strategy (type 10) with a divisor of 100.
1308
1309Starting with the packed array of 16-bit signed integers:
1310
1311```JSON
1312[ 18200, 0, 2, -1, 100, -3, 5 ]
1313```
1314
1315Unpacking/applying recursive indexing decoding to create an array of 32-bit signed integers (note, only the array type changed as the values all fitted into 16-bit signed integers):
1316
1317```JSON
1318[ 18200, 0, 2, -1, 100, -3, 5 ]
1319```
1320
1321Applying delta decoding to create an array of 32-bit signed integers:
1322
1323```JSON
1324[ 18200, 18200, 18202, 18201, 18301, 18298, 18303 ]
1325```
1326
1327Applying integer decoding with a divisor of `100` to create an array of 32-bit floating-point numbers:
1328
1329```JSON
1330[ 182.00, 182.00, 182.02, 182.01, 183.01, 182.98, 183.03 ]
1331```
1332
1333
1334#### xCoordList
1335#### yCoordList
1336#### zCoordList
1337
1338*Required fields*
1339
1340*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers.
1341
1342*Description*: Array of x, y, and z atom coordinates, respectively, in Å. One entry for each atom and coordinate.
1343
1344*Note*: To clarify, the data for each coordinate is stored in a separate array.
1345
1346*Example*:
1347
1348Using the 'Integer & delta encoded & two-byte-packed 32-bit floating-point number array' encoding strategy (type 10) with a divisor of 1000.
1349
1350Starting with the packed array of 16-bit signed integers:
1351
1352```JSON
1353[ 32767, 32767, 32767, 6899, 0, 2, -1, 100, -3, 5 ]
1354```
1355
1356Unpacking/Applying recursive indexing decoding to create an array of 32-bit signed integers:
1357
1358```JSON
1359[ 105200, 0, 2, -1, 100, -3, 5 ]
1360```
1361
1362Applying delta decoding to create an array of 32-bit signed integers:
1363
1364```JSON
1365[ 105200, 105200, 105202, 105201, 105301, 105298, 105303 ]
1366```
1367
1368Applying integer decoding with a divisor of `1000` to create an array of 32-bit floating-point values:
1369
1370```JSON
1371[ 100.000, 105.200, 105.202, 105.201, 105.301, 105.298, 105.303 ]
1372```
1373
1374
1375#### occupancyList
1376
1377*Optional field*
1378
1379*Description*: Array of atom occupancies, one for each atom.
1380
1381*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers.
1382
1383*Example*:
1384
1385Using the 'Integer & run-length encoded 32-bit floating-point number array' encoding strategy (type 9) with a divisor of 100.
1386
1387Starting with the array of 32-bit signed integers:
1388
1389```JSON
1390[ 100, 4, 50, 2 ]
1391```
1392
1393Applying run-length decoding:
1394
1395```JSON
1396[ 100, 100, 100, 100, 50, 50 ]
1397```
1398
1399Applying integer decoding with a divisor of `100` to create an array of 32-bit floating-point values:
1400
1401```JSON
1402[ 1.00, 1.00, 1.00, 1.00, 0.50, 0.50 ]
1403```
1404
1405
1406## Traversal
1407
1408The following traversal pseudo code assumes that all fields have been decoded.
1409
1410```Python
1411# initialize index counters
1412set modelIndex to 0
1413set chainIndex to 0
1414set groupIndex to 0
1415set atomIndex to 0
1416
1417# traverse models
1418for modelChainCount in chainsPerModel
1419    print modelIndex
1420    # traverse chains
1421    for 1 to modelChainCount
1422        print chainIndex
1423        set offset to chainIndex * 4
1424        print chainIdList[ offset : offset + 4 ]
1425        print chainNameList[ offset : offset + 4 ]
1426        set chainGroupCount to groupsPerChain[ chainIndex ]
1427        # traverse groups
1428        for 1 to chainGroupCount
1429            print groupIndex
1430            print groupIdList[ groupIndex ]
1431            print insCodeList[ groupIndex ]
1432            print secStructList[ groupIndex ]
1433            print sequenceIndexList[ groupIndex ]
1434            print groupTypeList[ groupIndex ]
1435            set group to groupList[ groupTypeList[ groupIndex ] ]
1436            print group.groupName
1437            print group.singleLetterCode
1438            print group.chemCompType
1439            set atomOffset to atomIndex
1440            set groupBondCount to group.bondAtomList.length / 2
1441            for i in 1 to groupBondCount
1442                print atomOffset + group.bondAtomList[ i * 2 ]      # atomIndex1
1443                print atomOffset + group.bondAtomList[ i * 2 + 1 ]  # atomIndex2
1444                print group.bondOrderList[ i ]
1445            set groupAtomCount to group.atomNameList.length
1446            # traverse atoms
1447            for i in 1 to groupAtomCount
1448                print atomIndex
1449                print xCoordList[ atomIndex ]
1450                print yCoordList[ atomIndex ]
1451                print zCoordList[ atomIndex ]
1452                print bFactorList[ atomIndex ]
1453                print atomIdList[ atomIndex ]
1454                print altLocList[ atomIndex ]
1455                print occupancyList[ atomIndex ]
1456                print group.formalChargeList[ i ]
1457                print group.atomNameList[ i ]
1458                print group.elementList[ i ]
1459                increment atomIndex by 1
1460            increment groupIndex by 1
1461        increment chainIndex by 1
1462    increment modelIndex by 1
1463
1464# traverse inter-group bonds
1465for i in 1 to bondAtomList.length / 2
1466    print bondAtomList[ i * 2 ]      # atomIndex1
1467    print bondAtomList[ i * 2 + 1 ]  # atomIndex2
1468    print bondOrderList[ i ]
1469```
1470