1 2 3# MMTF Specification 4 5*Version*: v1.0 6 7The **m**acro**m**olecular **t**ransmission **f**ormat (MMTF) is a binary encoding of biological structures. It includes the coordinates, the topology and associated data. Specifically, a large subset of the data in mmCIF or PDB files can be represented. Pronounced goals are a reduced file size for efficient transmission over the Internet or from hard disk to memory and fast decoding/parsing speed. Additionally, the format aims to be easily understood and implemented to facilitate its wide dissemination. For testing encoder and decoder implementations a [test suite](test-suite/) is available. 8 9 10## Table of contents 11 12* [Overview](#overview) 13* [Container](#container) 14* [Types](#types) 15* [Codecs](#codecs) 16 * [Header](#header) 17 * [Strategies](#strategies) 18* [Encodings](#encodings) 19* [Fields](#fields) 20 * [Format data](#format-data) 21 * [Structure data](#structure-data) 22 * [Model data](#model-data) 23 * [Chain data](#chain-data) 24 * [Group data](#group-data) 25 * [Atom data](#atom-data) 26* [Traversal](#traversal) 27 28 29## Overview 30 31This specification describes a set of required and optional [fields](#fields) representing molecular structures and associated data. The fields are limited to six primitive [types](#types) for efficient serialization and deserialization using the binary [MessagePack](http://msgpack.org/) format. The [fields](#fields) in MMTF are stored in a binary [container](#container) format. The top-level of the container contains the field names as keys and field data as values. To describe the layout of data in MMTF we use the [JSON](http://www.json.org/) notation throughout this document. 32 33The first step of decoding MMTF is decoding the MessagePack-encoded container. Many of the resulting MMTF fields do not need to be decoded any further. However, to allow for custom compression some fields are given as binary data and must be decoded using the [strategies](#encodings) described below. For maximal size savings the binary MMTF data can be compressed using general purpose algorithms like [gzip](https://www.gnu.org/software/gzip/) or [brotli](https://github.com/google/brotli). 34 35The fields in the MMTF format group data of the same type together to create a flat data-structure, for instance, the coordinates of all atoms are stored together, instead of in atom objects with other atom-related data. This avoids imposing a deeply-nested hierarchical structure on consuming programs, while still allowing efficient [traversal](traversal) of models, chains, groups, and atoms. 36 37 38## Container 39 40In principle any serialization format that supports the [types](#types) described below can be used to store the above [fields](#fields). MMTF files (specifically files with the `.mmtf` extension) use the binary [MessagePack](http://msgpack.org/) serialization format. 41 42 43### MessagePack 44 45The MessagePack format (version 5) is used as the binary container format of MMTF. The MessagePack [specification](https://github.com/msgpack/msgpack/blob/master/spec.md) describes the data types and the data layout. Encoding and decoding libraries for MessagePack are available in many languages, see the MessagePack [website](http://msgpack.org/). 46 47 48### JSON 49 50The test suite will additionally provide files representing the MMTF [fields](#fields) as [JSON](http://www.json.org/) to help validating implementations of this specification. 51 52 53## Types 54 55The following types are used for the fields in this specification. 56 57* `String` An UTF-8 encoded string. 58* `Float` A 32-bit floating-point number. 59* `Integer` A 32-bit signed integer. 60* `Map` A data structure of key-value pairs where each key is unique. Also known as "dictionary", "hash". 61* `Array` A sequence of elements that have the same type. 62* `Binary` An array of unsigned 8-bit integer numbers representing binary data. 63 64The `Binary` type is used here to store encoded data as described in the [Codecs](#codecs) section. When the encoded data is to be interpreted as a multi-byte type (e.g. 32-bit integers) it must be represented in big-endian format. 65 66Note that the MessagePack format limits the `String`, `Map`, `Array` and `Binary` type to (2^32)-1 entries per instance. 67 68 69## Codecs 70 71This section describes the binary layout of the header and the encoded data as well as the available en/decoding strategies. 72 73 74### Header 75 76* Bytes 0 to 3: 32-bit signed integer specifying the codec type 77* Bytes 4 to 7: 32-bit signed integer specifying the length of the resulting array 78* Bytes 8 to 11: 4 bytes containing codec-specific parameter data 79* Bytes 12 to N: bytes containing the encoded array data 80 81 82### Strategies 83 84#### Pass-through: 32-bit floating-point number array 85 86*Type* 1 87 88*Signature* `byte[] -> float32[]` 89 90*Description* Interpret bytes as array of 32-bit floating-point numbers. 91 92 93#### Pass-through: 8-bit signed integer array 94 95*Type* 2 96 97*Signature* `byte[] -> int8[]` 98 99*Description* Interpret bytes as array of 8-bit signed integers. 100 101 102#### Pass-through: 16-bit signed integer array 103 104*Type* 3 105 106*Signature* `byte[] -> int16[]` 107 108*Description* Interpret bytes as array of 16-bit signed integers. 109 110 111#### Pass-through: 32-bit signed integer array 112 113*Type* 4 114 115*Signature* `byte[] -> int32[]` 116 117*Description* Interpret bytes as array of 32-bit signed integers. 118 119 120#### UTF8/ASCII fixed-length string array 121 122*Type* 5 123 124*Parameter* `byte[4] -> int32` denoting the string length 125 126*Signature* `byte[] -> uint8[] -> string<length>[]` 127 128*Description* Interpret bytes as array of 8-bit unsigned integers, then iteratively consume `length` many bytes to form a string array. 129 130 131#### Run-length encoded character array 132 133*Type* 6 134 135*Signature* `byte[] -> int32[] -> char[]` 136 137*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of characters. 138 139 140#### Run-length encoded 32-bit signed integer array 141 142*Type* 7 143 144*Signature* `byte[] -> int32[] -> int32[]` 145 146*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers. 147 148 149#### Delta & run-length encoded 32-bit signed integer array 150 151*Type* 8 152 153*Signature* `byte[] -> int32[] -> int32[] -> int32[]` 154 155*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers, then delta decode into array of 32-bit signed integers. 156 157 158#### Integer & run-length encoded 32-bit floating-point number array 159 160*Type* 9 161 162*Parameter* `byte[4] -> int32` denoting the divisor 163 164*Signature* `byte[] -> int32[] -> int32[] -> float32[]` 165 166*Description* Interpret bytes as array of 32-bit signed integers, then run-length decode into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter. 167 168 169#### Integer & delta encoded & two-byte-packed 32-bit floating-point number array 170 171*Type* 10 172 173*Parameter* `byte[4] -> int32` denoting the divisor 174 175*Signature* `byte[] -> int16[] -> int32[] -> int32[] -> float32[]` 176 177*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit integers, then delta decode into array of 32-bit integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter. 178 179 180#### Integer encoded 32-bit floating-point number array 181 182*Type* 11 183 184*Parameter* `byte[4] -> int32` denoting the divisor 185 186*Signature* `byte[] -> int16[] -> float32[]` 187 188*Description* Interpret bytes as array of 16-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter. 189 190 191#### Integer & two-byte-packed 32-bit floating-point number array 192 193*Type* 12 194 195*Parameter* `byte[4] -> int32` denoting the divisor 196 197*Signature* `byte[] -> int16[] -> int32[] -> float32[]` 198 199*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter. 200 201*Note* Useful for arrays where a small amount of values may be slightly larger than two bytes. However, note that with many values larger than that the packing becomes inefficient. 202 203 204#### Integer & one-byte-packed 32-bit floating-point number array 205 206*Type* 13 207 208*Parameter* `byte[4] -> int32` denoting the divisor 209 210*Signature* `byte[] -> int8[] -> int32[] -> float32[]` 211 212*Description* Interpret array of bytes as array of 8-bit signed integers, then unpack into array of 32-bit signed integers, then integer decode into array of 32-bit floating-point numbers using the `divisor` parameter. 213 214*Note* Useful for arrays where a small amount of values may be slightly larger than one bytes. However, note that with many values larger than that the packing becomes inefficient. 215 216 217#### Two-byte-packed 32-bit signed integer array 218 219*Type* 14 220 221*Signature* `byte[] -> int16[] -> int32[]` 222 223*Description* Interpret bytes as array of 16-bit signed integers, then unpack into array of 32-bit signed integers. 224 225*Note* Useful for arrays where a small amount of values may be slightly larger than two bytes. However, note that with many values larger than that the packing becomes inefficient. 226 227 228#### One-byte-packed 32-bit signed integer array 229 230*Type* 15 231 232*Signature* `byte[] -> int8[] -> int32[]` 233 234*Description* Interpret bytes as array of 8-bit signed integers, then unpack into array of 32-bit signed integers. 235 236*Note* Useful for arrays where a small amount of values may be slightly larger than one bytes. However, note that with many values larger than that the packing becomes inefficient. 237 238 239## Encodings 240 241The following general encoding strategies are used to compress the data contained in MMTF files. 242 243 244### Run-length encoding 245 246Run-length encoding can generally be used to compress arrays that contain stretches of equal values. Instead of storing each value itself, stretches of equal values are represented by the value itself and the occurrence count, that is a value/count pair. 247 248*Example*: 249 250Starting with the encoded array of value/count pairs. In the following example there are three pairs `1, 10`, `2, 1` and `1, 4`. The first entry in a pair is the value to be repeated and the second entry denotes how often the value must be repeated. 251 252```JSON 253[ 1, 10, 2, 1, 1, 4 ] 254``` 255 256Applying run-length decoding by repeating, for each pair, the value as often as denoted by the count entry. 257 258```JSON 259[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ] 260``` 261 262 263### Delta encoding 264 265Delta encoding is used to store an array of numbers. Instead of storing the numbers themselves, the differences (deltas) between the numbers are stored. When the values of the deltas are smaller than the numbers themselves they can be more efficiently packed to require less space. 266 267Note that arrays in which the values change by an identical amount for a range of consecutive values lend themselves to subsequent run-length encoding. 268 269*Example*: 270 271Starting with the encoded array of delta values: 272 273```JSON 274[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1 ] 275``` 276 277Applying delta decoding. The first entry in the array is left as is, the second is calculated as the sum of the first and the second (not decoded) value, the third as the sum of the second (decoded) and third (not decoded) value and so forth. 278 279```JSON 280[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16 ] 281``` 282 283 284### Packing/Recursive indexing encoding 285 286Packing/Recursive indexing encodes values such that the encoded values lie within the open interval (MIN, MAX). This allows to create a more compact representation of a 32-bit signed integer array when the majority of values in the array fit into 16-bit (or 8-bit). To encode each value in the input array the method stores the value itself if it lies within the open interval (MIN, MAX), otherwise the MAX (or MIN if the number is negative) interval endpoint is stored and subtracted from the input value. This process of storing and subtracting is repeated recursively until the remainder lies within the interval. 287 288*Example*: 289 290Starting with the array of 8-bit integer values, so the open interval is (127, -128): 291 292```JSON 293[ 127, 41, 34, 1, 0, -50, -128, 0, 7, 127, 0, 127, 127, 14 ] 294``` 295 296Unpacking/Applying recursive indexing decoding. Values that lie within the interval are copied over to the output array. Values that are equal to an interval endpoint are added to the subsequent value while the subsequent value is equal to an interval endpoint, e.g. the sequence `127, 127, 14` becomes `268`: 297 298```JSON 299[ 168, 34, 1, 0, -50, -128, 7, 127, 268 ] 300``` 301 302 303### Integer encoding 304 305In integer encoding, floating point numbers are converted to integer values by multiplying with a factor and discard everything after the decimal point. Depending on the multiplication factor this can change the precision but with a sufficiently large factor it is lossless. The integer values can then often be compressed with delta encoding which is the main motivation for it. 306 307*Example*: 308 309Starting with the array of integer values: 310 311```JSON 312[ 100, 100, 100, 100, 50, 50 ] 313``` 314 315Applying integer decoding with a divisor of `100`: 316 317```JSON 318[ 1.00, 1.00, 1.00, 1.00, 0.50, 0.50 ] 319``` 320 321 322### Dictionary encoding 323 324For dictionary encoding an `Array` is created to store values. Indices as references to the values can then be used instead of repeating the values over and over again. Arrays of indices can afterwards be compressed with delta and run-length encoding. 325 326*Example*: 327 328First create a `Array` to hold values that are referable by indices. In the following example the are two indices, `0` and `1` with some values associated. 329 330```JSON 331[ 332 { 333 "groupName": "ASP", 334 "singleLetterCode": "D", 335 "chemCompType": "L-PEPTIDE LINKING", 336 "atomNameList": [ "N", "CA", "C", "O", "CB", "CG", "OD1", "OD2" ], 337 "elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ], 338 "formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ], 339 "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], 340 "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] 341 }, 342 { 343 "groupName": "SER", 344 "singleLetterCode": "S", 345 "chemCompType": "L-PEPTIDE LINKING", 346 "atomNameList": [ "N", "CA", "C", "O", "CB", "OG" ], 347 "elementList": [ "N", "C", "C", "O", "C", "O" ], 348 "formalChargeList": [ 0, 0, 0, 0, 0, 0 ], 349 "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], 350 "bondOrderList": [ 1, 1, 2, 1, 1 ] 351 } 352] 353``` 354 355The indices can then be used to reference the values as often as needed: 356 357```JSON 358[ 0, 1, 1, 0, 1 ] 359``` 360 361 362## Fields 363 364The following table lists all top level fields, including their [type](#types) and whether they are required or optional. The top-level fields themselves are stores as a `Map`. 365 366| Name | Type | Required | 367|---------------------------------------------|---------------------|:--------:| 368| [mmtfVersion](#mmtfversion) | [String](#types) | Y | 369| [mmtfProducer](#mmtfproducer) | [String](#types) | Y | 370| [unitCell](#unitcell) | [Array](#types) | | 371| [spaceGroup](#spacegroup) | [String](#types) | | 372| [structureId](#structureid) | [String](#types) | | 373| [title](#title) | [String](#types) | | 374| [depositionDate](#depositiondate) | [String](#types) | | 375| [releaseDate](#releasedate) | [String](#types) | | 376| [ncsOperatorList](#ncsoperatorlist) | [Array](#types) | | 377| [bioAssemblyList](#bioassemblylist) | [Array](#types) | | 378| [entityList](#entitylist) | [Array](#types) | | 379| [experimentalMethods](#experimentalmethods) | [Array](#types) | | 380| [resolution](#resolution) | [Float](#types) | | 381| [rFree](#rfree) | [Float](#types) | | 382| [rWork](#rwork) | [Float](#types) | | 383| [numBonds](#numbonds) | [Integer](#types) | Y | 384| [numAtoms](#numatoms) | [Integer](#types) | Y | 385| [numGroups](#numgroups) | [Integer](#types) | Y | 386| [numChains](#numchains) | [Integer](#types) | Y | 387| [numModels](#nummodels) | [Integer](#types) | Y | 388| [groupList](#grouplist) | [Array](#types) | Y | 389| [bondAtomList](#bondatomlist) | [Binary](#types) | | 390| [bondOrderList](#bondorderlist) | [Binary](#types) | | 391| [xCoordList](#xcoordlist) | [Binary](#types) | Y | 392| [yCoordList](#ycoordlist) | [Binary](#types) | Y | 393| [zCoordList](#zcoordlist) | [Binary](#types) | Y | 394| [bFactorList](#bfactorlist) | [Binary](#types) | | 395| [atomIdList](#atomidlist) | [Binary](#types) | | 396| [altLocList](#altloclist) | [Binary](#types) | | 397| [occupancyList](#occupancylist) | [Binary](#types) | | 398| [groupIdList](#groupidlist) | [Binary](#types) | Y | 399| [groupTypeList](#grouptypelist) | [Binary](#types) | Y | 400| [secStructList](#secstructlist) | [Binary](#types) | | 401| [insCodeList](#inscodelist) | [Binary](#types) | | 402| [sequenceIndexList](#sequenceindexlist) | [Binary](#types) | | 403| [chainIdList](#chainidlist) | [Binary](#types) | Y | 404| [chainNameList](#chainnamelist) | [Binary](#types) | | 405| [groupsPerChain](#groupsperchain) | [Array](#types) | Y | 406| [chainsPerModel](#chainspermodel) | [Array](#types) | Y | 407 408 409### Format data 410 411#### mmtfVersion 412 413*Required field* 414 415*Type*: [String](#types). 416 417*Description*: The version number of the specification the file adheres to. The specification follows a [semantic versioning](http://semver.org/) scheme. In a version number `MAJOR.MINOR`, the `MAJOR` part is incremented when specification changes are incompatible with previous versions. The `MINOR` part is changed for additions to the specification that are backwards compatible. 418 419*Examples*: 420 421The current, unreleased, in development specification: 422 423```JSON 424"0.1" 425``` 426 427A future version with additions backwards compatible to versions "1.0" and "1.1": 428 429```JSON 430"1.2" 431``` 432 433 434#### mmtfProducer 435 436*Required field* 437 438*Type*: [String](#types). 439 440*Description*: The name and version of the software used to produce the file. For development versions it can be useful to also include the checksum of the commit. The main purpose of this field is to identify the software that has written a file, for instance because it has format errors. 441 442*Examples*: 443 444A software name and the checksum of a commit: 445 446```JSON 447"RCSB PDB mmtf-java-encoder---version: 6b8635f8d319beea9cd7cc7f5dd2649578ac01a0" 448``` 449 450Another software name and its version number: 451 452```JSON 453"NGL mmtf exporter v1.2" 454``` 455 456 457### Structure data 458 459#### title 460 461*Optional field* 462 463*Type*: [String](#types). 464 465*Description*: A short description of the structural data included in the file. 466 467*Example*: 468 469```JSON 470"CRAMBIN" 471``` 472 473 474#### structureId 475 476*Optional field* 477 478*Type*: [String](#types). 479 480*Description*: An ID for the structure, for example the PDB ID if applicable. If not in conflict with the format of the ID, it must be given in uppercase. 481 482*Example*: 483 484```JSON 485"1CRN" 486``` 487 488 489#### depositionDate 490 491*Optional field* 492 493*Type*: [String](#types) with the format `YYYY-MM-DD`, where `YYYY` stands for the year in the Gregorian calendar, `MM` is the month of the year between 01 (January) and 12 (December), and `DD` is the day of the month between 01 and 31. 494 495*Description*: A date that relates to the deposition of the structure in a database, e.g. the wwPDB archive. 496 497*Example*: 498 499For example, the second day of October in the year 2005 is written as: 500 501```JSON 502"2005-10-02" 503``` 504 505 506#### releaseDate 507 508*Optional field* 509 510*Type*: [String](#types) with the format `YYYY-MM-DD`, where `YYYY` stands for the year in the Gregorian calendar, `MM` is the month of the year between 01 (January) and 12 (December), and `DD` is the day of the month between 01 and 31. 511 512*Description*: A date that relates to the release of the structure in a database, e.g. the wwPDB archive. 513 514*Example*: 515 516For example, the third day of December in the year 2013 is written as: 517 518```JSON 519"2013-12-03" 520``` 521 522 523#### numBonds 524 525*Required field* 526 527*Type*: [Integer](#types). 528 529*Description*: The overall number of bonds. This number must reflect both the bonds given in `bondAtomList` and the bonds given in the `groupType` entries in `groupList`. 530 531*Example*: 532 533```JSON 5341142 535``` 536 537 538#### numAtoms 539 540*Required field* 541 542*Type*: [Integer](#types). 543 544*Description*: The overall number of atoms in the structure. This also includes atoms at alternate locations. 545 546*Example*: 547 548```JSON 5491023 550``` 551 552 553#### numGroups 554 555*Required field* 556 557*Type*: [Integer](#types). 558 559*Description*: The overall number of groups in the structure. This also includes extra groups due to micro-heterogeneity. 560 561*Example*: 562 563```JSON 564302 565``` 566 567 568#### numChains 569 570*Required field* 571 572*Type*: [Integer](#types). 573 574*Description*: The overall number of chains in the structure. 575 576*Example*: 577 578```JSON 5794 580``` 581 582 583#### numModels 584 585*Required field* 586 587*Type*: [Integer](#types). 588 589*Description*: The overall number of models in the structure. 590 591*Example*: 592 593```JSON 5941 595``` 596 597 598#### spaceGroup 599 600*Optional field* 601 602*Type*: [String](#types). 603 604*Description*: The Hermann-Mauguin space-group symbol. 605 606*Example*: 607 608```JSON 609"P 1 21 1" 610``` 611 612 613#### unitCell 614 615*Optional field* 616 617*Type*: [Array](#types) of six [Float](#types) values. 618 619*Description*: Array of six values defining the unit cell. The first three entries are the length of the sides `a`, `b`, and `c` in Å. The last three angles are the `alpha`, `beta`, and `gamma` angles in degree. 620 621*Example*: 622 623```JSON 624[ 80.37, 96.12, 57.67, 90.00, 90.00, 90.00 ] 625``` 626 627 628#### ncsOperatorList 629 630*Optional field* 631 632*Type*: [Array](#types) of [Array](#types)s of 16 [Float](#types) values. 633 634*Description*: Array of arrays representing 4x4 transformation matrices that are stored linearly in row major order. Thus, the translational component comprises the 4th, 8th, and 12th element. The transformation matrices describe noncrystallographic symmetry operations needed to create all molecules in the unit cell. 635 636*Example*: 637 638```JSON 639[ 640 [ 641 0.5, -0.809, -0.309, 128.875, 642 0.809, 0.309, 0.5, -208.524, 643 -0.309, -0.5, 0.809, 79.649, 644 0.0, 0.0, 0.0, 1.0 645 ], 646 [ 647 -0.5, 0.809, -0.309, 386.625, 648 0.809, 0.309, -0.5, -208.524, 649 -0.309, -0.5, -0.809, 79.649, 650 0.0, 0.0, 0.0, 1.0 651 ] 652] 653``` 654 655 656#### bioAssemblyList 657 658*Optional field* 659 660*Type*: `Array` of assembly objects with the following fields: 661 662| Name | Type | Description | 663|------------------|------------------|-----------------------------------| 664| transformList | [Array](#types) | Array of transform objects | 665| name | [String](#types) | Name of the biological assembly | 666 667Fields in a `transform` object: 668 669| Name | Type | Description | 670|------------------|------------------|------------------------------------------------------| 671| chainIndexList | [Array](#types) | Pointers into chain data fields, [Integers](#types) | 672| matrix | [Array](#types) | 4x4 transformation matrix, [Floats](#types) | 673 674The entries of `chainIndexList` are indices into the [chainIdList](#chainidlist) and [chainNameList](#chainnamelist) fields. 675 676The elements of the 4x4 transformation `matrix` are stored linearly in row major order. Thus, the translational component comprises the 4th, 8th, and 12th element. 677 678*Description*: Array of instructions on how to transform coordinates for an array of chains to create (biological) assemblies. The translational component is given in Å. 679 680*Example*: 681 682The following example shows two transform objects from PDB ID [4OPJ](http://www.rcsb.org/pdb/explore.do?structureId=4OPJ). The transformation matrix of the first object performs no rotation and a translation of 42.387 Å in dimension x. The second one translates -42.387 Å in dimension x. 683 684```JSON 685[ 686 { 687 "transformList": [ 688 { 689 "chainIndexList": [ 0, 4, 6 ], 690 "matrix": [ 691 1.0, 0.0, 0.0, 42.387, 692 0.0, 1.0, 0.0, 0.000, 693 0.0, 0.0, 1.0, 0.000, 694 0.0, 0.0, 0.0, 1.000 695 ] 696 } 697 ] 698 }, 699 { 700 "transformList": [ 701 { 702 "chainIndexList": [ 0, 4, 6 ], 703 "matrix": [ 704 1.0, 0.0, 0.0, -42.387, 705 0.0, 1.0, 0.0, 0.000, 706 0.0, 0.0, 1.0, 0.000, 707 0.0, 0.0, 0.0, 1.000 708 ] 709 } 710 ] 711 } 712] 713``` 714 715 716#### entityList 717 718*Optional field* 719 720*Type*: [Array](#types) of entity objects with the following fields: 721 722| Name | Type | Description | 723|------------------|--------------------|------------------------------------------------------| 724| chainIndexList | [Array](#array) | Pointers into chain data fields, [Integers](#types) | 725| description | [String](#string) | Description of the entity | 726| type | [String](#string) | Name of the entity type | 727| sequence | [String](#string) | Sequence of the full construct in one-letter-code | 728 729The entries of `chainIndexList` are indices into the [chainIdList](#chainidlist) and [chainNameList](#chainnamelist) fields. 730 731The `sequence` string contains the full construct, not just the resolved residues. Its characters are referenced by the entries of the [sequenceIndexList](#sequenceindexlist) field. Further, characters follow the IUPAC single letter code for [protein](https://dx.doi.org/10.1111/j.1432-1033.1984.tb07877.x) or [DNA/RNA](https://dx.doi.org/10.1093/nar/13.9.3021) residues, otherwise the character 'X'. 732 733*Description*: Array of unique molecular entities within the structure. Each entry in `chainIndexList` represents an instance of that entity in the structure. 734 735*Vocabulary*: Known values for the entity field `type` from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx.dic/Items/_entity.type.html) are `macrolide`, `non-polymer`, `polymer`, `water`. 736 737*Example*: 738 739```JSON 740[ 741 { 742 "description": "BROMODOMAIN ADJACENT TO ZINC FINGER DOMAIN PROTEIN 2B", 743 "type": "polymer", 744 "chainIndexList": [ 0 ], 745 "sequence": "SMSVKKPKRDDSKDLALCSMILTEMETHEDAWPFLLPVNLKLVPGYKKVIKKPMDFSTIREKLSSGQYPNLETFALDVRLVFDNCETFNEDDSDIGRAGHNMRKYFEKKWTDTFKVS" 746 }, 747 { 748 "description": "4-FLUOROBENZAMIDOXIME", 749 "type": "non-polymer", 750 "chainIndexList": [ 1 ], 751 "sequence": "" 752 }, 753 { 754 "description": "METHANOL", 755 "type": "non-polymer", 756 "chainIndexList": [ 2, 3, 4 ], 757 "sequence": "" 758 }, 759 { 760 "description": "water", 761 "type": "water", 762 "chainIndexList": [ 5 ], 763 "sequence": "" 764 } 765] 766``` 767 768 769#### resolution 770 771*Optional field* 772 773*Type*: [Float](#types). 774 775*Description*: The experimental resolution in Angstrom. If not applicable the field must be omitted. 776 777*Examples*: 778 779```JSON 7802.3 781``` 782 783 784#### rFree 785 786*Optional field* 787 788*Type*: [Float](#types). 789 790*Description*: The R-free value. If not applicable the field must be omitted. 791 792*Examples*: 793 794```JSON 7950.203 796``` 797 798 799#### rWork 800 801*Optional field* 802 803*Type*: [Float](#types). 804 805*Description*: The R-work value. If not applicable the field must be omitted. 806 807*Examples*: 808 809```JSON 8100.176 811``` 812 813 814#### experimentalMethods 815 816*Optional field* 817 818*Type*: [Array](#types) of [String](#types)s. 819 820*Description*: The array of experimental methods employed for structure determination. 821 822*Vocabulary*: Known values from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_exptl.method.html) are `ELECTRON CRYSTALLOGRAPHY`, `ELECTRON MICROSCOPY`, `EPR`, `FIBER DIFFRACTION`, `FLUORESCENCE TRANSFER`, `INFRARED SPECTROSCOPY`, `NEUTRON DIFFRACTION`, `POWDER DIFFRACTION`, `SOLID-STATE NMR`, `SOLUTION NMR`, `SOLUTION SCATTERING`, `THEORETICAL MODEL`, `X-RAY DIFFRACTION`. 823 824*Example*: 825 826```JSON 827[ "X-RAY DIFFRACTION" ] 828``` 829 830 831#### bondAtomList 832 833*Optional field* 834 835*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers. 836 837*Description*: Pairs of values represent indices of covalently bonded atoms. The indices point to the [Atom data](#atom-data) arrays. Only covalent bonds may be given. 838 839*Example*: 840 841Using the 'Pass-through: 32-bit signed integer array' encoding strategy (type 4). 842 843In the following example there are three bonds, one between the atoms with the indices 0 and 61, one between the atoms with the indices 2 and 4, as well as one between the atoms with the indices 6 and 12. 844 845```JSON 846[ 0, 61, 2, 4, 6, 12 ] 847``` 848 849 850#### bondOrderList 851 852*Optional field* If it exists [bondAtomList](#bondatomlist) must also be present. However `bondAtomList` may exist without `bondOrderList`. 853 854*Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers. 855 856*Description*: Array of bond orders for bonds in `bondAtomList`. Must be values between 1 and 4, defining single, double, triple, and quadruple bonds. 857 858*Example*: 859 860Using the 'Pass-through: 8-bit signed integer array' encoding strategy (type 2). 861 862In the following example there are bond orders given for three bonds. The first and third bond have a bond order of 1 while the second bond has a bond order of 2. 863 864```JSON 865[ 1, 2, 1 ] 866``` 867 868 869### Model data 870 871The number of models in a structure is equal to the length of the [chainsPerModel](chainspermodel) field. The `chainsPerModel` field also defines which chains belong to each model. 872 873 874#### chainsPerModel 875 876*Required field* 877 878*Type*: [Array](#types) of [Integer](#types) numbers. The number of models is thus equal to the length of the `chainsPerModel` field. 879 880*Description*: Array of the number of chains in each model. The array allows looping over all models: 881 882```Python 883# initialize index counter 884set modelIndex to 0 885 886# traverse models 887for modelChainCount in chainsPerModel 888 print modelIndex 889 increment modelIndex by one 890``` 891 892*Examples*: 893 894In the following example there are 2 models. The first model has 5 chains and the second model has 8 chains. This also means that the chains with indices 0 to 4 belong to the first model and that the chains with indices 5 to 12 belong to the second model. 895 896```JSON 897[ 5, 8 ] 898``` 899 900For structures with homogeneous models the number of chains per model is identical for all models. In the following example there are five models, each with four chains. 901 902```JSON 903[ 4, 4, 4, 4, 4 ] 904``` 905 906 907### Chain data 908 909The number of chains in a structure is equal to the length of the [groupsPerChain](#groupsperchain) field. The `groupsPerChain` field also defines which groups belong to each chain. 910 911 912#### groupsPerChain 913 914*Required field* 915 916*Type*: [Array](#types) of [Integer](#types) numbers. 917 918*Description*: Array of the number of groups (aka residues) in each chain. The number of chains is thus equal to the length of the `groupsPerChain` field. In conjunction with `chainsPerModel`, the array allows looping over all chains: 919 920```Python 921# initialize index counters 922set modelIndex to 0 923set chainIndex to 0 924 925# traverse models 926for modelChainCount in chainsPerModel 927 print modelIndex 928 # traverse chains 929 for 1 to modelChainCount 930 print chainIndex 931 set offset to chainIndex * 4 932 print chainIdList[ offset : offset + 4 ] 933 print chainNameList[ offset : offset + 4 ] 934 increment chainIndex by 1 935 increment modelIndex by 1 936``` 937 938*Example*: 939 940In the following example there are 3 chains. The first chain has 73 groups, the second 59 and the third 1. This also means that the groups with indices 0 to 72 belong to the first chain, groups with indices 73 to 131 to the second chain and the group with index 132 to the third chain. 941 942```JSON 943[ 73, 59, 1 ] 944``` 945 946 947#### chainIdList 948 949*Required field* 950 951*Type*: [Binary](#types) data that decodes into an array of 4-character strings. 952 953*Description*: Array of chain IDs. For storing data from mmCIF files the `chainIdList` field should contain the value from the `label_asym_id` mmCIF data item and the `chainNameList` the `auth_asym_id` mmCIF data item. In PDB files there is only a single name/identifier for chains that corresponds to the `auth_asym_id` item. When there is only a single chain identifier available it must be stored in the `chainIdList` field. 954 955*Note*: The character strings must be left aligned and unused characters must be represented by 0 bytes. 956 957*Example*: 958 959Using the 'UTF8/ASCII fixed-length string array' encoding strategy (type 5). 960 961Starting with the array of 8-bit unsigned integers: 962 963```JSON 964[ 65, 0, 0, 0, 66, 0, 0, 0, 67, 0, 0, 0 ] 965``` 966 967Decoding the ASCII characters: 968 969```JSON 970[ "A", "", "", "", "B", "", "", "", "C", "", "", "" ] 971``` 972 973Creating the array of chain IDs: 974 975```JSON 976[ "A", "B", "C" ] 977``` 978 979 980#### chainNameList 981 982*Optional field* 983 984*Type*: [Binary](#types) data that decodes into an array of 4-character strings. 985 986*Description*: Array of chain names. This field allows to specify an additional set of labels/names for chains. For example, it can be used to store both, the `label_asym_id` (in `chainIdList`) and the `auth_asym_id` (in `chainNameList`) from mmCIF files. 987 988*Example*: 989 990Using the 'UTF8/ASCII fixed-length string array' encoding strategy (type 5). 991 992Starting with the array of 8-bit unsigned integers: 993 994```JSON 995[ 65, 0, 0, 0, 68, 65, 0, 0 ] 996``` 997 998Decoding the ASCII characters: 999 1000```JSON 1001[ "A", "", "", "", "DA", "", "", "" ] 1002``` 1003 1004Creating the array of chain IDs: 1005 1006```JSON 1007[ "A", "DA" ] 1008``` 1009 1010 1011 1012### Group data 1013 1014The fields in the following sections hold group-related data. 1015 1016The mmCIF format allows for so-called micro-heterogeneity on the group-level. For groups (residues) with micro-heterogeneity there are two or more entries given that have the same [sequence index](#sequenceindexlist), [group id](#groupidlist) (and [insertion code](#inscodelist)) but are of a different [group type](#grouptypelist). The defining property is their identical sequence index. 1017 1018 1019#### groupList 1020 1021*Required field* 1022 1023*Type*: [Array](#types) of `groupType` objects with the following fields: 1024 1025| Name | Type | Description | 1026|------------------|-------------------|-------------------------------------------------------------| 1027| formalChargeList | [Array](#types) | Array of formal charges as [Integers](#types) | 1028| atomNameList | [Array](#types) | Array of atom names, 0 to 5 character [Strings](#types) | 1029| elementList | [Array](#types) | Array of elements, 0 to 3 character [Strings](#types) | 1030| bondAtomList | [Array](#types) | Array of bonded atom indices, [Integers](#types) | 1031| bondOrderList | [Array](#types) | Array of bond orders as [Integers](#types) between 1 and 4 | 1032| groupName | [String](#types) | The name of the group, 0 to 5 characters | 1033| singleLetterCode | [String](#types) | The single letter code, 1 character | 1034| chemCompType | [String](#types) | The chemical component type | 1035 1036 1037The element name must follow the IUPAC [standard](http://dx.doi.org/10.1515/ci.2014.36.4.25) where only the first character is capitalized and the remaining ones are lower case, for instance `Cd` for Cadmium. 1038 1039Two consecutive entries in `bondAtomList` representing indices of covalently bound atoms. The indices point into the `formalChargeList`, `atomNameList`, and `elementList` fields. 1040 1041The `singleLetterCode` is the IUPAC single letter code for [protein](https://dx.doi.org/10.1111/j.1432-1033.1984.tb07877.x) or [DNA/RNA](https://dx.doi.org/10.1093/nar/13.9.3021) residues, otherwise the character 'X' for polymer groups or '?' for non-polymer groups. 1042 1043*Description*: Common group (residue) data that is referenced via the `groupType` key by group entries. 1044 1045*Vocabulary*: Known values for the groupType field `chemCompType` from the [mmCIF dictionary](http://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v40.dic/Items/_chem_comp.type.html) are `D-beta-peptide, C-gamma linking`, `D-gamma-peptide, C-delta linking`, `D-peptide COOH carboxy terminus`, `D-peptide NH3 amino terminus`, `D-peptide linking`, `D-saccharide`, `D-saccharide 1,4 and 1,4 linking`, `D-saccharide 1,4 and 1,6 linking`, `DNA OH 3 prime terminus`, `DNA OH 5 prime terminus`, `DNA linking`, `L-DNA linking`, `L-RNA linking`, `L-beta-peptide, C-gamma linking`, `L-gamma-peptide, C-delta linking`, `L-peptide COOH carboxy terminus`, `L-peptide NH3 amino terminus`, `L-peptide linking`, `L-saccharide`, `L-saccharide 1,4 and 1,4 linking`, `L-saccharide 1,4 and 1,6 linking`, `RNA OH 3 prime terminus`, `RNA OH 5 prime terminus`, `RNA linking`, `non-polymer`, `other`, `peptide linking`, `peptide-like`, `saccharide`. 1046 1047*Example*: 1048 1049```JSON 1050[ 1051 { 1052 "groupName": "GLY", 1053 "singleLetterCode": "G", 1054 "chemCompType": "PEPTIDE LINKING", 1055 "atomNameList": [ "N", "CA", "C", "O" ], 1056 "elementList": [ "N", "C", "C", "O" ], 1057 "formalChargeList": [ 0, 0, 0, 0 ], 1058 "bondAtomList": [ 1, 0, 2, 1, 3, 2 ], 1059 "bondOrderList": [ 1, 1, 2 ], 1060 }, 1061 { 1062 "groupName": "ASP", 1063 "singleLetterCode": "D", 1064 "chemCompType": "L-PEPTIDE LINKING", 1065 "atomNameList": [ "N", "CA", "C", "O", "CB", "CG", "OD1", "OD2" ], 1066 "elementList": [ "N", "C", "C", "O", "C", "C", "O", "O" ], 1067 "formalChargeList": [ 0, 0, 0, 0, 0, 0, 0, 0 ], 1068 "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4, 6, 5, 7, 5 ], 1069 "bondOrderList": [ 1, 1, 2, 1, 1, 2, 1 ] 1070 }, 1071 { 1072 "groupName": "SER", 1073 "singleLetterCode": "S", 1074 "chemCompType": "L-PEPTIDE LINKING", 1075 "atomNameList": [ "N", "CA", "C", "O", "CB", "OG" ], 1076 "elementList": [ "N", "C", "C", "O", "C", "O" ], 1077 "formalChargeList": [ 0, 0, 0, 0, 0, 0 ], 1078 "bondAtomList": [ 1, 0, 2, 1, 3, 2, 4, 1, 5, 4 ], 1079 "bondOrderList": [ 1, 1, 2, 1, 1 ] 1080 } 1081] 1082``` 1083 1084 1085#### groupTypeList 1086 1087*Required field* 1088 1089*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers. 1090 1091*Description*: Array of pointers to `groupType` entries in `groupList` by their keys. One entry for each residue, thus the number of residues is equal to the length of the `groupTypeList` field. 1092 1093*Example*: 1094 1095Using the 'Pass-through: 32-bit signed integer array' encoding strategy (type 4). 1096 1097In the following example there are 5 groups. The 1st, 4th and 5th reference the `groupType` with index `2`, the 2nd references index `0` and the third references index `1`. So using the data from the `groupList` example this describes the polymer `SER-GLY-ASP-SER-SER`. 1098 1099```JSON 1100[ 2, 0, 1, 2, 2 ] 1101``` 1102 1103 1104#### groupIdList 1105 1106*Required field* 1107 1108*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers. 1109 1110*Description*: Array of group (residue) numbers. One entry for each group/residue. 1111 1112*Example*: 1113 1114Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8). 1115 1116Starting with the array of 32-bit signed integers: 1117 1118```JSON 1119[ 1, 10, -10, 1, 1, 4 ] 1120``` 1121 1122Applying run-length decoding: 1123 1124```JSON 1125[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -10, 1, 1, 1, 1 ] 1126``` 1127 1128Applying delta decoding: 1129 1130```JSON 1131[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5 ] 1132``` 1133 1134 1135#### secStructList 1136 1137*Optional field* 1138 1139*Type*: [Binary](#types) data that decodes into an array of 8-bit signed integers. 1140 1141*Description*: Array of secondary structure assignments coded according to the following table, which shows the eight different types of secondary structure the [DSSP](https://dx.doi.org/10.1002%2Fbip.360221211) algorithm distinguishes. If the field is included there must be an entry for each group (residue) either in all models or only in the first model. 1142 1143| Code | Name | 1144|-----:|--------------| 1145| 0 | pi helix | 1146| 1 | bend | 1147| 2 | alpha helix | 1148| 3 | extended | 1149| 4 | 3-10 helix | 1150| 5 | bridge | 1151| 6 | turn | 1152| 7 | coil | 1153| -1 | undefined | 1154 1155*Example*: 1156 1157Using the 'Pass-through: 8-bit signed integer array' encoding strategy (type 2). 1158 1159Starting with the array of 8-bit signed integers: 1160 1161```JSON 1162[ 7, 7, 2, 2, 2, 2, 2, 2, 2, 7 ] 1163``` 1164 1165 1166#### insCodeList 1167 1168*Optional field* 1169 1170*Type*: [Binary](#types) data that decodes into an array of characters. 1171 1172*Description*: Array of insertion codes, one for each group (residue). The lack of an insertion code must be denoted by a 0 byte. 1173 1174*Example*: 1175 1176Using the 'Run-length encoded character array' encoding strategy (type 6). 1177 1178Starting with the array of 32-bit signed integers: 1179 1180```JSON 1181[ 0, 5, 65, 3, 66, 2 ] 1182``` 1183 1184Applying run-length decoding: 1185 1186```JSON 1187[ 0, 0, 0, 0, 0, 65, 65, 65, 66, 66 ] 1188``` 1189 1190If needed the ASCII codes can be converted to an `Array` of `String`s with the zeros as zero-length `String`s: 1191 1192```JSON 1193[ "", "", "", "", "", "A", "A", "A", "B", "B" ] 1194``` 1195 1196 1197#### sequenceIndexList 1198 1199*Optional field* 1200 1201*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers. 1202 1203*Description*: Array of indices that point into the `sequence` property of an entity object in the [entityList](entitylist) field that is associated with the chain the group belongs to (i.e. the index of the chain is included in the `chainIndexList` of the entity). There is one entry for each group (residue). It must be set to `-1` when a group entry has no associated entity (and thus no sequence), for example water molecules. 1204 1205*Example*: 1206 1207Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8). 1208 1209Starting with the array of 32-bit signed integers: 1210 1211```JSON 1212[ 1, 10, -10, 1, 1, 4 ] 1213``` 1214 1215Applying run-length decoding: 1216 1217```JSON 1218[ 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, -10, 1, 1, 1, 1 ] 1219``` 1220 1221Applying delta decoding: 1222 1223```JSON 1224[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4 ] 1225``` 1226 1227 1228### Atom data 1229 1230The fields in the following sections hold atom-related data. 1231 1232The mmCIF format allows for alternate locations of atoms. Such atoms have multiple entries in the atom-level fields (including the fields in the [groupList](grouplist) entries). They can be identified and distinguished by their distinct values in the [altLocList](altloclist) field. 1233 1234 1235#### atomIdList 1236 1237*Optional field* 1238 1239*Type*: [Binary](#types) data that decodes into an array of 32-bit signed integers. 1240 1241*Description*: Array of atom serial numbers. One entry for each atom. 1242 1243*Example*: 1244 1245Using the 'Delta & run-length encoded 32-bit signed integer array' encoding strategy (type 8). 1246 1247Starting with the array of 32-bit signed integers: 1248 1249```JSON 1250[ 1, 7, 2, 1 ] 1251``` 1252 1253Applying run-length decoding: 1254 1255```JSON 1256[ 1, 1, 1, 1, 1, 1, 1, 2 ] 1257``` 1258 1259Applying delta decoding: 1260 1261```JSON 1262[ 1, 2, 3, 4, 5, 6, 7, 9 ] 1263``` 1264 1265 1266#### altLocList 1267 1268*Optional field* 1269 1270*Type*: [Binary](#types) data that decodes into an array of characters. 1271 1272*Description*: Array of alternate location labels, one for each atom. The lack of an alternate location label must be denoted by a 0 byte. 1273 1274*Example*: 1275 1276Using the 'Run-length encoded character array' encoding strategy (type 6). 1277 1278Starting with the array of 32-bit signed integers: 1279 1280```JSON 1281[ 0, 5, 65, 3, 66, 2 ] 1282``` 1283 1284Applying run-length decoding: 1285 1286```JSON 1287[ 0, 0, 0, 0, 0, 65, 65, 65, 66, 66 ] 1288``` 1289 1290If needed the ASCII codes can be converted to an `Array` of `String`s with the zeros as zero-length `String`s: 1291 1292```JSON 1293[ "", "", "", "", "", "A", "A", "A", "B", "B" ] 1294``` 1295 1296 1297#### bFactorList 1298 1299*Optional fields* 1300 1301*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers. 1302 1303*Description*: Array of atom B-factors in in Å^2. One entry for each atom. 1304 1305*Example*: 1306 1307Using the 'Integer & delta encoded & two-byte-packed 32-bit floating-point number array' encoding strategy (type 10) with a divisor of 100. 1308 1309Starting with the packed array of 16-bit signed integers: 1310 1311```JSON 1312[ 18200, 0, 2, -1, 100, -3, 5 ] 1313``` 1314 1315Unpacking/applying recursive indexing decoding to create an array of 32-bit signed integers (note, only the array type changed as the values all fitted into 16-bit signed integers): 1316 1317```JSON 1318[ 18200, 0, 2, -1, 100, -3, 5 ] 1319``` 1320 1321Applying delta decoding to create an array of 32-bit signed integers: 1322 1323```JSON 1324[ 18200, 18200, 18202, 18201, 18301, 18298, 18303 ] 1325``` 1326 1327Applying integer decoding with a divisor of `100` to create an array of 32-bit floating-point numbers: 1328 1329```JSON 1330[ 182.00, 182.00, 182.02, 182.01, 183.01, 182.98, 183.03 ] 1331``` 1332 1333 1334#### xCoordList 1335#### yCoordList 1336#### zCoordList 1337 1338*Required fields* 1339 1340*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers. 1341 1342*Description*: Array of x, y, and z atom coordinates, respectively, in Å. One entry for each atom and coordinate. 1343 1344*Note*: To clarify, the data for each coordinate is stored in a separate array. 1345 1346*Example*: 1347 1348Using the 'Integer & delta encoded & two-byte-packed 32-bit floating-point number array' encoding strategy (type 10) with a divisor of 1000. 1349 1350Starting with the packed array of 16-bit signed integers: 1351 1352```JSON 1353[ 32767, 32767, 32767, 6899, 0, 2, -1, 100, -3, 5 ] 1354``` 1355 1356Unpacking/Applying recursive indexing decoding to create an array of 32-bit signed integers: 1357 1358```JSON 1359[ 105200, 0, 2, -1, 100, -3, 5 ] 1360``` 1361 1362Applying delta decoding to create an array of 32-bit signed integers: 1363 1364```JSON 1365[ 105200, 105200, 105202, 105201, 105301, 105298, 105303 ] 1366``` 1367 1368Applying integer decoding with a divisor of `1000` to create an array of 32-bit floating-point values: 1369 1370```JSON 1371[ 100.000, 105.200, 105.202, 105.201, 105.301, 105.298, 105.303 ] 1372``` 1373 1374 1375#### occupancyList 1376 1377*Optional field* 1378 1379*Description*: Array of atom occupancies, one for each atom. 1380 1381*Type*: [Binary](#types) data that decodes into an array of 32-bit floating-point numbers. 1382 1383*Example*: 1384 1385Using the 'Integer & run-length encoded 32-bit floating-point number array' encoding strategy (type 9) with a divisor of 100. 1386 1387Starting with the array of 32-bit signed integers: 1388 1389```JSON 1390[ 100, 4, 50, 2 ] 1391``` 1392 1393Applying run-length decoding: 1394 1395```JSON 1396[ 100, 100, 100, 100, 50, 50 ] 1397``` 1398 1399Applying integer decoding with a divisor of `100` to create an array of 32-bit floating-point values: 1400 1401```JSON 1402[ 1.00, 1.00, 1.00, 1.00, 0.50, 0.50 ] 1403``` 1404 1405 1406## Traversal 1407 1408The following traversal pseudo code assumes that all fields have been decoded. 1409 1410```Python 1411# initialize index counters 1412set modelIndex to 0 1413set chainIndex to 0 1414set groupIndex to 0 1415set atomIndex to 0 1416 1417# traverse models 1418for modelChainCount in chainsPerModel 1419 print modelIndex 1420 # traverse chains 1421 for 1 to modelChainCount 1422 print chainIndex 1423 set offset to chainIndex * 4 1424 print chainIdList[ offset : offset + 4 ] 1425 print chainNameList[ offset : offset + 4 ] 1426 set chainGroupCount to groupsPerChain[ chainIndex ] 1427 # traverse groups 1428 for 1 to chainGroupCount 1429 print groupIndex 1430 print groupIdList[ groupIndex ] 1431 print insCodeList[ groupIndex ] 1432 print secStructList[ groupIndex ] 1433 print sequenceIndexList[ groupIndex ] 1434 print groupTypeList[ groupIndex ] 1435 set group to groupList[ groupTypeList[ groupIndex ] ] 1436 print group.groupName 1437 print group.singleLetterCode 1438 print group.chemCompType 1439 set atomOffset to atomIndex 1440 set groupBondCount to group.bondAtomList.length / 2 1441 for i in 1 to groupBondCount 1442 print atomOffset + group.bondAtomList[ i * 2 ] # atomIndex1 1443 print atomOffset + group.bondAtomList[ i * 2 + 1 ] # atomIndex2 1444 print group.bondOrderList[ i ] 1445 set groupAtomCount to group.atomNameList.length 1446 # traverse atoms 1447 for i in 1 to groupAtomCount 1448 print atomIndex 1449 print xCoordList[ atomIndex ] 1450 print yCoordList[ atomIndex ] 1451 print zCoordList[ atomIndex ] 1452 print bFactorList[ atomIndex ] 1453 print atomIdList[ atomIndex ] 1454 print altLocList[ atomIndex ] 1455 print occupancyList[ atomIndex ] 1456 print group.formalChargeList[ i ] 1457 print group.atomNameList[ i ] 1458 print group.elementList[ i ] 1459 increment atomIndex by 1 1460 increment groupIndex by 1 1461 increment chainIndex by 1 1462 increment modelIndex by 1 1463 1464# traverse inter-group bonds 1465for i in 1 to bondAtomList.length / 2 1466 print bondAtomList[ i * 2 ] # atomIndex1 1467 print bondAtomList[ i * 2 + 1 ] # atomIndex2 1468 print bondOrderList[ i ] 1469``` 1470