1# Index Disk Format 2 3The following describes the format of the `index` file found in each block directory. 4It is terminated by a table of contents which serves as an entry point into the index. 5 6``` 7┌────────────────────────────┬─────────────────────┐ 8│ magic(0xBAAAD700) <4b> │ version(1) <1 byte> │ 9├────────────────────────────┴─────────────────────┤ 10│ ┌──────────────────────────────────────────────┐ │ 11│ │ Symbol Table │ │ 12│ ├──────────────────────────────────────────────┤ │ 13│ │ Series │ │ 14│ ├──────────────────────────────────────────────┤ │ 15│ │ Label Index 1 │ │ 16│ ├──────────────────────────────────────────────┤ │ 17│ │ ... │ │ 18│ ├──────────────────────────────────────────────┤ │ 19│ │ Label Index N │ │ 20│ ├──────────────────────────────────────────────┤ │ 21│ │ Postings 1 │ │ 22│ ├──────────────────────────────────────────────┤ │ 23│ │ ... │ │ 24│ ├──────────────────────────────────────────────┤ │ 25│ │ Postings N │ │ 26│ ├──────────────────────────────────────────────┤ │ 27│ │ Label Index Table │ │ 28│ ├──────────────────────────────────────────────┤ │ 29│ │ Postings Table │ │ 30│ ├──────────────────────────────────────────────┤ │ 31│ │ TOC │ │ 32│ └──────────────────────────────────────────────┘ │ 33└──────────────────────────────────────────────────┘ 34``` 35 36When the index is written, an arbitrary number of padding bytes may be added between the lined out main sections above. When sequentially scanning through the file, any zero bytes after a section's specified length must be skipped. 37 38Most of the sections described below start with a `len` field. It always specifies the number of bytes just before the trailing CRC32 checksum. The checksum is always calculated over those `len` bytes. 39 40 41### Symbol Table 42 43The symbol table holds a sorted list of deduplicated strings that occurred in label pairs of the stored series. They can be referenced from subsequent sections and significantly reduce the total index size. 44 45The section contains a sequence of the string entries, each prefixed with the string's length in raw bytes. All strings are utf-8 encoded. 46Strings are referenced by sequential indexing. The strings are sorted in lexicographically ascending order. 47 48``` 49┌────────────────────┬─────────────────────┐ 50│ len <4b> │ #symbols <4b> │ 51├────────────────────┴─────────────────────┤ 52│ ┌──────────────────────┬───────────────┐ │ 53│ │ len(str_1) <uvarint> │ str_1 <bytes> │ │ 54│ ├──────────────────────┴───────────────┤ │ 55│ │ . . . │ │ 56│ ├──────────────────────┬───────────────┤ │ 57│ │ len(str_n) <uvarint> │ str_n <bytes> │ │ 58│ └──────────────────────┴───────────────┘ │ 59├──────────────────────────────────────────┤ 60│ CRC32 <4b> │ 61└──────────────────────────────────────────┘ 62``` 63 64 65### Series 66 67The section contains a sequence of series that hold the label set of the series as well as its chunks within the block. The series are sorted lexicographically by their label sets. 68Each series section is aligned to 16 bytes. The ID for a series is the `offset/16`. This serves as the series' ID in all subsequent references. Thereby, a sorted list of series IDs implies a lexicographically sorted list of series label sets. 69 70``` 71┌───────────────────────────────────────┐ 72│ ┌───────────────────────────────────┐ │ 73│ │ series_1 │ │ 74│ ├───────────────────────────────────┤ │ 75│ │ . . . │ │ 76│ ├───────────────────────────────────┤ │ 77│ │ series_n │ │ 78│ └───────────────────────────────────┘ │ 79└───────────────────────────────────────┘ 80``` 81 82Every series entry first holds its number of labels, followed by tuples of symbol table references that contain the label name and value. The label pairs are lexicographically sorted. 83After the labels, the number of indexed chunks is encoded, followed by a sequence of metadata entries containing the chunks minimum (`mint`) and maximum (`maxt`) timestamp and a reference to its position in the chunk file. The `mint` is the time of the first sample and `maxt` is the time of the last sample in the chunk. Holding the time range data in the index allows dropping chunks irrelevant to queried time ranges without accessing them directly. 84 85`mint` of the first chunk is stored, it's `maxt` is stored as a delta and the `mint` and `maxt` are encoded as deltas to the previous time for subsequent chunks. Similarly, the reference of the first chunk is stored and the next ref is stored as a delta to the previous one. 86 87``` 88┌──────────────────────────────────────────────────────────────────────────┐ 89│ len <uvarint> │ 90├──────────────────────────────────────────────────────────────────────────┤ 91│ ┌──────────────────────────────────────────────────────────────────────┐ │ 92│ │ labels count <uvarint64> │ │ 93│ ├──────────────────────────────────────────────────────────────────────┤ │ 94│ │ ┌────────────────────────────────────────────┐ │ │ 95│ │ │ ref(l_i.name) <uvarint32> │ │ │ 96│ │ ├────────────────────────────────────────────┤ │ │ 97│ │ │ ref(l_i.value) <uvarint32> │ │ │ 98│ │ └────────────────────────────────────────────┘ │ │ 99│ │ ... │ │ 100│ ├──────────────────────────────────────────────────────────────────────┤ │ 101│ │ chunks count <uvarint64> │ │ 102│ ├──────────────────────────────────────────────────────────────────────┤ │ 103│ │ ┌────────────────────────────────────────────┐ │ │ 104│ │ │ c_0.mint <varint64> │ │ │ 105│ │ ├────────────────────────────────────────────┤ │ │ 106│ │ │ c_0.maxt - c_0.mint <uvarint64> │ │ │ 107│ │ ├────────────────────────────────────────────┤ │ │ 108│ │ │ ref(c_0.data) <uvarint64> │ │ │ 109│ │ └────────────────────────────────────────────┘ │ │ 110│ │ ┌────────────────────────────────────────────┐ │ │ 111│ │ │ c_i.mint - c_i-1.maxt <uvarint64> │ │ │ 112│ │ ├────────────────────────────────────────────┤ │ │ 113│ │ │ c_i.maxt - c_i.mint <uvarint64> │ │ │ 114│ │ ├────────────────────────────────────────────┤ │ │ 115│ │ │ ref(c_i.data) - ref(c_i-1.data) <varint64> │ │ │ 116│ │ └────────────────────────────────────────────┘ │ │ 117│ │ ... │ │ 118│ └──────────────────────────────────────────────────────────────────────┘ │ 119├──────────────────────────────────────────────────────────────────────────┤ 120│ CRC32 <4b> │ 121└──────────────────────────────────────────────────────────────────────────┘ 122``` 123 124 125 126### Label Index 127 128A label index section indexes the existing (combined) values for one or more label names. 129The `#names` field determines the number of indexed label names, followed by the total number of entries in the `#entries` field. The body holds #entries / #names tuples of symbol table references, each tuple being of #names length. The value tuples are sorted in lexicographically increasing order. 130 131``` 132┌───────────────┬────────────────┬────────────────┐ 133│ len <4b> │ #names <4b> │ #entries <4b> │ 134├───────────────┴────────────────┴────────────────┤ 135│ ┌─────────────────────────────────────────────┐ │ 136│ │ ref(value_0) <4b> │ │ 137│ ├─────────────────────────────────────────────┤ │ 138│ │ ... │ │ 139│ ├─────────────────────────────────────────────┤ │ 140│ │ ref(value_n) <4b> │ │ 141│ └─────────────────────────────────────────────┘ │ 142│ . . . │ 143├─────────────────────────────────────────────────┤ 144│ CRC32 <4b> │ 145└─────────────────────────────────────────────────┘ 146``` 147 148For instance, a single label name with 4 different values will be encoded as: 149 150``` 151┌────┬───┬───┬──────────────┬──────────────┬──────────────┬──────────────┬───────┐ 152│ 24 │ 1 │ 4 │ ref(value_0) | ref(value_1) | ref(value_2) | ref(value_3) | CRC32 | 153└────┴───┴───┴──────────────┴──────────────┴──────────────┴──────────────┴───────┘ 154``` 155 156The sequence of label index sections is finalized by an [offset table](#offset-table) pointing to the beginning of each label index section for a given set of label names. 157 158### Postings 159 160Postings sections store monotonically increasing lists of series references that contain a given label pair associated with the list. 161 162``` 163┌────────────────────┬────────────────────┐ 164│ len <4b> │ #entries <4b> │ 165├────────────────────┴────────────────────┤ 166│ ┌─────────────────────────────────────┐ │ 167│ │ ref(series_1) <4b> │ │ 168│ ├─────────────────────────────────────┤ │ 169│ │ ... │ │ 170│ ├─────────────────────────────────────┤ │ 171│ │ ref(series_n) <4b> │ │ 172│ └─────────────────────────────────────┘ │ 173├─────────────────────────────────────────┤ 174│ CRC32 <4b> │ 175└─────────────────────────────────────────┘ 176``` 177 178The sequence of postings sections is finalized by an [offset table](#offset-table) pointing to the beginning of each postings section for a given set of label names. 179 180### Offset Table 181 182An offset table stores a sequence of entries that maps a list of strings to an offset. They are used to track label index and postings sections. They are read into memory when an index file is loaded. 183 184``` 185┌─────────────────────┬──────────────────────┐ 186│ len <4b> │ #entries <4b> │ 187├─────────────────────┴──────────────────────┤ 188│ ┌────────────────────────────────────────┐ │ 189│ │ n = #strs <uvarint> │ │ 190│ ├──────────────────────┬─────────────────┤ │ 191│ │ len(str_1) <uvarint> │ str_1 <bytes> │ │ 192│ ├──────────────────────┴─────────────────┤ │ 193│ │ ... │ │ 194│ ├──────────────────────┬─────────────────┤ │ 195│ │ len(str_n) <uvarint> │ str_n <bytes> │ │ 196│ ├──────────────────────┴─────────────────┤ │ 197│ │ offset <uvarint64> │ │ 198│ └────────────────────────────────────────┘ │ 199│ . . . │ 200├────────────────────────────────────────────┤ 201│ CRC32 <4b> │ 202└────────────────────────────────────────────┘ 203``` 204 205 206### TOC 207 208The table of contents serves as an entry point to the entire index and points to various sections in the file. 209If a reference is zero, it indicates the respective section does not exist and empty results should be returned upon lookup. 210 211``` 212┌─────────────────────────────────────────┐ 213│ ref(symbols) <8b> │ 214├─────────────────────────────────────────┤ 215│ ref(series) <8b> │ 216├─────────────────────────────────────────┤ 217│ ref(label indices start) <8b> │ 218├─────────────────────────────────────────┤ 219│ ref(label indices table) <8b> │ 220├─────────────────────────────────────────┤ 221│ ref(postings start) <8b> │ 222├─────────────────────────────────────────┤ 223│ ref(postings table) <8b> │ 224├─────────────────────────────────────────┤ 225│ CRC32 <4b> │ 226└─────────────────────────────────────────┘ 227``` 228