1# WAL Disk Format 2 3The write ahead log operates in segments that are numbered and sequential, 4e.g. `000000`, `000001`, `000002`, etc., and are limited to 128MB by default. 5A segment is written to in pages of 32KB. Only the last page of the most recent segment 6may be partial. A WAL record is an opaque byte slice that gets split up into sub-records 7should it exceed the remaining space of the current page. Records are never split across 8segment boundaries. If a single record exceeds the default segment size, a segment with 9a larger size will be created. 10The encoding of pages is largely borrowed from [LevelDB's/RocksDB's write ahead log.](https://github.com/facebook/rocksdb/wiki/Write-Ahead-Log-File-Format) 11 12Notable deviations are that the record fragment is encoded as: 13 14``` 15┌───────────┬──────────┬────────────┬──────────────┐ 16│ type <1b> │ len <2b> │ CRC32 <4b> │ data <bytes> │ 17└───────────┴──────────┴────────────┴──────────────┘ 18``` 19 20The type flag has the following states: 21 22* `0`: rest of page will be empty 23* `1`: a full record encoded in a single fragment 24* `2`: first fragment of a record 25* `3`: middle fragment of a record 26* `4`: final fragment of a record 27 28## Record encoding 29 30The records written to the write ahead log are encoded as follows: 31 32### Series records 33 34Series records encode the labels that identifies a series and its unique ID. 35 36``` 37┌────────────────────────────────────────────┐ 38│ type = 1 <1b> │ 39├────────────────────────────────────────────┤ 40│ ┌─────────┬──────────────────────────────┐ │ 41│ │ id <8b> │ n = len(labels) <uvarint> │ │ 42│ ├─────────┴────────────┬─────────────────┤ │ 43│ │ len(str_1) <uvarint> │ str_1 <bytes> │ │ 44│ ├──────────────────────┴─────────────────┤ │ 45│ │ ... │ │ 46│ ├───────────────────────┬────────────────┤ │ 47│ │ len(str_2n) <uvarint> │ str_2n <bytes> │ │ 48│ └───────────────────────┴────────────────┘ │ 49│ . . . │ 50└────────────────────────────────────────────┘ 51``` 52 53### Sample records 54 55Sample records encode samples as a list of triples `(series_id, timestamp, value)`. 56Series reference and timestamp are encoded as deltas w.r.t the first sample. 57The first row stores the starting id and the starting timestamp. 58The first sample record begins at the second row. 59 60``` 61┌──────────────────────────────────────────────────────────────────┐ 62│ type = 2 <1b> │ 63├──────────────────────────────────────────────────────────────────┤ 64│ ┌────────────────────┬───────────────────────────┐ │ 65│ │ id <8b> │ timestamp <8b> │ │ 66│ └────────────────────┴───────────────────────────┘ │ 67│ ┌────────────────────┬───────────────────────────┬─────────────┐ │ 68│ │ id_delta <uvarint> │ timestamp_delta <uvarint> │ value <8b> │ │ 69│ └────────────────────┴───────────────────────────┴─────────────┘ │ 70│ . . . │ 71└──────────────────────────────────────────────────────────────────┘ 72``` 73 74### Tombstone records 75 76Tombstone records encode tombstones as a list of triples `(series_id, min_time, max_time)` 77and specify an interval for which samples of a series got deleted. 78 79``` 80┌─────────────────────────────────────────────────────┐ 81│ type = 3 <1b> │ 82├─────────────────────────────────────────────────────┤ 83│ ┌─────────┬───────────────────┬───────────────────┐ │ 84│ │ id <8b> │ min_time <varint> │ max_time <varint> │ │ 85│ └─────────┴───────────────────┴───────────────────┘ │ 86│ . . . │ 87└─────────────────────────────────────────────────────┘ 88``` 89