• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.clang-tidyH A D02-Dec-2021259 98

README.mdH A D02-Dec-20213.3 KiB6947

array.cH A D02-Dec-202121.2 KiB706532

array.hH A D02-Dec-20213.8 KiB8052

compress_utils.cH A D02-Dec-202121 KiB624477

compress_utils.hH A D02-Dec-2021662 1910

compression.cH A D02-Dec-202153.7 KiB1,6821,250

compression.hH A D02-Dec-20216 KiB165106

create.cH A D02-Dec-202136.8 KiB1,095857

create.hH A D02-Dec-20211.2 KiB2918

datum_serialize.cH A D02-Dec-202111 KiB422327

datum_serialize.hH A D02-Dec-20211.8 KiB5127

deltadelta.cH A D02-Dec-202120.7 KiB707572

deltadelta.hH A D02-Dec-20213 KiB6438

dictionary.cH A D02-Dec-202119.3 KiB636501

dictionary.hH A D02-Dec-20212.9 KiB6138

dictionary_hash.hH A D02-Dec-20213.1 KiB11172

gorilla.cH A D02-Dec-202127 KiB828643

gorilla.hH A D02-Dec-20215.3 KiB10837

segment_meta.cH A D02-Dec-20213.5 KiB147122

segment_meta.hH A D02-Dec-20211,003 2514

simple8b_rle.hH A D02-Dec-202127.7 KiB870651

utils.hH A D02-Dec-20211.5 KiB5236

README.md

1# Compression Algorithms
2
3This is a collection of compression algorithms that are used to compress data of different types.
4The algorithms are optimized for time-series use-cases; many of them assume that adjacent rows will have "similar" values.
5
6## API
7
8Each compression algorithm the API is divided into two parts: a _compressor_ and a _decompression iterator_. The compressor
9is used to compress new data.
10
11- `<algorithm name>_compressor_alloc` - creates the compressor
12- `<algorithm_name>_compressor_append_null` - appends a null
13- `<algorithm_name>_compressor_append_value` - appends a non-null value
14- `<agorithm_name>_compressor_finish` - finalizes the compression and returns the compressed data
15
16Data can be read back out using the decompression iterator. An iterator can operate backwards or forwards.
17There is no random access. The api is
18
19- `<algorithm_name>_decompression_iterator_from_datum_<forward|reverse>` - create a new DatumIterator in the forward or reverse direction.
20- a DatumIterator has a function pointer called `try_next` that returns the next `DecompressResult`.
21
22A `DecompressResult` can either be a decompressed value datum, null, or a done marker to indicate that the iterator is done.
23
24Each decompression algorithm also contains send and recv function to get the external binary representations.
25
26`CompressionAlgorithmDefinition` is a structure that defines function pointers to get forward and reverse iterators
27as well as send and recv functions. The `definitions` array in  `compression.c` contains a `CompressionAlgorithmDefinition`
28for each compression algorithm.
29
30## Base algorithms
31
32The `simple8b rle` algorithm is a building block for many of the compression algorithms.
33It compresses a series of `uint64` values. It compresses the data by packing the values into the least
34amount of bits necessary for the magnitude of the int values, using run-length-encoding for large numbers of repeated values,
35A complete description is in the header file. Note that this is a header-only implementation as performance
36is paramount here as it is used a primitive in all the other compression algorithms.
37
38## Compression Algorithms
39
40### DeltaDelta
41
42for each integer, it takes the delta-of-deltas with the pervious integer,
43zigzag encodes this deltadelta, then finally simple8b_rle encodes this
44zigzagged result. This algorithm performs very well when the magnitude of the
45delta between adjacent values tends not to vary much, and is optimal for
46fixed rate-of-change.
47
48
49### Gorilla
50
51`gorilla` encodes floats using the Facebook gorilla algorithm. It stores the
52compressed xors of adjacent values. It is one of the few simple algorithms
53that compresses floating point numbers reasonably well.
54
55### Dictionary
56
57The dictionary mechanism stores data in two parts: a "dictionary" storing
58each unique value in the dataset (stored as an array, see below) and
59simple8b_rle compressed list of indexes into the dictionary, ordered by row.
60This scheme can store any type of data, but will only be a space improvement
61if the data set is of relatively low cardinality.
62
63### Array
64
65The array "compression" method simply stores the data in an array-like
66structure and does not actually compress it (though TOAST-based compression
67can be applied on top). It is the compression mechanism used when no other
68compression mechanism works. It can store any type of data.
69