README.md
1# Huff0 entropy compression
2
3This package provides Huff0 encoding and decoding as used in zstd.
4
5[Huff0](https://github.com/Cyan4973/FiniteStateEntropy#new-generation-entropy-coders),
6a Huffman codec designed for modern CPU, featuring OoO (Out of Order) operations on multiple ALU
7(Arithmetic Logic Unit), achieving extremely fast compression and decompression speeds.
8
9This can be used for compressing input with a lot of similar input values to the smallest number of bytes.
10This does not perform any multi-byte [dictionary coding](https://en.wikipedia.org/wiki/Dictionary_coder) as LZ coders,
11but it can be used as a secondary step to compressors (like Snappy) that does not do entropy encoding.
12
13* [Godoc documentation](https://godoc.org/github.com/klauspost/compress/huff0)
14
15THIS PACKAGE IS NOT CONSIDERED STABLE AND API OR ENCODING MAY CHANGE IN THE FUTURE.
16
17## News
18
19 * Mar 2018: First implementation released. Consider this beta software for now.
20
21# Usage
22
23This package provides a low level interface that allows to compress single independent blocks.
24
25Each block is separate, and there is no built in integrity checks.
26This means that the caller should keep track of block sizes and also do checksums if needed.
27
28Compressing a block is done via the [`Compress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress1X) and
29[`Compress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Compress4X) functions.
30You must provide input and will receive the output and maybe an error.
31
32These error values can be returned:
33
34| Error | Description |
35|---------------------|-----------------------------------------------------------------------------|
36| `<nil>` | Everything ok, output is returned |
37| `ErrIncompressible` | Returned when input is judged to be too hard to compress |
38| `ErrUseRLE` | Returned from the compressor when the input is a single byte value repeated |
39| `ErrTooBig` | Returned if the input block exceeds the maximum allowed size (128 Kib) |
40| `(error)` | An internal error occurred. |
41
42
43As can be seen above some of there are errors that will be returned even under normal operation so it is important to handle these.
44
45To reduce allocations you can provide a [`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object
46that can be re-used for successive calls. Both compression and decompression accepts a `Scratch` object, and the same
47object can be used for both.
48
49Be aware, that when re-using a `Scratch` object that the *output* buffer is also re-used, so if you are still using this
50you must set the `Out` field in the scratch to nil. The same buffer is used for compression and decompression output.
51
52The `Scratch` object will retain state that allows to re-use previous tables for encoding and decoding.
53
54## Tables and re-use
55
56Huff0 allows for reusing tables from the previous block to save space if that is expected to give better/faster results.
57
58The Scratch object allows you to set a [`ReusePolicy`](https://godoc.org/github.com/klauspost/compress/huff0#ReusePolicy)
59that controls this behaviour. See the documentation for details. This can be altered between each block.
60
61Do however note that this information is *not* stored in the output block and it is up to the users of the package to
62record whether [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable) should be called,
63based on the boolean reported back from the CompressXX call.
64
65If you want to store the table separate from the data, you can access them as `OutData` and `OutTable` on the
66[`Scratch`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch) object.
67
68## Decompressing
69
70The first part of decoding is to initialize the decoding table through [`ReadTable`](https://godoc.org/github.com/klauspost/compress/huff0#ReadTable).
71This will initialize the decoding tables.
72You can supply the complete block to `ReadTable` and it will return the data part of the block
73which can be given to the decompressor.
74
75Decompressing is done by calling the [`Decompress1X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress1X)
76or [`Decompress4X`](https://godoc.org/github.com/klauspost/compress/huff0#Scratch.Decompress4X) function.
77
78You must provide the output from the compression stage, at exactly the size you got back. If you receive an error back
79your input was likely corrupted.
80
81It is important to note that a successful decoding does *not* mean your output matches your original input.
82There are no integrity checks, so relying on errors from the decompressor does not assure your data is valid.
83
84# Contributing
85
86Contributions are always welcome. Be aware that adding public functions will require good justification and breaking
87changes will likely not be accepted. If in doubt open an issue before writing the PR.