• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..15-Oct-2021-

internal/xxhash/H15-Oct-2021-617433

README.mdH A D15-Oct-202122 KiB432316

bitreader.goH A D15-Oct-20213.3 KiB137100

bitwriter.goH A D15-Oct-20214.5 KiB170131

blockdec.goH A D15-Oct-202118.6 KiB737616

blockenc.goH A D15-Oct-202122.3 KiB872727

blocktype_string.goH A D15-Oct-20212.7 KiB8661

bytebuf.goH A D15-Oct-20212.4 KiB131101

bytereader.goH A D15-Oct-20212 KiB8959

decodeheader.goH A D15-Oct-20215.3 KiB203135

decoder.goH A D15-Oct-202113.2 KiB555408

decoder_options.goH A D15-Oct-20213 KiB10369

dict.goH A D15-Oct-20212.8 KiB123101

enc_base.goH A D15-Oct-20214.3 KiB179140

enc_best.goH A D15-Oct-202113.9 KiB502392

enc_better.goH A D15-Oct-202133.1 KiB1,236967

enc_dfast.goH A D15-Oct-202129.2 KiB1,122841

enc_fast.goH A D15-Oct-202125.3 KiB1,019798

encoder.goH A D15-Oct-202113.6 KiB577476

encoder_options.goH A D15-Oct-20219.9 KiB313213

framedec.goH A D15-Oct-202112.4 KiB522423

frameenc.goH A D15-Oct-20213.3 KiB138116

fse_decoder.goH A D15-Oct-202110.6 KiB386293

fse_encoder.goH A D15-Oct-202119 KiB726599

fse_predefined.goH A D15-Oct-20215.1 KiB159112

hash.goH A D15-Oct-20212.6 KiB7844

history.goH A D15-Oct-20212.3 KiB9065

seqdec.goH A D15-Oct-202113.2 KiB493374

seqenc.goH A D15-Oct-20213.2 KiB11581

snappy.goH A D15-Oct-202112.7 KiB436345

zip.goH A D15-Oct-20212.8 KiB12294

zstd.goH A D15-Oct-20214.4 KiB15382

README.md

1# zstd
2
3[Zstandard](https://facebook.github.io/zstd/) is a real-time compression algorithm, providing high compression ratios.
4It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder.
5A high performance compression algorithm is implemented. For now focused on speed.
6
7This package provides [compression](#Compressor) to and [decompression](#Decompressor) of Zstandard content.
8
9This package is pure Go and without use of "unsafe".
10
11The `zstd` package is provided as open source software using a Go standard license.
12
13Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.
14
15## Installation
16
17Install using `go get -u github.com/klauspost/compress`. The package is located in `github.com/klauspost/compress/zstd`.
18
19[![Go Reference](https://pkg.go.dev/badge/github.com/klauspost/compress/zstd.svg)](https://pkg.go.dev/github.com/klauspost/compress/zstd)
20
21## Compressor
22
23### Status:
24
25STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively
26used by several projects. This library is being [fuzz-tested](https://github.com/klauspost/compress-fuzz) for all updates.
27
28There may still be specific combinations of data types/size/settings that could lead to edge cases,
29so as always, testing is recommended.
30
31For now, a high speed (fastest) and medium-fast (default) compressor has been implemented.
32
33* The "Fastest" compression ratio is roughly equivalent to zstd level 1.
34* The "Default" compression ratio is roughly equivalent to zstd level 3 (default).
35* The "Better" compression ratio is roughly equivalent to zstd level 7.
36* The "Best" compression ratio is roughly equivalent to zstd level 11.
37
38In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode.
39The compression ratio compared to stdlib is around level 3, but usually 3x as fast.
40
41
42### Usage
43
44An Encoder can be used for either compressing a stream via the
45`io.WriteCloser` interface supported by the Encoder or as multiple independent
46tasks via the `EncodeAll` function.
47Smaller encodes are encouraged to use the EncodeAll function.
48Use `NewWriter` to create a new instance that can be used for both.
49
50To create a writer with default options, do like this:
51
52```Go
53// Compress input to output.
54func Compress(in io.Reader, out io.Writer) error {
55    enc, err := zstd.NewWriter(out)
56    if err != nil {
57        return err
58    }
59    _, err = io.Copy(enc, in)
60    if err != nil {
61        enc.Close()
62        return err
63    }
64    return enc.Close()
65}
66```
67
68Now you can encode by writing data to `enc`. The output will be finished writing when `Close()` is called.
69Even if your encode fails, you should still call `Close()` to release any resources that may be held up.
70
71The above is fine for big encodes. However, whenever possible try to *reuse* the writer.
72
73To reuse the encoder, you can use the `Reset(io.Writer)` function to change to another output.
74This will allow the encoder to reuse all resources and avoid wasteful allocations.
75
76Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on part
77of a stream. This is independent of the `WithEncoderConcurrency(n)`, but that is likely to change
78in the future. So if you want to limit concurrency for future updates, specify the concurrency
79you would like.
80
81You can specify your desired compression level using `WithEncoderLevel()` option. Currently only pre-defined
82compression settings can be specified.
83
84#### Future Compatibility Guarantees
85
86This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.
87
88The goal will be to keep the default efficiency at the default zstd (level 3).
89However the encoding should never be assumed to remain the same,
90and you should not use hashes of compressed output for similarity checks.
91
92The Encoder can be assumed to produce the same output from the exact same code version.
93However, the may be modes in the future that break this,
94although they will not be enabled without an explicit option.
95
96This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.
97
98Also note, that the cgo decompressor currently does not [report all errors on invalid input](https://github.com/DataDog/zstd/issues/59),
99[omits error checks](https://github.com/DataDog/zstd/issues/61), [ignores checksums](https://github.com/DataDog/zstd/issues/43)
100and seems to ignore concatenated streams, even though [it is part of the spec](https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frames).
101
102#### Blocks
103
104For compressing small blocks, the returned encoder has a function called `EncodeAll(src, dst []byte) []byte`.
105
106`EncodeAll` will encode all input in src and append it to dst.
107This function can be called concurrently, but each call will only run on a single goroutine.
108
109Encoded blocks can be concatenated and the result will be the combined input stream.
110Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or `DecodeAll`.
111
112Especially when encoding blocks you should take special care to reuse the encoder.
113This will effectively make it run without allocations after a warmup period.
114To make it run completely without allocations, supply a destination buffer with space for all content.
115
116```Go
117import "github.com/klauspost/compress/zstd"
118
119// Create a writer that caches compressors.
120// For this operation type we supply a nil Reader.
121var encoder, _ = zstd.NewWriter(nil)
122
123// Compress a buffer.
124// If you have a destination buffer, the allocation in the call can also be eliminated.
125func Compress(src []byte) []byte {
126    return encoder.EncodeAll(src, make([]byte, 0, len(src)))
127}
128```
129
130You can control the maximum number of concurrent encodes using the `WithEncoderConcurrency(n)`
131option when creating the writer.
132
133Using the Encoder for both a stream and individual blocks concurrently is safe.
134
135### Performance
136
137I have collected some speed examples to compare speed and compression against other compressors.
138
139* `file` is the input file.
140* `out` is the compressor used. `zskp` is this package. `zstd` is the Datadog cgo library. `gzstd/gzkp` is gzip standard and this library.
141* `level` is the compression level used. For `zskp` level 1 is "fastest", level 2 is "default"; 3 is "better", 4 is "best".
142* `insize`/`outsize` is the input/output size.
143* `millis` is the number of milliseconds used for compression.
144* `mb/s` is megabytes (2^20 bytes) per second.
145
146```
147Silesia Corpus:
148http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
149
150This package:
151file    out     level   insize      outsize     millis  mb/s
152silesia.tar zskp    1   211947520   73101992    643     313.87
153silesia.tar zskp    2   211947520   67504318    969     208.38
154silesia.tar zskp    3   211947520   64595893    2007    100.68
155silesia.tar zskp    4   211947520   60995370    7691    26.28
156
157cgo zstd:
158silesia.tar zstd    1   211947520   73605392    543     371.56
159silesia.tar zstd    3   211947520   66793289    864     233.68
160silesia.tar zstd    6   211947520   62916450    1913    105.66
161silesia.tar zstd    9   211947520   60212393    5063    39.92
162
163gzip, stdlib/this package:
164silesia.tar gzstd   1   211947520   80007735    1654    122.21
165silesia.tar gzkp    1   211947520   80369488    1168    173.06
166
167GOB stream of binary data. Highly compressible.
168https://files.klauspost.com/compress/gob-stream.7z
169
170file        out     level   insize  outsize     millis  mb/s
171gob-stream  zskp    1   1911399616  235022249   3088    590.30
172gob-stream  zskp    2   1911399616  205669791   3786    481.34
173gob-stream  zskp    3   1911399616  175034659   9636    189.17
174gob-stream  zskp    4   1911399616  167273881   29337   62.13
175gob-stream  zstd    1   1911399616  249810424   2637    691.26
176gob-stream  zstd    3   1911399616  208192146   3490    522.31
177gob-stream  zstd    6   1911399616  193632038   6687    272.56
178gob-stream  zstd    9   1911399616  177620386   16175   112.70
179gob-stream  gzstd   1   1911399616  357382641   10251   177.82
180gob-stream  gzkp    1   1911399616  362156523   5695    320.08
181
182The test data for the Large Text Compression Benchmark is the first
18310^9 bytes of the English Wikipedia dump on Mar. 3, 2006.
184http://mattmahoney.net/dc/textdata.html
185
186file    out level   insize      outsize     millis  mb/s
187enwik9  zskp    1   1000000000  343848582   3609    264.18
188enwik9  zskp    2   1000000000  317276632   5746    165.97
189enwik9  zskp    3   1000000000  292243069   12162   78.41
190enwik9  zskp    4   1000000000  275241169   36430   26.18
191enwik9  zstd    1   1000000000  358072021   3110    306.65
192enwik9  zstd    3   1000000000  313734672   4784    199.35
193enwik9  zstd    6   1000000000  295138875   10290   92.68
194enwik9  zstd    9   1000000000  278348700   28549   33.40
195enwik9  gzstd   1   1000000000  382578136   9604    99.30
196enwik9  gzkp    1   1000000000  383825945   6544    145.73
197
198Highly compressible JSON file.
199https://files.klauspost.com/compress/github-june-2days-2019.json.zst
200
201file                        out level   insize      outsize     millis  mb/s
202github-june-2days-2019.json zskp    1   6273951764  699045015   10620   563.40
203github-june-2days-2019.json zskp    2   6273951764  617881763   11687   511.96
204github-june-2days-2019.json zskp    3   6273951764  524340691   34043   175.75
205github-june-2days-2019.json zskp    4   6273951764  503314661   93811   63.78
206github-june-2days-2019.json zstd    1   6273951764  766284037   8450    708.00
207github-june-2days-2019.json zstd    3   6273951764  661889476   10927   547.57
208github-june-2days-2019.json zstd    6   6273951764  642756859   22996   260.18
209github-june-2days-2019.json zstd    9   6273951764  601974523   52413   114.16
210github-june-2days-2019.json gzstd   1   6273951764  1164400847  29948   199.79
211github-june-2days-2019.json gzkp    1   6273951764  1128755542  19236   311.03
212
213VM Image, Linux mint with a few installed applications:
214https://files.klauspost.com/compress/rawstudio-mint14.7z
215
216file                    out level   insize      outsize     millis  mb/s
217rawstudio-mint14.tar    zskp    1   8558382592  3667489370  20210   403.84
218rawstudio-mint14.tar    zskp    2   8558382592  3364592300  31873   256.07
219rawstudio-mint14.tar    zskp    3   8558382592  3158085214  77675   105.08
220rawstudio-mint14.tar    zskp    4   8558382592  3020370044  404956  20.16
221rawstudio-mint14.tar    zstd    1   8558382592  3609250104  17136   476.27
222rawstudio-mint14.tar    zstd    3   8558382592  3341679997  29262   278.92
223rawstudio-mint14.tar    zstd    6   8558382592  3235846406  77904   104.77
224rawstudio-mint14.tar    zstd    9   8558382592  3160778861  140946  57.91
225rawstudio-mint14.tar    gzstd   1   8558382592  3926257486  57722   141.40
226rawstudio-mint14.tar    gzkp    1   8558382592  3970463184  41749   195.49
227
228CSV data:
229https://files.klauspost.com/compress/nyc-taxi-data-10M.csv.zst
230
231file                    out level   insize      outsize     millis  mb/s
232nyc-taxi-data-10M.csv   zskp    1   3325605752  641339945   8925    355.35
233nyc-taxi-data-10M.csv   zskp    2   3325605752  591748091   11268   281.44
234nyc-taxi-data-10M.csv   zskp    3   3325605752  530289687   25239   125.66
235nyc-taxi-data-10M.csv   zskp    4   3325605752  490907191   65939   48.10
236nyc-taxi-data-10M.csv   zstd    1   3325605752  687399637   8233    385.18
237nyc-taxi-data-10M.csv   zstd    3   3325605752  598514411   10065   315.07
238nyc-taxi-data-10M.csv   zstd    6   3325605752  570522953   20038   158.27
239nyc-taxi-data-10M.csv   zstd    9   3325605752  517554797   64565   49.12
240nyc-taxi-data-10M.csv   gzstd   1   3325605752  928656485   23876   132.83
241nyc-taxi-data-10M.csv   gzkp    1   3325605752  924718719   16388   193.53
242```
243
244## Decompressor
245
246Staus: STABLE - there may still be subtle bugs, but a wide variety of content has been tested.
247
248This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
249kindly supplied by [fuzzit.dev](https://fuzzit.dev/).
250The main purpose of the fuzz testing is to ensure that it is not possible to crash the decoder,
251or run it past its limits with ANY input provided.
252
253### Usage
254
255The package has been designed for two main usages, big streams of data and smaller in-memory buffers.
256There are two main usages of the package for these. Both of them are accessed by creating a `Decoder`.
257
258For streaming use a simple setup could look like this:
259
260```Go
261import "github.com/klauspost/compress/zstd"
262
263func Decompress(in io.Reader, out io.Writer) error {
264    d, err := zstd.NewReader(in)
265    if err != nil {
266        return err
267    }
268    defer d.Close()
269
270    // Copy content...
271    _, err = io.Copy(out, d)
272    return err
273}
274```
275
276It is important to use the "Close" function when you no longer need the Reader to stop running goroutines.
277See "Allocation-less operation" below.
278
279For decoding buffers, it could look something like this:
280
281```Go
282import "github.com/klauspost/compress/zstd"
283
284// Create a reader that caches decompressors.
285// For this operation type we supply a nil Reader.
286var decoder, _ = zstd.NewReader(nil)
287
288// Decompress a buffer. We don't supply a destination buffer,
289// so it will be allocated by the decoder.
290func Decompress(src []byte) ([]byte, error) {
291    return decoder.DecodeAll(src, nil)
292}
293```
294
295Both of these cases should provide the functionality needed.
296The decoder can be used for *concurrent* decompression of multiple buffers.
297It will only allow a certain number of concurrent operations to run.
298To tweak that yourself use the `WithDecoderConcurrency(n)` option when creating the decoder.
299
300### Dictionaries
301
302Data compressed with [dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression) can be decompressed.
303
304Dictionaries are added individually to Decoders.
305Dictionaries are generated by the `zstd --train` command and contains an initial state for the decoder.
306To add a dictionary use the `WithDecoderDicts(dicts ...[]byte)` option with the dictionary data.
307Several dictionaries can be added at once.
308
309The dictionary will be used automatically for the data that specifies them.
310A re-used Decoder will still contain the dictionaries registered.
311
312When registering multiple dictionaries with the same ID, the last one will be used.
313
314It is possible to use dictionaries when compressing data.
315
316To enable a dictionary use `WithEncoderDict(dict []byte)`. Here only one dictionary will be used
317and it will likely be used even if it doesn't improve compression.
318
319The used dictionary must be used to decompress the content.
320
321For any real gains, the dictionary should be built with similar data.
322If an unsuitable dictionary is used the output may be slightly larger than using no dictionary.
323Use the [zstd commandline tool](https://github.com/facebook/zstd/releases) to build a dictionary from sample data.
324For information see [zstd dictionary information](https://github.com/facebook/zstd#the-case-for-small-data-compression).
325
326For now there is a fixed startup performance penalty for compressing content with dictionaries.
327This will likely be improved over time. Just be aware to test performance when implementing.
328
329### Allocation-less operation
330
331The decoder has been designed to operate without allocations after a warmup.
332
333This means that you should *store* the decoder for best performance.
334To re-use a stream decoder, use the `Reset(r io.Reader) error` to switch to another stream.
335A decoder can safely be re-used even if the previous stream failed.
336
337To release the resources, you must call the `Close()` function on a decoder.
338After this it can *no longer be reused*, but all running goroutines will be stopped.
339So you *must* use this if you will no longer need the Reader.
340
341For decompressing smaller buffers a single decoder can be used.
342When decoding buffers, you can supply a destination slice with length 0 and your expected capacity.
343In this case no unneeded allocations should be made.
344
345### Concurrency
346
347The buffer decoder does everything on the same goroutine and does nothing concurrently.
348It can however decode several buffers concurrently. Use `WithDecoderConcurrency(n)` to limit that.
349
350The stream decoder operates on
351
352* One goroutine reads input and splits the input to several block decoders.
353* A number of decoders will decode blocks.
354* A goroutine coordinates these blocks and sends history from one to the next.
355
356So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.
357
358Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.
359
360In practice this means that concurrency is often limited to utilizing about 2 cores effectively.
361
362
363### Benchmarks
364
365These are some examples of performance compared to [datadog cgo library](https://github.com/DataDog/zstd).
366
367The first two are streaming decodes and the last are smaller inputs.
368
369```
370BenchmarkDecoderSilesia-8                          3     385000067 ns/op     550.51 MB/s        5498 B/op          8 allocs/op
371BenchmarkDecoderSilesiaCgo-8                       6     197666567 ns/op    1072.25 MB/s      270672 B/op          8 allocs/op
372
373BenchmarkDecoderEnwik9-8                           1    2027001600 ns/op     493.34 MB/s       10496 B/op         18 allocs/op
374BenchmarkDecoderEnwik9Cgo-8                        2     979499200 ns/op    1020.93 MB/s      270672 B/op          8 allocs/op
375
376Concurrent performance:
377
378BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                28915         42469 ns/op    4340.07 MB/s         114 B/op          0 allocs/op
379BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16           116505          9965 ns/op    11900.16 MB/s         16 B/op          0 allocs/op
380BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16              8952        134272 ns/op    3588.70 MB/s         915 B/op          0 allocs/op
381BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16               11820        102538 ns/op    4161.90 MB/s         594 B/op          0 allocs/op
382BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16             34782         34184 ns/op    3661.88 MB/s          60 B/op          0 allocs/op
383BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16              27712         43447 ns/op    3500.58 MB/s          99 B/op          0 allocs/op
384BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                 62826         18750 ns/op    21845.10 MB/s        104 B/op          0 allocs/op
385BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16          631545          1794 ns/op    57078.74 MB/s          2 B/op          0 allocs/op
386BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16         1690140           712 ns/op    172938.13 MB/s         1 B/op          0 allocs/op
387BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                 10432        113593 ns/op    6180.73 MB/s        1143 B/op          0 allocs/op
388BenchmarkDecoder_DecodeAllParallel/html.zst-16                    113206         10671 ns/op    9596.27 MB/s          15 B/op          0 allocs/op
389BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16          1530615           779 ns/op    5229.49 MB/s           0 B/op          0 allocs/op
390
391BenchmarkDecoder_DecodeAllParallelCgo/kppkn.gtb.zst-16             65217         16192 ns/op    11383.34 MB/s         46 B/op          0 allocs/op
392BenchmarkDecoder_DecodeAllParallelCgo/geo.protodata.zst-16        292671          4039 ns/op    29363.19 MB/s          6 B/op          0 allocs/op
393BenchmarkDecoder_DecodeAllParallelCgo/plrabn12.txt.zst-16          26314         46021 ns/op    10470.43 MB/s        293 B/op          0 allocs/op
394BenchmarkDecoder_DecodeAllParallelCgo/lcet10.txt.zst-16            33897         34900 ns/op    12227.96 MB/s        205 B/op          0 allocs/op
395BenchmarkDecoder_DecodeAllParallelCgo/asyoulik.txt.zst-16         104348         11433 ns/op    10949.01 MB/s         20 B/op          0 allocs/op
396BenchmarkDecoder_DecodeAllParallelCgo/alice29.txt.zst-16           75949         15510 ns/op    9805.60 MB/s          32 B/op          0 allocs/op
397BenchmarkDecoder_DecodeAllParallelCgo/html_x_4.zst-16             173910          6756 ns/op    60624.29 MB/s         37 B/op          0 allocs/op
398BenchmarkDecoder_DecodeAllParallelCgo/paper-100k.pdf.zst-16       923076          1339 ns/op    76474.87 MB/s          1 B/op          0 allocs/op
399BenchmarkDecoder_DecodeAllParallelCgo/fireworks.jpeg.zst-16       922920          1351 ns/op    91102.57 MB/s          2 B/op          0 allocs/op
400BenchmarkDecoder_DecodeAllParallelCgo/urls.10K.zst-16              27649         43618 ns/op    16096.19 MB/s        407 B/op          0 allocs/op
401BenchmarkDecoder_DecodeAllParallelCgo/html.zst-16                 279073          4160 ns/op    24614.18 MB/s          6 B/op          0 allocs/op
402BenchmarkDecoder_DecodeAllParallelCgo/comp-data.bin.zst-16        749938          1579 ns/op    2581.71 MB/s           0 B/op          0 allocs/op
403```
404
405This reflects the performance around May 2020, but this may be out of date.
406
407## Zstd inside ZIP files
408
409It is possible to use zstandard to compress individual files inside zip archives.
410While this isn't widely supported it can be useful for internal files.
411
412To support the compression and decompression of these files you must register a compressor and decompressor.
413
414It is highly recommended registering the (de)compressors on individual zip Reader/Writer and NOT
415use the global registration functions. The main reason for this is that 2 registrations from
416different packages will result in a panic.
417
418It is a good idea to only have a single compressor and decompressor, since they can be used for multiple zip
419files concurrently, and using a single instance will allow reusing some resources.
420
421See [this example](https://pkg.go.dev/github.com/klauspost/compress/zstd#example-ZipCompressor) for
422how to compress and decompress files inside zip archives.
423
424# Contributions
425
426Contributions are always welcome.
427For new features/fixes, remember to add tests and for performance enhancements include benchmarks.
428
429For general feedback and experience reports, feel free to open an issue or write me on [Twitter](https://twitter.com/sh0dan).
430
431This package includes the excellent [`github.com/cespare/xxhash`](https://github.com/cespare/xxhash) package Copyright (c) 2016 Caleb Spare.
432