• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..23-Jun-2020-

internal/xxhash/H23-Jun-2020-780575

testdata/H03-May-2022-

README.mdH A D23-Jun-202020.7 KiB428309

bitreader.goH A D23-Jun-20203.3 KiB137100

bitwriter.goH A D23-Jun-20204.5 KiB170131

blockdec.goH A D23-Jun-202018.5 KiB740619

blockenc.goH A D23-Jun-202021.3 KiB838697

blocktype_string.goH A D23-Jun-20202.7 KiB8661

bytebuf.goH A D23-Jun-20202.3 KiB12898

bytereader.goH A D23-Jun-20202 KiB8959

decoder.goH A D23-Jun-202012.7 KiB541400

decoder_options.goH A D23-Jun-20202.3 KiB8556

decoder_test.goH A D23-Jun-202033.5 KiB1,4541,216

dict.goH A D23-Jun-20202.4 KiB10588

dict_test.goH A D23-Jun-20202.5 KiB135126

enc_better.goH A D23-Jun-202013.8 KiB519385

enc_dfast.goH A D23-Jun-202017.6 KiB679497

enc_fast.goH A D23-Jun-202018.7 KiB756582

enc_params.goH A D23-Jun-20206.9 KiB1581

encoder.goH A D23-Jun-202013.2 KiB561461

encoder_options.goH A D23-Jun-20208.4 KiB268176

encoder_options_test.goH A D23-Jun-20202.8 KiB155150

encoder_test.goH A D23-Jun-202026.3 KiB1,065971

framedec.goH A D23-Jun-202011.9 KiB495400

frameenc.goH A D23-Jun-20202.9 KiB11695

fse_decoder.goH A D23-Jun-202010.6 KiB386293

fse_encoder.goH A D23-Jun-202019.1 KiB727600

fse_predefined.goH A D23-Jun-20205.2 KiB159112

hash.goH A D23-Jun-20202.6 KiB7844

history.goH A D23-Jun-20202.3 KiB9065

seqdec.goH A D23-Jun-202012.9 KiB486369

seqenc.goH A D23-Jun-20203.2 KiB11682

snappy.goH A D23-Jun-202012.7 KiB437346

snappy_test.goH A D23-Jun-20206.7 KiB332312

zstd.goH A D23-Jun-20204.5 KiB14580

zstd_test.goH A D23-Jun-2020427 2926

README.md

1# zstd
2
3[Zstandard](https://facebook.github.io/zstd/) is a real-time compression algorithm, providing high compression ratios.
4It offers a very wide range of compression / speed trade-off, while being backed by a very fast decoder.
5A high performance compression algorithm is implemented. For now focused on speed.
6
7This package provides [compression](#Compressor) to and [decompression](#Decompressor) of Zstandard content.
8Note that custom dictionaries are only supported for decompression.
9
10This package is pure Go and without use of "unsafe".
11
12The `zstd` package is provided as open source software using a Go standard license.
13
14Currently the package is heavily optimized for 64 bit processors and will be significantly slower on 32 bit processors.
15
16## Installation
17
18Install using `go get -u github.com/klauspost/compress`. The package is located in `github.com/klauspost/compress/zstd`.
19
20Godoc Documentation: https://godoc.org/github.com/klauspost/compress/zstd
21
22
23## Compressor
24
25### Status:
26
27STABLE - there may always be subtle bugs, a wide variety of content has been tested and the library is actively
28used by several projects. This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
29kindly supplied by [fuzzit.dev](https://fuzzit.dev/).
30
31There may still be specific combinations of data types/size/settings that could lead to edge cases,
32so as always, testing is recommended.
33
34For now, a high speed (fastest) and medium-fast (default) compressor has been implemented.
35
36The "Fastest" compression ratio is roughly equivalent to zstd level 1.
37The "Default" compression ratio is roughly equivalent to zstd level 3 (default).
38
39In terms of speed, it is typically 2x as fast as the stdlib deflate/gzip in its fastest mode.
40The compression ratio compared to stdlib is around level 3, but usually 3x as fast.
41
42Compared to cgo zstd, the speed is around level 3 (default), but compression slightly worse, between level 1&2.
43
44
45### Usage
46
47An Encoder can be used for either compressing a stream via the
48`io.WriteCloser` interface supported by the Encoder or as multiple independent
49tasks via the `EncodeAll` function.
50Smaller encodes are encouraged to use the EncodeAll function.
51Use `NewWriter` to create a new instance that can be used for both.
52
53To create a writer with default options, do like this:
54
55```Go
56// Compress input to output.
57func Compress(in io.Reader, out io.Writer) error {
58    w, err := NewWriter(output)
59    if err != nil {
60        return err
61    }
62    _, err := io.Copy(w, input)
63    if err != nil {
64        enc.Close()
65        return err
66    }
67    return enc.Close()
68}
69```
70
71Now you can encode by writing data to `enc`. The output will be finished writing when `Close()` is called.
72Even if your encode fails, you should still call `Close()` to release any resources that may be held up.
73
74The above is fine for big encodes. However, whenever possible try to *reuse* the writer.
75
76To reuse the encoder, you can use the `Reset(io.Writer)` function to change to another output.
77This will allow the encoder to reuse all resources and avoid wasteful allocations.
78
79Currently stream encoding has 'light' concurrency, meaning up to 2 goroutines can be working on part
80of a stream. This is independent of the `WithEncoderConcurrency(n)`, but that is likely to change
81in the future. So if you want to limit concurrency for future updates, specify the concurrency
82you would like.
83
84You can specify your desired compression level using `WithEncoderLevel()` option. Currently only pre-defined
85compression settings can be specified.
86
87#### Future Compatibility Guarantees
88
89This will be an evolving project. When using this package it is important to note that both the compression efficiency and speed may change.
90
91The goal will be to keep the default efficiency at the default zstd (level 3).
92However the encoding should never be assumed to remain the same,
93and you should not use hashes of compressed output for similarity checks.
94
95The Encoder can be assumed to produce the same output from the exact same code version.
96However, the may be modes in the future that break this,
97although they will not be enabled without an explicit option.
98
99This encoder is not designed to (and will probably never) output the exact same bitstream as the reference encoder.
100
101Also note, that the cgo decompressor currently does not [report all errors on invalid input](https://github.com/DataDog/zstd/issues/59),
102[omits error checks](https://github.com/DataDog/zstd/issues/61), [ignores checksums](https://github.com/DataDog/zstd/issues/43)
103and seems to ignore concatenated streams, even though [it is part of the spec](https://github.com/facebook/zstd/blob/dev/doc/zstd_compression_format.md#frames).
104
105#### Blocks
106
107For compressing small blocks, the returned encoder has a function called `EncodeAll(src, dst []byte) []byte`.
108
109`EncodeAll` will encode all input in src and append it to dst.
110This function can be called concurrently, but each call will only run on a single goroutine.
111
112Encoded blocks can be concatenated and the result will be the combined input stream.
113Data compressed with EncodeAll can be decoded with the Decoder, using either a stream or `DecodeAll`.
114
115Especially when encoding blocks you should take special care to reuse the encoder.
116This will effectively make it run without allocations after a warmup period.
117To make it run completely without allocations, supply a destination buffer with space for all content.
118
119```Go
120import "github.com/klauspost/compress/zstd"
121
122// Create a writer that caches compressors.
123// For this operation type we supply a nil Reader.
124var encoder, _ = zstd.NewWriter(nil)
125
126// Compress a buffer.
127// If you have a destination buffer, the allocation in the call can also be eliminated.
128func Compress(src []byte) []byte {
129    return encoder.EncodeAll(src, make([]byte, 0, len(src)))
130}
131```
132
133You can control the maximum number of concurrent encodes using the `WithEncoderConcurrency(n)`
134option when creating the writer.
135
136Using the Encoder for both a stream and individual blocks concurrently is safe.
137
138### Performance
139
140I have collected some speed examples to compare speed and compression against other compressors.
141
142* `file` is the input file.
143* `out` is the compressor used. `zskp` is this package. `zstd` is the Datadog cgo library. `gzstd/gzkp` is gzip standard and this library.
144* `level` is the compression level used. For `zskp` level 1 is "fastest", level 2 is "default".
145* `insize`/`outsize` is the input/output size.
146* `millis` is the number of milliseconds used for compression.
147* `mb/s` is megabytes (2^20 bytes) per second.
148
149```
150Silesia Corpus:
151http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip
152
153This package:
154file    out     level   insize      outsize     millis  mb/s
155silesia.tar zskp    1   211947520   73101992    643     313.87
156silesia.tar zskp    2   211947520   67504318    969     208.38
157silesia.tar zskp    3   211947520   65177448    1899    106.44
158
159cgo zstd:
160silesia.tar zstd    1   211947520   73605392    543     371.56
161silesia.tar zstd    3   211947520   66793289    864     233.68
162silesia.tar zstd    6   211947520   62916450    1913    105.66
163
164gzip, stdlib/this package:
165silesia.tar gzstd   1   211947520   80007735    1654    122.21
166silesia.tar gzkp    1   211947520   80369488    1168    173.06
167
168GOB stream of binary data. Highly compressible.
169https://files.klauspost.com/compress/gob-stream.7z
170
171file        out     level   insize  outsize     millis  mb/s
172gob-stream  zskp    1   1911399616  235022249   3088    590.30
173gob-stream  zskp    2   1911399616  205669791   3786    481.34
174gob-stream  zskp    3   1911399616  185792019   9324    195.48
175gob-stream  zstd    1   1911399616  249810424   2637    691.26
176gob-stream  zstd    3   1911399616  208192146   3490    522.31
177gob-stream  zstd    6   1911399616  193632038   6687    272.56
178gob-stream  gzstd   1   1911399616  357382641   10251   177.82
179gob-stream  gzkp    1   1911399616  362156523   5695    320.08
180
181The test data for the Large Text Compression Benchmark is the first
18210^9 bytes of the English Wikipedia dump on Mar. 3, 2006.
183http://mattmahoney.net/dc/textdata.html
184
185file    out level   insize      outsize     millis  mb/s
186enwik9  zskp    1   1000000000  343848582   3609    264.18
187enwik9  zskp    2   1000000000  317276632   5746    165.97
188enwik9  zskp    3   1000000000  294540704   11725   81.34
189enwik9  zstd    1   1000000000  358072021   3110    306.65
190enwik9  zstd    3   1000000000  313734672   4784    199.35
191enwik9  zstd    6   1000000000  295138875   10290   92.68
192enwik9  gzstd   1   1000000000  382578136   9604    99.30
193enwik9  gzkp    1   1000000000  383825945   6544    145.73
194
195Highly compressible JSON file.
196https://files.klauspost.com/compress/github-june-2days-2019.json.zst
197
198file                        out level   insize      outsize     millis  mb/s
199github-june-2days-2019.json zskp    1   6273951764  699045015   10620   563.40
200github-june-2days-2019.json zskp    2   6273951764  617881763   11687   511.96
201github-june-2days-2019.json zskp    3   6273951764  537511906   29252   204.54
202github-june-2days-2019.json zstd    1   6273951764  766284037   8450    708.00
203github-june-2days-2019.json zstd    3   6273951764  661889476   10927   547.57
204github-june-2days-2019.json zstd    6   6273951764  642756859   22996   260.18
205github-june-2days-2019.json gzstd   1   6273951764  1164400847  29948   199.79
206github-june-2days-2019.json gzkp    1   6273951764  1128755542  19236   311.03
207
208VM Image, Linux mint with a few installed applications:
209https://files.klauspost.com/compress/rawstudio-mint14.7z
210
211file                    out level   insize      outsize     millis  mb/s
212rawstudio-mint14.tar    zskp    1   8558382592  3667489370  20210   403.84
213rawstudio-mint14.tar    zskp    2   8558382592  3364592300  31873   256.07
214rawstudio-mint14.tar    zskp    3   8558382592  3224594213  71751   113.75
215rawstudio-mint14.tar    zstd    1   8558382592  3609250104  17136   476.27
216rawstudio-mint14.tar    zstd    3   8558382592  3341679997  29262   278.92
217rawstudio-mint14.tar    zstd    6   8558382592  3235846406  77904   104.77
218rawstudio-mint14.tar    gzstd   1   8558382592  3926257486  57722   141.40
219rawstudio-mint14.tar    gzkp    1   8558382592  3970463184  41749   195.49
220
221CSV data:
222https://files.klauspost.com/compress/nyc-taxi-data-10M.csv.zst
223
224file                    out level   insize      outsize     millis  mb/s
225nyc-taxi-data-10M.csv   zskp    1   3325605752  641339945   8925    355.35
226nyc-taxi-data-10M.csv   zskp    2   3325605752  591748091   11268   281.44
227nyc-taxi-data-10M.csv   zskp    3   3325605752  538490114   19880   159.53
228nyc-taxi-data-10M.csv   zstd    1   3325605752  687399637   8233    385.18
229nyc-taxi-data-10M.csv   zstd    3   3325605752  598514411   10065   315.07
230nyc-taxi-data-10M.csv   zstd    6   3325605752  570522953   20038   158.27
231nyc-taxi-data-10M.csv   gzstd   1   3325605752  928656485   23876   132.83
232nyc-taxi-data-10M.csv   gzkp    1   3325605752  924718719   16388   193.53
233```
234
235### Converters
236
237As part of the development process a *Snappy* -> *Zstandard* converter was also built.
238
239This can convert a *framed* [Snappy Stream](https://godoc.org/github.com/golang/snappy#Writer) to a zstd stream.
240Note that a single block is not framed.
241
242Conversion is done by converting the stream directly from Snappy without intermediate full decoding.
243Therefore the compression ratio is much less than what can be done by a full decompression
244and compression, and a faulty Snappy stream may lead to a faulty Zstandard stream without
245any errors being generated.
246No CRC value is being generated and not all CRC values of the Snappy stream are checked.
247However, it provides really fast re-compression of Snappy streams.
248
249
250```
251BenchmarkSnappy_ConvertSilesia-8           1  1156001600 ns/op   183.35 MB/s
252Snappy len 103008711 -> zstd len 82687318
253
254BenchmarkSnappy_Enwik9-8           1  6472998400 ns/op   154.49 MB/s
255Snappy len 508028601 -> zstd len 390921079
256```
257
258
259```Go
260    s := zstd.SnappyConverter{}
261    n, err = s.Convert(input, output)
262    if err != nil {
263        fmt.Println("Re-compressed stream to", n, "bytes")
264    }
265```
266
267The converter `s` can be reused to avoid allocations, even after errors.
268
269
270## Decompressor
271
272Staus: STABLE - there may still be subtle bugs, but a wide variety of content has been tested.
273
274This library is being continuously [fuzz-tested](https://github.com/klauspost/compress-fuzz),
275kindly supplied by [fuzzit.dev](https://fuzzit.dev/).
276The main purpose of the fuzz testing is to ensure that it is not possible to crash the decoder,
277or run it past its limits with ANY input provided.
278
279### Usage
280
281The package has been designed for two main usages, big streams of data and smaller in-memory buffers.
282There are two main usages of the package for these. Both of them are accessed by creating a `Decoder`.
283
284For streaming use a simple setup could look like this:
285
286```Go
287import "github.com/klauspost/compress/zstd"
288
289func Decompress(in io.Reader, out io.Writer) error {
290    d, err := zstd.NewReader(input)
291    if err != nil {
292        return err
293    }
294    defer d.Close()
295
296    // Copy content...
297    _, err := io.Copy(out, d)
298    return err
299}
300```
301
302It is important to use the "Close" function when you no longer need the Reader to stop running goroutines.
303See "Allocation-less operation" below.
304
305For decoding buffers, it could look something like this:
306
307```Go
308import "github.com/klauspost/compress/zstd"
309
310// Create a reader that caches decompressors.
311// For this operation type we supply a nil Reader.
312var decoder, _ = zstd.NewReader(nil)
313
314// Decompress a buffer. We don't supply a destination buffer,
315// so it will be allocated by the decoder.
316func Decompress(src []byte) ([]byte, error) {
317    return decoder.DecodeAll(src, nil)
318}
319```
320
321Both of these cases should provide the functionality needed.
322The decoder can be used for *concurrent* decompression of multiple buffers.
323It will only allow a certain number of concurrent operations to run.
324To tweak that yourself use the `WithDecoderConcurrency(n)` option when creating the decoder.
325
326### Dictionaries
327
328Data compressed with [dictionaries](https://github.com/facebook/zstd#the-case-for-small-data-compression) can be decompressed.
329
330Dictionaries are added individually to Decoders.
331Dictionaries are generated by the `zstd --train` command and contains an initial state for the decoder.
332To add a dictionary use the `WithDecoderDicts(dicts ...[]byte)` option with the dictionary data.
333Several dictionaries can be added at once.
334
335The dictionary will be used automatically for the data that specifies them.
336A re-used Decoder will still contain the dictionaries registered.
337
338When registering multiple dictionaries with the same ID, the last one will be used.
339
340### Allocation-less operation
341
342The decoder has been designed to operate without allocations after a warmup.
343
344This means that you should *store* the decoder for best performance.
345To re-use a stream decoder, use the `Reset(r io.Reader) error` to switch to another stream.
346A decoder can safely be re-used even if the previous stream failed.
347
348To release the resources, you must call the `Close()` function on a decoder.
349After this it can *no longer be reused*, but all running goroutines will be stopped.
350So you *must* use this if you will no longer need the Reader.
351
352For decompressing smaller buffers a single decoder can be used.
353When decoding buffers, you can supply a destination slice with length 0 and your expected capacity.
354In this case no unneeded allocations should be made.
355
356### Concurrency
357
358The buffer decoder does everything on the same goroutine and does nothing concurrently.
359It can however decode several buffers concurrently. Use `WithDecoderConcurrency(n)` to limit that.
360
361The stream decoder operates on
362
363* One goroutine reads input and splits the input to several block decoders.
364* A number of decoders will decode blocks.
365* A goroutine coordinates these blocks and sends history from one to the next.
366
367So effectively this also means the decoder will "read ahead" and prepare data to always be available for output.
368
369Since "blocks" are quite dependent on the output of the previous block stream decoding will only have limited concurrency.
370
371In practice this means that concurrency is often limited to utilizing about 2 cores effectively.
372
373
374### Benchmarks
375
376These are some examples of performance compared to [datadog cgo library](https://github.com/DataDog/zstd).
377
378The first two are streaming decodes and the last are smaller inputs.
379
380```
381BenchmarkDecoderSilesia-8                          3     385000067 ns/op     550.51 MB/s        5498 B/op          8 allocs/op
382BenchmarkDecoderSilesiaCgo-8                       6     197666567 ns/op    1072.25 MB/s      270672 B/op          8 allocs/op
383
384BenchmarkDecoderEnwik9-8                           1    2027001600 ns/op     493.34 MB/s       10496 B/op         18 allocs/op
385BenchmarkDecoderEnwik9Cgo-8                        2     979499200 ns/op    1020.93 MB/s      270672 B/op          8 allocs/op
386
387Concurrent performance:
388
389BenchmarkDecoder_DecodeAllParallel/kppkn.gtb.zst-16                28915         42469 ns/op    4340.07 MB/s         114 B/op          0 allocs/op
390BenchmarkDecoder_DecodeAllParallel/geo.protodata.zst-16           116505          9965 ns/op    11900.16 MB/s         16 B/op          0 allocs/op
391BenchmarkDecoder_DecodeAllParallel/plrabn12.txt.zst-16              8952        134272 ns/op    3588.70 MB/s         915 B/op          0 allocs/op
392BenchmarkDecoder_DecodeAllParallel/lcet10.txt.zst-16               11820        102538 ns/op    4161.90 MB/s         594 B/op          0 allocs/op
393BenchmarkDecoder_DecodeAllParallel/asyoulik.txt.zst-16             34782         34184 ns/op    3661.88 MB/s          60 B/op          0 allocs/op
394BenchmarkDecoder_DecodeAllParallel/alice29.txt.zst-16              27712         43447 ns/op    3500.58 MB/s          99 B/op          0 allocs/op
395BenchmarkDecoder_DecodeAllParallel/html_x_4.zst-16                 62826         18750 ns/op    21845.10 MB/s        104 B/op          0 allocs/op
396BenchmarkDecoder_DecodeAllParallel/paper-100k.pdf.zst-16          631545          1794 ns/op    57078.74 MB/s          2 B/op          0 allocs/op
397BenchmarkDecoder_DecodeAllParallel/fireworks.jpeg.zst-16         1690140           712 ns/op    172938.13 MB/s         1 B/op          0 allocs/op
398BenchmarkDecoder_DecodeAllParallel/urls.10K.zst-16                 10432        113593 ns/op    6180.73 MB/s        1143 B/op          0 allocs/op
399BenchmarkDecoder_DecodeAllParallel/html.zst-16                    113206         10671 ns/op    9596.27 MB/s          15 B/op          0 allocs/op
400BenchmarkDecoder_DecodeAllParallel/comp-data.bin.zst-16          1530615           779 ns/op    5229.49 MB/s           0 B/op          0 allocs/op
401
402BenchmarkDecoder_DecodeAllParallelCgo/kppkn.gtb.zst-16             65217         16192 ns/op    11383.34 MB/s         46 B/op          0 allocs/op
403BenchmarkDecoder_DecodeAllParallelCgo/geo.protodata.zst-16        292671          4039 ns/op    29363.19 MB/s          6 B/op          0 allocs/op
404BenchmarkDecoder_DecodeAllParallelCgo/plrabn12.txt.zst-16          26314         46021 ns/op    10470.43 MB/s        293 B/op          0 allocs/op
405BenchmarkDecoder_DecodeAllParallelCgo/lcet10.txt.zst-16            33897         34900 ns/op    12227.96 MB/s        205 B/op          0 allocs/op
406BenchmarkDecoder_DecodeAllParallelCgo/asyoulik.txt.zst-16         104348         11433 ns/op    10949.01 MB/s         20 B/op          0 allocs/op
407BenchmarkDecoder_DecodeAllParallelCgo/alice29.txt.zst-16           75949         15510 ns/op    9805.60 MB/s          32 B/op          0 allocs/op
408BenchmarkDecoder_DecodeAllParallelCgo/html_x_4.zst-16             173910          6756 ns/op    60624.29 MB/s         37 B/op          0 allocs/op
409BenchmarkDecoder_DecodeAllParallelCgo/paper-100k.pdf.zst-16       923076          1339 ns/op    76474.87 MB/s          1 B/op          0 allocs/op
410BenchmarkDecoder_DecodeAllParallelCgo/fireworks.jpeg.zst-16       922920          1351 ns/op    91102.57 MB/s          2 B/op          0 allocs/op
411BenchmarkDecoder_DecodeAllParallelCgo/urls.10K.zst-16              27649         43618 ns/op    16096.19 MB/s        407 B/op          0 allocs/op
412BenchmarkDecoder_DecodeAllParallelCgo/html.zst-16                 279073          4160 ns/op    24614.18 MB/s          6 B/op          0 allocs/op
413BenchmarkDecoder_DecodeAllParallelCgo/comp-data.bin.zst-16        749938          1579 ns/op    2581.71 MB/s           0 B/op          0 allocs/op
414```
415
416This reflects the performance around May 2020, but this may be out of date.
417
418# Contributions
419
420Contributions are always welcome.
421For new features/fixes, remember to add tests and for performance enhancements include benchmarks.
422
423For sending files for reproducing errors use a service like [goobox](https://goobox.io/#/upload) or similar to share your files.
424
425For general feedback and experience reports, feel free to open an issue or write me on [Twitter](https://twitter.com/sh0dan).
426
427This package includes the excellent [`github.com/cespare/xxhash`](https://github.com/cespare/xxhash) package Copyright (c) 2016 Caleb Spare.
428