• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..24-Mar-2022-

.gitignoreH A D24-Mar-2022337 1614

LICENSEH A D24-Mar-20221.5 KiB2925

README.mdH A D24-Mar-202239 KiB718535

decode.goH A D24-Mar-202213.8 KiB566454

decode_amd64.sH A D24-Mar-202214.4 KiB569239

decode_arm64.sH A D24-Mar-202214.6 KiB575244

decode_asm.goH A D24-Mar-2022474 182

decode_other.goH A D24-Mar-20227.2 KiB268213

encode.goH A D24-Mar-202231.3 KiB1,173794

encode_all.goH A D24-Mar-202211.3 KiB457315

encode_amd64.goH A D24-Mar-20223.7 KiB14393

encode_best.goH A D24-Mar-202215.3 KiB605452

encode_better.goH A D24-Mar-202210.6 KiB432300

encode_go.goH A D24-Mar-20227.4 KiB299213

encodeblock_amd64.goH A D24-Mar-20227 KiB19026

encodeblock_amd64.sH A D24-Mar-2022393.2 KiB15,67913,746

s2.goH A D24-Mar-20224.5 KiB14056

README.md

1# S2 Compression
2
3S2 is an extension of [Snappy](https://github.com/google/snappy).
4
5S2 is aimed for high throughput, which is why it features concurrent compression for bigger payloads.
6
7Decoding is compatible with Snappy compressed content, but content compressed with S2 cannot be decompressed by Snappy.
8This means that S2 can seamlessly replace Snappy without converting compressed content.
9
10S2 can produce Snappy compatible output, faster and better than Snappy.
11If you want full benefit of the changes you should use s2 without Snappy compatibility.
12
13S2 is designed to have high throughput on content that cannot be compressed.
14This is important, so you don't have to worry about spending CPU cycles on already compressed data.
15
16## Benefits over Snappy
17
18* Better compression
19* Adjustable compression (3 levels)
20* Concurrent stream compression
21* Faster decompression, even for Snappy compatible content
22* Ability to quickly skip forward in compressed stream
23* Compatible with reading Snappy compressed content
24* Smaller block size overhead on incompressible blocks
25* Block concatenation
26* Uncompressed stream mode
27* Automatic stream size padding
28* Snappy compatible block compression
29
30## Drawbacks over Snappy
31
32* Not optimized for 32 bit systems.
33* Streams use slightly more memory due to larger blocks and concurrency (configurable).
34
35# Usage
36
37Installation: `go get -u github.com/klauspost/compress/s2`
38
39Full package documentation:
40
41[![godoc][1]][2]
42
43[1]: https://godoc.org/github.com/klauspost/compress?status.svg
44[2]: https://godoc.org/github.com/klauspost/compress/s2
45
46## Compression
47
48```Go
49func EncodeStream(src io.Reader, dst io.Writer) error {
50    enc := s2.NewWriter(dst)
51    _, err := io.Copy(enc, src)
52    if err != nil {
53        enc.Close()
54        return err
55    }
56    // Blocks until compression is done.
57    return enc.Close()
58}
59```
60
61You should always call `enc.Close()`, otherwise you will leak resources and your encode will be incomplete.
62
63For the best throughput, you should attempt to reuse the `Writer` using the `Reset()` method.
64
65The Writer in S2 is always buffered, therefore `NewBufferedWriter` in Snappy can be replaced with `NewWriter` in S2.
66It is possible to flush any buffered data using the `Flush()` method.
67This will block until all data sent to the encoder has been written to the output.
68
69S2 also supports the `io.ReaderFrom` interface, which will consume all input from a reader.
70
71As a final method to compress data, if you have a single block of data you would like to have encoded as a stream,
72a slightly more efficient method is to use the `EncodeBuffer` method.
73This will take ownership of the buffer until the stream is closed.
74
75```Go
76func EncodeStream(src []byte, dst io.Writer) error {
77    enc := s2.NewWriter(dst)
78    // The encoder owns the buffer until Flush or Close is called.
79    err := enc.EncodeBuffer(buf)
80    if err != nil {
81        enc.Close()
82        return err
83    }
84    // Blocks until compression is done.
85    return enc.Close()
86}
87```
88
89Each call to `EncodeBuffer` will result in discrete blocks being created without buffering,
90so it should only be used a single time per stream.
91If you need to write several blocks, you should use the regular io.Writer interface.
92
93
94## Decompression
95
96```Go
97func DecodeStream(src io.Reader, dst io.Writer) error {
98    dec := s2.NewReader(src)
99    _, err := io.Copy(dst, dec)
100    return err
101}
102```
103
104Similar to the Writer, a Reader can be reused using the `Reset` method.
105
106For the best possible throughput, there is a `EncodeBuffer(buf []byte)` function available.
107However, it requires that the provided buffer isn't used after it is handed over to S2 and until the stream is flushed or closed.
108
109For smaller data blocks, there is also a non-streaming interface: `Encode()`, `EncodeBetter()` and `Decode()`.
110Do however note that these functions (similar to Snappy) does not provide validation of data,
111so data corruption may be undetected. Stream encoding provides CRC checks of data.
112
113It is possible to efficiently skip forward in a compressed stream using the `Skip()` method.
114For big skips the decompressor is able to skip blocks without decompressing them.
115
116## Single Blocks
117
118Similar to Snappy S2 offers single block compression.
119Blocks do not offer the same flexibility and safety as streams,
120but may be preferable for very small payloads, less than 100K.
121
122Using a simple `dst := s2.Encode(nil, src)` will compress `src` and return the compressed result.
123It is possible to provide a destination buffer.
124If the buffer has a capacity of `s2.MaxEncodedLen(len(src))` it will be used.
125If not a new will be allocated.
126
127Alternatively `EncodeBetter`/`EncodeBest` can also be used for better, but slightly slower compression.
128
129Similarly to decompress a block you can use `dst, err := s2.Decode(nil, src)`.
130Again an optional destination buffer can be supplied.
131The `s2.DecodedLen(src)` can be used to get the minimum capacity needed.
132If that is not satisfied a new buffer will be allocated.
133
134Block function always operate on a single goroutine since it should only be used for small payloads.
135
136# Commandline tools
137
138Some very simply commandline tools are provided; `s2c` for compression and `s2d` for decompression.
139
140Binaries can be downloaded on the [Releases Page](https://github.com/klauspost/compress/releases).
141
142Installing then requires Go to be installed. To install them, use:
143
144`go install github.com/klauspost/compress/s2/cmd/s2c && go install github.com/klauspost/compress/s2/cmd/s2d`
145
146To build binaries to the current folder use:
147
148`go build github.com/klauspost/compress/s2/cmd/s2c && go build github.com/klauspost/compress/s2/cmd/s2d`
149
150
151## s2c
152
153```
154Usage: s2c [options] file1 file2
155
156Compresses all files supplied as input separately.
157Output files are written as 'filename.ext.s2' or 'filename.ext.snappy'.
158By default output files will be overwritten.
159Use - as the only file name to read from stdin and write to stdout.
160
161Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
162Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
163
164File names beginning with 'http://' and 'https://' will be downloaded and compressed.
165Only http response code 200 is accepted.
166
167Options:
168  -bench int
169    	Run benchmark n times. No output will be written
170  -blocksize string
171    	Max  block size. Examples: 64K, 256K, 1M, 4M. Must be power of two and <= 4MB (default "4M")
172  -c	Write all output to stdout. Multiple input files will be concatenated
173  -cpu int
174    	Compress using this amount of threads (default 32)
175  -faster
176    	Compress faster, but with a minor compression loss
177  -help
178    	Display help
179  -o string
180        Write output to another file. Single input file only
181  -pad string
182    	Pad size to a multiple of this value, Examples: 500, 64K, 256K, 1M, 4M, etc (default "1")
183  -q	Don't write any output to terminal, except errors
184  -rm
185    	Delete source file(s) after successful compression
186  -safe
187    	Do not overwrite output files
188  -slower
189    	Compress more, but a lot slower
190  -snappy
191        Generate Snappy compatible output stream
192  -verify
193    	Verify written files
194
195```
196
197## s2d
198
199```
200Usage: s2d [options] file1 file2
201
202Decompresses all files supplied as input. Input files must end with '.s2' or '.snappy'.
203Output file names have the extension removed. By default output files will be overwritten.
204Use - as the only file name to read from stdin and write to stdout.
205
206Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
207Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
208
209File names beginning with 'http://' and 'https://' will be downloaded and decompressed.
210Extensions on downloaded files are ignored. Only http response code 200 is accepted.
211
212Options:
213  -bench int
214    	Run benchmark n times. No output will be written
215  -c	Write all output to stdout. Multiple input files will be concatenated
216  -help
217    	Display help
218  -o string
219        Write output to another file. Single input file only
220  -q	Don't write any output to terminal, except errors
221  -rm
222    	Delete source file(s) after successful decompression
223  -safe
224    	Do not overwrite output files
225  -verify
226    	Verify files, but do not write output
227```
228
229## s2sx: self-extracting archives
230
231s2sx allows creating self-extracting archives with no dependencies.
232
233By default, executables are created for the same platforms as the host os,
234but this can be overridden with `-os` and `-arch` parameters.
235
236Extracted files have 0666 permissions, except when untar option used.
237
238```
239Usage: s2sx [options] file1 file2
240
241Compresses all files supplied as input separately.
242If files have '.s2' extension they are assumed to be compressed already.
243Output files are written as 'filename.s2sx' and with '.exe' for windows targets.
244If output is big, an additional file with ".more" is written. This must be included as well.
245By default output files will be overwritten.
246
247Wildcards are accepted: testdir/*.txt will compress all files in testdir ending with .txt
248Directories can be wildcards as well. testdir/*/*.txt will match testdir/subdir/b.txt
249
250Options:
251  -arch string
252        Destination architecture (default "amd64")
253  -c    Write all output to stdout. Multiple input files will be concatenated
254  -cpu int
255        Compress using this amount of threads (default 32)
256  -help
257        Display help
258  -max string
259        Maximum executable size. Rest will be written to another file. (default "1G")
260  -os string
261        Destination operating system (default "windows")
262  -q    Don't write any output to terminal, except errors
263  -rm
264        Delete source file(s) after successful compression
265  -safe
266        Do not overwrite output files
267  -untar
268        Untar on destination
269```
270
271Available platforms are:
272
273 * darwin-amd64
274 * darwin-arm64
275 * linux-amd64
276 * linux-arm
277 * linux-arm64
278 * linux-mips64
279 * linux-ppc64le
280 * windows-386
281 * windows-amd64
282
283By default, there is a size limit of 1GB for the output executable.
284
285When this is exceeded the remaining file content is written to a file called
286output+`.more`. This file must be included for a successful extraction and
287placed alongside the executable for a successful extraction.
288
289This file *must* have the same name as the executable, so if the executable is renamed,
290so must the `.more` file.
291
292This functionality is disabled with stdin/stdout.
293
294### Self-extracting TAR files
295
296If you wrap a TAR file you can specify `-untar` to make it untar on the destination host.
297
298Files are extracted to the current folder with the path specified in the tar file.
299
300Note that tar files are not validated before they are wrapped.
301
302For security reasons files that move below the root folder are not allowed.
303
304# Performance
305
306This section will focus on comparisons to Snappy.
307This package is solely aimed at replacing Snappy as a high speed compression package.
308If you are mainly looking for better compression [zstandard](https://github.com/klauspost/compress/tree/master/zstd#zstd)
309gives better compression, but typically at speeds slightly below "better" mode in this package.
310
311Compression is increased compared to Snappy, mostly around 5-20% and the throughput is typically 25-40% increased (single threaded) compared to the Snappy Go implementation.
312
313Streams are concurrently compressed. The stream will be distributed among all available CPU cores for the best possible throughput.
314
315A "better" compression mode is also available. This allows to trade a bit of speed for a minor compression gain.
316The content compressed in this mode is fully compatible with the standard decoder.
317
318Snappy vs S2 **compression** speed on 16 core (32 thread) computer, using all threads and a single thread (1 CPU):
319
320| File                                                                                                | S2 speed | S2 Throughput | S2 % smaller | S2 "better" | "better" throughput | "better" % smaller |
321|-----------------------------------------------------------------------------------------------------|----------|---------------|--------------|-------------|---------------------|--------------------|
322| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                    | 12.70x   | 10556 MB/s    | 7.35%        | 4.15x       | 3455 MB/s           | 12.79%             |
323| (1 CPU)                                                                                             | 1.14x    | 948 MB/s      | -            | 0.42x       | 349 MB/s            | -                  |
324| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst) | 17.13x   | 14484 MB/s    | 31.60%       | 10.09x      | 8533 MB/s           | 37.71%             |
325| (1 CPU)                                                                                             | 1.33x    | 1127 MB/s     | -            | 0.70x       | 589 MB/s            | -                  |
326| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)         | 15.14x   | 12000 MB/s    | -5.79%       | 6.59x       | 5223 MB/s           | 5.80%              |
327| (1 CPU)                                                                                             | 1.11x    | 877 MB/s      | -            | 0.47x       | 370 MB/s            | -                  |
328| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                     | 14.62x   | 12116 MB/s    | 15.90%       | 5.35x       | 4430 MB/s           | 16.08%             |
329| (1 CPU)                                                                                             | 1.38x    | 1146 MB/s     | -            | 0.38x       | 312 MB/s            | -                  |
330| [adresser.json](https://files.klauspost.com/compress/adresser.json.zst)                             | 8.83x    | 17579 MB/s    | 43.86%       | 6.54x       | 13011 MB/s          | 47.23%             |
331| (1 CPU)                                                                                             | 1.14x    | 2259 MB/s     | -            | 0.74x       | 1475 MB/s           | -                  |
332| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                    | 16.72x   | 14019 MB/s    | 24.02%       | 10.11x      | 8477 MB/s           | 30.48%             |
333| (1 CPU)                                                                                             | 1.24x    | 1043 MB/s     | -            | 0.70x       | 586 MB/s            | -                  |
334| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                     | 13.33x   | 9254 MB/s     | 1.84%        | 6.75x       | 4686 MB/s           | 6.72%              |
335| (1 CPU)                                                                                             | 0.97x    | 672 MB/s      | -            | 0.53x       | 366 MB/s            | -                  |
336| sharnd.out.2gb                                                                                      | 2.11x    | 12639 MB/s    | 0.01%        | 1.98x       | 11833 MB/s          | 0.01%              |
337| (1 CPU)                                                                                             | 0.93x    | 5594 MB/s     | -            | 1.34x       | 8030 MB/s           | -                  |
338| [enwik9](http://mattmahoney.net/dc/textdata.html)                                                   | 19.34x   | 8220 MB/s     | 3.98%        | 7.87x       | 3345 MB/s           | 15.82%             |
339| (1 CPU)                                                                                             | 1.06x    | 452 MB/s      | -            | 0.50x       | 213 MB/s            | -                  |
340| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                    | 10.48x   | 6124 MB/s     | 5.67%        | 3.76x       | 2197 MB/s           | 12.60%             |
341| (1 CPU)                                                                                             | 0.97x    | 568 MB/s      | -            | 0.46x       | 271 MB/s            | -                  |
342| [enwik10](https://encode.su/threads/3315-enwik10-benchmark-results)                                 | 21.07x   | 9020 MB/s     | 6.36%        | 6.91x       | 2959 MB/s           | 16.95%             |
343| (1 CPU)                                                                                             | 1.07x    | 460 MB/s      | -            | 0.51x       | 220 MB/s            | -                  |
344
345### Legend
346
347* `S2 speed`: Speed of S2 compared to Snappy, using 16 cores and 1 core.
348* `S2 throughput`: Throughput of S2 in MB/s.
349* `S2 % smaller`: How many percent of the Snappy output size is S2 better.
350* `S2 "better"`: Speed when enabling "better" compression mode in S2 compared to Snappy.
351* `"better" throughput`: Speed when enabling "better" compression mode in S2 compared to Snappy.
352* `"better" % smaller`: How many percent of the Snappy output size is S2 better when using "better" compression.
353
354There is a good speedup across the board when using a single thread and a significant speedup when using multiple threads.
355
356Machine generated data gets by far the biggest compression boost, with size being being reduced by up to 45% of Snappy size.
357
358The "better" compression mode sees a good improvement in all cases, but usually at a performance cost.
359
360Incompressible content (`sharnd.out.2gb`, 2GB random data) sees the smallest speedup.
361This is likely dominated by synchronization overhead, which is confirmed by the fact that single threaded performance is higher (see above).
362
363## Decompression
364
365S2 attempts to create content that is also fast to decompress, except in "better" mode where the smallest representation is used.
366
367S2 vs Snappy **decompression** speed. Both operating on single core:
368
369| File                                                                                                | S2 Throughput | vs. Snappy | Better Throughput | vs. Snappy |
370|-----------------------------------------------------------------------------------------------------|---------------|------------|-------------------|------------|
371| [rawstudio-mint14.tar](https://files.klauspost.com/compress/rawstudio-mint14.7z)                    | 2117 MB/s     | 1.14x      | 1738 MB/s         | 0.94x      |
372| [github-june-2days-2019.json](https://files.klauspost.com/compress/github-june-2days-2019.json.zst) | 2401 MB/s     | 1.25x      | 2307 MB/s         | 1.20x      |
373| [github-ranks-backup.bin](https://files.klauspost.com/compress/github-ranks-backup.bin.zst)         | 2075 MB/s     | 0.98x      | 1764 MB/s         | 0.83x      |
374| [consensus.db.10gb](https://files.klauspost.com/compress/consensus.db.10gb.zst)                     | 2967 MB/s     | 1.05x      | 2885 MB/s         | 1.02x      |
375| [adresser.json](https://files.klauspost.com/compress/adresser.json.zst)                             | 4141 MB/s     | 1.07x      | 4184 MB/s         | 1.08x      |
376| [gob-stream](https://files.klauspost.com/compress/gob-stream.7z)                                    | 2264 MB/s     | 1.12x      | 2185 MB/s         | 1.08x      |
377| [10gb.tar](http://mattmahoney.net/dc/10gb.html)                                                     | 1525 MB/s     | 1.03x      | 1347 MB/s         | 0.91x      |
378| sharnd.out.2gb                                                                                      | 3813 MB/s     | 0.79x      | 3900 MB/s         | 0.81x      |
379| [enwik9](http://mattmahoney.net/dc/textdata.html)                                                   | 1246 MB/s     | 1.29x      | 967 MB/s          | 1.00x      |
380| [silesia.tar](http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip)                                    | 1433 MB/s     | 1.12x      | 1203 MB/s         | 0.94x      |
381| [enwik10](https://encode.su/threads/3315-enwik10-benchmark-results)                                 | 1284 MB/s     | 1.32x      | 1010 MB/s         | 1.04x      |
382
383### Legend
384
385* `S2 Throughput`: Decompression speed of S2 encoded content.
386* `Better Throughput`: Decompression speed of S2 "better" encoded content.
387* `vs Snappy`: Decompression speed of S2 "better" mode compared to Snappy and absolute speed.
388
389
390While the decompression code hasn't changed, there is a significant speedup in decompression speed.
391S2 prefers longer matches and will typically only find matches that are 6 bytes or longer.
392While this reduces compression a bit, it improves decompression speed.
393
394The "better" compression mode will actively look for shorter matches, which is why it has a decompression speed quite similar to Snappy.
395
396Without assembly decompression is also very fast; single goroutine decompression speed. No assembly:
397
398| File                           | S2 Throughput | S2 throughput |
399|--------------------------------|--------------|---------------|
400| consensus.db.10gb.s2           | 1.84x        | 2289.8 MB/s   |
401| 10gb.tar.s2                    | 1.30x        | 867.07 MB/s   |
402| rawstudio-mint14.tar.s2        | 1.66x        | 1329.65 MB/s  |
403| github-june-2days-2019.json.s2 | 2.36x        | 1831.59 MB/s  |
404| github-ranks-backup.bin.s2     | 1.73x        | 1390.7 MB/s   |
405| enwik9.s2                      | 1.67x        | 681.53 MB/s   |
406| adresser.json.s2               | 3.41x        | 4230.53 MB/s  |
407| silesia.tar.s2                 | 1.52x        | 811.58        |
408
409Even though S2 typically compresses better than Snappy, decompression speed is always better.
410
411## Block compression
412
413
414When compressing blocks no concurrent compression is performed just as Snappy.
415This is because blocks are for smaller payloads and generally will not benefit from concurrent compression.
416
417An important change is that incompressible blocks will not be more than at most 10 bytes bigger than the input.
418In rare, worst case scenario Snappy blocks could be significantly bigger than the input.
419
420### Mixed content blocks
421
422The most reliable is a wide dataset.
423For this we use [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
42453927 files, total input size: 4,014,735,833 bytes. Single goroutine used.
425
426| *                 | Input      | Output     | Reduction | MB/s   |
427|-------------------|------------|------------|-----------|--------|
428| S2                | 4014735833 | 1059723369 | 73.60%    | **934.34** |
429| S2 Better         | 4014735833 | 969670507  | 75.85%    | 532.70 |
430| S2 Best           | 4014735833 | 906625668  | **77.85%** | 46.84 |
431| Snappy            | 4014735833 | 1128706759 | 71.89%    | 762.59 |
432| S2, Snappy Output | 4014735833 | 1093821420 | 72.75%    | 908.60 |
433| LZ4               | 4014735833 | 1079259294 | 73.12%    | 526.94 |
434
435S2 delivers both the best single threaded throughput with regular mode and the best compression rate with "best".
436"Better" mode provides the same compression speed as LZ4 with better compression ratio.
437
438When outputting Snappy compatible output it still delivers better throughput (150MB/s more) and better compression.
439
440As can be seen from the other benchmarks decompression should also be easier on the S2 generated output.
441
442Though they cannot be compared due to different decompression speeds here are the speed/size comparisons for
443other Go compressors:
444
445| *                 | Input      | Output     | Reduction | MB/s   |
446|-------------------|------------|------------|-----------|--------|
447| Zstd Fastest (Go) | 4014735833 | 794608518  | 80.21%    | 236.04 |
448| Zstd Best (Go)    | 4014735833 | 704603356  | 82.45%    | 35.63  |
449| Deflate (Go) l1   | 4014735833 | 871294239  | 78.30%    | 214.04 |
450| Deflate (Go) l9   | 4014735833 | 730389060  | 81.81%    | 41.17  |
451
452### Standard block compression
453
454Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
455So individual benchmarks should only be seen as a guideline and the overall picture is more important.
456
457These micro-benchmarks are with data in cache and trained branch predictors. For a more realistic benchmark see the mixed content above.
458
459Block compression. Parallel benchmark running on 16 cores, 16 goroutines.
460
461AMD64 assembly is use for both S2 and Snappy.
462
463| Absolute Perf         | Snappy size | S2 Size | Snappy Speed | S2 Speed    | Snappy dec  | S2 dec      |
464|-----------------------|-------------|---------|--------------|-------------|-------------|-------------|
465| html                  | 22843       | 21111   | 16246 MB/s   | 17438 MB/s  | 40972 MB/s  | 49263 MB/s  |
466| urls.10K              | 335492      | 287326  | 7943 MB/s    | 9693 MB/s   | 22523 MB/s  | 26484 MB/s  |
467| fireworks.jpeg        | 123034      | 123100  | 349544 MB/s  | 273889 MB/s | 718321 MB/s | 827552 MB/s |
468| fireworks.jpeg (200B) | 146         | 155     | 8869 MB/s    | 17773 MB/s  | 33691 MB/s  | 52421 MB/s  |
469| paper-100k.pdf        | 85304       | 84459   | 167546 MB/s  | 101263 MB/s | 326905 MB/s | 291944 MB/s |
470| html_x_4              | 92234       | 21113   | 15194 MB/s   | 50670 MB/s  | 30843 MB/s  | 32217 MB/s  |
471| alice29.txt           | 88034       | 85975   | 5936 MB/s    | 6139 MB/s   | 12882 MB/s  | 20044 MB/s  |
472| asyoulik.txt          | 77503       | 79650   | 5517 MB/s    | 6366 MB/s   | 12735 MB/s  | 22806 MB/s  |
473| lcet10.txt            | 234661      | 220670  | 6235 MB/s    | 6067 MB/s   | 14519 MB/s  | 18697 MB/s  |
474| plrabn12.txt          | 319267      | 317985  | 5159 MB/s    | 5726 MB/s   | 11923 MB/s  | 19901 MB/s  |
475| geo.protodata         | 23335       | 18690   | 21220 MB/s   | 26529 MB/s  | 56271 MB/s  | 62540 MB/s  |
476| kppkn.gtb             | 69526       | 65312   | 9732 MB/s    | 8559 MB/s   | 18491 MB/s  | 18969 MB/s  |
477| alice29.txt (128B)    | 80          | 82      | 6691 MB/s    | 15489 MB/s  | 31883 MB/s  | 38874 MB/s  |
478| alice29.txt (1000B)   | 774         | 774     | 12204 MB/s   | 13000 MB/s  | 48056 MB/s  | 52341 MB/s  |
479| alice29.txt (10000B)  | 6648        | 6933    | 10044 MB/s   | 12806 MB/s  | 32378 MB/s  | 46322 MB/s  |
480| alice29.txt (20000B)  | 12686       | 13574   | 7733 MB/s    | 11210 MB/s  | 30566 MB/s  | 58969 MB/s  |
481
482
483| Relative Perf         | Snappy size | S2 size improved | S2 Speed | S2 Dec Speed |
484|-----------------------|-------------|------------------|----------|--------------|
485| html                  | 22.31%      | 7.58%            | 1.07x    | 1.20x        |
486| urls.10K              | 47.78%      | 14.36%           | 1.22x    | 1.18x        |
487| fireworks.jpeg        | 99.95%      | -0.05%           | 0.78x    | 1.15x        |
488| fireworks.jpeg (200B) | 73.00%      | -6.16%           | 2.00x    | 1.56x        |
489| paper-100k.pdf        | 83.30%      | 0.99%            | 0.60x    | 0.89x        |
490| html_x_4              | 22.52%      | 77.11%           | 3.33x    | 1.04x        |
491| alice29.txt           | 57.88%      | 2.34%            | 1.03x    | 1.56x        |
492| asyoulik.txt          | 61.91%      | -2.77%           | 1.15x    | 1.79x        |
493| lcet10.txt            | 54.99%      | 5.96%            | 0.97x    | 1.29x        |
494| plrabn12.txt          | 66.26%      | 0.40%            | 1.11x    | 1.67x        |
495| geo.protodata         | 19.68%      | 19.91%           | 1.25x    | 1.11x        |
496| kppkn.gtb             | 37.72%      | 6.06%            | 0.88x    | 1.03x        |
497| alice29.txt (128B)    | 62.50%      | -2.50%           | 2.31x    | 1.22x        |
498| alice29.txt (1000B)   | 77.40%      | 0.00%            | 1.07x    | 1.09x        |
499| alice29.txt (10000B)  | 66.48%      | -4.29%           | 1.27x    | 1.43x        |
500| alice29.txt (20000B)  | 63.43%      | -7.00%           | 1.45x    | 1.93x        |
501
502Speed is generally at or above Snappy. Small blocks gets a significant speedup, although at the expense of size.
503
504Decompression speed is better than Snappy, except in one case.
505
506Since payloads are very small the variance in terms of size is rather big, so they should only be seen as a general guideline.
507
508Size is on average around Snappy, but varies on content type.
509In cases where compression is worse, it usually is compensated by a speed boost.
510
511
512### Better compression
513
514Benchmarking single block performance is subject to a lot more variation since it only tests a limited number of file patterns.
515So individual benchmarks should only be seen as a guideline and the overall picture is more important.
516
517| Absolute Perf         | Snappy size | Better Size | Snappy Speed | Better Speed | Snappy dec  | Better dec  |
518|-----------------------|-------------|-------------|--------------|--------------|-------------|-------------|
519| html                  | 22843       | 19833       | 16246 MB/s   | 7731 MB/s    | 40972 MB/s  | 40292 MB/s  |
520| urls.10K              | 335492      | 253529      | 7943 MB/s    | 3980 MB/s    | 22523 MB/s  | 20981 MB/s  |
521| fireworks.jpeg        | 123034      | 123100      | 349544 MB/s  | 9760 MB/s    | 718321 MB/s | 823698 MB/s |
522| fireworks.jpeg (200B) | 146         | 142         | 8869 MB/s    | 594 MB/s     | 33691 MB/s  | 30101 MB/s  |
523| paper-100k.pdf        | 85304       | 82915       | 167546 MB/s  | 7470 MB/s    | 326905 MB/s | 198869 MB/s |
524| html_x_4              | 92234       | 19841       | 15194 MB/s   | 23403 MB/s   | 30843 MB/s  | 30937 MB/s  |
525| alice29.txt           | 88034       | 73218       | 5936 MB/s    | 2945 MB/s    | 12882 MB/s  | 16611 MB/s  |
526| asyoulik.txt          | 77503       | 66844       | 5517 MB/s    | 2739 MB/s    | 12735 MB/s  | 14975 MB/s  |
527| lcet10.txt            | 234661      | 190589      | 6235 MB/s    | 3099 MB/s    | 14519 MB/s  | 16634 MB/s  |
528| plrabn12.txt          | 319267      | 270828      | 5159 MB/s    | 2600 MB/s    | 11923 MB/s  | 13382 MB/s  |
529| geo.protodata         | 23335       | 18278       | 21220 MB/s   | 11208 MB/s   | 56271 MB/s  | 57961 MB/s  |
530| kppkn.gtb             | 69526       | 61851       | 9732 MB/s    | 4556 MB/s    | 18491 MB/s  | 16524 MB/s  |
531| alice29.txt (128B)    | 80          | 81          | 6691 MB/s    | 529 MB/s     | 31883 MB/s  | 34225 MB/s  |
532| alice29.txt (1000B)   | 774         | 748         | 12204 MB/s   | 1943 MB/s    | 48056 MB/s  | 42068 MB/s  |
533| alice29.txt (10000B)  | 6648        | 6234        | 10044 MB/s   | 2949 MB/s    | 32378 MB/s  | 28813 MB/s  |
534| alice29.txt (20000B)  | 12686       | 11584       | 7733 MB/s    | 2822 MB/s    | 30566 MB/s  | 27315 MB/s  |
535
536
537| Relative Perf         | Snappy size | Better size | Better Speed | Better dec |
538|-----------------------|-------------|-------------|--------------|------------|
539| html                  | 22.31%      | 13.18%      | 0.48x        | 0.98x      |
540| urls.10K              | 47.78%      | 24.43%      | 0.50x        | 0.93x      |
541| fireworks.jpeg        | 99.95%      | -0.05%      | 0.03x        | 1.15x      |
542| fireworks.jpeg (200B) | 73.00%      | 2.74%       | 0.07x        | 0.89x      |
543| paper-100k.pdf        | 83.30%      | 2.80%       | 0.07x        | 0.61x      |
544| html_x_4              | 22.52%      | 78.49%      | 0.04x        | 1.00x      |
545| alice29.txt           | 57.88%      | 16.83%      | 1.54x        | 1.29x      |
546| asyoulik.txt          | 61.91%      | 13.75%      | 0.50x        | 1.18x      |
547| lcet10.txt            | 54.99%      | 18.78%      | 0.50x        | 1.15x      |
548| plrabn12.txt          | 66.26%      | 15.17%      | 0.50x        | 1.12x      |
549| geo.protodata         | 19.68%      | 21.67%      | 0.50x        | 1.03x      |
550| kppkn.gtb             | 37.72%      | 11.04%      | 0.53x        | 0.89x      |
551| alice29.txt (128B)    | 62.50%      | -1.25%      | 0.47x        | 1.07x      |
552| alice29.txt (1000B)   | 77.40%      | 3.36%       | 0.08x        | 0.88x      |
553| alice29.txt (10000B)  | 66.48%      | 6.23%       | 0.16x        | 0.89x      |
554| alice29.txt (20000B)  | 63.43%      | 8.69%       | 0.29x        | 0.89x      |
555
556Except for the mostly incompressible JPEG image compression is better and usually in the
557double digits in terms of percentage reduction over Snappy.
558
559The PDF sample shows a significant slowdown compared to Snappy, as this mode tries harder
560to compress the data. Very small blocks are also not favorable for better compression, so throughput is way down.
561
562This mode aims to provide better compression at the expense of performance and achieves that
563without a huge performance penalty, except on very small blocks.
564
565Decompression speed suffers a little compared to the regular S2 mode,
566but still manages to be close to Snappy in spite of increased compression.
567
568# Best compression mode
569
570S2 offers a "best" compression mode.
571
572This will compress as much as possible with little regard to CPU usage.
573
574Mainly for offline compression, but where decompression speed should still
575be high and compatible with other S2 compressed data.
576
577Some examples compared on 16 core CPU, amd64 assembly used:
578
579```
580* enwik10
581Default... 10000000000 -> 4761467548 [47.61%]; 1.098s, 8685.6MB/s
582Better...  10000000000 -> 4219438251 [42.19%]; 1.925s, 4954.2MB/s
583Best...    10000000000 -> 3627364337 [36.27%]; 43.051s, 221.5MB/s
584
585* github-june-2days-2019.json
586Default... 6273951764 -> 1043196283 [16.63%]; 431ms, 13882.3MB/s
587Better...  6273951764 -> 949146808 [15.13%]; 547ms, 10938.4MB/s
588Best...    6273951764 -> 832855506 [13.27%]; 9.455s, 632.8MB/s
589
590* nyc-taxi-data-10M.csv
591Default... 3325605752 -> 1095998837 [32.96%]; 324ms, 9788.7MB/s
592Better...  3325605752 -> 954776589 [28.71%]; 491ms, 6459.4MB/s
593Best...    3325605752 -> 779098746 [23.43%]; 8.29s, 382.6MB/s
594
595* 10gb.tar
596Default... 10065157632 -> 5916578242 [58.78%]; 1.028s, 9337.4MB/s
597Better...  10065157632 -> 5649207485 [56.13%]; 1.597s, 6010.6MB/s
598Best...    10065157632 -> 5208719802 [51.75%]; 32.78s, 292.8MB/
599
600* consensus.db.10gb
601Default... 10737418240 -> 4562648848 [42.49%]; 882ms, 11610.0MB/s
602Better...  10737418240 -> 4542428129 [42.30%]; 1.533s, 6679.7MB/s
603Best...    10737418240 -> 4244773384 [39.53%]; 42.96s, 238.4MB/s
604```
605
606Decompression speed should be around the same as using the 'better' compression mode.
607
608# Snappy Compatibility
609
610S2 now offers full compatibility with Snappy.
611
612This means that the efficient encoders of S2 can be used to generate fully Snappy compatible output.
613
614There is a [snappy](https://github.com/klauspost/compress/tree/master/snappy) package that can be used by
615simply changing imports from `github.com/golang/snappy` to `github.com/klauspost/compress/snappy`.
616This uses "better" mode for all operations.
617If you would like more control, you can use the s2 package as described below:
618
619## Blocks
620
621Snappy compatible blocks can be generated with the S2 encoder.
622Compression and speed is typically a bit better `MaxEncodedLen` is also smaller for smaller memory usage. Replace
623
624| Snappy                     | S2 replacement          |
625|----------------------------|-------------------------|
626| snappy.Encode(...)         | s2.EncodeSnappy(...)   |
627| snappy.MaxEncodedLen(...)  | s2.MaxEncodedLen(...)   |
628
629`s2.EncodeSnappy` can be replaced with `s2.EncodeSnappyBetter` or `s2.EncodeSnappyBest` to get more efficiently compressed snappy compatible output.
630
631`s2.ConcatBlocks` is compatible with snappy blocks.
632
633Comparison of [`webdevdata.org-2015-01-07-subset`](https://files.klauspost.com/compress/webdevdata.org-2015-01-07-4GB-subset.7z),
63453927 files, total input size: 4,014,735,833 bytes. amd64, single goroutine used:
635
636| Encoder               | Size       | MB/s   | Reduction |
637|-----------------------|------------|--------|------------
638| snappy.Encode         | 1128706759 | 725.59 | 71.89%    |
639| s2.EncodeSnappy       | 1093823291 | 899.16 | 72.75%    |
640| s2.EncodeSnappyBetter | 1001158548 | 578.49 | 75.06%    |
641| s2.EncodeSnappyBest   | 944507998  | 66.00  | 76.47%    |
642
643## Streams
644
645For streams, replace `enc = snappy.NewBufferedWriter(w)` with `enc = s2.NewWriter(w, s2.WriterSnappyCompat())`.
646All other options are available, but note that block size limit is different for snappy.
647
648Comparison of different streams, AMD Ryzen 3950x, 16 cores. Size and throughput:
649
650| File                        | snappy.NewWriter         | S2 Snappy                 | S2 Snappy, Better        | S2 Snappy, Best         |
651|-----------------------------|--------------------------|---------------------------|--------------------------|-------------------------|
652| nyc-taxi-data-10M.csv       | 1316042016 - 517.54MB/s  | 1307003093 - 8406.29MB/s  | 1174534014 - 4984.35MB/s | 1115904679 - 177.81MB/s |
653| enwik10                     | 5088294643 - 433.45MB/s  | 5175840939 - 8454.52MB/s  | 4560784526 - 4403.10MB/s | 4340299103 - 159.71MB/s |
654| 10gb.tar                    | 6056946612 - 703.25MB/s  | 6208571995 - 9035.75MB/s  | 5741646126 - 2402.08MB/s | 5548973895 - 171.17MB/s |
655| github-june-2days-2019.json | 1525176492 - 908.11MB/s  | 1476519054 - 12625.93MB/s | 1400547532 - 6163.61MB/s | 1321887137 - 200.71MB/s |
656| consensus.db.10gb           | 5412897703 - 1054.38MB/s | 5354073487 - 12634.82MB/s | 5335069899 - 2472.23MB/s | 5201000954 - 166.32MB/s |
657
658# Decompression
659
660All decompression functions map directly to equivalent s2 functions.
661
662| Snappy                 | S2 replacement     |
663|------------------------|--------------------|
664| snappy.Decode(...)     | s2.Decode(...)     |
665| snappy.DecodedLen(...) | s2.DecodedLen(...) |
666| snappy.NewReader(...)  | s2.NewReader(...)  |
667
668Features like [quick forward skipping without decompression](https://pkg.go.dev/github.com/klauspost/compress/s2#Reader.Skip)
669are also available for Snappy streams.
670
671If you know you are only decompressing snappy streams, setting [`ReaderMaxBlockSize(64<<10)`](https://pkg.go.dev/github.com/klauspost/compress/s2#ReaderMaxBlockSize)
672on your Reader will reduce memory consumption.
673
674# Concatenating blocks and streams.
675
676Concatenating streams will concatenate the output of both without recompressing them.
677While this is inefficient in terms of compression it might be usable in certain scenarios.
678The 10 byte 'stream identifier' of the second stream can optionally be stripped, but it is not a requirement.
679
680Blocks can be concatenated using the `ConcatBlocks` function.
681
682Snappy blocks/streams can safely be concatenated with S2 blocks and streams.
683
684# Format Extensions
685
686* Frame [Stream identifier](https://github.com/google/snappy/blob/master/framing_format.txt#L68) changed from `sNaPpY` to `S2sTwO`.
687* [Framed compressed blocks](https://github.com/google/snappy/blob/master/format_description.txt) can be up to 4MB (up from 64KB).
688* Compressed blocks can have an offset of `0`, which indicates to repeat the last seen offset.
689
690Repeat offsets must be encoded as a [2.2.1. Copy with 1-byte offset (01)](https://github.com/google/snappy/blob/master/format_description.txt#L89), where the offset is 0.
691
692The length is specified by reading the 3-bit length specified in the tag and decode using this table:
693
694| Length | Actual Length        |
695|--------|----------------------|
696| 0      | 4                    |
697| 1      | 5                    |
698| 2      | 6                    |
699| 3      | 7                    |
700| 4      | 8                    |
701| 5      | 8 + read 1 byte      |
702| 6      | 260 + read 2 bytes   |
703| 7      | 65540 + read 3 bytes |
704
705This allows any repeat offset + length to be represented by 2 to 5 bytes.
706
707Lengths are stored as little endian values.
708
709The first copy of a block cannot be a repeat offset and the offset is not carried across blocks in streams.
710
711Default streaming block size is 1MB.
712
713# LICENSE
714
715This code is based on the [Snappy-Go](https://github.com/golang/snappy) implementation.
716
717Use of this source code is governed by a BSD-style license that can be found in the LICENSE file.
718