xref: /freebsd/sys/contrib/zstd/zlibWrapper/README.md (revision 0c16b537)
10c16b537SWarner LoshZstandard wrapper for zlib
20c16b537SWarner Losh================================
30c16b537SWarner Losh
40c16b537SWarner LoshThe main objective of creating a zstd wrapper for [zlib](http://zlib.net/) is to allow a quick and smooth transition to zstd for projects already using zlib.
50c16b537SWarner Losh
60c16b537SWarner Losh#### Required files
70c16b537SWarner Losh
80c16b537SWarner LoshTo build the zstd wrapper for zlib the following files are required:
90c16b537SWarner Losh- zlib.h
100c16b537SWarner Losh- a static or dynamic zlib library
110c16b537SWarner Losh- zlibWrapper/zstd_zlibwrapper.h
120c16b537SWarner Losh- zlibWrapper/zstd_zlibwrapper.c
130c16b537SWarner Losh- zlibWrapper/gz*.c files (gzclose.c, gzlib.c, gzread.c, gzwrite.c)
140c16b537SWarner Losh- zlibWrapper/gz*.h files (gzcompatibility.h, gzguts.h)
150c16b537SWarner Losh- a static or dynamic zstd library
160c16b537SWarner Losh
170c16b537SWarner LoshThe first two files are required by all projects using zlib and they are not included with the zstd distribution.
180c16b537SWarner LoshThe further files are supplied with the zstd distribution.
190c16b537SWarner Losh
200c16b537SWarner Losh
210c16b537SWarner Losh#### Embedding the zstd wrapper within your project
220c16b537SWarner Losh
230c16b537SWarner LoshLet's assume that your project that uses zlib is compiled with:
240c16b537SWarner Losh```gcc project.o -lz```
250c16b537SWarner Losh
260c16b537SWarner LoshTo compile the zstd wrapper with your project you have to do the following:
270c16b537SWarner Losh- change all references with `#include "zlib.h"` to `#include "zstd_zlibwrapper.h"`
280c16b537SWarner Losh- compile your project with `zstd_zlibwrapper.c`, `gz*.c` and a static or dynamic zstd library
290c16b537SWarner Losh
300c16b537SWarner LoshThe linking should be changed to:
310c16b537SWarner Losh```gcc project.o zstd_zlibwrapper.o gz*.c -lz -lzstd```
320c16b537SWarner Losh
330c16b537SWarner Losh
340c16b537SWarner Losh#### Enabling zstd compression within your project
350c16b537SWarner Losh
360c16b537SWarner LoshAfter embedding the zstd wrapper within your project the zstd library is turned off by default.
370c16b537SWarner LoshYour project should work as before with zlib. There are two options to enable zstd compression:
380c16b537SWarner Losh- compilation with `-DZWRAP_USE_ZSTD=1` (or using `#define ZWRAP_USE_ZSTD 1` before `#include "zstd_zlibwrapper.h"`)
390c16b537SWarner Losh- using the `void ZWRAP_useZSTDcompression(int turn_on)` function (declared in `#include "zstd_zlibwrapper.h"`)
400c16b537SWarner Losh
410c16b537SWarner LoshDuring decompression zlib and zstd streams are automatically detected and decompressed using a proper library.
420c16b537SWarner LoshThis behavior can be changed using `ZWRAP_setDecompressionType(ZWRAP_FORCE_ZLIB)` what will make zlib decompression slightly faster.
430c16b537SWarner Losh
440c16b537SWarner Losh
450c16b537SWarner Losh#### Example
460c16b537SWarner LoshWe have take the file `test/example.c` from [the zlib library distribution](http://zlib.net/) and copied it to [zlibWrapper/examples/example.c](examples/example.c).
470c16b537SWarner LoshAfter compilation and execution it shows the following results:
480c16b537SWarner Losh```
490c16b537SWarner Loshzlib version 1.2.8 = 0x1280, compile flags = 0x65
500c16b537SWarner Loshuncompress(): hello, hello!
510c16b537SWarner Loshgzread(): hello, hello!
520c16b537SWarner Loshgzgets() after gzseek:  hello!
530c16b537SWarner Loshinflate(): hello, hello!
540c16b537SWarner Loshlarge_inflate(): OK
550c16b537SWarner Loshafter inflateSync(): hello, hello!
560c16b537SWarner Loshinflate with dictionary: hello, hello!
570c16b537SWarner Losh```
580c16b537SWarner LoshThen we have changed `#include "zlib.h"` to `#include "zstd_zlibwrapper.h"`, compiled the [example.c](examples/example.c) file
590c16b537SWarner Loshwith `-DZWRAP_USE_ZSTD=1` and linked with additional `zstd_zlibwrapper.o gz*.c -lzstd`.
600c16b537SWarner LoshWe were forced to turn off the following functions: `test_flush`, `test_sync` which use currently unsupported features.
610c16b537SWarner LoshAfter running it shows the following results:
620c16b537SWarner Losh```
630c16b537SWarner Loshzlib version 1.2.8 = 0x1280, compile flags = 0x65
640c16b537SWarner Loshuncompress(): hello, hello!
650c16b537SWarner Loshgzread(): hello, hello!
660c16b537SWarner Loshgzgets() after gzseek:  hello!
670c16b537SWarner Loshinflate(): hello, hello!
680c16b537SWarner Loshlarge_inflate(): OK
690c16b537SWarner Loshinflate with dictionary: hello, hello!
700c16b537SWarner Losh```
710c16b537SWarner LoshThe script used for compilation can be found at [zlibWrapper/Makefile](Makefile).
720c16b537SWarner Losh
730c16b537SWarner Losh
740c16b537SWarner Losh#### The measurement of performance of Zstandard wrapper for zlib
750c16b537SWarner Losh
760c16b537SWarner LoshThe zstd distribution contains a tool called `zwrapbench` which can measure speed and ratio of zlib, zstd, and the wrapper.
770c16b537SWarner LoshThe benchmark is conducted using given filenames or synthetic data if filenames are not provided.
780c16b537SWarner LoshThe files are read into memory and processed independently.
790c16b537SWarner LoshIt makes benchmark more precise as it eliminates I/O overhead.
800c16b537SWarner LoshMany filenames can be supplied as multiple parameters, parameters with wildcards or names of directories can be used as parameters with the -r option.
810c16b537SWarner LoshOne can select compression levels starting from `-b` and ending with `-e`. The `-i` parameter selects minimal time used for each of tested levels.
820c16b537SWarner LoshWith `-B` option bigger files can be divided into smaller, independently compressed blocks.
830c16b537SWarner LoshThe benchmark tool can be compiled with `make zwrapbench` using [zlibWrapper/Makefile](Makefile).
840c16b537SWarner Losh
850c16b537SWarner Losh
860c16b537SWarner Losh#### Improving speed of streaming compression
870c16b537SWarner Losh
880c16b537SWarner LoshDuring streaming compression the compressor never knows how big is data to compress.
890c16b537SWarner LoshZstandard compression can be improved by providing size of source data to the compressor. By default streaming compressor assumes that data is bigger than 256 KB but it can hurt compression speed on smaller data.
900c16b537SWarner LoshThe zstd wrapper provides the `ZWRAP_setPledgedSrcSize()` function that allows to change a pledged source size for a given compression stream.
910c16b537SWarner LoshThe function will change zstd compression parameters what may improve compression speed and/or ratio.
920c16b537SWarner LoshIt should be called just after `deflateInit()`or `deflateReset()` and before `deflate()` or `deflateSetDictionary()`. The function is only helpful when data is compressed in blocks. There will be no change in case of `deflateInit()` or `deflateReset()`  immediately followed by `deflate(strm, Z_FINISH)`
930c16b537SWarner Loshas this case is automatically detected.
940c16b537SWarner Losh
950c16b537SWarner Losh
960c16b537SWarner Losh#### Reusing contexts
970c16b537SWarner Losh
980c16b537SWarner LoshThe ordinary zlib compression of two files/streams allocates two contexts:
990c16b537SWarner Losh- for the 1st file calls `deflateInit`, `deflate`, `...`, `deflate`, `deflateEnd`
1000c16b537SWarner Losh- for the 2nd file calls `deflateInit`, `deflate`, `...`, `deflate`, `deflateEnd`
1010c16b537SWarner Losh
1020c16b537SWarner LoshThe speed of compression can be improved with reusing a single context with following steps:
1030c16b537SWarner Losh- initialize the context with `deflateInit`
1040c16b537SWarner Losh- for the 1st file call `deflate`, `...`, `deflate`
1050c16b537SWarner Losh- for the 2nd file call `deflateReset`, `deflate`, `...`, `deflate`
1060c16b537SWarner Losh- free the context with `deflateEnd`
1070c16b537SWarner Losh
1080c16b537SWarner LoshTo check the difference we made experiments using `zwrapbench -ri6b6` with zstd and zlib compression (both at level 6).
1090c16b537SWarner LoshThe input data was decompressed git repository downloaded from https://github.com/git/git/archive/master.zip which contains 2979 files.
1100c16b537SWarner LoshThe table below shows that reusing contexts has a minor influence on zlib but it gives improvement for zstd.
1110c16b537SWarner LoshIn our example (the last 2 lines) it gives 4% better compression speed and 5% better decompression speed.
1120c16b537SWarner Losh
1130c16b537SWarner Losh| Compression type                                  | Compression | Decompress.| Compr. size | Ratio |
1140c16b537SWarner Losh| ------------------------------------------------- | ------------| -----------| ----------- | ----- |
1150c16b537SWarner Losh| zlib 1.2.8                                        |  30.51 MB/s | 219.3 MB/s |     6819783 | 3.459 |
1160c16b537SWarner Losh| zlib 1.2.8 not reusing a context                  |  30.22 MB/s | 218.1 MB/s |     6819783 | 3.459 |
1170c16b537SWarner Losh| zlib 1.2.8 with zlibWrapper and reusing a context |  30.40 MB/s | 218.9 MB/s |     6819783 | 3.459 |
1180c16b537SWarner Losh| zlib 1.2.8 with zlibWrapper not reusing a context |  30.28 MB/s | 218.1 MB/s |     6819783 | 3.459 |
1190c16b537SWarner Losh| zstd 1.1.0 using ZSTD_CCtx                        |  68.35 MB/s | 430.9 MB/s |     6868521 | 3.435 |
1200c16b537SWarner Losh| zstd 1.1.0 using ZSTD_CStream                     |  66.63 MB/s | 422.3 MB/s |     6868521 | 3.435 |
1210c16b537SWarner Losh| zstd 1.1.0 with zlibWrapper and reusing a context |  54.01 MB/s | 403.2 MB/s |     6763482 | 3.488 |
1220c16b537SWarner Losh| zstd 1.1.0 with zlibWrapper not reusing a context |  51.59 MB/s | 383.7 MB/s |     6763482 | 3.488 |
1230c16b537SWarner Losh
1240c16b537SWarner Losh
1250c16b537SWarner Losh#### Compatibility issues
1260c16b537SWarner LoshAfter enabling zstd compression not all native zlib functions are supported. When calling unsupported methods they put error message into `strm->msg` and return Z_STREAM_ERROR.
1270c16b537SWarner Losh
1280c16b537SWarner LoshSupported methods:
1290c16b537SWarner Losh- deflateInit
1300c16b537SWarner Losh- deflate (with exception of Z_FULL_FLUSH, Z_BLOCK, and Z_TREES)
1310c16b537SWarner Losh- deflateSetDictionary
1320c16b537SWarner Losh- deflateEnd
1330c16b537SWarner Losh- deflateReset
1340c16b537SWarner Losh- deflateBound
1350c16b537SWarner Losh- inflateInit
1360c16b537SWarner Losh- inflate
1370c16b537SWarner Losh- inflateSetDictionary
1380c16b537SWarner Losh- inflateReset
1390c16b537SWarner Losh- inflateReset2
1400c16b537SWarner Losh- compress
1410c16b537SWarner Losh- compress2
1420c16b537SWarner Losh- compressBound
1430c16b537SWarner Losh- uncompress
1440c16b537SWarner Losh- gzip file access functions
1450c16b537SWarner Losh
1460c16b537SWarner LoshIgnored methods (they do nothing):
1470c16b537SWarner Losh- deflateParams
1480c16b537SWarner Losh
1490c16b537SWarner LoshUnsupported methods:
1500c16b537SWarner Losh- deflateCopy
1510c16b537SWarner Losh- deflateTune
1520c16b537SWarner Losh- deflatePending
1530c16b537SWarner Losh- deflatePrime
1540c16b537SWarner Losh- deflateSetHeader
1550c16b537SWarner Losh- inflateGetDictionary
1560c16b537SWarner Losh- inflateCopy
1570c16b537SWarner Losh- inflateSync
1580c16b537SWarner Losh- inflatePrime
1590c16b537SWarner Losh- inflateMark
1600c16b537SWarner Losh- inflateGetHeader
1610c16b537SWarner Losh- inflateBackInit
1620c16b537SWarner Losh- inflateBack
1630c16b537SWarner Losh- inflateBackEnd
164