1# Overview 2 3libdeflate is a library for fast, whole-buffer DEFLATE-based compression and 4decompression. 5 6The supported formats are: 7 8- DEFLATE (raw) 9- zlib (a.k.a. DEFLATE with a zlib wrapper) 10- gzip (a.k.a. DEFLATE with a gzip wrapper) 11 12libdeflate is heavily optimized. It is significantly faster than the zlib 13library, both for compression and decompression, and especially on x86 14processors. In addition, libdeflate provides optional high compression modes 15that provide a better compression ratio than the zlib's "level 9". 16 17libdeflate itself is a library, but the following command-line programs which 18use this library are also provided: 19 20* gzip (or gunzip), a program which mostly behaves like the standard equivalent, 21 except that it does not yet have good streaming support and therefore does not 22 yet support very large files 23* benchmark, a program for benchmarking in-memory compression and decompression 24 25# Building 26 27## For UNIX 28 29Just run `make`. You need GNU Make and either GCC or Clang. GCC is recommended 30because it builds slightly faster binaries. There is no `make install` yet; 31just copy the file(s) to where you want. 32 33By default, all targets are built, including the library and programs, with the 34exception of the `benchmark` program. `make help` shows the available targets. 35There are also several options which can be set on the `make` command line. See 36the Makefile for details. 37 38## For Windows 39 40MinGW (GCC) is the recommended compiler to use when building binaries for 41Windows. MinGW can be used on either Windows or Linux. On Windows, you'll need 42the compiler as well as GNU Make and basic UNIX tools such as `sh`. This is 43most easily set up with Cygwin, but some standalone MinGW distributions for 44Windows also work. Or, on Linux, you'll need to install the `mingw-w64-gcc` or 45similarly-named package. Once ready, do the build using a command like: 46 47 $ make CC=x86_64-w64-mingw32-gcc 48 49Some MinGW distributions for Windows may require `CC=gcc` instead. 50 51Windows binaries prebuilt with MinGW may also be downloaded from 52https://github.com/ebiggers/libdeflate/releases. 53 54Alternatively, a separate Makefile, `Makefile.msc`, is provided for the tools 55that come with Visual Studio, for those who strongly prefer that toolchain. 56 57As usual, 64-bit binaries are faster than 32-bit binaries and should be 58preferred whenever possible. 59 60# API 61 62libdeflate has a simple API that is not zlib-compatible. You can create 63compressors and decompressors and use them to compress or decompress buffers. 64See libdeflate.h for details. 65 66There is currently no support for streaming. This has been considered, but it 67always significantly increases complexity and slows down fast paths. 68Unfortunately, at this point it remains a future TODO. So: if your application 69compresses data in "chunks", say, less than 1 MB in size, then libdeflate is a 70great choice for you; that's what it's designed to do. This is perfect for 71certain use cases such as transparent filesystem compression. But if your 72application compresses large files as a single compressed stream, similarly to 73the `gzip` program, then libdeflate isn't for you. 74 75Note that with chunk-based compression, you generally should have the 76uncompressed size of each chunk stored outside of the compressed data itself. 77This enables you to allocate an output buffer of the correct size without 78guessing. However, libdeflate's decompression routines do optionally provide 79the actual number of output bytes in case you need it. 80 81# DEFLATE vs. zlib vs. gzip 82 83The DEFLATE format ([rfc1951](https://www.ietf.org/rfc/rfc1951.txt)), the zlib 84format ([rfc1950](https://www.ietf.org/rfc/rfc1950.txt)), and the gzip format 85([rfc1952](https://www.ietf.org/rfc/rfc1952.txt)) are commonly confused with 86each other as well as with the [zlib software library](http://zlib.net), which 87actually supports all three formats. libdeflate (this library) also supports 88all three formats. 89 90Briefly, DEFLATE is a raw compressed stream, whereas zlib and gzip are different 91wrappers for this stream. Both zlib and gzip include checksums, but gzip can 92include extra information such as the original filename. Generally, you should 93choose a format as follows: 94 95- If you are compressing whole files with no subdivisions, similar to the `gzip` 96 program, you probably should use the gzip format. 97- Otherwise, if you don't need the features of the gzip header and footer but do 98 still want a checksum for corruption detection, you probably should use the 99 zlib format. 100- Otherwise, you probably should use raw DEFLATE. This is ideal if you don't 101 need checksums, e.g. because they're simply not needed for your use case or 102 because you already compute your own checksums that are stored separately from 103 the compressed stream. 104 105Note that gzip and zlib streams can be distinguished from each other based on 106their starting bytes, but this is not necessarily true of raw DEFLATE streams. 107 108# Compression levels 109 110An often-underappreciated fact of compression formats such as DEFLATE is that 111there are an enormous number of different ways that a given input could be 112compressed. Different algorithms and different amounts of computation time will 113result in different compression ratios, while remaining equally compatible with 114the decompressor. 115 116For this reason, the commonly used zlib library provides nine compression 117levels. Level 1 is the fastest but provides the worst compression; level 9 118provides the best compression but is the slowest. It defaults to level 6. 119libdeflate uses this same design but is designed to improve on both zlib's 120performance *and* compression ratio at every compression level. In addition, 121libdeflate's levels go [up to 12](https://xkcd.com/670/) to make room for a 122minimum-cost-path based algorithm (sometimes called "optimal parsing") that can 123significantly improve on zlib's compression ratio. 124 125If you are using DEFLATE (or zlib, or gzip) in your application, you should test 126different levels to see which works best for your application. 127 128# Motivation 129 130Despite DEFLATE's widespread use mainly through the zlib library, in the 131compression community this format from the early 1990s is often considered 132obsolete. And in a few significant ways, it is. 133 134So why implement DEFLATE at all, instead of focusing entirely on 135bzip2/LZMA/xz/LZ4/LZX/ZSTD/Brotli/LZHAM/LZFSE/[insert cool new format here]? 136 137To do something better, you need to understand what came before. And it turns 138out that most ideas from DEFLATE are still relevant. Many of the newer formats 139share a similar structure as DEFLATE, with different tweaks. The effects of 140trivial but very useful tweaks, such as increasing the sliding window size, are 141often confused with the effects of nontrivial but less useful tweaks. And 142actually, many of these formats are similar enough that common algorithms and 143optimizations (e.g. those dealing with LZ77 matchfinding) can be reused. 144 145In addition, comparing compressors fairly is difficult because the performance 146of a compressor depends heavily on optimizations which are not intrinsic to the 147compression format itself. In this respect, the zlib library sometimes compares 148poorly to certain newer code because zlib is not well optimized for modern 149processors. libdeflate addresses this by providing an optimized DEFLATE 150implementation which can be used for benchmarking purposes. And, of course, 151real applications can use it as well. 152 153That being said, I have also started [a separate 154project](https://github.com/ebiggers/xpack) for an experimental, more modern 155compression format. 156 157# License 158 159libdeflate is [MIT-licensed](COPYING). 160 161Additional notes (informational only): 162 163- I am not aware of any patents covering libdeflate. 164 165- Old versions of libdeflate were public domain; I only started copyrighting 166 changes in newer versions. Portions of the source code that have not been 167 changed since being released in a public domain version can theoretically 168 still be used as public domain if you want to. But for practical purposes, it 169 probably would be easier to just take the MIT license option, which is nearly 170 the same anyway. 171