1# Overview
2
3libdeflate is a library for fast, whole-buffer DEFLATE-based compression and
4decompression.
5
6The supported formats are:
7
8- DEFLATE (raw)
9- zlib (a.k.a. DEFLATE with a zlib wrapper)
10- gzip (a.k.a. DEFLATE with a gzip wrapper)
11
12libdeflate is heavily optimized.  It is significantly faster than the zlib
13library, both for compression and decompression, and especially on x86
14processors.  In addition, libdeflate provides optional high compression modes
15that provide a better compression ratio than the zlib's "level 9".
16
17libdeflate itself is a library, but the following command-line programs which
18use this library are also provided:
19
20* gzip (or gunzip), a program which mostly behaves like the standard equivalent,
21  except that it does not yet have good streaming support and therefore does not
22  yet support very large files
23* benchmark, a program for benchmarking in-memory compression and decompression
24
25# Building
26
27## For UNIX
28
29Just run `make`.  You need GNU Make and either GCC or Clang.  GCC is recommended
30because it builds slightly faster binaries.  There is no `make install` yet;
31just copy the file(s) to where you want.
32
33By default, all targets are built, including the library and programs, with the
34exception of the `benchmark` program.  `make help` shows the available targets.
35There are also several options which can be set on the `make` command line.  See
36the Makefile for details.
37
38## For Windows
39
40MinGW (GCC) is the recommended compiler to use when building binaries for
41Windows.  MinGW can be used on either Windows or Linux.  On Windows, you'll need
42the compiler as well as GNU Make and basic UNIX tools such as `sh`.  This is
43most easily set up with Cygwin, but some standalone MinGW distributions for
44Windows also work.  Or, on Linux, you'll need to install the `mingw-w64-gcc` or
45similarly-named package.  Once ready, do the build using a command like:
46
47    $ make CC=x86_64-w64-mingw32-gcc
48
49Some MinGW distributions for Windows may require `CC=gcc` instead.
50
51Windows binaries prebuilt with MinGW may also be downloaded from
52https://github.com/ebiggers/libdeflate/releases.
53
54Alternatively, a separate Makefile, `Makefile.msc`, is provided for the tools
55that come with Visual Studio, for those who strongly prefer that toolchain.
56
57As usual, 64-bit binaries are faster than 32-bit binaries and should be
58preferred whenever possible.
59
60# API
61
62libdeflate has a simple API that is not zlib-compatible.  You can create
63compressors and decompressors and use them to compress or decompress buffers.
64See libdeflate.h for details.
65
66There is currently no support for streaming.  This has been considered, but it
67always significantly increases complexity and slows down fast paths.
68Unfortunately, at this point it remains a future TODO.  So: if your application
69compresses data in "chunks", say, less than 1 MB in size, then libdeflate is a
70great choice for you; that's what it's designed to do.  This is perfect for
71certain use cases such as transparent filesystem compression.  But if your
72application compresses large files as a single compressed stream, similarly to
73the `gzip` program, then libdeflate isn't for you.
74
75Note that with chunk-based compression, you generally should have the
76uncompressed size of each chunk stored outside of the compressed data itself.
77This enables you to allocate an output buffer of the correct size without
78guessing.  However, libdeflate's decompression routines do optionally provide
79the actual number of output bytes in case you need it.
80
81# DEFLATE vs. zlib vs. gzip
82
83The DEFLATE format ([rfc1951](https://www.ietf.org/rfc/rfc1951.txt)), the zlib
84format ([rfc1950](https://www.ietf.org/rfc/rfc1950.txt)), and the gzip format
85([rfc1952](https://www.ietf.org/rfc/rfc1952.txt)) are commonly confused with
86each other as well as with the [zlib software library](http://zlib.net), which
87actually supports all three formats.  libdeflate (this library) also supports
88all three formats.
89
90Briefly, DEFLATE is a raw compressed stream, whereas zlib and gzip are different
91wrappers for this stream.  Both zlib and gzip include checksums, but gzip can
92include extra information such as the original filename.  Generally, you should
93choose a format as follows:
94
95- If you are compressing whole files with no subdivisions, similar to the `gzip`
96  program, you probably should use the gzip format.
97- Otherwise, if you don't need the features of the gzip header and footer but do
98  still want a checksum for corruption detection, you probably should use the
99  zlib format.
100- Otherwise, you probably should use raw DEFLATE.  This is ideal if you don't
101  need checksums, e.g. because they're simply not needed for your use case or
102  because you already compute your own checksums that are stored separately from
103  the compressed stream.
104
105Note that gzip and zlib streams can be distinguished from each other based on
106their starting bytes, but this is not necessarily true of raw DEFLATE streams.
107
108# Compression levels
109
110An often-underappreciated fact of compression formats such as DEFLATE is that
111there are an enormous number of different ways that a given input could be
112compressed.  Different algorithms and different amounts of computation time will
113result in different compression ratios, while remaining equally compatible with
114the decompressor.
115
116For this reason, the commonly used zlib library provides nine compression
117levels.  Level 1 is the fastest but provides the worst compression; level 9
118provides the best compression but is the slowest.  It defaults to level 6.
119libdeflate uses this same design but is designed to improve on both zlib's
120performance *and* compression ratio at every compression level.  In addition,
121libdeflate's levels go [up to 12](https://xkcd.com/670/) to make room for a
122minimum-cost-path based algorithm (sometimes called "optimal parsing") that can
123significantly improve on zlib's compression ratio.
124
125If you are using DEFLATE (or zlib, or gzip) in your application, you should test
126different levels to see which works best for your application.
127
128# Motivation
129
130Despite DEFLATE's widespread use mainly through the zlib library, in the
131compression community this format from the early 1990s is often considered
132obsolete.  And in a few significant ways, it is.
133
134So why implement DEFLATE at all, instead of focusing entirely on
135bzip2/LZMA/xz/LZ4/LZX/ZSTD/Brotli/LZHAM/LZFSE/[insert cool new format here]?
136
137To do something better, you need to understand what came before.  And it turns
138out that most ideas from DEFLATE are still relevant.  Many of the newer formats
139share a similar structure as DEFLATE, with different tweaks.  The effects of
140trivial but very useful tweaks, such as increasing the sliding window size, are
141often confused with the effects of nontrivial but less useful tweaks.  And
142actually, many of these formats are similar enough that common algorithms and
143optimizations (e.g. those dealing with LZ77 matchfinding) can be reused.
144
145In addition, comparing compressors fairly is difficult because the performance
146of a compressor depends heavily on optimizations which are not intrinsic to the
147compression format itself.  In this respect, the zlib library sometimes compares
148poorly to certain newer code because zlib is not well optimized for modern
149processors.  libdeflate addresses this by providing an optimized DEFLATE
150implementation which can be used for benchmarking purposes.  And, of course,
151real applications can use it as well.
152
153That being said, I have also started [a separate
154project](https://github.com/ebiggers/xpack) for an experimental, more modern
155compression format.
156
157# License
158
159libdeflate is [MIT-licensed](COPYING).
160
161Additional notes (informational only):
162
163- I am not aware of any patents covering libdeflate.
164
165- Old versions of libdeflate were public domain; I only started copyrighting
166  changes in newer versions.  Portions of the source code that have not been
167  changed since being released in a public domain version can theoretically
168  still be used as public domain if you want to.  But for practical purposes, it
169  probably would be easier to just take the MIT license option, which is nearly
170  the same anyway.
171