xref: /freebsd/sys/contrib/zstd/programs/zstd.1.md (revision 780fb4a2)
1zstd(1) -- zstd, zstdmt, unzstd, zstdcat - Compress or decompress .zst files
2============================================================================
3
4SYNOPSIS
5--------
6
7`zstd` [*OPTIONS*] [-|_INPUT-FILE_] [-o _OUTPUT-FILE_]
8
9`zstdmt` is equivalent to `zstd -T0`
10
11`unzstd` is equivalent to `zstd -d`
12
13`zstdcat` is equivalent to `zstd -dcf`
14
15
16DESCRIPTION
17-----------
18`zstd` is a fast lossless compression algorithm and data compression tool,
19with command line syntax similar to `gzip (1)` and `xz (1)`.
20It is based on the **LZ77** family, with further FSE & huff0 entropy stages.
21`zstd` offers highly configurable compression speed,
22with fast modes at > 200 MB/s per code,
23and strong modes nearing lzma compression ratios.
24It also features a very fast decoder, with speeds > 500 MB/s per core.
25
26`zstd` command line syntax is generally similar to gzip,
27but features the following differences :
28
29  - Source files are preserved by default.
30    It's possible to remove them automatically by using the `--rm` command.
31  - When compressing a single file, `zstd` displays progress notifications
32    and result summary by default.
33    Use `-q` to turn them off.
34  - `zstd` does not accept input from console,
35    but it properly accepts `stdin` when it's not the console.
36  - `zstd` displays a short help page when command line is an error.
37    Use `-q` to turn it off.
38
39`zstd` compresses or decompresses each _file_ according to the selected
40operation mode.
41If no _files_ are given or _file_ is `-`, `zstd` reads from standard input
42and writes the processed data to standard output.
43`zstd` will refuse to write compressed data to standard output
44if it is a terminal : it will display an error message and skip the _file_.
45Similarly, `zstd` will refuse to read compressed data from standard input
46if it is a terminal.
47
48Unless `--stdout` or `-o` is specified, _files_ are written to a new file
49whose name is derived from the source _file_ name:
50
51* When compressing, the suffix `.zst` is appended to the source filename to
52  get the target filename.
53* When decompressing, the `.zst` suffix is removed from the source filename to
54  get the target filename
55
56### Concatenation with .zst files
57It is possible to concatenate `.zst` files as is.
58`zstd` will decompress such files as if they were a single `.zst` file.
59
60OPTIONS
61-------
62
63### Integer suffixes and special values
64In most places where an integer argument is expected,
65an optional suffix is supported to easily indicate large integers.
66There must be no space between the integer and the suffix.
67
68* `KiB`:
69    Multiply the integer by 1,024 (2\^10).
70    `Ki`, `K`, and `KB` are accepted as synonyms for `KiB`.
71* `MiB`:
72    Multiply the integer by 1,048,576 (2\^20).
73    `Mi`, `M`, and `MB` are accepted as synonyms for `MiB`.
74
75### Operation mode
76If multiple operation mode options are given,
77the last one takes effect.
78
79* `-z`, `--compress`:
80    Compress.
81    This is the default operation mode when no operation mode option is specified
82    and no other operation mode is implied from the command name
83    (for example, `unzstd` implies `--decompress`).
84* `-d`, `--decompress`, `--uncompress`:
85    Decompress.
86* `-t`, `--test`:
87    Test the integrity of compressed _files_.
88    This option is equivalent to `--decompress --stdout` except that the
89    decompressed data is discarded instead of being written to standard output.
90    No files are created or removed.
91* `-b#`:
92    Benchmark file(s) using compression level #
93* `--train FILEs`:
94    Use FILEs as a training set to create a dictionary.
95    The training set should contain a lot of small files (> 100).
96* `-l`, `--list`:
97    Display information related to a zstd compressed file, such as size, ratio, and checksum.
98    Some of these fields may not be available.
99    This command can be augmented with the `-v` modifier.
100
101### Operation modifiers
102
103* `-#`:
104    `#` compression level \[1-19] (default: 3)
105* `--ultra`:
106    unlocks high compression levels 20+ (maximum 22), using a lot more memory.
107    Note that decompression will also require more memory when using these levels.
108* `--long[=#]`:
109    enables long distance matching with `#` `windowLog`, if not `#` is not
110    present it defaults to `27`.
111    This increases the window size (`windowLog`) and memory usage for both the
112    compressor and decompressor.
113    This setting is designed to improve the compression ratio for files with
114    long matches at a large distance.
115
116    Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
117    `--memory=windowSize` needs to be passed to the decompressor.
118* `--fast[=#]`:
119    switch to ultra-fast compression levels.
120    If `=#` is not present, it defaults to `1`.
121    The higher the value, the faster the compression speed,
122    at the cost of some compression ratio.
123    This setting overwrites compression level if one was set previously.
124    Similarly, if a compression level is set after `--fast`, it overrides it.
125
126* `-T#`, `--threads=#`:
127    Compress using `#` working threads (default: 1).
128    If `#` is 0, attempt to detect and use the number of physical CPU cores.
129    In all cases, the nb of threads is capped to ZSTDMT_NBTHREADS_MAX==200.
130    This modifier does nothing if `zstd` is compiled without multithread support.
131* `--single-thread`:
132    Does not spawn a thread for compression, use caller thread instead.
133    This is the only available mode when multithread support is disabled.
134    In this mode, compression is serialized with I/O.
135    (This is different from `-T1`, which spawns 1 compression thread in parallel of I/O).
136    Single-thread mode also features lower memory usage.
137* `-D file`:
138    use `file` as Dictionary to compress or decompress FILE(s)
139* `--nodictID`:
140    do not store dictionary ID within frame header (dictionary compression).
141    The decoder will have to rely on implicit knowledge about which dictionary to use,
142    it won't be able to check if it's correct.
143* `-o file`:
144    save result into `file` (only possible with a single _INPUT-FILE_)
145* `-f`, `--force`:
146    overwrite output without prompting, and (de)compress symbolic links
147* `-c`, `--stdout`:
148    force write to standard output, even if it is the console
149* `--[no-]sparse`:
150    enable / disable sparse FS support,
151    to make files with many zeroes smaller on disk.
152    Creating sparse files may save disk space and speed up decompression by
153    reducing the amount of disk I/O.
154    default: enabled when output is into a file,
155    and disabled when output is stdout.
156    This setting overrides default and can force sparse mode over stdout.
157* `--rm`:
158    remove source file(s) after successful compression or decompression
159* `-k`, `--keep`:
160    keep source file(s) after successful compression or decompression.
161    This is the default behavior.
162* `-r`:
163    operate recursively on dictionaries
164* `--format=FORMAT`:
165    compress and decompress in other formats. If compiled with
166    support, zstd can compress to or decompress from other compression algorithm
167    formats. Possibly available options are `gzip`, `xz`, `lzma`, and `lz4`.
168* `-h`/`-H`, `--help`:
169    display help/long help and exit
170* `-V`, `--version`:
171    display version number and exit.
172    Advanced : `-vV` also displays supported formats.
173    `-vvV` also displays POSIX support.
174* `-v`:
175    verbose mode
176* `-q`, `--quiet`:
177    suppress warnings, interactivity, and notifications.
178    specify twice to suppress errors too.
179* `-C`, `--[no-]check`:
180    add integrity check computed from uncompressed data (default: enabled)
181* `--`:
182    All arguments after `--` are treated as files
183
184
185DICTIONARY BUILDER
186------------------
187`zstd` offers _dictionary_ compression,
188which greatly improves efficiency on small files and messages.
189It's possible to train `zstd` with a set of samples,
190the result of which is saved into a file called a `dictionary`.
191Then during compression and decompression, reference the same dictionary,
192using command `-D dictionaryFileName`.
193Compression of small files similar to the sample set will be greatly improved.
194
195* `--train FILEs`:
196    Use FILEs as training set to create a dictionary.
197    The training set should contain a lot of small files (> 100),
198    and weight typically 100x the target dictionary size
199    (for example, 10 MB for a 100 KB dictionary).
200
201    Supports multithreading if `zstd` is compiled with threading support.
202    Additional parameters can be specified with `--train-cover`.
203    The legacy dictionary builder can be accessed with `--train-legacy`.
204    Equivalent to `--train-cover=d=8,steps=4`.
205* `-o file`:
206    Dictionary saved into `file` (default name: dictionary).
207* `--maxdict=#`:
208    Limit dictionary to specified size (default: 112640).
209* `-#`:
210    Use `#` compression level during training (optional).
211    Will generate statistics more tuned for selected compression level,
212    resulting in a _small_ compression ratio improvement for this level.
213* `-B#`:
214    Split input files in blocks of size # (default: no split)
215* `--dictID=#`:
216    A dictionary ID is a locally unique ID that a decoder can use to verify it is
217    using the right dictionary.
218    By default, zstd will create a 4-bytes random number ID.
219    It's possible to give a precise number instead.
220    Short numbers have an advantage : an ID < 256 will only need 1 byte in the
221    compressed frame header, and an ID < 65536 will only need 2 bytes.
222    This compares favorably to 4 bytes default.
223    However, it's up to the dictionary manager to not assign twice the same ID to
224    2 different dictionaries.
225* `--train-cover[=k#,d=#,steps=#]`:
226    Select parameters for the default dictionary builder algorithm named cover.
227    If _d_ is not specified, then it tries _d_ = 6 and _d_ = 8.
228    If _k_ is not specified, then it tries _steps_ values in the range [50, 2000].
229    If _steps_ is not specified, then the default value of 40 is used.
230    Requires that _d_ <= _k_.
231
232    Selects segments of size _k_ with highest score to put in the dictionary.
233    The score of a segment is computed by the sum of the frequencies of all the
234    subsegments of size _d_.
235    Generally _d_ should be in the range [6, 8], occasionally up to 16, but the
236    algorithm will run faster with d <= _8_.
237    Good values for _k_ vary widely based on the input data, but a safe range is
238    [2 * _d_, 2000].
239    Supports multithreading if `zstd` is compiled with threading support.
240
241    Examples:
242
243    `zstd --train-cover FILEs`
244
245    `zstd --train-cover=k=50,d=8 FILEs`
246
247    `zstd --train-cover=d=8,steps=500 FILEs`
248
249    `zstd --train-cover=k=50 FILEs`
250
251* `--train-legacy[=selectivity=#]`:
252    Use legacy dictionary builder algorithm with the given dictionary
253    _selectivity_ (default: 9).
254    The smaller the _selectivity_ value, the denser the dictionary,
255    improving its efficiency but reducing its possible maximum size.
256    `--train-legacy=s=#` is also accepted.
257
258    Examples:
259
260    `zstd --train-legacy FILEs`
261
262    `zstd --train-legacy=selectivity=8 FILEs`
263
264
265BENCHMARK
266---------
267
268* `-b#`:
269    benchmark file(s) using compression level #
270* `-e#`:
271    benchmark file(s) using multiple compression levels, from `-b#` to `-e#` (inclusive)
272* `-i#`:
273    minimum evaluation time, in seconds (default: 3s), benchmark mode only
274* `-B#`, `--block-size=#`:
275    cut file(s) into independent blocks of size # (default: no block)
276* `--priority=rt`:
277    set process priority to real-time
278
279**Output Format:** CompressionLevel#Filename : IntputSize -> OutputSize (CompressionRatio), CompressionSpeed, DecompressionSpeed
280
281**Methodology:** For both compression and decompression speed, the entire input is compressed/decompressed in-memory to measure speed. A run lasts at least 1 sec, so when files are small, they are compressed/decompressed several times per run, in order to improve measurement accuracy.
282
283ADVANCED COMPRESSION OPTIONS
284----------------------------
285### --zstd[=options]:
286`zstd` provides 22 predefined compression levels.
287The selected or default predefined compression level can be changed with
288advanced compression options.
289The _options_ are provided as a comma-separated list.
290You may specify only the options you want to change and the rest will be
291taken from the selected or default compression level.
292The list of available _options_:
293
294- `strategy`=_strat_, `strat`=_strat_:
295    Specify a strategy used by a match finder.
296
297    There are 8 strategies numbered from 1 to 8, from faster to stronger:
298    1=ZSTD\_fast, 2=ZSTD\_dfast, 3=ZSTD\_greedy, 4=ZSTD\_lazy,
299    5=ZSTD\_lazy2, 6=ZSTD\_btlazy2, 7=ZSTD\_btopt, 8=ZSTD\_btultra.
300
301- `windowLog`=_wlog_, `wlog`=_wlog_:
302    Specify the maximum number of bits for a match distance.
303
304    The higher number of increases the chance to find a match which usually
305    improves compression ratio.
306    It also increases memory requirements for the compressor and decompressor.
307    The minimum _wlog_ is 10 (1 KiB) and the maximum is 30 (1 GiB) on 32-bit
308    platforms and 31 (2 GiB) on 64-bit platforms.
309
310    Note: If `windowLog` is set to larger than 27, `--long=windowLog` or
311    `--memory=windowSize` needs to be passed to the decompressor.
312
313- `hashLog`=_hlog_, `hlog`=_hlog_:
314    Specify the maximum number of bits for a hash table.
315
316    Bigger hash tables cause less collisions which usually makes compression
317    faster, but requires more memory during compression.
318
319    The minimum _hlog_ is 6 (64 B) and the maximum is 26 (128 MiB).
320
321- `chainLog`=_clog_, `clog`=_clog_:
322    Specify the maximum number of bits for a hash chain or a binary tree.
323
324    Higher numbers of bits increases the chance to find a match which usually
325    improves compression ratio.
326    It also slows down compression speed and increases memory requirements for
327    compression.
328    This option is ignored for the ZSTD_fast strategy.
329
330    The minimum _clog_ is 6 (64 B) and the maximum is 28 (256 MiB).
331
332- `searchLog`=_slog_, `slog`=_slog_:
333    Specify the maximum number of searches in a hash chain or a binary tree
334    using logarithmic scale.
335
336    More searches increases the chance to find a match which usually increases
337    compression ratio but decreases compression speed.
338
339    The minimum _slog_ is 1 and the maximum is 26.
340
341- `searchLength`=_slen_, `slen`=_slen_:
342    Specify the minimum searched length of a match in a hash table.
343
344    Larger search lengths usually decrease compression ratio but improve
345    decompression speed.
346
347    The minimum _slen_ is 3 and the maximum is 7.
348
349- `targetLen`=_tlen_, `tlen`=_tlen_:
350    The impact of this field vary depending on selected strategy.
351
352    For ZSTD\_btopt and ZSTD\_btultra, it specifies the minimum match length
353    that causes match finder to stop searching for better matches.
354    A larger `targetLen` usually improves compression ratio
355    but decreases compression speed.
356
357    For ZSTD\_fast, it specifies
358    the amount of data skipped between match sampling.
359    Impact is reversed : a larger `targetLen` increases compression speed
360    but decreases compression ratio.
361
362    For all other strategies, this field has no impact.
363
364    The minimum _tlen_ is 1 and the maximum is 999.
365
366- `overlapLog`=_ovlog_,  `ovlog`=_ovlog_:
367    Determine `overlapSize`, amount of data reloaded from previous job.
368    This parameter is only available when multithreading is enabled.
369    Reloading more data improves compression ratio, but decreases speed.
370
371    The minimum _ovlog_ is 0, and the maximum is 9.
372    0 means "no overlap", hence completely independent jobs.
373    9 means "full overlap", meaning up to `windowSize` is reloaded from previous job.
374    Reducing _ovlog_ by 1 reduces the amount of reload by a factor 2.
375    Default _ovlog_ is 6, which means "reload `windowSize / 8`".
376    Exception : the maximum compression level (22) has a default _ovlog_ of 9.
377
378- `ldmHashLog`=_ldmhlog_, `ldmhlog`=_ldmhlog_:
379    Specify the maximum size for a hash table used for long distance matching.
380
381    This option is ignored unless long distance matching is enabled.
382
383    Bigger hash tables usually improve compression ratio at the expense of more
384    memory during compression and a decrease in compression speed.
385
386    The minimum _ldmhlog_ is 6 and the maximum is 26 (default: 20).
387
388- `ldmSearchLength`=_ldmslen_, `ldmslen`=_ldmslen_:
389    Specify the minimum searched length of a match for long distance matching.
390
391    This option is ignored unless long distance matching is enabled.
392
393    Larger/very small values usually decrease compression ratio.
394
395    The minumum _ldmslen_ is 4 and the maximum is 4096 (default: 64).
396
397- `ldmBucketSizeLog`=_ldmblog_, `ldmblog`=_ldmblog_:
398    Specify the size of each bucket for the hash table used for long distance
399    matching.
400
401    This option is ignored unless long distance matching is enabled.
402
403    Larger bucket sizes improve collision resolution but decrease compression
404    speed.
405
406    The minimum _ldmblog_ is 0 and the maximum is 8 (default: 3).
407
408- `ldmHashEveryLog`=_ldmhevery_, `ldmhevery`=_ldmhevery_:
409    Specify the frequency of inserting entries into the long distance matching
410    hash table.
411
412    This option is ignored unless long distance matching is enabled.
413
414    Larger values will improve compression speed. Deviating far from the
415    default value will likely result in a decrease in compression ratio.
416
417    The default value is `wlog - ldmhlog`.
418
419### -B#:
420Select the size of each compression job.
421This parameter is available only when multi-threading is enabled.
422Default value is `4 * windowSize`, which means it varies depending on compression level.
423`-B#` makes it possible to select a custom value.
424Note that job size must respect a minimum value which is enforced transparently.
425This minimum is either 1 MB, or `overlapSize`, whichever is largest.
426
427### Example
428The following parameters sets advanced compression options to those of
429predefined level 19 for files bigger than 256 KB:
430
431`--zstd`=windowLog=23,chainLog=23,hashLog=22,searchLog=6,searchLength=3,targetLength=48,strategy=6
432
433BUGS
434----
435Report bugs at: https://github.com/facebook/zstd/issues
436
437AUTHOR
438------
439Yann Collet
440