1\input texinfo @c -*-texinfo-*-
2@c %**start of header
3@setfilename plzip.info
4@documentencoding ISO-8859-15
5@settitle Plzip Manual
6@finalout
7@c %**end of header
8
9@set UPDATED 3 January 2021
10@set VERSION 1.9
11
12@dircategory Data Compression
13@direntry
14* Plzip: (plzip).               Massively parallel implementation of lzip
15@end direntry
16
17
18@ifnothtml
19@titlepage
20@title Plzip
21@subtitle Massively parallel implementation of lzip
22@subtitle for Plzip version @value{VERSION}, @value{UPDATED}
23@author by Antonio Diaz Diaz
24
25@page
26@vskip 0pt plus 1filll
27@end titlepage
28
29@contents
30@end ifnothtml
31
32@ifnottex
33@node Top
34@top
35
36This manual is for Plzip (version @value{VERSION}, @value{UPDATED}).
37
38@menu
39* Introduction::           Purpose and features of plzip
40* Output::                 Meaning of plzip's output
41* Invoking plzip::         Command line interface
42* Program design::         Internal structure of plzip
43* File format::            Detailed format of the compressed file
44* Memory requirements::    Memory required to compress and decompress
45* Minimum file sizes::     Minimum file sizes required for full speed
46* Trailing data::          Extra data appended to the file
47* Examples::               A small tutorial with examples
48* Problems::               Reporting bugs
49* Concept index::          Index of concepts
50@end menu
51
52@sp 1
53Copyright @copyright{} 2009-2021 Antonio Diaz Diaz.
54
55This manual is free documentation: you have unlimited permission to copy,
56distribute, and modify it.
57@end ifnottex
58
59
60@node Introduction
61@chapter Introduction
62@cindex introduction
63
64@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip}
65is a massively parallel (multi-threaded) implementation of lzip, fully
66compatible with lzip 1.4 or newer. Plzip uses the compression library
67@uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}.
68
69@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip}
70is a lossless data compressor with a user interface similar to the one
71of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov
72chain-Algorithm' (LZMA) stream format, chosen to maximize safety and
73interoperability. Lzip can compress about as fast as gzip @w{(lzip -0)} or
74compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is
75intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from
76a data recovery perspective. Lzip has been designed, written, and tested
77with great care to replace gzip and bzip2 as the standard general-purpose
78compressed format for unix-like systems.
79
80Plzip can compress/decompress large files on multiprocessor machines much
81faster than lzip, at the cost of a slightly reduced compression ratio (0.4
82to 2 percent larger compressed files). Note that the number of usable
83threads is limited by file size; on files larger than a few GB plzip can use
84hundreds of processors, but on files of only a few MB plzip is no faster
85than lzip. @xref{Minimum file sizes}.
86
87For creation and manipulation of compressed tar archives
88@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be
89more efficient than using tar and plzip because tarlz is able to keep the
90alignment between tar members and lzip members.
91@ifnothtml
92@xref{Top,tarlz manual,,tarlz}.
93@end ifnothtml
94
95The lzip file format is designed for data sharing and long-term archiving,
96taking into account both data integrity and decoder availability:
97
98@itemize @bullet
99@item
100The lzip format provides very safe integrity checking and some data
101recovery means. The program
102@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover}
103can repair bit flip errors (one of the most common forms of data corruption)
104in lzip files, and provides data recovery capabilities, including
105error-checked merging of damaged copies of a file.
106@ifnothtml
107@xref{Data safety,,,lziprecover}.
108@end ifnothtml
109
110@item
111The lzip format is as simple as possible (but not simpler). The lzip
112manual provides the source code of a simple decompressor along with a
113detailed explanation of how it works, so that with the only help of the
114lzip manual it would be possible for a digital archaeologist to extract
115the data from a lzip file long after quantum computers eventually render
116LZMA obsolete.
117
118@item
119Additionally the lzip reference implementation is copylefted, which
120guarantees that it will remain free forever.
121@end itemize
122
123A nice feature of the lzip format is that a corrupt byte is easier to repair
124the nearer it is from the beginning of the file. Therefore, with the help of
125lziprecover, losing an entire archive just because of a corrupt byte near
126the beginning is a thing of the past.
127
128Plzip uses the same well-defined exit status values used by lzip, which
129makes it safer than compressors returning ambiguous warning values (like
130gzip) when it is used as a back end for other programs like tar or zutils.
131
132Plzip will automatically use for each file the largest dictionary size that
133does not exceed neither the file size nor the limit given. Keep in mind that
134the decompression memory requirement is affected at compression time by the
135choice of dictionary size limit. @xref{Memory requirements}.
136
137When compressing, plzip replaces every file given in the command line
138with a compressed version of itself, with the name "original_name.lz".
139When decompressing, plzip attempts to guess the name for the decompressed
140file from that of the compressed file as follows:
141
142@multitable {anyothername} {becomes} {anyothername.out}
143@item filename.lz  @tab becomes @tab filename
144@item filename.tlz @tab becomes @tab filename.tar
145@item anyothername @tab becomes @tab anyothername.out
146@end multitable
147
148(De)compressing a file is much like copying or moving it; therefore plzip
149preserves the access and modification dates, permissions, and, when
150possible, ownership of the file just as @samp{cp -p} does. (If the user ID or
151the group ID can't be duplicated, the file permission bits S_ISUID and
152S_ISGID are cleared).
153
154Plzip is able to read from some types of non-regular files if either the
155option @samp{-c} or the option @samp{-o} is specified.
156
157Plzip will refuse to read compressed data from a terminal or write compressed
158data to a terminal, as this would be entirely incomprehensible and might
159leave the terminal in an abnormal state.
160
161Plzip will correctly decompress a file which is the concatenation of two or
162more compressed files. The result is the concatenation of the corresponding
163decompressed files. Integrity testing of concatenated compressed files is
164also supported.
165
166
167@node Output
168@chapter Meaning of plzip's output
169@cindex output
170
171The output of plzip looks like this:
172
173@example
174plzip -v foo
175  foo:  6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out.
176
177plzip -tvvv foo.lz
178  foo.lz:  6.676:1, 14.98% ratio, 85.02% saved.  450560 out,  67493 in. ok
179@end example
180
181The meaning of each field is as follows:
182
183@table @code
184@item N:1
185The compression ratio @w{(uncompressed_size / compressed_size)}, shown as
186@w{N to 1}.
187
188@item ratio
189The inverse compression ratio @w{(compressed_size / uncompressed_size)},
190shown as a percentage. A decimal ratio is easily obtained by moving the
191decimal point two places to the left; @w{14.98% = 0.1498}.
192
193@item saved
194The space saved by compression @w{(1 - ratio)}, shown as a percentage.
195
196@item in
197Size of the input data. This is the uncompressed size when compressing, or
198the compressed size when decompressing or testing. Note that plzip always
199prints the uncompressed size before the compressed size when compressing,
200decompressing, testing, or listing.
201
202@item out
203Size of the output data. This is the compressed size when compressing, or
204the decompressed size when decompressing or testing.
205
206@end table
207
208When decompressing or testing at verbosity level 4 (-vvvv), the dictionary
209size used to compress the file is also shown.
210
211LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have
212been compressed. Decompressed is used to refer to data which have undergone
213the process of decompression.
214
215
216@node Invoking plzip
217@chapter Invoking plzip
218@cindex invoking
219@cindex options
220@cindex usage
221@cindex version
222
223The format for running plzip is:
224
225@example
226plzip [@var{options}] [@var{files}]
227@end example
228
229@noindent
230If no file names are specified, plzip compresses (or decompresses) from
231standard input to standard output. A hyphen @samp{-} used as a @var{file}
232argument means standard input. It can be mixed with other @var{files} and is
233read just once, the first time it appears in the command line.
234
235plzip supports the following
236@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}:
237@ifnothtml
238@xref{Argument syntax,,,arg_parser}.
239@end ifnothtml
240
241@table @code
242@item -h
243@itemx --help
244Print an informative help message describing the options and exit.
245
246@item -V
247@itemx --version
248Print the version number of plzip on the standard output and exit.
249This version number should be included in all bug reports.
250
251@anchor{--trailing-error}
252@item -a
253@itemx --trailing-error
254Exit with error status 2 if any remaining input is detected after
255decompressing the last member. Such remaining input is usually trailing
256garbage that can be safely ignored. @xref{concat-example}.
257
258@anchor{--data-size}
259@item -B @var{bytes}
260@itemx --data-size=@var{bytes}
261When compressing, set the size of the input data blocks in bytes. The
262input file will be divided in chunks of this size before compression is
263performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value
264is two times the dictionary size, except for option @samp{-0} where it
265defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is
266larger than the data size specified. @xref{Minimum file sizes}.
267
268@item -c
269@itemx --stdout
270Compress or decompress to standard output; keep input files unchanged. If
271compressing several files, each file is compressed independently. This
272option (or @samp{-o}) is needed when reading from a named pipe (fifo) or
273from a device. Use @w{@samp{lziprecover -cd -i}} to recover as much of the
274decompressed data as possible when decompressing a corrupt file. @samp{-c}
275overrides @samp{-o}. @samp{-c} has no effect when testing or listing.
276
277@item -d
278@itemx --decompress
279Decompress the files specified. If a file does not exist or can't be
280opened, plzip continues decompressing the rest of the files. If a file
281fails to decompress, or is a terminal, plzip exits immediately without
282decompressing the rest of the files.
283
284@item -f
285@itemx --force
286Force overwrite of output files.
287
288@item -F
289@itemx --recompress
290When compressing, force re-compression of files whose name already has
291the @samp{.lz} or @samp{.tlz} suffix.
292
293@item -k
294@itemx --keep
295Keep (don't delete) input files during compression or decompression.
296
297@item -l
298@itemx --list
299Print the uncompressed size, compressed size, and percentage saved of the
300files specified. Trailing data are ignored. The values produced are correct
301even for multimember files. If more than one file is given, a final line
302containing the cumulative sizes is printed. With @samp{-v}, the dictionary
303size, the number of members in the file, and the amount of trailing data (if
304any) are also printed. With @samp{-vv}, the positions and sizes of each
305member in multimember files are also printed.
306
307@samp{-lq} can be used to verify quickly (without decompressing) the
308structural integrity of the files specified. (Use @samp{--test} to verify
309the data integrity). @samp{-alq} additionally verifies that none of the
310files specified contain trailing data.
311
312@item -m @var{bytes}
313@itemx --match-length=@var{bytes}
314When compressing, set the match length limit in bytes. After a match
315this long is found, the search is finished. Valid values range from 5 to
316273. Larger values usually give better compression ratios but longer
317compression times.
318
319@item -n @var{n}
320@itemx --threads=@var{n}
321Set the maximum number of worker threads, overriding the system's default.
322Valid values range from 1 to "as many as your system can support". If this
323option is not used, plzip tries to detect the number of processors in the
324system and use it as default value. When compressing on a @w{32 bit} system,
325plzip tries to limit the memory use to under @w{2.22 GiB} (4 worker threads
326at level -9) by reducing the number of threads below the system's default.
327@w{@samp{plzip --help}} shows the system's default value.
328
329Plzip starts the number of threads required by each file without exceeding
330the value specified. Note that the number of usable threads is limited to
331@w{ceil( file_size / data_size )} during compression (@pxref{Minimum file
332sizes}), and to the number of members in the input during decompression. You
333can find the number of members in a lzip file by running
334@w{@samp{plzip -lv file.lz}}.
335
336@item -o @var{file}
337@itemx --output=@var{file}
338If @samp{-c} has not been also specified, write the (de)compressed output to
339@var{file}; keep input files unchanged. If compressing several files, each
340file is compressed independently. This option (or @samp{-c}) is needed when
341reading from a named pipe (fifo) or from a device. @w{@samp{-o -}} is
342equivalent to @samp{-c}. @samp{-o} has no effect when testing or listing.
343
344In order to keep backward compatibility with plzip versions prior to 1.9,
345when compressing from standard input and no other file names are given, the
346extension @samp{.lz} is appended to @var{file} unless it already ends in
347@samp{.lz} or @samp{.tlz}. This feature will be removed in a future version
348of plzip. Meanwhile, redirection may be used instead of @samp{-o} to write
349the compressed output to a file without the extension @samp{.lz} in its
350name: @w{@samp{plzip < file > foo}}.
351
352@item -q
353@itemx --quiet
354Quiet operation. Suppress all messages.
355
356@item -s @var{bytes}
357@itemx --dictionary-size=@var{bytes}
358When compressing, set the dictionary size limit in bytes. Plzip will use
359for each file the largest dictionary size that does not exceed neither
360the file size nor this limit. Valid values range from @w{4 KiB} to
361@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning
3622^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be
363coded in just one byte (@pxref{coded-dict-size}). If the size specified
364does not match one of the valid sizes, it will be rounded upwards by
365adding up to @w{(@var{bytes} / 8)} to it.
366
367For maximum compression you should use a dictionary size limit as large
368as possible, but keep in mind that the decompression memory requirement
369is affected at compression time by the choice of dictionary size limit.
370
371@item -t
372@itemx --test
373Check integrity of the files specified, but don't decompress them. This
374really performs a trial decompression and throws away the result. Use it
375together with @samp{-v} to see information about the files. If a file
376fails the test, does not exist, can't be opened, or is a terminal, plzip
377continues checking the rest of the files. A final diagnostic is shown at
378verbosity level 1 or higher if any file fails the test when testing
379multiple files.
380
381@item -v
382@itemx --verbose
383Verbose mode.@*
384When compressing, show the compression ratio and size for each file
385processed.@*
386When decompressing or testing, further -v's (up to 4) increase the
387verbosity level, showing status, compression ratio, dictionary size,
388decompressed size, and compressed size.@*
389Two or more @samp{-v} options show the progress of (de)compression,
390except for single-member files.
391
392@item -0 .. -9
393Compression level. Set the compression parameters (dictionary size and
394match length limit) as shown in the table below. The default compression
395level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that
396@samp{-9} can be much slower than @samp{-0}. These options have no
397effect when decompressing, testing, or listing.
398
399The bidimensional parameter space of LZMA can't be mapped to a linear
400scale optimal for all files. If your files are large, very repetitive,
401etc, you may need to use the options @samp{--dictionary-size} and
402@samp{--match-length} directly to achieve optimal performance.
403
404If several compression levels or @samp{-s} or @samp{-m} options are
405given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is
406equivalent to @w{@samp{-s64MiB -m273}}
407
408@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)}
409@item Level @tab Dictionary size (-s) @tab Match length limit (-m)
410@item -0 @tab 64 KiB @tab  16 bytes
411@item -1 @tab  1 MiB @tab   5 bytes
412@item -2 @tab  1.5 MiB @tab   6 bytes
413@item -3 @tab  2 MiB @tab   8 bytes
414@item -4 @tab  3 MiB @tab  12 bytes
415@item -5 @tab  4 MiB @tab  20 bytes
416@item -6 @tab  8 MiB @tab  36 bytes
417@item -7 @tab 16 MiB @tab  68 bytes
418@item -8 @tab 24 MiB @tab 132 bytes
419@item -9 @tab 32 MiB @tab 273 bytes
420@end multitable
421
422@item --fast
423@itemx --best
424Aliases for GNU gzip compatibility.
425
426@item --loose-trailing
427When decompressing, testing, or listing, allow trailing data whose first
428bytes are so similar to the magic bytes of a lzip header that they can
429be confused with a corrupt header. Use this option if a file triggers a
430"corrupt header" error and the cause is not indeed a corrupt header.
431
432@item --in-slots=@var{n}
433Number of @w{1 MiB} input packets buffered per worker thread when
434decompressing from non-seekable input. Increasing the number of packets
435may increase decompression speed, but requires more memory. Valid values
436range from 1 to 64. The default value is 4.
437
438@item --out-slots=@var{n}
439Number of @w{1 MiB} output packets buffered per worker thread when
440decompressing to non-seekable output. Increasing the number of packets
441may increase decompression speed, but requires more memory. Valid values
442range from 1 to 1024. The default value is 64.
443
444@item --check-lib
445Compare the
446@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib}
447used to compile plzip with the version actually being used at run time and
448exit. Report any differences found. Exit with error status 1 if differences
449are found. A mismatch may indicate that lzlib is not correctly installed or
450that a different version of lzlib has been installed after compiling plzip.
451@w{@samp{plzip -v --check-lib}} shows the version of lzlib being used and
452the value of @samp{LZ_API_VERSION} (if defined).
453@ifnothtml
454@xref{Library version,,,lzlib}.
455@end ifnothtml
456
457@end table
458
459Numbers given as arguments to options may be followed by a multiplier
460and an optional @samp{B} for "byte".
461
462Table of SI and binary prefixes (unit multipliers):
463
464@multitable {Prefix} {kilobyte  (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)}
465@item Prefix @tab Value               @tab | @tab Prefix @tab Value
466@item k @tab kilobyte  (10^3 = 1000)  @tab | @tab Ki @tab kibibyte (2^10 = 1024)
467@item M @tab megabyte  (10^6)         @tab | @tab Mi @tab mebibyte (2^20)
468@item G @tab gigabyte  (10^9)         @tab | @tab Gi @tab gibibyte (2^30)
469@item T @tab terabyte  (10^12)        @tab | @tab Ti @tab tebibyte (2^40)
470@item P @tab petabyte  (10^15)        @tab | @tab Pi @tab pebibyte (2^50)
471@item E @tab exabyte   (10^18)        @tab | @tab Ei @tab exbibyte (2^60)
472@item Z @tab zettabyte (10^21)        @tab | @tab Zi @tab zebibyte (2^70)
473@item Y @tab yottabyte (10^24)        @tab | @tab Yi @tab yobibyte (2^80)
474@end multitable
475
476@sp 1
477Exit status: 0 for a normal exit, 1 for environmental problems (file not
478found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or
479invalid input file, 3 for an internal consistency error (eg, bug) which
480caused plzip to panic.
481
482
483@node Program design
484@chapter Internal structure of plzip
485@cindex program design
486
487When compressing, plzip divides the input file into chunks and compresses as
488many chunks simultaneously as worker threads are chosen, creating a
489multimember compressed file.
490
491When decompressing, plzip decompresses as many members simultaneously as
492worker threads are chosen. Files that were compressed with lzip will not
493be decompressed faster than using lzip (unless the option @samp{-b} was used)
494because lzip usually produces single-member files, which can't be
495decompressed in parallel.
496
497For each input file, a splitter thread and several worker threads are
498created, acting the main thread as muxer (multiplexer) thread. A "packet
499courier" takes care of data transfers among threads and limits the
500maximum number of data blocks (packets) being processed simultaneously.
501
502The splitter reads data blocks from the input file, and distributes them
503to the workers. The workers (de)compress the blocks received from the
504splitter. The muxer collects processed packets from the workers, and
505writes them to the output file.
506
507@verbatim
508                             ,------------,
509                         ,-->| worker   0 |--,
510                         |   `------------'  |
511,-------,   ,----------, |   ,------------,  |   ,-------,   ,--------,
512| input |-->| splitter |-+-->| worker   1 |--+-->| muxer |-->| output |
513| file  |   `----------' |   `------------'  |   `-------'   |  file  |
514`-------'                |        ...        |               `--------'
515                         |   ,------------,  |
516                         `-->| worker N-1 |--'
517                             `------------'
518@end verbatim
519
520When decompressing from a regular file, the splitter is removed and the
521workers read directly from the input file. If the output file is also a
522regular file, the muxer is also removed and the workers write directly
523to the output file. With these optimizations, the use of RAM is greatly
524reduced and the decompression speed of large files with many members is
525only limited by the number of processors available and by I/O speed.
526
527
528@node File format
529@chapter File format
530@cindex file format
531
532Perfection is reached, not when there is no longer anything to add, but
533when there is no longer anything to take away.@*
534--- Antoine de Saint-Exupery
535
536@sp 1
537In the diagram below, a box like this:
538
539@verbatim
540+---+
541|   | <-- the vertical bars might be missing
542+---+
543@end verbatim
544
545represents one byte; a box like this:
546
547@verbatim
548+==============+
549|              |
550+==============+
551@end verbatim
552
553represents a variable number of bytes.
554
555@sp 1
556A lzip file consists of a series of "members" (compressed data sets).
557The members simply appear one after another in the file, with no
558additional information before, between, or after them.
559
560Each member has the following structure:
561
562@verbatim
563+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
564| ID string | VN | DS | LZMA stream | CRC32 |   Data size   |  Member size  |
565+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
566@end verbatim
567
568All multibyte values are stored in little endian order.
569
570@table @samp
571@item ID string (the "magic" bytes)
572A four byte string, identifying the lzip format, with the value "LZIP"
573(0x4C, 0x5A, 0x49, 0x50).
574
575@item VN (version number, 1 byte)
576Just in case something needs to be modified in the future. 1 for now.
577
578@anchor{coded-dict-size}
579@item DS (coded dictionary size, 1 byte)
580The dictionary size is calculated by taking a power of 2 (the base size)
581and subtracting from it a fraction between 0/16 and 7/16 of the base size.@*
582Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@*
583Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract
584from the base size to obtain the dictionary size.@*
585Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@*
586Valid values for dictionary size range from 4 KiB to 512 MiB.
587
588@item LZMA stream
589The LZMA stream, finished by an end of stream marker. Uses default values
590for encoder properties.
591@ifnothtml
592@xref{Stream format,,,lzip},
593@end ifnothtml
594@ifhtml
595See
596@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format}
597@end ifhtml
598for a complete description.
599
600@item CRC32 (4 bytes)
601Cyclic Redundancy Check (CRC) of the uncompressed original data.
602
603@item Data size (8 bytes)
604Size of the uncompressed original data.
605
606@item Member size (8 bytes)
607Total size of the member, including header and trailer. This field acts
608as a distributed index, allows the verification of stream integrity, and
609facilitates safe recovery of undamaged members from multimember files.
610
611@end table
612
613
614@node Memory requirements
615@chapter Memory required to compress and decompress
616@cindex memory requirements
617
618The amount of memory required @strong{per worker thread} for decompression
619or testing is approximately the following:
620
621@itemize @bullet
622@item
623For decompression of a regular (seekable) file to another regular file,
624or for testing of a regular file; the dictionary size.
625
626@item
627For testing of a non-seekable file or of standard input; the dictionary
628size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets
629buffered (4 by default).
630
631@item
632For decompression of a regular file to a non-seekable file or to
633standard output; the dictionary size plus up to the number of @w{1 MiB}
634output packets buffered (64 by default).
635
636@item
637For decompression of a non-seekable file or of standard input; the
638dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input
639and output packets buffered (68 by default).
640@end itemize
641
642@noindent
643The amount of memory required @strong{per worker thread} for compression
644is approximately the following:
645
646@itemize @bullet
647@item
648For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size
649(@pxref{--data-size}). Default is @w{4.875 MiB}.
650
651@item
652For compression at other levels; 11 times the dictionary size plus 3.375
653times the data size. Default is @w{142 MiB}.
654@end itemize
655
656@noindent
657The following table shows the memory required @strong{per thread} for
658compression at a given level, using the default data size for each level:
659
660@multitable {Level} {Memory required}
661@item Level @tab Memory required
662@item -0 @tab   4.875 MiB
663@item -1 @tab  17.75 MiB
664@item -2 @tab  26.625 MiB
665@item -3 @tab  35.5 MiB
666@item -4 @tab  53.25 MiB
667@item -5 @tab  71 MiB
668@item -6 @tab 142 MiB
669@item -7 @tab 284 MiB
670@item -8 @tab 426 MiB
671@item -9 @tab 568 MiB
672@end multitable
673
674
675@node Minimum file sizes
676@chapter Minimum file sizes required for full compression speed
677@cindex minimum file sizes
678
679When compressing, plzip divides the input file into chunks and
680compresses as many chunks simultaneously as worker threads are chosen,
681creating a multimember compressed file.
682
683For this to work as expected (and roughly multiply the compression speed
684by the number of available processors), the uncompressed file must be at
685least as large as the number of worker threads times the chunk size
686(@pxref{--data-size}). Else some processors will not get any data to
687compress, and compression will be proportionally slower. The maximum
688speed increase achievable on a given file is limited by the ratio
689@w{(file_size / data_size)}. For example, a tarball the size of gcc or
690linux will scale up to 10 or 14 processors at level -9.
691
692The following table shows the minimum uncompressed file size needed for
693full use of N processors at a given compression level, using the default
694data size for each level:
695
696@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB}
697@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256
698@item Level
699@item -0 @tab   2 MiB @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  64 MiB @tab 256 MiB
700@item -1 @tab   4 MiB @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab 128 MiB @tab 512 MiB
701@item -2 @tab   6 MiB @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab 192 MiB @tab 768 MiB
702@item -3 @tab   8 MiB @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 256 MiB @tab   1 GiB
703@item -4 @tab  12 MiB @tab  24 MiB @tab  48 MiB @tab  96 MiB @tab 384 MiB @tab 1.5 GiB
704@item -5 @tab  16 MiB @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 512 MiB @tab   2 GiB
705@item -6 @tab  32 MiB @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab   1 GiB @tab   4 GiB
706@item -7 @tab  64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   2 GiB @tab   8 GiB
707@item -8 @tab  96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab   3 GiB @tab  12 GiB
708@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab   1 GiB @tab   4 GiB @tab  16 GiB
709@end multitable
710
711
712@node Trailing data
713@chapter Extra data appended to the file
714@cindex trailing data
715
716Sometimes extra data are found appended to a lzip file after the last
717member. Such trailing data may be:
718
719@itemize @bullet
720@item
721Padding added to make the file size a multiple of some block size, for
722example when writing to a tape. It is safe to append any amount of
723padding zero bytes to a lzip file.
724
725@item
726Useful data added by the user; a cryptographically secure hash, a
727description of file contents, etc. It is safe to append any amount of
728text to a lzip file as long as none of the first four bytes of the text
729match the corresponding byte in the string "LZIP", and the text does not
730contain any zero bytes (null characters). Nonzero bytes and zero bytes
731can't be safely mixed in trailing data.
732
733@item
734Garbage added by some not totally successful copy operation.
735
736@item
737Malicious data added to the file in order to make its total size and
738hash value (for a chosen hash) coincide with those of another file.
739
740@item
741In rare cases, trailing data could be the corrupt header of another
742member. In multimember or concatenated files the probability of
743corruption happening in the magic bytes is 5 times smaller than the
744probability of getting a false positive caused by the corruption of the
745integrity information itself. Therefore it can be considered to be below
746the noise level. Additionally, the test used by plzip to discriminate
747trailing data from a corrupt header has a Hamming distance (HD) of 3,
748and the 3 bit flips must happen in different magic bytes for the test to
749fail. In any case, the option @samp{--trailing-error} guarantees that
750any corrupt header will be detected.
751@end itemize
752
753Trailing data are in no way part of the lzip file format, but tools
754reading lzip files are expected to behave as correctly and usefully as
755possible in the presence of trailing data.
756
757Trailing data can be safely ignored in most cases. In some cases, like
758that of user-added data, they are expected to be ignored. In those cases
759where a file containing trailing data must be rejected, the option
760@samp{--trailing-error} can be used. @xref{--trailing-error}.
761
762
763@node Examples
764@chapter A small tutorial with examples
765@cindex examples
766
767WARNING! Even if plzip is bug-free, other causes may result in a corrupt
768compressed file (bugs in the system libraries, memory errors, etc).
769Therefore, if the data you are going to compress are important, give the
770option @samp{--keep} to plzip and don't remove the original file until you
771verify the compressed file with a command like
772@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during
773compression can only be detected by comparing the compressed file with the
774original because the corruption happens before plzip compresses the RAM
775contents, resulting in a valid compressed file containing wrong data.
776
777@sp 1
778@noindent
779Example 1: Extract all the files from archive @samp{foo.tar.lz}.
780
781@example
782  tar -xf foo.tar.lz
783or
784  plzip -cd foo.tar.lz | tar -xf -
785@end example
786
787@sp 1
788@noindent
789Example 2: Replace a regular file with its compressed version @samp{file.lz}
790and show the compression ratio.
791
792@example
793plzip -v file
794@end example
795
796@sp 1
797@noindent
798Example 3: Like example 1 but the created @samp{file.lz} has a block size of
799@w{1 MiB}. The compression ratio is not shown.
800
801@example
802plzip -B 1MiB file
803@end example
804
805@sp 1
806@noindent
807Example 4: Restore a regular file from its compressed version
808@samp{file.lz}. If the operation is successful, @samp{file.lz} is removed.
809
810@example
811plzip -d file.lz
812@end example
813
814@sp 1
815@noindent
816Example 5: Verify the integrity of the compressed file @samp{file.lz} and
817show status.
818
819@example
820plzip -tv file.lz
821@end example
822
823@sp 1
824@noindent
825Example 6: Compress a whole device in /dev/sdc and send the output to
826@samp{file.lz}.
827
828@example
829  plzip -c /dev/sdc > file.lz
830or
831  plzip /dev/sdc -o file.lz
832@end example
833
834@sp 1
835@anchor{concat-example}
836@noindent
837Example 7: The right way of concatenating the decompressed output of two or
838more compressed files. @xref{Trailing data}.
839
840@example
841Don't do this
842  cat file1.lz file2.lz file3.lz | plzip -d -
843Do this instead
844  plzip -cd file1.lz file2.lz file3.lz
845@end example
846
847@sp 1
848@noindent
849Example 8: Decompress @samp{file.lz} partially until @w{10 KiB} of
850decompressed data are produced.
851
852@example
853plzip -cd file.lz | dd bs=1024 count=10
854@end example
855
856@sp 1
857@noindent
858Example 9: Decompress @samp{file.lz} partially from decompressed byte at
859offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced).
860
861@example
862plzip -cd file.lz | dd bs=1000 skip=10 count=5
863@end example
864
865
866@node Problems
867@chapter Reporting bugs
868@cindex bugs
869@cindex getting help
870
871There are probably bugs in plzip. There are certainly errors and
872omissions in this manual. If you report them, they will get fixed. If
873you don't, no one will ever know about them and they will remain unfixed
874for all eternity, if not longer.
875
876If you find a bug in plzip, please send electronic mail to
877@email{lzip-bug@@nongnu.org}. Include the version number, which you can
878find by running @w{@samp{plzip --version}}.
879
880
881@node Concept index
882@unnumbered Concept index
883
884@printindex cp
885
886@bye
887