1\input texinfo @c -*-texinfo-*- 2@c %**start of header 3@setfilename plzip.info 4@documentencoding ISO-8859-15 5@settitle Plzip Manual 6@finalout 7@c %**end of header 8 9@set UPDATED 3 January 2021 10@set VERSION 1.9 11 12@dircategory Data Compression 13@direntry 14* Plzip: (plzip). Massively parallel implementation of lzip 15@end direntry 16 17 18@ifnothtml 19@titlepage 20@title Plzip 21@subtitle Massively parallel implementation of lzip 22@subtitle for Plzip version @value{VERSION}, @value{UPDATED} 23@author by Antonio Diaz Diaz 24 25@page 26@vskip 0pt plus 1filll 27@end titlepage 28 29@contents 30@end ifnothtml 31 32@ifnottex 33@node Top 34@top 35 36This manual is for Plzip (version @value{VERSION}, @value{UPDATED}). 37 38@menu 39* Introduction:: Purpose and features of plzip 40* Output:: Meaning of plzip's output 41* Invoking plzip:: Command line interface 42* Program design:: Internal structure of plzip 43* File format:: Detailed format of the compressed file 44* Memory requirements:: Memory required to compress and decompress 45* Minimum file sizes:: Minimum file sizes required for full speed 46* Trailing data:: Extra data appended to the file 47* Examples:: A small tutorial with examples 48* Problems:: Reporting bugs 49* Concept index:: Index of concepts 50@end menu 51 52@sp 1 53Copyright @copyright{} 2009-2021 Antonio Diaz Diaz. 54 55This manual is free documentation: you have unlimited permission to copy, 56distribute, and modify it. 57@end ifnottex 58 59 60@node Introduction 61@chapter Introduction 62@cindex introduction 63 64@uref{http://www.nongnu.org/lzip/plzip.html,,Plzip} 65is a massively parallel (multi-threaded) implementation of lzip, fully 66compatible with lzip 1.4 or newer. Plzip uses the compression library 67@uref{http://www.nongnu.org/lzip/lzlib.html,,lzlib}. 68 69@uref{http://www.nongnu.org/lzip/lzip.html,,Lzip} 70is a lossless data compressor with a user interface similar to the one 71of gzip or bzip2. Lzip uses a simplified form of the 'Lempel-Ziv-Markov 72chain-Algorithm' (LZMA) stream format, chosen to maximize safety and 73interoperability. Lzip can compress about as fast as gzip @w{(lzip -0)} or 74compress most files more than bzip2 @w{(lzip -9)}. Decompression speed is 75intermediate between gzip and bzip2. Lzip is better than gzip and bzip2 from 76a data recovery perspective. Lzip has been designed, written, and tested 77with great care to replace gzip and bzip2 as the standard general-purpose 78compressed format for unix-like systems. 79 80Plzip can compress/decompress large files on multiprocessor machines much 81faster than lzip, at the cost of a slightly reduced compression ratio (0.4 82to 2 percent larger compressed files). Note that the number of usable 83threads is limited by file size; on files larger than a few GB plzip can use 84hundreds of processors, but on files of only a few MB plzip is no faster 85than lzip. @xref{Minimum file sizes}. 86 87For creation and manipulation of compressed tar archives 88@uref{http://www.nongnu.org/lzip/manual/tarlz_manual.html,,tarlz} can be 89more efficient than using tar and plzip because tarlz is able to keep the 90alignment between tar members and lzip members. 91@ifnothtml 92@xref{Top,tarlz manual,,tarlz}. 93@end ifnothtml 94 95The lzip file format is designed for data sharing and long-term archiving, 96taking into account both data integrity and decoder availability: 97 98@itemize @bullet 99@item 100The lzip format provides very safe integrity checking and some data 101recovery means. The program 102@uref{http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Data-safety,,lziprecover} 103can repair bit flip errors (one of the most common forms of data corruption) 104in lzip files, and provides data recovery capabilities, including 105error-checked merging of damaged copies of a file. 106@ifnothtml 107@xref{Data safety,,,lziprecover}. 108@end ifnothtml 109 110@item 111The lzip format is as simple as possible (but not simpler). The lzip 112manual provides the source code of a simple decompressor along with a 113detailed explanation of how it works, so that with the only help of the 114lzip manual it would be possible for a digital archaeologist to extract 115the data from a lzip file long after quantum computers eventually render 116LZMA obsolete. 117 118@item 119Additionally the lzip reference implementation is copylefted, which 120guarantees that it will remain free forever. 121@end itemize 122 123A nice feature of the lzip format is that a corrupt byte is easier to repair 124the nearer it is from the beginning of the file. Therefore, with the help of 125lziprecover, losing an entire archive just because of a corrupt byte near 126the beginning is a thing of the past. 127 128Plzip uses the same well-defined exit status values used by lzip, which 129makes it safer than compressors returning ambiguous warning values (like 130gzip) when it is used as a back end for other programs like tar or zutils. 131 132Plzip will automatically use for each file the largest dictionary size that 133does not exceed neither the file size nor the limit given. Keep in mind that 134the decompression memory requirement is affected at compression time by the 135choice of dictionary size limit. @xref{Memory requirements}. 136 137When compressing, plzip replaces every file given in the command line 138with a compressed version of itself, with the name "original_name.lz". 139When decompressing, plzip attempts to guess the name for the decompressed 140file from that of the compressed file as follows: 141 142@multitable {anyothername} {becomes} {anyothername.out} 143@item filename.lz @tab becomes @tab filename 144@item filename.tlz @tab becomes @tab filename.tar 145@item anyothername @tab becomes @tab anyothername.out 146@end multitable 147 148(De)compressing a file is much like copying or moving it; therefore plzip 149preserves the access and modification dates, permissions, and, when 150possible, ownership of the file just as @samp{cp -p} does. (If the user ID or 151the group ID can't be duplicated, the file permission bits S_ISUID and 152S_ISGID are cleared). 153 154Plzip is able to read from some types of non-regular files if either the 155option @samp{-c} or the option @samp{-o} is specified. 156 157Plzip will refuse to read compressed data from a terminal or write compressed 158data to a terminal, as this would be entirely incomprehensible and might 159leave the terminal in an abnormal state. 160 161Plzip will correctly decompress a file which is the concatenation of two or 162more compressed files. The result is the concatenation of the corresponding 163decompressed files. Integrity testing of concatenated compressed files is 164also supported. 165 166 167@node Output 168@chapter Meaning of plzip's output 169@cindex output 170 171The output of plzip looks like this: 172 173@example 174plzip -v foo 175 foo: 6.676:1, 14.98% ratio, 85.02% saved, 450560 in, 67493 out. 176 177plzip -tvvv foo.lz 178 foo.lz: 6.676:1, 14.98% ratio, 85.02% saved. 450560 out, 67493 in. ok 179@end example 180 181The meaning of each field is as follows: 182 183@table @code 184@item N:1 185The compression ratio @w{(uncompressed_size / compressed_size)}, shown as 186@w{N to 1}. 187 188@item ratio 189The inverse compression ratio @w{(compressed_size / uncompressed_size)}, 190shown as a percentage. A decimal ratio is easily obtained by moving the 191decimal point two places to the left; @w{14.98% = 0.1498}. 192 193@item saved 194The space saved by compression @w{(1 - ratio)}, shown as a percentage. 195 196@item in 197Size of the input data. This is the uncompressed size when compressing, or 198the compressed size when decompressing or testing. Note that plzip always 199prints the uncompressed size before the compressed size when compressing, 200decompressing, testing, or listing. 201 202@item out 203Size of the output data. This is the compressed size when compressing, or 204the decompressed size when decompressing or testing. 205 206@end table 207 208When decompressing or testing at verbosity level 4 (-vvvv), the dictionary 209size used to compress the file is also shown. 210 211LANGUAGE NOTE: Uncompressed = not compressed = plain data; it may never have 212been compressed. Decompressed is used to refer to data which have undergone 213the process of decompression. 214 215 216@node Invoking plzip 217@chapter Invoking plzip 218@cindex invoking 219@cindex options 220@cindex usage 221@cindex version 222 223The format for running plzip is: 224 225@example 226plzip [@var{options}] [@var{files}] 227@end example 228 229@noindent 230If no file names are specified, plzip compresses (or decompresses) from 231standard input to standard output. A hyphen @samp{-} used as a @var{file} 232argument means standard input. It can be mixed with other @var{files} and is 233read just once, the first time it appears in the command line. 234 235plzip supports the following 236@uref{http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax,,options}: 237@ifnothtml 238@xref{Argument syntax,,,arg_parser}. 239@end ifnothtml 240 241@table @code 242@item -h 243@itemx --help 244Print an informative help message describing the options and exit. 245 246@item -V 247@itemx --version 248Print the version number of plzip on the standard output and exit. 249This version number should be included in all bug reports. 250 251@anchor{--trailing-error} 252@item -a 253@itemx --trailing-error 254Exit with error status 2 if any remaining input is detected after 255decompressing the last member. Such remaining input is usually trailing 256garbage that can be safely ignored. @xref{concat-example}. 257 258@anchor{--data-size} 259@item -B @var{bytes} 260@itemx --data-size=@var{bytes} 261When compressing, set the size of the input data blocks in bytes. The 262input file will be divided in chunks of this size before compression is 263performed. Valid values range from @w{8 KiB} to @w{1 GiB}. Default value 264is two times the dictionary size, except for option @samp{-0} where it 265defaults to @w{1 MiB}. Plzip will reduce the dictionary size if it is 266larger than the data size specified. @xref{Minimum file sizes}. 267 268@item -c 269@itemx --stdout 270Compress or decompress to standard output; keep input files unchanged. If 271compressing several files, each file is compressed independently. This 272option (or @samp{-o}) is needed when reading from a named pipe (fifo) or 273from a device. Use @w{@samp{lziprecover -cd -i}} to recover as much of the 274decompressed data as possible when decompressing a corrupt file. @samp{-c} 275overrides @samp{-o}. @samp{-c} has no effect when testing or listing. 276 277@item -d 278@itemx --decompress 279Decompress the files specified. If a file does not exist or can't be 280opened, plzip continues decompressing the rest of the files. If a file 281fails to decompress, or is a terminal, plzip exits immediately without 282decompressing the rest of the files. 283 284@item -f 285@itemx --force 286Force overwrite of output files. 287 288@item -F 289@itemx --recompress 290When compressing, force re-compression of files whose name already has 291the @samp{.lz} or @samp{.tlz} suffix. 292 293@item -k 294@itemx --keep 295Keep (don't delete) input files during compression or decompression. 296 297@item -l 298@itemx --list 299Print the uncompressed size, compressed size, and percentage saved of the 300files specified. Trailing data are ignored. The values produced are correct 301even for multimember files. If more than one file is given, a final line 302containing the cumulative sizes is printed. With @samp{-v}, the dictionary 303size, the number of members in the file, and the amount of trailing data (if 304any) are also printed. With @samp{-vv}, the positions and sizes of each 305member in multimember files are also printed. 306 307@samp{-lq} can be used to verify quickly (without decompressing) the 308structural integrity of the files specified. (Use @samp{--test} to verify 309the data integrity). @samp{-alq} additionally verifies that none of the 310files specified contain trailing data. 311 312@item -m @var{bytes} 313@itemx --match-length=@var{bytes} 314When compressing, set the match length limit in bytes. After a match 315this long is found, the search is finished. Valid values range from 5 to 316273. Larger values usually give better compression ratios but longer 317compression times. 318 319@item -n @var{n} 320@itemx --threads=@var{n} 321Set the maximum number of worker threads, overriding the system's default. 322Valid values range from 1 to "as many as your system can support". If this 323option is not used, plzip tries to detect the number of processors in the 324system and use it as default value. When compressing on a @w{32 bit} system, 325plzip tries to limit the memory use to under @w{2.22 GiB} (4 worker threads 326at level -9) by reducing the number of threads below the system's default. 327@w{@samp{plzip --help}} shows the system's default value. 328 329Plzip starts the number of threads required by each file without exceeding 330the value specified. Note that the number of usable threads is limited to 331@w{ceil( file_size / data_size )} during compression (@pxref{Minimum file 332sizes}), and to the number of members in the input during decompression. You 333can find the number of members in a lzip file by running 334@w{@samp{plzip -lv file.lz}}. 335 336@item -o @var{file} 337@itemx --output=@var{file} 338If @samp{-c} has not been also specified, write the (de)compressed output to 339@var{file}; keep input files unchanged. If compressing several files, each 340file is compressed independently. This option (or @samp{-c}) is needed when 341reading from a named pipe (fifo) or from a device. @w{@samp{-o -}} is 342equivalent to @samp{-c}. @samp{-o} has no effect when testing or listing. 343 344In order to keep backward compatibility with plzip versions prior to 1.9, 345when compressing from standard input and no other file names are given, the 346extension @samp{.lz} is appended to @var{file} unless it already ends in 347@samp{.lz} or @samp{.tlz}. This feature will be removed in a future version 348of plzip. Meanwhile, redirection may be used instead of @samp{-o} to write 349the compressed output to a file without the extension @samp{.lz} in its 350name: @w{@samp{plzip < file > foo}}. 351 352@item -q 353@itemx --quiet 354Quiet operation. Suppress all messages. 355 356@item -s @var{bytes} 357@itemx --dictionary-size=@var{bytes} 358When compressing, set the dictionary size limit in bytes. Plzip will use 359for each file the largest dictionary size that does not exceed neither 360the file size nor this limit. Valid values range from @w{4 KiB} to 361@w{512 MiB}. Values 12 to 29 are interpreted as powers of two, meaning 3622^12 to 2^29 bytes. Dictionary sizes are quantized so that they can be 363coded in just one byte (@pxref{coded-dict-size}). If the size specified 364does not match one of the valid sizes, it will be rounded upwards by 365adding up to @w{(@var{bytes} / 8)} to it. 366 367For maximum compression you should use a dictionary size limit as large 368as possible, but keep in mind that the decompression memory requirement 369is affected at compression time by the choice of dictionary size limit. 370 371@item -t 372@itemx --test 373Check integrity of the files specified, but don't decompress them. This 374really performs a trial decompression and throws away the result. Use it 375together with @samp{-v} to see information about the files. If a file 376fails the test, does not exist, can't be opened, or is a terminal, plzip 377continues checking the rest of the files. A final diagnostic is shown at 378verbosity level 1 or higher if any file fails the test when testing 379multiple files. 380 381@item -v 382@itemx --verbose 383Verbose mode.@* 384When compressing, show the compression ratio and size for each file 385processed.@* 386When decompressing or testing, further -v's (up to 4) increase the 387verbosity level, showing status, compression ratio, dictionary size, 388decompressed size, and compressed size.@* 389Two or more @samp{-v} options show the progress of (de)compression, 390except for single-member files. 391 392@item -0 .. -9 393Compression level. Set the compression parameters (dictionary size and 394match length limit) as shown in the table below. The default compression 395level is @samp{-6}, equivalent to @w{@samp{-s8MiB -m36}}. Note that 396@samp{-9} can be much slower than @samp{-0}. These options have no 397effect when decompressing, testing, or listing. 398 399The bidimensional parameter space of LZMA can't be mapped to a linear 400scale optimal for all files. If your files are large, very repetitive, 401etc, you may need to use the options @samp{--dictionary-size} and 402@samp{--match-length} directly to achieve optimal performance. 403 404If several compression levels or @samp{-s} or @samp{-m} options are 405given, the last setting is used. For example @w{@samp{-9 -s64MiB}} is 406equivalent to @w{@samp{-s64MiB -m273}} 407 408@multitable {Level} {Dictionary size (-s)} {Match length limit (-m)} 409@item Level @tab Dictionary size (-s) @tab Match length limit (-m) 410@item -0 @tab 64 KiB @tab 16 bytes 411@item -1 @tab 1 MiB @tab 5 bytes 412@item -2 @tab 1.5 MiB @tab 6 bytes 413@item -3 @tab 2 MiB @tab 8 bytes 414@item -4 @tab 3 MiB @tab 12 bytes 415@item -5 @tab 4 MiB @tab 20 bytes 416@item -6 @tab 8 MiB @tab 36 bytes 417@item -7 @tab 16 MiB @tab 68 bytes 418@item -8 @tab 24 MiB @tab 132 bytes 419@item -9 @tab 32 MiB @tab 273 bytes 420@end multitable 421 422@item --fast 423@itemx --best 424Aliases for GNU gzip compatibility. 425 426@item --loose-trailing 427When decompressing, testing, or listing, allow trailing data whose first 428bytes are so similar to the magic bytes of a lzip header that they can 429be confused with a corrupt header. Use this option if a file triggers a 430"corrupt header" error and the cause is not indeed a corrupt header. 431 432@item --in-slots=@var{n} 433Number of @w{1 MiB} input packets buffered per worker thread when 434decompressing from non-seekable input. Increasing the number of packets 435may increase decompression speed, but requires more memory. Valid values 436range from 1 to 64. The default value is 4. 437 438@item --out-slots=@var{n} 439Number of @w{1 MiB} output packets buffered per worker thread when 440decompressing to non-seekable output. Increasing the number of packets 441may increase decompression speed, but requires more memory. Valid values 442range from 1 to 1024. The default value is 64. 443 444@item --check-lib 445Compare the 446@uref{http://www.nongnu.org/lzip/manual/lzlib_manual.html#Library-version,,version of lzlib} 447used to compile plzip with the version actually being used at run time and 448exit. Report any differences found. Exit with error status 1 if differences 449are found. A mismatch may indicate that lzlib is not correctly installed or 450that a different version of lzlib has been installed after compiling plzip. 451@w{@samp{plzip -v --check-lib}} shows the version of lzlib being used and 452the value of @samp{LZ_API_VERSION} (if defined). 453@ifnothtml 454@xref{Library version,,,lzlib}. 455@end ifnothtml 456 457@end table 458 459Numbers given as arguments to options may be followed by a multiplier 460and an optional @samp{B} for "byte". 461 462Table of SI and binary prefixes (unit multipliers): 463 464@multitable {Prefix} {kilobyte (10^3 = 1000)} {|} {Prefix} {kibibyte (2^10 = 1024)} 465@item Prefix @tab Value @tab | @tab Prefix @tab Value 466@item k @tab kilobyte (10^3 = 1000) @tab | @tab Ki @tab kibibyte (2^10 = 1024) 467@item M @tab megabyte (10^6) @tab | @tab Mi @tab mebibyte (2^20) 468@item G @tab gigabyte (10^9) @tab | @tab Gi @tab gibibyte (2^30) 469@item T @tab terabyte (10^12) @tab | @tab Ti @tab tebibyte (2^40) 470@item P @tab petabyte (10^15) @tab | @tab Pi @tab pebibyte (2^50) 471@item E @tab exabyte (10^18) @tab | @tab Ei @tab exbibyte (2^60) 472@item Z @tab zettabyte (10^21) @tab | @tab Zi @tab zebibyte (2^70) 473@item Y @tab yottabyte (10^24) @tab | @tab Yi @tab yobibyte (2^80) 474@end multitable 475 476@sp 1 477Exit status: 0 for a normal exit, 1 for environmental problems (file not 478found, invalid flags, I/O errors, etc), 2 to indicate a corrupt or 479invalid input file, 3 for an internal consistency error (eg, bug) which 480caused plzip to panic. 481 482 483@node Program design 484@chapter Internal structure of plzip 485@cindex program design 486 487When compressing, plzip divides the input file into chunks and compresses as 488many chunks simultaneously as worker threads are chosen, creating a 489multimember compressed file. 490 491When decompressing, plzip decompresses as many members simultaneously as 492worker threads are chosen. Files that were compressed with lzip will not 493be decompressed faster than using lzip (unless the option @samp{-b} was used) 494because lzip usually produces single-member files, which can't be 495decompressed in parallel. 496 497For each input file, a splitter thread and several worker threads are 498created, acting the main thread as muxer (multiplexer) thread. A "packet 499courier" takes care of data transfers among threads and limits the 500maximum number of data blocks (packets) being processed simultaneously. 501 502The splitter reads data blocks from the input file, and distributes them 503to the workers. The workers (de)compress the blocks received from the 504splitter. The muxer collects processed packets from the workers, and 505writes them to the output file. 506 507@verbatim 508 ,------------, 509 ,-->| worker 0 |--, 510 | `------------' | 511,-------, ,----------, | ,------------, | ,-------, ,--------, 512| input |-->| splitter |-+-->| worker 1 |--+-->| muxer |-->| output | 513| file | `----------' | `------------' | `-------' | file | 514`-------' | ... | `--------' 515 | ,------------, | 516 `-->| worker N-1 |--' 517 `------------' 518@end verbatim 519 520When decompressing from a regular file, the splitter is removed and the 521workers read directly from the input file. If the output file is also a 522regular file, the muxer is also removed and the workers write directly 523to the output file. With these optimizations, the use of RAM is greatly 524reduced and the decompression speed of large files with many members is 525only limited by the number of processors available and by I/O speed. 526 527 528@node File format 529@chapter File format 530@cindex file format 531 532Perfection is reached, not when there is no longer anything to add, but 533when there is no longer anything to take away.@* 534--- Antoine de Saint-Exupery 535 536@sp 1 537In the diagram below, a box like this: 538 539@verbatim 540+---+ 541| | <-- the vertical bars might be missing 542+---+ 543@end verbatim 544 545represents one byte; a box like this: 546 547@verbatim 548+==============+ 549| | 550+==============+ 551@end verbatim 552 553represents a variable number of bytes. 554 555@sp 1 556A lzip file consists of a series of "members" (compressed data sets). 557The members simply appear one after another in the file, with no 558additional information before, between, or after them. 559 560Each member has the following structure: 561 562@verbatim 563+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 564| ID string | VN | DS | LZMA stream | CRC32 | Data size | Member size | 565+--+--+--+--+----+----+=============+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 566@end verbatim 567 568All multibyte values are stored in little endian order. 569 570@table @samp 571@item ID string (the "magic" bytes) 572A four byte string, identifying the lzip format, with the value "LZIP" 573(0x4C, 0x5A, 0x49, 0x50). 574 575@item VN (version number, 1 byte) 576Just in case something needs to be modified in the future. 1 for now. 577 578@anchor{coded-dict-size} 579@item DS (coded dictionary size, 1 byte) 580The dictionary size is calculated by taking a power of 2 (the base size) 581and subtracting from it a fraction between 0/16 and 7/16 of the base size.@* 582Bits 4-0 contain the base 2 logarithm of the base size (12 to 29).@* 583Bits 7-5 contain the numerator of the fraction (0 to 7) to subtract 584from the base size to obtain the dictionary size.@* 585Example: 0xD3 = 2^19 - 6 * 2^15 = 512 KiB - 6 * 32 KiB = 320 KiB@* 586Valid values for dictionary size range from 4 KiB to 512 MiB. 587 588@item LZMA stream 589The LZMA stream, finished by an end of stream marker. Uses default values 590for encoder properties. 591@ifnothtml 592@xref{Stream format,,,lzip}, 593@end ifnothtml 594@ifhtml 595See 596@uref{http://www.nongnu.org/lzip/manual/lzip_manual.html#Stream-format,,Stream format} 597@end ifhtml 598for a complete description. 599 600@item CRC32 (4 bytes) 601Cyclic Redundancy Check (CRC) of the uncompressed original data. 602 603@item Data size (8 bytes) 604Size of the uncompressed original data. 605 606@item Member size (8 bytes) 607Total size of the member, including header and trailer. This field acts 608as a distributed index, allows the verification of stream integrity, and 609facilitates safe recovery of undamaged members from multimember files. 610 611@end table 612 613 614@node Memory requirements 615@chapter Memory required to compress and decompress 616@cindex memory requirements 617 618The amount of memory required @strong{per worker thread} for decompression 619or testing is approximately the following: 620 621@itemize @bullet 622@item 623For decompression of a regular (seekable) file to another regular file, 624or for testing of a regular file; the dictionary size. 625 626@item 627For testing of a non-seekable file or of standard input; the dictionary 628size plus @w{1 MiB} plus up to the number of @w{1 MiB} input packets 629buffered (4 by default). 630 631@item 632For decompression of a regular file to a non-seekable file or to 633standard output; the dictionary size plus up to the number of @w{1 MiB} 634output packets buffered (64 by default). 635 636@item 637For decompression of a non-seekable file or of standard input; the 638dictionary size plus @w{1 MiB} plus up to the number of @w{1 MiB} input 639and output packets buffered (68 by default). 640@end itemize 641 642@noindent 643The amount of memory required @strong{per worker thread} for compression 644is approximately the following: 645 646@itemize @bullet 647@item 648For compression at level -0; @w{1.5 MiB} plus 3.375 times the data size 649(@pxref{--data-size}). Default is @w{4.875 MiB}. 650 651@item 652For compression at other levels; 11 times the dictionary size plus 3.375 653times the data size. Default is @w{142 MiB}. 654@end itemize 655 656@noindent 657The following table shows the memory required @strong{per thread} for 658compression at a given level, using the default data size for each level: 659 660@multitable {Level} {Memory required} 661@item Level @tab Memory required 662@item -0 @tab 4.875 MiB 663@item -1 @tab 17.75 MiB 664@item -2 @tab 26.625 MiB 665@item -3 @tab 35.5 MiB 666@item -4 @tab 53.25 MiB 667@item -5 @tab 71 MiB 668@item -6 @tab 142 MiB 669@item -7 @tab 284 MiB 670@item -8 @tab 426 MiB 671@item -9 @tab 568 MiB 672@end multitable 673 674 675@node Minimum file sizes 676@chapter Minimum file sizes required for full compression speed 677@cindex minimum file sizes 678 679When compressing, plzip divides the input file into chunks and 680compresses as many chunks simultaneously as worker threads are chosen, 681creating a multimember compressed file. 682 683For this to work as expected (and roughly multiply the compression speed 684by the number of available processors), the uncompressed file must be at 685least as large as the number of worker threads times the chunk size 686(@pxref{--data-size}). Else some processors will not get any data to 687compress, and compression will be proportionally slower. The maximum 688speed increase achievable on a given file is limited by the ratio 689@w{(file_size / data_size)}. For example, a tarball the size of gcc or 690linux will scale up to 10 or 14 processors at level -9. 691 692The following table shows the minimum uncompressed file size needed for 693full use of N processors at a given compression level, using the default 694data size for each level: 695 696@multitable {Processors} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} {512 MiB} 697@headitem Processors @tab 2 @tab 4 @tab 8 @tab 16 @tab 64 @tab 256 698@item Level 699@item -0 @tab 2 MiB @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 64 MiB @tab 256 MiB 700@item -1 @tab 4 MiB @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 128 MiB @tab 512 MiB 701@item -2 @tab 6 MiB @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 192 MiB @tab 768 MiB 702@item -3 @tab 8 MiB @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 256 MiB @tab 1 GiB 703@item -4 @tab 12 MiB @tab 24 MiB @tab 48 MiB @tab 96 MiB @tab 384 MiB @tab 1.5 GiB 704@item -5 @tab 16 MiB @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 512 MiB @tab 2 GiB 705@item -6 @tab 32 MiB @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 1 GiB @tab 4 GiB 706@item -7 @tab 64 MiB @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 2 GiB @tab 8 GiB 707@item -8 @tab 96 MiB @tab 192 MiB @tab 384 MiB @tab 768 MiB @tab 3 GiB @tab 12 GiB 708@item -9 @tab 128 MiB @tab 256 MiB @tab 512 MiB @tab 1 GiB @tab 4 GiB @tab 16 GiB 709@end multitable 710 711 712@node Trailing data 713@chapter Extra data appended to the file 714@cindex trailing data 715 716Sometimes extra data are found appended to a lzip file after the last 717member. Such trailing data may be: 718 719@itemize @bullet 720@item 721Padding added to make the file size a multiple of some block size, for 722example when writing to a tape. It is safe to append any amount of 723padding zero bytes to a lzip file. 724 725@item 726Useful data added by the user; a cryptographically secure hash, a 727description of file contents, etc. It is safe to append any amount of 728text to a lzip file as long as none of the first four bytes of the text 729match the corresponding byte in the string "LZIP", and the text does not 730contain any zero bytes (null characters). Nonzero bytes and zero bytes 731can't be safely mixed in trailing data. 732 733@item 734Garbage added by some not totally successful copy operation. 735 736@item 737Malicious data added to the file in order to make its total size and 738hash value (for a chosen hash) coincide with those of another file. 739 740@item 741In rare cases, trailing data could be the corrupt header of another 742member. In multimember or concatenated files the probability of 743corruption happening in the magic bytes is 5 times smaller than the 744probability of getting a false positive caused by the corruption of the 745integrity information itself. Therefore it can be considered to be below 746the noise level. Additionally, the test used by plzip to discriminate 747trailing data from a corrupt header has a Hamming distance (HD) of 3, 748and the 3 bit flips must happen in different magic bytes for the test to 749fail. In any case, the option @samp{--trailing-error} guarantees that 750any corrupt header will be detected. 751@end itemize 752 753Trailing data are in no way part of the lzip file format, but tools 754reading lzip files are expected to behave as correctly and usefully as 755possible in the presence of trailing data. 756 757Trailing data can be safely ignored in most cases. In some cases, like 758that of user-added data, they are expected to be ignored. In those cases 759where a file containing trailing data must be rejected, the option 760@samp{--trailing-error} can be used. @xref{--trailing-error}. 761 762 763@node Examples 764@chapter A small tutorial with examples 765@cindex examples 766 767WARNING! Even if plzip is bug-free, other causes may result in a corrupt 768compressed file (bugs in the system libraries, memory errors, etc). 769Therefore, if the data you are going to compress are important, give the 770option @samp{--keep} to plzip and don't remove the original file until you 771verify the compressed file with a command like 772@w{@samp{plzip -cd file.lz | cmp file -}}. Most RAM errors happening during 773compression can only be detected by comparing the compressed file with the 774original because the corruption happens before plzip compresses the RAM 775contents, resulting in a valid compressed file containing wrong data. 776 777@sp 1 778@noindent 779Example 1: Extract all the files from archive @samp{foo.tar.lz}. 780 781@example 782 tar -xf foo.tar.lz 783or 784 plzip -cd foo.tar.lz | tar -xf - 785@end example 786 787@sp 1 788@noindent 789Example 2: Replace a regular file with its compressed version @samp{file.lz} 790and show the compression ratio. 791 792@example 793plzip -v file 794@end example 795 796@sp 1 797@noindent 798Example 3: Like example 1 but the created @samp{file.lz} has a block size of 799@w{1 MiB}. The compression ratio is not shown. 800 801@example 802plzip -B 1MiB file 803@end example 804 805@sp 1 806@noindent 807Example 4: Restore a regular file from its compressed version 808@samp{file.lz}. If the operation is successful, @samp{file.lz} is removed. 809 810@example 811plzip -d file.lz 812@end example 813 814@sp 1 815@noindent 816Example 5: Verify the integrity of the compressed file @samp{file.lz} and 817show status. 818 819@example 820plzip -tv file.lz 821@end example 822 823@sp 1 824@noindent 825Example 6: Compress a whole device in /dev/sdc and send the output to 826@samp{file.lz}. 827 828@example 829 plzip -c /dev/sdc > file.lz 830or 831 plzip /dev/sdc -o file.lz 832@end example 833 834@sp 1 835@anchor{concat-example} 836@noindent 837Example 7: The right way of concatenating the decompressed output of two or 838more compressed files. @xref{Trailing data}. 839 840@example 841Don't do this 842 cat file1.lz file2.lz file3.lz | plzip -d - 843Do this instead 844 plzip -cd file1.lz file2.lz file3.lz 845@end example 846 847@sp 1 848@noindent 849Example 8: Decompress @samp{file.lz} partially until @w{10 KiB} of 850decompressed data are produced. 851 852@example 853plzip -cd file.lz | dd bs=1024 count=10 854@end example 855 856@sp 1 857@noindent 858Example 9: Decompress @samp{file.lz} partially from decompressed byte at 859offset 10000 to decompressed byte at offset 14999 (5000 bytes are produced). 860 861@example 862plzip -cd file.lz | dd bs=1000 skip=10 count=5 863@end example 864 865 866@node Problems 867@chapter Reporting bugs 868@cindex bugs 869@cindex getting help 870 871There are probably bugs in plzip. There are certainly errors and 872omissions in this manual. If you report them, they will get fixed. If 873you don't, no one will ever know about them and they will remain unfixed 874for all eternity, if not longer. 875 876If you find a bug in plzip, please send electronic mail to 877@email{lzip-bug@@nongnu.org}. Include the version number, which you can 878find by running @w{@samp{plzip --version}}. 879 880 881@node Concept index 882@unnumbered Concept index 883 884@printindex cp 885 886@bye 887