1 2XZ Utils FAQ 3============ 4 5Q: What do the letters XZ mean? 6 7A: Nothing. They are just two letters, which come from the file format 8 suffix .xz. The .xz suffix was selected, because it seemed to be 9 pretty much unused. It has no deeper meaning. 10 11 12Q: What are LZMA and LZMA2? 13 14A: LZMA stands for Lempel-Ziv-Markov chain-Algorithm. It is the name 15 of the compression algorithm designed by Igor Pavlov for 7-Zip. 16 LZMA is based on LZ77 and range encoding. 17 18 LZMA2 is an updated version of the original LZMA to fix a couple of 19 practical issues. In context of XZ Utils, LZMA is called LZMA1 to 20 emphasize that LZMA is not the same thing as LZMA2. LZMA2 is the 21 primary compression algorithm in the .xz file format. 22 23 24Q: There are many LZMA related projects. How does XZ Utils relate to them? 25 26A: 7-Zip and LZMA SDK are the original projects. LZMA SDK is roughly 27 a subset of the 7-Zip source tree. 28 29 p7zip is 7-Zip's command-line tools ported to POSIX-like systems. 30 31 LZMA Utils provide a gzip-like lzma tool for POSIX-like systems. 32 LZMA Utils are based on LZMA SDK. XZ Utils are the successor to 33 LZMA Utils. 34 35 There are several other projects using LZMA. Most are more or less 36 based on LZMA SDK. See <http://7-zip.org/links.html>. 37 38 39Q: Why is liblzma named liblzma if its primary file format is .xz? 40 Shouldn't it be e.g. libxz? 41 42A: When the designing of the .xz format began, the idea was to replace 43 the .lzma format and use the same .lzma suffix. It would have been 44 quite OK to reuse the suffix when there were very few .lzma files 45 around. However, the old .lzma format became popular before the 46 new format was finished. The new format was renamed to .xz but the 47 name of liblzma wasn't changed. 48 49 50Q: Do XZ Utils support the .7z format? 51 52A: No. Use 7-Zip (Windows) or p7zip (POSIX-like systems) to handle .7z 53 files. 54 55 56Q: I have many .tar.7z files. Can I convert them to .tar.xz without 57 spending hours recompressing the data? 58 59A: In the "extra" directory, there is a script named 7z2lzma.bash which 60 is able to convert some .7z files to the .lzma format (not .xz). It 61 needs the 7za (or 7z) command from p7zip. The script may silently 62 produce corrupt output if certain assumptions are not met, so 63 decompress the resulting .lzma file and compare it against the 64 original before deleting the original file! 65 66 67Q: I have many .lzma files. Can I quickly convert them to the .xz format? 68 69A: For now, no. Since XZ Utils supports the .lzma format, it's usually 70 not too bad to keep the old files in the old format. If you want to 71 do the conversion anyway, you need to decompress the .lzma files and 72 then recompress to the .xz format. 73 74 Technically, there is a way to make the conversion relatively fast 75 (roughly twice the time that normal decompression takes). Writing 76 such a tool would take quite a bit of time though, and would probably 77 be useful to only a few people. If you really want such a conversion 78 tool, contact Lasse Collin and offer some money. 79 80 81Q: I have installed xz, but my tar doesn't recognize .tar.xz files. 82 How can I extract .tar.xz files? 83 84A: xz -dc foo.tar.xz | tar xf - 85 86 87Q: Can I recover parts of a broken .xz file (e.g. a corrupted CD-R)? 88 89A: It may be possible if the file consists of multiple blocks, which 90 typically is not the case if the file was created in single-threaded 91 mode. There is no recovery program yet. 92 93 94Q: Is (some part of) XZ Utils patented? 95 96A: Lasse Collin is not aware of any patents that could affect XZ Utils. 97 However, due to the nature of software patents, it's not possible to 98 guarantee that XZ Utils isn't affected by any third party patent(s). 99 100 101Q: Where can I find documentation about the file format and algorithms? 102 103A: The .xz format is documented in xz-file-format.txt. It is a container 104 format only, and doesn't include descriptions of any non-trivial 105 filters. 106 107 Documenting LZMA and LZMA2 is planned, but for now, there is no other 108 documentation than the source code. Before you begin, you should know 109 the basics of LZ77 and range-coding algorithms. LZMA is based on LZ77, 110 but LZMA is a lot more complex. Range coding is used to compress 111 the final bitstream like Huffman coding is used in Deflate. 112 113 114Q: I cannot find BCJ and BCJ2 filters. Don't they exist in liblzma? 115 116A: BCJ filter is called "x86" in liblzma. BCJ2 is not included, 117 because it requires using more than one encoded output stream. 118 A streamable version of BCJ2-style filtering is planned. 119 120 121Q: I need to use a script that runs "xz -9". On a system with 256 MiB 122 of RAM, xz says that it cannot allocate memory. Can I make the 123 script work without modifying it? 124 125A: Set a default memory usage limit for compression. You can do it e.g. 126 in a shell initialization script such as ~/.bashrc or /etc/profile: 127 128 XZ_DEFAULTS=--memlimit-compress=150MiB 129 export XZ_DEFAULTS 130 131 xz will then scale the compression settings down so that the given 132 memory usage limit is not reached. This way xz shouldn't run out 133 of memory. 134 135 Check also that memory-related resource limits are high enough. 136 On most systems, "ulimit -a" will show the current resource limits. 137 138 139Q: How do I create files that can be decompressed with XZ Embedded? 140 141A: See the documentation in XZ Embedded. In short, something like 142 this is a good start: 143 144 xz --check=crc32 --lzma2=preset=6e,dict=64KiB 145 146 Or if a BCJ filter is needed too, e.g. if compressing 147 a kernel image for PowerPC: 148 149 xz --check=crc32 --powerpc --lzma2=preset=6e,dict=64KiB 150 151 Adjust the dictionary size to get a good compromise between 152 compression ratio and decompressor memory usage. Note that 153 in single-call decompression mode of XZ Embedded, a big 154 dictionary doesn't increase memory usage. 155 156 157Q: Will xz support threaded compression? 158 159A: It is planned and has been taken into account when designing 160 the .xz file format. Eventually there will probably be three types 161 of threading, each method having its own advantages and disadvantages. 162 163 The simplest method is splitting the uncompressed data into blocks 164 and compressing them in parallel independent from each other. 165 Since the blocks are compressed independently, they can also be 166 decompressed independently. Together with the index feature in .xz, 167 this allows using threads to create .xz files for random-access 168 reading. This also makes threaded decompression possible, although 169 it is not clear if threaded decompression will ever be implemented. 170 171 The independent blocks method has a couple of disadvantages too. It 172 will compress worse than a single-block method. Often the difference 173 is not too big (maybe 1-2 %) but sometimes it can be too big. Also, 174 the memory usage of the compressor increases linearly when adding 175 threads. 176 177 Match finder parallelization is another threading method. It has 178 been in 7-Zip for ages. It doesn't affect compression ratio or 179 memory usage significantly. Among the three threading methods, only 180 this is useful when compressing small files (files that are not 181 significantly bigger than the dictionary). Unfortunately this method 182 scales only to about two CPU cores. 183 184 The third method is pigz-style threading (I use that name, because 185 pigz <http://www.zlib.net/pigz/> uses that method). It doesn't 186 affect compression ratio significantly and scales to many cores. 187 The memory usage scales linearly when threads are added. This isn't 188 significant with pigz, because Deflate uses only a 32 KiB dictionary, 189 but with LZMA2 the memory usage will increase dramatically just like 190 with the independent-blocks method. There is also a constant 191 computational overhead, which may make pigz-method a bit dull on 192 dual-core compared to the parallel match finder method, but with more 193 cores the overhead is not a big deal anymore. 194 195 Combining the threading methods will be possible and also useful. 196 E.g. combining match finder parallelization with pigz-style threading 197 can cut the memory usage by 50 %. 198 199 It is possible that the single-threaded method will be modified to 200 create files identical to the pigz-style method. We'll see once 201 pigz-style threading has been implemented in liblzma. 202 203 204Q: How do I build a program that needs liblzmadec (lzmadec.h)? 205 206A: liblzmadec is part of LZMA Utils. XZ Utils has liblzma, but no 207 liblzmadec. The code using liblzmadec should be ported to use 208 liblzma instead. If you cannot or don't want to do that, download 209 LZMA Utils from <http://tukaani.org/lzma/>. 210 211 212Q: The default build of liblzma is too big. How can I make it smaller? 213 214A: Give --enable-small to the configure script. Use also appropriate 215 --enable or --disable options to include only those filter encoders 216 and decoders and integrity checks that you actually need. Use 217 CFLAGS=-Os (with GCC) or equivalent to tell your compiler to optimize 218 for size. See INSTALL for information about configure options. 219 220 If the result is still too big, take a look at XZ Embedded. It is 221 a separate project, which provides a limited but significantly 222 smaller XZ decoder implementation than XZ Utils. You can find it 223 at <http://tukaani.org/xz/embedded.html>. 224 225