1 2History of LZMA Utils and XZ Utils 3================================== 4 5Tukaani distribution 6 7 In 2005, there was a small group working on the Tukaani distribution, 8 which was a Slackware fork. One of the project's goals was to fit the 9 distro on a single 700 MiB ISO-9660 image. Using LZMA instead of gzip 10 helped a lot. Roughly speaking, one could fit data that took 1000 MiB 11 in gzipped form into 700 MiB with LZMA. Naturally, the compression 12 ratio varied across packages, but this was what we got on average. 13 14 Slackware packages have traditionally had .tgz as the filename suffix, 15 which is an abbreviation of .tar.gz. A logical naming for LZMA 16 compressed packages was .tlz, being an abbreviation of .tar.lzma. 17 18 At the end of the year 2007, there was no distribution under the 19 Tukaani project anymore, but development of LZMA Utils was kept going. 20 Still, there were .tlz packages around, because at least Vector Linux 21 (a Slackware based distribution) used LZMA for its packages. 22 23 First versions of the modified pkgtools used the LZMA_Alone tool from 24 Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need 25 to interact with LZMA_Alone directly. But people soon wanted to use 26 LZMA for other files too, and the interface of LZMA_Alone wasn't 27 comfortable for those used to gzip and bzip2. 28 29 30First steps of LZMA Utils 31 32 The first version of LZMA Utils (4.22.0) included a shell script called 33 lzmash. It was a wrapper that had a gzip-like command-line interface. It 34 used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, 35 zdiff, and related scripts from gzip were adapted to work with LZMA and 36 were part of the first LZMA Utils release too. 37 38 LZMA Utils 4.22.0 included also lzmadec, which was a small (less than 39 10 KiB) decoder-only command-line tool. It was written on top of the 40 decoder-only C code found from the LZMA SDK. lzmadec was convenient in 41 situations where LZMA_Alone (a few hundred KiB) would be too big. 42 43 lzmash and lzmadec were written by Lasse Collin. 44 45 46Second generation 47 48 The lzmash script was an ugly and not very secure hack. The last 49 version of LZMA Utils to use lzmash was 4.27.1. 50 51 LZMA Utils 4.32.0beta1 introduced a new lzma command-line tool written 52 by Ville Koskinen. It was written in C++, and used the encoder and 53 decoder from C++ LZMA SDK with some little modifications. This tool 54 replaced both the lzmash script and the LZMA_Alone command-line tool 55 in LZMA Utils. 56 57 Introducing this new tool caused some temporary incompatibilities, 58 because the LZMA_Alone executable was simply named lzma like the new 59 command-line tool, but they had a completely different command-line 60 interface. The file format was still the same. 61 62 Lasse wrote liblzmadec, which was a small decoder-only library based 63 on the C code found from LZMA SDK. liblzmadec had an API similar to 64 zlib, although there were some significant differences, which made it 65 non-trivial to use it in some applications designed for zlib and 66 libbzip2. 67 68 The lzmadec command-line tool was converted to use liblzmadec. 69 70 Alexandre Sauvé helped converting the build system to use GNU 71 Autotools. This made it easier to test for certain less portable 72 features needed by the new command-line tool. 73 74 Since the new command-line tool never got completely finished (for 75 example, it didn't support the LZMA_OPT environment variable), the 76 intent was to not call 4.32.x stable. Similarly, liblzmadec wasn't 77 polished, but appeared to work well enough, so some people started 78 using it too. 79 80 Because the development of the third generation of LZMA Utils was 81 delayed considerably (3-4 years), the 4.32.x branch had to be kept 82 maintained. It got some bug fixes now and then, and finally it was 83 decided to call it stable, although most of the missing features were 84 never added. 85 86 87File format problems 88 89 The file format used by LZMA_Alone was primitive. It was designed with 90 embedded systems in mind, and thus provided only a minimal set of 91 features. The two biggest problems for non-embedded use were the lack 92 of magic bytes and an integrity check. 93 94 Igor and Lasse started developing a new file format with some help 95 from Ville Koskinen. Also Mark Adler, Mikko Pouru, H. Peter Anvin, 96 and Lars Wirzenius helped with some minor things at some point of the 97 development. Designing the new format took quite a long time (actually, 98 too long a time would be a more appropriate expression). It was mostly 99 because Lasse was quite slow at getting things done due to personal 100 reasons. 101 102 Originally the new format was supposed to use the same .lzma suffix 103 that was already used by the old file format. Switching to the new 104 format wouldn't have caused much trouble when the old format wasn't 105 used by many people. But since the development of the new format took 106 such a long time, the old format got quite popular, and it was decided 107 that the new file format must use a different suffix. 108 109 It was decided to use .xz as the suffix of the new file format. The 110 first stable .xz file format specification was finally released in 111 December 2008. In addition to fixing the most obvious problems of 112 the old .lzma format, the .xz format added some new features like 113 support for multiple filters (compression algorithms), filter chaining 114 (like piping on the command line), and limited random-access reading. 115 116 Currently the primary compression algorithm used in .xz is LZMA2. 117 It is an extension on top of the original LZMA to fix some practical 118 problems: LZMA2 adds support for flushing the encoder, uncompressed 119 chunks, eases stateful decoder implementations, and improves support 120 for multithreading. Since LZMA2 is better than the original LZMA, the 121 original LZMA is not supported in .xz. 122 123 124Transition to XZ Utils 125 126 The early versions of XZ Utils were called LZMA Utils. The first 127 releases were 4.42.0alphas. They dropped the rest of the C++ LZMA SDK. 128 The code was still directly based on LZMA SDK but ported to C and 129 converted from a callback API to a stateful API. Later, Igor Pavlov 130 made a C version of the LZMA encoder too; these ports from C++ to C 131 were independent in LZMA SDK and LZMA Utils. 132 133 The core of the new LZMA Utils was liblzma, a compression library with 134 a zlib-like API. liblzma supported both the old and new file format. 135 The gzip-like lzma command-line tool was rewritten to use liblzma. 136 137 The new LZMA Utils code base was renamed to XZ Utils when the name 138 of the new file format had been decided. The liblzma compression 139 library retained its name though, because changing it would have 140 caused unnecessary breakage in applications already using the early 141 liblzma snapshots. 142 143 The xz command-line tool can emulate the gzip-like lzma tool by 144 creating appropriate symlinks (e.g. lzma -> xz). Thus, practically 145 all scripts using the lzma tool from LZMA Utils will work as is with 146 XZ Utils (and will keep using the old .lzma format). Still, the .lzma 147 format is more or less deprecated. XZ Utils will keep supporting it, 148 but new applications should use the .xz format, and migrating old 149 applications to .xz is often a good idea too. 150 151