• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

ALGORITHMSH A D26-Aug-19961.4 KiB3828

LICENSEH A D06-Jul-199617.6 KiB340281

MakefileH A D03-May-2022543 1815

READMEH A D07-Sep-199612.7 KiB312228

README.DOSH A D31-Aug-1996749 2115

bzip.1H A D31-Aug-199610.6 KiB303268

bzip.1.preformattedH A D31-Aug-199613.5 KiB331223

bzip.cH A D03-May-202288.6 KiB3,2432,106

test.batH A D31-Aug-1996239 99

test.cmdH A D31-Aug-1996238 99

words0H A D20-Jul-1996245 85

words1H A D10-Jul-1996104 62

words2H A D09-Jul-1996191 73

words3H A D20-Jul-1996693 2414

words3shH A D31-Aug-1996440 129

README

1
2GREETINGS!
3
4   This is the README for BZIP, my block-sorting file compressor,
5   version 0.21.
6
7   BZIP is distributed under the GNU General Public License version 2;
8   for details, see the file LICENSE.  Pointers to the algorithms used
9   are in ALGORITHMS.  Instructions for use are in bzip.1.preformatted.
10
11   Please read this file carefully.
12
13
14
15HOW TO BUILD
16
17   -- for UNIX:
18
19        Type `make'.     (tough, huh? :-)
20
21        This creates binaries "bzip", and "bunzip",
22        which is a symbolic link to "bzip".
23
24        It also runs four compress-decompress tests to make sure
25        things are working properly.  If all goes well, you should be up &
26        running.  Please be sure to read the output from `make'
27        just to be sure that the tests went ok.
28
29        To install bzip properly:
30
31           -- Copy the binary "bzip" to a publically visible place,
32              possibly /usr/bin, /usr/common/bin or /usr/local/bin.
33
34           -- In that directory, make "bunzip" be a symbolic link
35              to "bzip".
36
37           -- Copy the manual page, bzip.1, to the relevant place.
38              Probably the right place is /usr/man/man1/.
39
40   -- for Windows 95 and NT:
41
42        For a start, do you *really* want to recompile bzip?
43        The standard distribution includes a pre-compiled version
44        for Windows 95 and NT, `BZIP.EXE'.
45
46        Assuming you do, compilation is less straightforward than for
47        Unix platforms.  You can compile either with Microsoft Visual C++ 2.0
48        or later, or with Borland C++ 5.0 or later.
49
50        NOTE [THIS IS IMPORTANT] that it would *appear* that
51        MS VC++ 2.0's optimising compiler has a bug which, at maximum
52        optimisation, gives an executable which produces garbage
53        compressed files.  Proceed with caution.  I do not know whether
54        or not this happens with later versions of VC++.
55
56        Edit the defines starting at line 86 of bzip.c to select your
57        platform/compiler combination, and then compile.  Then check that
58        the resulting executable (assumed to be called BZIP.EXE) works
59        correctly, using the SELFTEST.BAT file.  Bearing in mind the
60        previous paragraph, the self-test is important.
61
62   A manual page is supplied, unformatted (bzip.1),
63   preformatted (bzip.1.preformatted), and preformatted
64   and sanitised for MS-DOS (bzip1.txt).
65
66
67
68COMPILATION NOTES
69
70   bzip should work on any 32-bit machine.  It is known to work
71   [meaning: it has compiled and passed self-tests] on the
72   following platform-os combinations:
73
74      Intel i386/i486        running Linux 1.2.13 and Linux 2.0.0
75      Sun Sparcs (various)   running SunOS 4.1.3 and Solaris 2.5
76      SGI Indy R3000         running Irix 5.3
77      HP 9000/700            running HPUX 9.03
78      HP 9000/300            running NetBSD 1.1
79      Acorn R260             running RISC iX (a BSD 4.? derivative)
80
81      Intel i386/i486        running Windows 95
82
83   I have also heard, but have not myself verified, that bzip works
84   on the following machines:
85
86      Intel i486             running Windows NT 3.51
87      IBM 3090 clone         running OSF/1
88      Dec Alpha              running ?????
89
90   The #defines starting at around line 86 of bzip.c supply some
91   degree of platform-independance.  If you configure bzip for some
92   new far-out platform, please send me the relevant definitions.
93
94   I recommend GNU C for compilation.  The code is standard ANSI C,
95   except for the Unix-specific file handling, so any ANSI C compiler
96   should work.  Note however that the many routines marked INLINE
97   should be inlined by your compiler, else performance will be very
98   poor.  Asking your compiler to unroll loops might give some
99   small improvement too; for gcc, the relevant flag is
100   -funroll-loops.
101
102   On a 386/486 machines, I'd recommend giving gcc the
103   -fomit-frame-pointer flag; this liberates another register for
104   allocation, which measurably improves performance.
105
106   On SPARCs (and, I guess, on many low-range RISC machines) there is no
107   hardware implementation of integer multiply and divide.  This can
108   mean poor decompression performance.  It also means it is important
109   to generate code for the version of the SPARC instruction set you
110   intend to use.  gcc -mcypress (for older sparcs) and gcc
111   -msupersparc (for newer ones) give binaries which run at strikingly
112   different speeds on different flavours of SPARCs.  If you are
113   interested in performance figures, try both.
114
115   If you compile bzip on a new platform or with a new compiler,
116   please be sure to run the four compress-decompress tests, either
117   using the Makefile, or with the test.bat (MSDOS) or test.cmd (OS/2)
118   files.  Some compilers have been seen to introduce subtle bugs
119   when optimising, so this check is important.  Ideally you should
120   then go on to test bzip on a file several megabytes or even
121   tens of megabytes long, just to be 110% sure.  ``Professional
122   programmers are paranoid programmers.'' (anon).
123
124
125
126MAKING IT GO FASTER
127
128   After 0.15 was released, various people asked whether it would
129   be possible to make it compress faster.  The answer falls in
130   three parts:
131
132   1.  Yes, and 0.21 compresses substantially faster than 0.15.
133
134   2.  You can probably compress somewhat faster, even, than 0.21,
135       by tinkering with the sorting algorithms.  However, it is
136       easy to fall into the trap of speeding up the average
137       case a little whilst at the same time imposing a giant
138       (25 times) slowdown on the worst-but-not-uncommon case,
139       files which are highly repetitive.  Beware!
140
141   3.  Are you solving the right problem?  In many situations,
142       it is the *de*compression speed which is the limiting factor
143       on overall usefulness of bzip.  If you want to do some
144       serious hacking on bzip, your hacking could be useful if
145       you could speed up decompression.
146
147       I appreciate that the arithmetic-coding back end imposes a
148       fairly serious restriction on decompression speed.  A possible
149       future option would be to make a variant of bzip which
150       used Huffman-coding (or some such) instead; this would reduce
151       the compression ratio but greatly accelerate decompression.
152       Experimental results welcomed!
153
154
155
156VALIDATION
157
158   Correct operation, in the sense that a compressed file can always be
159   decompressed to reproduce the original, is obviously of paramount
160   importance.  To validate bzip, I used a modified version of
161   Mark Nelson's churn program.  Churn is an automated test driver
162   which recursively traverses a directory structure, using bzip to
163   compress and then decompress each file it encounters, and checking
164   that the decompressed data is the same as the original.  As test
165   material, I used the entirety of my Linux filesystem, constituting
166   390 megabytes in 20,440 files.  The largest file was about seventeen
167   megabytes long.  Included in this filesystem was a directory containing
168   39 specially constructed test files, designed to break the sorting
169   phase of compression, the most elaborate part of the machinery.
170   This included files of zero length, various long, highly repetitive
171   files, and some files which generate blocks with all values the same.
172
173   Validation of version 0.15
174   ~~~~~~~~~~~~~~~~~~~~~~~~~~
175   There were actually six test runs on this filesystem, taking about
176   50 CPU hours on an Intel 486DX4-100 machine:
177
178      One with the block size set to 900k (ie, with the -9 flag, the default).
179
180      One with the block size set to 500k (ie, with -5).
181
182      One with the block size set to 100k (ie, with -1).
183
184      One where the parameters for the arithmetic coder were
185      set to smallB == 14 and smallF == 11, rather than the
186      usual values of 26 and 18.  This was intended to expose
187      possible boundary-case problems with the arithmetic coder;
188      in particular, setting smallB == 14 keeps the coding values
189      all below or equal to 8192.  Doing this, I hoped that the
190      values actually would hit their endpoints from time to time,
191      so I'd see problems if any lurked.  With smallB = 26, the
192      range of values goes up to 2^26 (64 million), which makes
193      potential bugs associated with endpoint effects vastly less
194      likely to be detected.
195
196      One where the block size was set to a trivial value, 173,
197      so as to invoke the blocking/unblocking machinery tens of
198      thousands of times over the run, and expose any potential
199      problem there.
200
201      One with normal settings, the block size set 900k, but
202      compiled with the symbol DEBUG set to 1, which turns on
203      many assertion-checks in the compressor.
204
205   None of these test runs exposed any problems.
206
207   In addition, earlier versions of bzip have been in informal use
208   for a while without difficulties.  The largest file I have tried
209   so far is a log file from a chip-simulator, 52 megabytes long,
210   and that decompressed correctly.
211
212   The distribution does four tests after building bzip.  These tests
213   include test decompressions of pre-supplied compressed files, so
214   they not only test that bzip works correctly on the machine it was
215   built on, but can also decompress files compressed on a different
216   machine.  This guards against unforseen interoperability problems.
217
218   Validation of version 0.21
219   ~~~~~~~~~~~~~~~~~~~~~~~~~~
220   0.21 differs radically from 0.15 in the sorting phase which
221   constitutes the bulk of the work during compression, and in
222   several other non-cosmetic ways, so there was considerable
223   emphasis on trying to break it before release.  100% compatibility
224   with 0.15 was also an issue.  On the other hand, the arithmetic
225   coder is unchanged, so I didn't put special effort into trying
226   to break that.  Testing was done on two filesystems, a Linux
227   filesystem with about 21000 files in 400 megabytes, and a
228   Windows 95 filesystem with 14900 files in about 610 megabytes.
229   The test runs were:
230
231      Linux FS, blocksize = 900k, 0.15 compressing, 0.21 decompressing
232      Linux FS, blocksize = 900k, 0.21 compressing, 0.15 decompressing
233
234      Linux FS, blocksize = 900k, -DDEBUG=1
235      Linux FS, blocksize = 500k, -DDEBUG=1
236      Linux FS, blocksize = 100k, -DDEBUG=1
237
238      Linux FS, blocksize = 900k
239
240      Win95 FS, blocksize = 900k
241
242      A single text file 186 megabytes long.
243
244      My Win95 disk read by Linux as a single entity -- 425 Megabytes.
245
246      Misc other anecdotal tests, incl some on a Sparc box (as a check
247      for endian issues), covering another 140 megabytes of new data.
248
249      Misc tests with Purify 3.0.1 snooping on the proceedings,
250      to check for subscript range errors, &c.
251
252   Overall, the quantity of original files in this validation
253   run is about 1760 megabytes.  Not Bad.
254
255
256Please read and be aware of the following:
257
258COMMERCIAL USE:
259
260   This program may or may not infringe certain US patents
261   pertaining to arithmetic coding and to the block-sorting
262   transformation itself.  Opinions differ as to the precise
263   legal status of some of the algorithms used.  Nevertheless,
264   you should be aware that commercial use of this program
265   could render you liable to unfriendly legal action.
266
267
268WARNING:
269
270   This program (attempts to) compress data by performing several
271   non-trivial transformations on it.  Unless you are 100% familiar
272   with *all* the algorithms contained herein, and with the
273   consequences of modifying them, you should NOT meddle with the
274   compression or decompression machinery.  Incorrect changes can and
275   very likely *will* lead to disastrous loss of data.
276
277
278DISCLAIMER:
279
280   I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
281   USE OF THIS PROGRAM, HOWSOEVER CAUSED.
282
283   Every compression of a file implies an assumption that the
284   compressed file can be decompressed to reproduce the original.
285   Great efforts in design, coding and testing have been made to
286   ensure that this program works correctly.  However, the complexity
287   of the algorithms, and, in particular, the presence of various
288   special cases in the code which occur with very low but non-zero
289   probability make it impossible to rule out the possibility of bugs
290   remaining in the program.  DO NOT COMPRESS ANY DATA WITH THIS
291   PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER
292   SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.
293
294   That is not to say this program is inherently unreliable.  Indeed,
295   I very much hope the opposite is true.  BZIP has been carefully
296   constructed and extensively tested.
297
298End of nasty legalities.
299
300
301I hope you find bzip useful.  Feel free to contact me at
302   sewardj@cs.man.ac.uk
303if you have any suggestions or queries.  Many people mailed me with
304comments, suggestions and patches after the release of 0.15, and the
305changes in 0.21 are largely a result of this feedback.
306
307Julian Seward
308Manchester, UK
30918 July 1996 (version 0.15)
31025 August 1996 (version 0.21)
311
312

README.DOS

1
2Windows 95 & Windows NT users:
3
41.  There's a pre-built executable, bzip.exe, which
5    should work.  You don't need to compile anything.
6    You can run the `test.bat' batch file to check
7    the executable is working ok, if you want.
8
92.  The control-C signal catcher seems pretty dodgy
10    under Windows, at least for the executable supplied.
11    When it catches a control-C, bzip tries to delete
12    its output file, so you don't get left with a half-
13    baked file.  But this sometimes seems to fail
14    under Windows.  Caveat Emptor!  I think I am doing
15    something not-quite-right in the signal catching.
16    Windows-&-C gurus got any suggestions?
17
18    Control-C handling all seems to work fine under Unix.
19
2028 Aug 96
21