README
1
2This is the README for bzip2, a block-sorting file compressor, version
31.0.2. This version is fully compatible with the previous public
4releases, versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1.
5
6bzip2-1.0.2 is distributed under a BSD-style license. For details,
7see the file LICENSE.
8
9Complete documentation is available in Postscript form (manual.ps),
10PDF (manual.pdf, amazingly enough) or html (manual_toc.html). A
11plain-text version of the manual page is available as bzip2.txt.
12A statement about Y2K issues is now included in the file Y2K_INFO.
13
14
15HOW TO BUILD -- UNIX
16
17Type `make'. This builds the library libbz2.a and then the
18programs bzip2 and bzip2recover. Six self-tests are run.
19If the self-tests complete ok, carry on to installation:
20
21To install in /usr/bin, /usr/lib, /usr/man and /usr/include, type
22 make install
23To install somewhere else, eg, /xxx/yyy/{bin,lib,man,include}, type
24 make install PREFIX=/xxx/yyy
25If you are (justifiably) paranoid and want to see what 'make install'
26is going to do, you can first do
27 make -n install or
28 make -n install PREFIX=/xxx/yyy respectively.
29The -n instructs make to show the commands it would execute, but
30not actually execute them.
31
32
33HOW TO BUILD -- UNIX, shared library libbz2.so.
34
35Do 'make -f Makefile-libbz2_so'. This Makefile seems to work for
36Linux-ELF (RedHat 7.2 on an x86 box), with gcc. I make no claims
37that it works for any other platform, though I suspect it probably
38will work for most platforms employing both ELF and gcc.
39
40bzip2-shared, a client of the shared library, is also built, but not
41self-tested. So I suggest you also build using the normal Makefile,
42since that conducts a self-test. A second reason to prefer the
43version statically linked to the library is that, on x86 platforms,
44building shared objects makes a valuable register (%ebx) unavailable
45to gcc, resulting in a slowdown of 10%-20%, at least for bzip2.
46
47Important note for people upgrading .so's from 0.9.0/0.9.5 to version
481.0.X. All the functions in the library have been renamed, from (eg)
49bzCompress to BZ2_bzCompress, to avoid namespace pollution.
50Unfortunately this means that the libbz2.so created by
51Makefile-libbz2_so will not work with any program which used an older
52version of the library. Sorry. I do encourage library clients to
53make the effort to upgrade to use version 1.0, since it is both faster
54and more robust than previous versions.
55
56
57HOW TO BUILD -- Windows 95, NT, DOS, Mac, etc.
58
59It's difficult for me to support compilation on all these platforms.
60My approach is to collect binaries for these platforms, and put them
61on the master web page (http://sources.redhat.com/bzip2). Look there.
62However (FWIW), bzip2-1.0.X is very standard ANSI C and should compile
63unmodified with MS Visual C. If you have difficulties building, you
64might want to read README.COMPILATION.PROBLEMS.
65
66At least using MS Visual C++ 6, you can build from the unmodified
67sources by issuing, in a command shell:
68 nmake -f makefile.msc
69(you may need to first run the MSVC-provided script VCVARS32.BAT
70 so as to set up paths to the MSVC tools correctly).
71
72
73VALIDATION
74
75Correct operation, in the sense that a compressed file can always be
76decompressed to reproduce the original, is obviously of paramount
77importance. To validate bzip2, I used a modified version of Mark
78Nelson's churn program. Churn is an automated test driver which
79recursively traverses a directory structure, using bzip2 to compress
80and then decompress each file it encounters, and checking that the
81decompressed data is the same as the original. There are more details
82in Section 4 of the user guide.
83
84
85
86Please read and be aware of the following:
87
88WARNING:
89
90 This program (attempts to) compress data by performing several
91 non-trivial transformations on it. Unless you are 100% familiar
92 with *all* the algorithms contained herein, and with the
93 consequences of modifying them, you should NOT meddle with the
94 compression or decompression machinery. Incorrect changes can and
95 very likely *will* lead to disastrous loss of data.
96
97
98DISCLAIMER:
99
100 I TAKE NO RESPONSIBILITY FOR ANY LOSS OF DATA ARISING FROM THE
101 USE OF THIS PROGRAM, HOWSOEVER CAUSED.
102
103 Every compression of a file implies an assumption that the
104 compressed file can be decompressed to reproduce the original.
105 Great efforts in design, coding and testing have been made to
106 ensure that this program works correctly. However, the complexity
107 of the algorithms, and, in particular, the presence of various
108 special cases in the code which occur with very low but non-zero
109 probability make it impossible to rule out the possibility of bugs
110 remaining in the program. DO NOT COMPRESS ANY DATA WITH THIS
111 PROGRAM UNLESS YOU ARE PREPARED TO ACCEPT THE POSSIBILITY, HOWEVER
112 SMALL, THAT THE DATA WILL NOT BE RECOVERABLE.
113
114 That is not to say this program is inherently unreliable. Indeed,
115 I very much hope the opposite is true. bzip2 has been carefully
116 constructed and extensively tested.
117
118
119PATENTS:
120
121 To the best of my knowledge, bzip2 does not use any patented
122 algorithms. However, I do not have the resources available to
123 carry out a full patent search. Therefore I cannot give any
124 guarantee of the above statement.
125
126End of legalities.
127
128
129WHAT'S NEW IN 0.9.0 (as compared to 0.1pl2) ?
130
131 * Approx 10% faster compression, 30% faster decompression
132 * -t (test mode) is a lot quicker
133 * Can decompress concatenated compressed files
134 * Programming interface, so programs can directly read/write .bz2 files
135 * Less restrictive (BSD-style) licensing
136 * Flag handling more compatible with GNU gzip
137 * Much more documentation, i.e., a proper user manual
138 * Hopefully, improved portability (at least of the library)
139
140WHAT'S NEW IN 0.9.5 ?
141
142 * Compression speed is much less sensitive to the input
143 data than in previous versions. Specifically, the very
144 slow performance caused by repetitive data is fixed.
145 * Many small improvements in file and flag handling.
146 * A Y2K statement.
147
148WHAT'S NEW IN 1.0.0 ?
149
150 See the CHANGES file.
151
152WHAT'S NEW IN 1.0.2 ?
153
154 See the CHANGES file.
155
156
157I hope you find bzip2 useful. Feel free to contact me at
158 jseward@acm.org
159if you have any suggestions or queries. Many people mailed me with
160comments, suggestions and patches after the releases of bzip-0.15,
161bzip-0.21, and bzip2 versions 0.1pl2, 0.9.0, 0.9.5, 1.0.0 and 1.0.1,
162and the changes in bzip2 are largely a result of this feedback.
163I thank you for your comments.
164
165At least for the time being, bzip2's "home" is (or can be reached via)
166http://sources.redhat.com/bzip2.
167
168Julian Seward
169jseward@acm.org
170
171Cambridge, UK (and what a great town this is!)
172
17318 July 1996 (version 0.15)
17425 August 1996 (version 0.21)
175 7 August 1997 (bzip2, version 0.1)
17629 August 1997 (bzip2, version 0.1pl2)
17723 August 1998 (bzip2, version 0.9.0)
178 8 June 1999 (bzip2, version 0.9.5)
179 4 Sept 1999 (bzip2, version 0.9.5d)
180 5 May 2000 (bzip2, version 1.0pre8)
18130 December 2001 (bzip2, version 1.0.2pre1)
README.COMPILATION.PROBLEMS
1
2bzip2-1.0 should compile without problems on the vast majority of
3platforms. Using the supplied Makefile, I've built and tested it
4myself for x86-linux, sparc-solaris, alpha-linux, x86-cygwin32 and
5alpha-tru64unix. With makefile.msc, Visual C++ 6.0 and nmake, you can
6build a native Win32 version too. Large file support seems to work
7correctly on at least alpha-tru64unix and x86-cygwin32 (on Windows
82000).
9
10When I say "large file" I mean a file of size 2,147,483,648 (2^31)
11bytes or above. Many older OSs can't handle files above this size,
12but many newer ones can. Large files are pretty huge -- most files
13you'll encounter are not Large Files.
14
15Earlier versions of bzip2 (0.1, 0.9.0, 0.9.5) compiled on a wide
16variety of platforms without difficulty, and I hope this version will
17continue in that tradition. However, in order to support large files,
18I've had to include the define -D_FILE_OFFSET_BITS=64 in the Makefile.
19This can cause problems.
20
21The technique of adding -D_FILE_OFFSET_BITS=64 to get large file
22support is, as far as I know, the Recommended Way to get correct large
23file support. For more details, see the Large File Support
24Specification, published by the Large File Summit, at
25 http://www.sas.com/standard/large.file/
26
27As a general comment, if you get compilation errors which you think
28are related to large file support, try removing the above define from
29the Makefile, ie, delete the line
30 BIGFILES=-D_FILE_OFFSET_BITS=64
31from the Makefile, and do 'make clean ; make'. This will give you a
32version of bzip2 without large file support, which, for most
33applications, is probably not a problem.
34
35Alternatively, try some of the platform-specific hints listed below.
36
37You can use the spewG.c program to generate huge files to test bzip2's
38large file support, if you are feeling paranoid. Be aware though that
39any compilation problems which affect bzip2 will also affect spewG.c,
40alas.
41
42
43Known problems as of 1.0pre8:
44~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45
46* HP/UX 10.20 and 11.00, using gcc (2.7.2.3 and 2.95.2): A large
47 number of warnings appear, including the following:
48
49 /usr/include/sys/resource.h: In function `getrlimit':
50 /usr/include/sys/resource.h:168:
51 warning: implicit declaration of function `__getrlimit64'
52 /usr/include/sys/resource.h: In function `setrlimit':
53 /usr/include/sys/resource.h:170:
54 warning: implicit declaration of function `__setrlimit64'
55
56 This would appear to be a problem with large file support, header
57 files and gcc. gcc may or may not give up at this point. If it
58 fails, you might be able to improve matters by adding
59 -D__STDC_EXT__=1
60 to the BIGFILES variable in the Makefile (ie, change its definition
61 to
62 BIGFILES=-D_FILE_OFFSET_BITS=64 -D__STDC_EXT__=1
63
64 Even if gcc does produce a binary which appears to work (ie passes
65 its self-tests), you might want to test it to see if it works properly
66 on large files.
67
68
69* HP/UX 10.20 and 11.00, using HP's cc compiler.
70
71 No specific problems for this combination, except that you'll need to
72 specify the -Ae flag, and zap the gcc-specific stuff
73 -Wall -Winline -O2 -fomit-frame-pointer -fno-strength-reduce.
74 You should retain -D_FILE_OFFSET_BITS=64 in order to get large
75 file support -- which is reported to work ok for this HP/UX + cc
76 combination.
77
78
79* SunOS 4.1.X.
80
81 Amazingly, there are still people out there using this venerable old
82 banger. I shouldn't be too rude -- I started life on SunOS, and
83 it was a pretty darn good OS, way back then. Anyway:
84
85 SunOS doesn't seem to have strerror(), so you'll have to use
86 perror(), perhaps by doing adding this (warning: UNTESTED CODE):
87
88 char* strerror ( int errnum )
89 {
90 if (errnum < 0 || errnum >= sys_nerr)
91 return "Unknown error";
92 else
93 return sys_errlist[errnum];
94 }
95
96 Or you could comment out the relevant calls to strerror; they're
97 not mission-critical. Or you could upgrade to Solaris. Ha ha ha!
98 (what?? you think I've got Bad Attitude?)
99
100
101* Making a shared library on Solaris. (Not really a compilation
102 problem, but many people ask ...)
103
104 Firstly, if you have Solaris 8, either you have libbz2.so already
105 on your system, or you can install it from the Solaris CD.
106
107 Secondly, be aware that there are potential naming conflicts
108 between the .so file supplied with Solaris 8, and the .so file
109 which Makefile-libbz2_so will make. Makefile-libbz2_so creates
110 a .so which has the names which I intend to be "official" as
111 of version 1.0.0 and onwards. Unfortunately, the .so in
112 Solaris 8 appeared before I decided on the final names, so
113 the two libraries are incompatible. We have since communicated
114 and I hope that the problems will have been solved in the next
115 version of Solaris, whenever that might appear.
116
117 All that said: you might be able to get somewhere
118 by finding the line in Makefile-libbz2_so which says
119
120 $(CC) -shared -Wl,-soname -Wl,libbz2.so.1.0 -o libbz2.so.1.0.2 $(OBJS)
121
122 and replacing with
123
124 $(CC) -G -shared -o libbz2.so.1.0.2 -h libbz2.so.1.0 $(OBJS)
125
126 If gcc objects to the combination -fpic -fPIC, get rid of
127 the second one, leaving just "-fpic".
128
129
130That's the end of the currently known compilation problems.
131