• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

DoxyfileH A D09-Mar-20156.1 KiB218217

Makefile.amH A D09-Mar-2015762 3927

Makefile.inH A D03-May-202215.3 KiB540466

README.AssemblerH A D09-Mar-20151 KiB4530

README.benchmarksH A D09-Mar-20156.7 KiB155120

lrzip.conf.exampleH A D09-Mar-20151.5 KiB6344

README.Assembler

1README.Assembler
2
3Notes about CRC Assembly Language Coding.
4
5lrzip-0.21 makes use of an x86 assembly language file
6that optimizes CRC computation used in lrzip. It includes
7a wrapper C file, 7zCrcT8.c and the assembler code,
87zCrcT8U.s.
9
10configure should detect your host system properly
11and adjust the Makefile accordingly. If you don't
12have the nasm assembler or have a ppc or other non-
13x86 system, the standard C CRC routines will be
14compiled and linked in.
15
16If for any reason configure does not properly
17detect your system type, or you do not want assembler
18modules to be compiled, you can run
19
20ASM=no ./configure
21
22which will automatically not include the asm module or
23change the line
24
25ASM_OBJ=7zCrcT8.o 7zCrcT8U.o
26to
27ASM_OBJ=7zCrc.o
28
29in Makefile. This will change the dependency tree.
30
31To force assembly module compilation and linking (if
32configure does not detect your system type properly),
33type
34
35ASM=yes ./configure
36
37or change the Makefile to include the ASM_OBJ files
38as described above.
39
40
41Type `make clean' and then re-run make.
42
43Peter Hyman
44pete@peterhyman.com
45

README.benchmarks

1The first comparison is that of a linux kernel tarball (2.6.37). In all cases
2the default options were used. 4 other common compression apps were used for
3comparison, 7z which is an excellent all-round lzma based compression app,
4gzip which is the benchmark fast standard that has good compression, and bzip2
5which is the most common linux used compression. xz was included for
6completeness.
7
8In the following tables, lrzip means lrzip default options, lrzip -l means
9lrzip using the lzo backend, lrzip -g means using the gzip backend,
10lrzip -b means using the bzip2 backend and lrzip -z means using the zpaq
11backend.
12
13
14linux-2.6.37.tar
15
16These are benchmarks performed on a 3GHz quad core Intel Core2 with 8GB ram
17using lrzip v0.612 on an SSD drive.
18
19Compression	Size		Percentage	Compress	Decompress
20None		430612480	100
217z		63636839	14.8		2m28s		0m6.6s
22xz		63291156	14.7		4m02s		0m8.7
23lrzip		64561485	14.9		1m12s		0m4.3s
24lrzip -z	51588423	12.0		2m02s		2m08s
25lrzip -l	137515997	31.9		0m14s		0m2.7s
26lrzip -g	86142459	20.0		0m17s		0m3.0s
27lrzip -b	72103197	16.7		0m21s		0m6.5s
28bzip2		74060625	17.2		0m48s		0m12.8s
29gzip		94512561	21.9		0m17s		0m4.0s
30
31
32These results are interesting to note the compression of lrzip by default is
33about the same as 7z, but it's significantly faster thanks to its heavily
34multithreaded nature. Zpaq offers by far the best compression but at the cost
35of extra time. However with the heavily threaded nature of lrzip, it's not a lot
36longer given how much better its compression is. It's actually faster than xz
37on compression on a quad core machine.
38
39
40Let's take six kernel trees one version apart as a tarball, linux-2.6.31 to
41linux-2.6.36. These will show lots of redundant information, but hundreds
42of megabytes apart, which lrzip will be very good at compressing. For
43simplicity, only 7z will be compared since that's by far the best general
44purpose compressor at the moment:
45
46These are benchmarks performed on a 2.53Ghz dual core Intel Core2 with 4GB ram
47using lrzip v0.5.1. Note that it was running with a 32 bit userspace so only
482GB addressing was posible. However the benchmark was run with the -U option
49allowing the whole file to be treated as one large compression window.
50
51Tarball of 6 consecutive kernel trees.
52
53Compression	Size		Percentage	Compress	Decompress
54None		2373713920	100
557z		344088002	14.5		17m26s		1m22s
56lrzip		104874109	4.4		11m37s		56s
57lrzip -l	223130711	9.4		05m21s		1m01s
58lrzip -U	73356070	3.1		08m53s		43s
59lrzip -Ul	158851141	6.7		04m31s		35s
60lrzip -Uz	62614573	2.6		24m42s		25m30s
61
62Things start getting very interesting now when lrzip is really starting to
63shine. Note how it's not that much larger for 6 kernel trees than it was for
64one. That's because all the similar data in both kernel trees is being
65compressed as one copy and only the differences really make up the extra size.
66All compression software does this, but not over such large distances. If you
67copy the same data over multiple times, the resulting lrzip archive doesn't
68get much larger at all. You might find this example interesting because the
69-U option is actually faster as well as providing better compression. The
70reason is that the window is not much larger than the amount of ram addressable
71(2GB), and it compresses so much more in the rzip stage that it makes up the
72time by not needing to compress anywhere near as much data with the backend
73compressor.
74
75
76Using the first example (linux-2.6.31.tar) and simply copying the data multiple
77times over gives these results with lrzip(lzo):
78
79Copies		Size		Compressed	Compress	Decompress
801		365711360	112151676	0m14.9s		0m5.1s
812		731422720	112151829	0m16.2s		0m6.5s
823		1097134080	112151832	0m17.5s		0m8.1s
83
84
85I had the amusing thought that this compression software could be used as a
86bullshit detector if you were to compress people's speeches because if their
87talks were full of catchphrases and not much actual content, it would all be
88compressed down. So the larger the final archive, the less bullshit =)
89
90Now let's move on to the other special feature of lrzip, the ability to
91compress massive amounts of data on huge ram machines by using massive
92compression windows. This is a 10GB virtual image of an installed operating
93system and some basic working software on it. The default options on the
948GB machine meant that it was using a 5 GB window.
95
96
9710GB Virtual image:
98
99These benchmarks were done on the quad core with version 0.612
100
101Compression	Size		Percentage	Compress Time	Decompress Time
102None		10737418240	100.0
103gzip		2772899756	 25.8		05m47s		2m46s
104bzip2		2704781700	 25.2		16m15s		6m19s
105xz		2272322208	 21.2		50m58s		3m52s
1067z		2242897134	 20.9		26m36s		5m41s
107lrzip		1372218189	 12.8		10m23s		2m53s
108lrzip -U	1095735108	 10.2		08m44s		2m45s
109lrzip -l	1831894161	 17.1		04m53s		2m37s
110lrzip -lU	1414959433	 13.2		04m48s		2m38s
111lrzip -zU	1067169419	  9.9		39m32s		39m46s
112
113
114At this end of the spectrum things really start to heat up. The compression
115advantage is massive, with the lzo backend even giving much better results than
1167z, and over a ridiculously short time. The improvements in version 0.530 in
117scalability with multiple CPUs has a huge impact on compression time here,
118with zpaq almost being faster on quad core than xz is, yet producing a file
119less than half the size.
120
121What appears to be a big disappointment is actually zpaq here which takes more
122than 4 times longer than r/lzma for a measly .3% improvement. The reason is that
123most of the advantage here is achieved by the rzip first stage since there's a
124lot of redundant space over huge distances on a virtual image. The -U option
125which works the memory subsystem rather hard making noticeable impact on the
126rest of the machine also does further wonders for the compression (virtually
127always) and even the times in this particular case.
128
129
130Finally testing the same 10GB image on a i7-3930K at 3.2GHz (12 thread CPU!)
131with 32GB of ram so the whole image fits in ram with a fast SSD:
132
133Compression	Size		Percentage	Compress Time	Decompress Time
134None		10737418240	100.0
135gzip		2772899756	 25.8		3m56s		2m15s
136pbzip2		2705814394	 25.2		1m41s		1m46s
137lrzip		1095337763	 10.2		2m54s		2m21s
138
139
140Note that with enough ram and CPU, lrzip is actually faster than gzip (which
141does compression in place) and comparable on decompression, despite a huge
142increase in compression. pbzip2 is faster than both but its compression is
143almost no better than gzip.
144
145
146This should help govern what compression you choose. Small files are nicely
147compressed with zpaq. Intermediate files are nicely compressed with lzma.
148Large files get excellent results even with lzo provided you have enough ram.
149(Small being < 100MB, intermediate <1GB, large >1GB).
150Or, to make things easier, just use the default settings all the time and be
151happy as lzma gives good results. :D
152
153Con Kolivas
154Saturday, 7th July 2012
155