1This is a patched version of zlib modified to use 2Pentium-optimized assembly code in the deflation algorithm. The files 3changed/added by this patch are: 4 5README.586 6match.S 7 8The effectiveness of these modifications is a bit marginal, as the 9program's bottleneck seems to be mostly L1-cache contention, for which 10there is no real way to work around without rewriting the basic 11algorithm. The speedup on average is around 5-10% (which is generally 12less than the amount of variance between subsequent executions). 13However, when used at level 9 compression, the cache contention can 14drop enough for the assembly version to achieve 10-20% speedup (and 15sometimes more, depending on the amount of overall redundancy in the 16files). Even here, though, cache contention can still be the limiting 17factor, depending on the nature of the program using the zlib library. 18This may also mean that better improvements will be seen on a Pentium 19with MMX, which suffers much less from L1-cache contention, but I have 20not yet verified this. 21 22Note that this code has been tailored for the Pentium in particular, 23and will not perform well on the Pentium Pro (due to the use of a 24partial register in the inner loop). 25 26If you are using an assembler other than GNU as, you will have to 27translate match.S to use your assembler's syntax. (Have fun.) 28 29Brian Raiter 30breadbox@muppetlabs.com 31April, 1998 32 33 34Added for zlib 1.1.3: 35 36The patches come from 37http://www.muppetlabs.com/~breadbox/software/assembly.html 38 39To compile zlib with this asm file, copy match.S to the zlib directory 40then do: 41 42CFLAGS="-O3 -DASMV" ./configure 43make OBJA=match.o 44