1898184e3Ssthen 2898184e3Ssthen=head1 NAME 3898184e3Ssthen 4898184e3SsthenIO::Compress::FAQ -- Frequently Asked Questions about IO::Compress 5898184e3Ssthen 6898184e3Ssthen=head1 DESCRIPTION 7898184e3Ssthen 8898184e3SsthenCommon questions answered. 9898184e3Ssthen 10898184e3Ssthen=head1 GENERAL 11898184e3Ssthen 12898184e3Ssthen=head2 Compatibility with Unix compress/uncompress. 13898184e3Ssthen 14898184e3SsthenAlthough C<Compress::Zlib> has a pair of functions called C<compress> and 15898184e3SsthenC<uncompress>, they are I<not> related to the Unix programs of the same 16898184e3Ssthenname. The C<Compress::Zlib> module is not compatible with Unix 17898184e3SsthenC<compress>. 18898184e3Ssthen 19898184e3SsthenIf you have the C<uncompress> program available, you can use this to read 20898184e3Ssthencompressed files 21898184e3Ssthen 22898184e3Ssthen open F, "uncompress -c $filename |"; 23898184e3Ssthen while (<F>) 24898184e3Ssthen { 25898184e3Ssthen ... 26898184e3Ssthen 27898184e3SsthenAlternatively, if you have the C<gunzip> program available, you can use 28898184e3Ssthenthis to read compressed files 29898184e3Ssthen 30898184e3Ssthen open F, "gunzip -c $filename |"; 31898184e3Ssthen while (<F>) 32898184e3Ssthen { 33898184e3Ssthen ... 34898184e3Ssthen 35898184e3Ssthenand this to write compress files, if you have the C<compress> program 36898184e3Ssthenavailable 37898184e3Ssthen 38898184e3Ssthen open F, "| compress -c $filename "; 39898184e3Ssthen print F "data"; 40898184e3Ssthen ... 41898184e3Ssthen close F ; 42898184e3Ssthen 43898184e3Ssthen=head2 Accessing .tar.Z files 44898184e3Ssthen 45898184e3SsthenThe C<Archive::Tar> module can optionally use C<Compress::Zlib> (via the 46898184e3SsthenC<IO::Zlib> module) to access tar files that have been compressed with 47898184e3SsthenC<gzip>. Unfortunately tar files compressed with the Unix C<compress> 48898184e3Ssthenutility cannot be read by C<Compress::Zlib> and so cannot be directly 49898184e3Ssthenaccessed by C<Archive::Tar>. 50898184e3Ssthen 51898184e3SsthenIf the C<uncompress> or C<gunzip> programs are available, you can use one 52898184e3Ssthenof these workarounds to read C<.tar.Z> files from C<Archive::Tar> 53898184e3Ssthen 54898184e3SsthenFirstly with C<uncompress> 55898184e3Ssthen 56898184e3Ssthen use strict; 57898184e3Ssthen use warnings; 58898184e3Ssthen use Archive::Tar; 59898184e3Ssthen 60898184e3Ssthen open F, "uncompress -c $filename |"; 61898184e3Ssthen my $tar = Archive::Tar->new(*F); 62898184e3Ssthen ... 63898184e3Ssthen 64898184e3Ssthenand this with C<gunzip> 65898184e3Ssthen 66898184e3Ssthen use strict; 67898184e3Ssthen use warnings; 68898184e3Ssthen use Archive::Tar; 69898184e3Ssthen 70898184e3Ssthen open F, "gunzip -c $filename |"; 71898184e3Ssthen my $tar = Archive::Tar->new(*F); 72898184e3Ssthen ... 73898184e3Ssthen 74898184e3SsthenSimilarly, if the C<compress> program is available, you can use this to 75898184e3Ssthenwrite a C<.tar.Z> file 76898184e3Ssthen 77898184e3Ssthen use strict; 78898184e3Ssthen use warnings; 79898184e3Ssthen use Archive::Tar; 80898184e3Ssthen use IO::File; 81898184e3Ssthen 82eac174f2Safresh1 my $fh = IO::File->new( "| compress -c >$filename" ); 83898184e3Ssthen my $tar = Archive::Tar->new(); 84898184e3Ssthen ... 85898184e3Ssthen $tar->write($fh); 86898184e3Ssthen $fh->close ; 87898184e3Ssthen 88898184e3Ssthen=head2 How do I recompress using a different compression? 89898184e3Ssthen 90898184e3SsthenThis is easier that you might expect if you realise that all the 91898184e3SsthenC<IO::Compress::*> objects are derived from C<IO::File> and that all the 92898184e3SsthenC<IO::Uncompress::*> modules can read from an C<IO::File> filehandle. 93898184e3Ssthen 94898184e3SsthenSo, for example, say you have a file compressed with gzip that you want to 95898184e3Ssthenrecompress with bzip2. Here is all that is needed to carry out the 96898184e3Ssthenrecompression. 97898184e3Ssthen 98898184e3Ssthen use IO::Uncompress::Gunzip ':all'; 99898184e3Ssthen use IO::Compress::Bzip2 ':all'; 100898184e3Ssthen 101898184e3Ssthen my $gzipFile = "somefile.gz"; 102898184e3Ssthen my $bzipFile = "somefile.bz2"; 103898184e3Ssthen 104eac174f2Safresh1 my $gunzip = IO::Uncompress::Gunzip->new( $gzipFile ) 105898184e3Ssthen or die "Cannot gunzip $gzipFile: $GunzipError\n" ; 106898184e3Ssthen 107898184e3Ssthen bzip2 $gunzip => $bzipFile 108898184e3Ssthen or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ; 109898184e3Ssthen 110898184e3SsthenNote, there is a limitation of this technique. Some compression file 111898184e3Ssthenformats store extra information along with the compressed data payload. For 112898184e3Ssthenexample, gzip can optionally store the original filename and Zip stores a 113898184e3Ssthenlot of information about the original file. If the original compressed file 114898184e3Ssthencontains any of this extra information, it will not be transferred to the 1156fb12b70Safresh1new compressed file using the technique above. 116898184e3Ssthen 117898184e3Ssthen=head1 ZIP 118898184e3Ssthen 119898184e3Ssthen=head2 What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip support? 120898184e3Ssthen 121898184e3SsthenThe following compression formats are supported by C<IO::Compress::Zip> and 122898184e3SsthenC<IO::Uncompress::Unzip> 123898184e3Ssthen 124898184e3Ssthen=over 5 125898184e3Ssthen 126898184e3Ssthen=item * Store (method 0) 127898184e3Ssthen 128898184e3SsthenNo compression at all. 129898184e3Ssthen 130898184e3Ssthen=item * Deflate (method 8) 131898184e3Ssthen 132898184e3SsthenThis is the default compression used when creating a zip file with 133898184e3SsthenC<IO::Compress::Zip>. 134898184e3Ssthen 135898184e3Ssthen=item * Bzip2 (method 12) 136898184e3Ssthen 137898184e3SsthenOnly supported if the C<IO-Compress-Bzip2> module is installed. 138898184e3Ssthen 139898184e3Ssthen=item * Lzma (method 14) 140898184e3Ssthen 141898184e3SsthenOnly supported if the C<IO-Compress-Lzma> module is installed. 142898184e3Ssthen 143898184e3Ssthen=back 144898184e3Ssthen 145898184e3Ssthen=head2 Can I Read/Write Zip files larger the 4 Gig? 146898184e3Ssthen 147898184e3SsthenYes, both the C<IO-Compress-Zip> and C<IO-Uncompress-Unzip> modules 148898184e3Ssthensupport the zip feature called I<Zip64>. That allows them to read/write 149898184e3Ssthenfiles/buffers larger than 4Gig. 150898184e3Ssthen 151898184e3SsthenIf you are creating a Zip file using the one-shot interface, and any of the 152898184e3Sstheninput files is greater than 4Gig, a zip64 complaint zip file will be 153898184e3Ssthencreated. 154898184e3Ssthen 155898184e3Ssthen zip "really-large-file" => "my.zip"; 156898184e3Ssthen 157898184e3SsthenSimilarly with the one-shot interface, if the input is a buffer larger than 158898184e3Ssthen4 Gig, a zip64 complaint zip file will be created. 159898184e3Ssthen 160898184e3Ssthen zip \$really_large_buffer => "my.zip"; 161898184e3Ssthen 162898184e3SsthenThe one-shot interface allows you to force the creation of a zip64 zip file 163898184e3Ssthenby including the C<Zip64> option. 164898184e3Ssthen 165898184e3Ssthen zip $filehandle => "my.zip", Zip64 => 1; 166898184e3Ssthen 167898184e3SsthenIf you want to create a zip64 zip file with the OO interface you must 168898184e3Ssthenspecify the C<Zip64> option. 169898184e3Ssthen 170eac174f2Safresh1 my $zip = IO::Compress::Zip->new( "whatever", Zip64 => 1 ); 171898184e3Ssthen 172898184e3SsthenWhen uncompressing with C<IO-Uncompress-Unzip>, it will automatically 173898184e3Ssthendetect if the zip file is zip64. 174898184e3Ssthen 175898184e3SsthenIf you intend to manipulate the Zip64 zip files created with 176898184e3SsthenC<IO-Compress-Zip> using an external zip/unzip, make sure that it supports 177898184e3SsthenZip64. 178898184e3Ssthen 179898184e3SsthenIn particular, if you are using Info-Zip you need to have zip version 3.x 180898184e3Ssthenor better to update a Zip64 archive and unzip version 6.x to read a zip64 181898184e3Ssthenarchive. 182898184e3Ssthen 18391f110e0Safresh1=head2 Can I write more that 64K entries is a Zip files? 18491f110e0Safresh1 18591f110e0Safresh1Yes. Zip64 allows this. See previous question. 18691f110e0Safresh1 187898184e3Ssthen=head2 Zip Resources 188898184e3Ssthen 189898184e3SsthenThe primary reference for zip files is the "appnote" document available at 190898184e3SsthenL<http://www.pkware.com/documents/casestudies/APPNOTE.TXT> 191898184e3Ssthen 192898184e3SsthenAn alternatively is the Info-Zip appnote. This is available from 193898184e3SsthenL<ftp://ftp.info-zip.org/pub/infozip/doc/> 194898184e3Ssthen 195898184e3Ssthen=head1 GZIP 196898184e3Ssthen 197898184e3Ssthen=head2 Gzip Resources 198898184e3Ssthen 199898184e3SsthenThe primary reference for gzip files is RFC 1952 200eac174f2Safresh1L<https://datatracker.ietf.org/doc/html/rfc1952> 201898184e3Ssthen 2029f11ffb7Safresh1The primary site for gzip is L<http://www.gzip.org>. 203898184e3Ssthen 204b8851fccSafresh1=head2 Dealing with concatenated gzip files 20591f110e0Safresh1 20691f110e0Safresh1If the gunzip program encounters a file containing multiple gzip files 20791f110e0Safresh1concatenated together it will automatically uncompress them all. 20891f110e0Safresh1The example below illustrates this behaviour 20991f110e0Safresh1 21091f110e0Safresh1 $ echo abc | gzip -c >x.gz 21191f110e0Safresh1 $ echo def | gzip -c >>x.gz 21291f110e0Safresh1 $ gunzip -c x.gz 21391f110e0Safresh1 abc 21491f110e0Safresh1 def 21591f110e0Safresh1 2166fb12b70Safresh1By default C<IO::Uncompress::Gunzip> will I<not> behave like the gunzip 21791f110e0Safresh1program. It will only uncompress the first gzip data stream in the file, as 21891f110e0Safresh1shown below 21991f110e0Safresh1 22091f110e0Safresh1 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT' 22191f110e0Safresh1 abc 22291f110e0Safresh1 22391f110e0Safresh1To force C<IO::Uncompress::Gunzip> to uncompress all the gzip data streams, 22491f110e0Safresh1include the C<MultiStream> option, as shown below 22591f110e0Safresh1 22691f110e0Safresh1 $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1' 22791f110e0Safresh1 abc 22891f110e0Safresh1 def 22991f110e0Safresh1 230b8851fccSafresh1=head2 Reading bgzip files with IO::Uncompress::Gunzip 231b8851fccSafresh1 232b8851fccSafresh1A C<bgzip> file consists of a series of valid gzip-compliant data streams 233b8851fccSafresh1concatenated together. To read a file created by C<bgzip> with 234b8851fccSafresh1C<IO::Uncompress::Gunzip> use the C<MultiStream> option as shown in the 235b8851fccSafresh1previous section. 236b8851fccSafresh1 237b8851fccSafresh1See the section titled "The BGZF compression format" in 2389f11ffb7Safresh1L<http://samtools.github.io/hts-specs/SAMv1.pdf> for a definition of 239b8851fccSafresh1C<bgzip>. 240b8851fccSafresh1 241898184e3Ssthen=head1 ZLIB 242898184e3Ssthen 243898184e3Ssthen=head2 Zlib Resources 244898184e3Ssthen 245898184e3SsthenThe primary site for the I<zlib> compression library is 2469f11ffb7Safresh1L<http://www.zlib.org>. 247898184e3Ssthen 24891f110e0Safresh1=head1 Bzip2 24991f110e0Safresh1 25091f110e0Safresh1=head2 Bzip2 Resources 25191f110e0Safresh1 2529f11ffb7Safresh1The primary site for bzip2 is L<http://www.bzip.org>. 25391f110e0Safresh1 25491f110e0Safresh1=head2 Dealing with Concatenated bzip2 files 25591f110e0Safresh1 25691f110e0Safresh1If the bunzip2 program encounters a file containing multiple bzip2 files 25791f110e0Safresh1concatenated together it will automatically uncompress them all. 25891f110e0Safresh1The example below illustrates this behaviour 25991f110e0Safresh1 26091f110e0Safresh1 $ echo abc | bzip2 -c >x.bz2 26191f110e0Safresh1 $ echo def | bzip2 -c >>x.bz2 26291f110e0Safresh1 $ bunzip2 -c x.bz2 26391f110e0Safresh1 abc 26491f110e0Safresh1 def 26591f110e0Safresh1 2666fb12b70Safresh1By default C<IO::Uncompress::Bunzip2> will I<not> behave like the bunzip2 26791f110e0Safresh1program. It will only uncompress the first bunzip2 data stream in the file, as 26891f110e0Safresh1shown below 26991f110e0Safresh1 27091f110e0Safresh1 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT' 27191f110e0Safresh1 abc 27291f110e0Safresh1 27391f110e0Safresh1To force C<IO::Uncompress::Bunzip2> to uncompress all the bzip2 data streams, 27491f110e0Safresh1include the C<MultiStream> option, as shown below 27591f110e0Safresh1 27691f110e0Safresh1 $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1' 27791f110e0Safresh1 abc 27891f110e0Safresh1 def 27991f110e0Safresh1 28091f110e0Safresh1=head2 Interoperating with Pbzip2 28191f110e0Safresh1 28291f110e0Safresh1Pbzip2 (L<http://compression.ca/pbzip2/>) is a parallel implementation of 28391f110e0Safresh1bzip2. The output from pbzip2 consists of a series of concatenated bzip2 28491f110e0Safresh1data streams. 28591f110e0Safresh1 28691f110e0Safresh1By default C<IO::Uncompress::Bzip2> will only uncompress the first bzip2 28791f110e0Safresh1data stream in a pbzip2 file. To uncompress the complete pbzip2 file you 28891f110e0Safresh1must include the C<MultiStream> option, like this. 28991f110e0Safresh1 29091f110e0Safresh1 bunzip2 $input => \$output, MultiStream => 1 29191f110e0Safresh1 or die "bunzip2 failed: $Bunzip2Error\n"; 29291f110e0Safresh1 293898184e3Ssthen=head1 HTTP & NETWORK 294898184e3Ssthen 295898184e3Ssthen=head2 Apache::GZip Revisited 296898184e3Ssthen 297898184e3SsthenBelow is a mod_perl Apache compression module, called C<Apache::GZip>, 298898184e3Ssthentaken from 2999f11ffb7Safresh1L<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression> 300898184e3Ssthen 301898184e3Ssthen package Apache::GZip; 302898184e3Ssthen #File: Apache::GZip.pm 303898184e3Ssthen 304898184e3Ssthen use strict vars; 305898184e3Ssthen use Apache::Constants ':common'; 306898184e3Ssthen use Compress::Zlib; 307898184e3Ssthen use IO::File; 308898184e3Ssthen use constant GZIP_MAGIC => 0x1f8b; 309898184e3Ssthen use constant OS_MAGIC => 0x03; 310898184e3Ssthen 311898184e3Ssthen sub handler { 312898184e3Ssthen my $r = shift; 313898184e3Ssthen my ($fh,$gz); 314898184e3Ssthen my $file = $r->filename; 315898184e3Ssthen return DECLINED unless $fh=IO::File->new($file); 316898184e3Ssthen $r->header_out('Content-Encoding'=>'gzip'); 317898184e3Ssthen $r->send_http_header; 318898184e3Ssthen return OK if $r->header_only; 319898184e3Ssthen 320898184e3Ssthen tie *STDOUT,'Apache::GZip',$r; 321898184e3Ssthen print($_) while <$fh>; 322898184e3Ssthen untie *STDOUT; 323898184e3Ssthen return OK; 324898184e3Ssthen } 325898184e3Ssthen 326898184e3Ssthen sub TIEHANDLE { 327898184e3Ssthen my($class,$r) = @_; 328898184e3Ssthen # initialize a deflation stream 329898184e3Ssthen my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef; 330898184e3Ssthen 331898184e3Ssthen # gzip header -- don't ask how I found out 332898184e3Ssthen $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC)); 333898184e3Ssthen 334898184e3Ssthen return bless { r => $r, 335898184e3Ssthen crc => crc32(undef), 336898184e3Ssthen d => $d, 337898184e3Ssthen l => 0 338898184e3Ssthen },$class; 339898184e3Ssthen } 340898184e3Ssthen 341898184e3Ssthen sub PRINT { 342898184e3Ssthen my $self = shift; 343898184e3Ssthen foreach (@_) { 344898184e3Ssthen # deflate the data 345898184e3Ssthen my $data = $self->{d}->deflate($_); 346898184e3Ssthen $self->{r}->print($data); 347898184e3Ssthen # keep track of its length and crc 348898184e3Ssthen $self->{l} += length($_); 349898184e3Ssthen $self->{crc} = crc32($_,$self->{crc}); 350898184e3Ssthen } 351898184e3Ssthen } 352898184e3Ssthen 353898184e3Ssthen sub DESTROY { 354898184e3Ssthen my $self = shift; 355898184e3Ssthen 356898184e3Ssthen # flush the output buffers 357898184e3Ssthen my $data = $self->{d}->flush; 358898184e3Ssthen $self->{r}->print($data); 359898184e3Ssthen 360898184e3Ssthen # print the CRC and the total length (uncompressed) 361898184e3Ssthen $self->{r}->print(pack("LL",@{$self}{qw/crc l/})); 362898184e3Ssthen } 363898184e3Ssthen 364898184e3Ssthen 1; 365898184e3Ssthen 366898184e3SsthenHere's the Apache configuration entry you'll need to make use of it. Once 367898184e3Ssthenset it will result in everything in the /compressed directory will be 368898184e3Ssthencompressed automagically. 369898184e3Ssthen 370898184e3Ssthen <Location /compressed> 371898184e3Ssthen SetHandler perl-script 372898184e3Ssthen PerlHandler Apache::GZip 373898184e3Ssthen </Location> 374898184e3Ssthen 375898184e3SsthenAlthough at first sight there seems to be quite a lot going on in 376898184e3SsthenC<Apache::GZip>, you could sum up what the code was doing as follows -- 377898184e3Ssthenread the contents of the file in C<< $r->filename >>, compress it and write 378898184e3Ssthenthe compressed data to standard output. That's all. 379898184e3Ssthen 380898184e3SsthenThis code has to jump through a few hoops to achieve this because 381898184e3Ssthen 382898184e3Ssthen=over 383898184e3Ssthen 384898184e3Ssthen=item 1. 385898184e3Ssthen 386898184e3SsthenThe gzip support in C<Compress::Zlib> version 1.x can only work with a real 387898184e3Ssthenfilesystem filehandle. The filehandles used by Apache modules are not 388898184e3Ssthenassociated with the filesystem. 389898184e3Ssthen 390898184e3Ssthen=item 2. 391898184e3Ssthen 392898184e3SsthenThat means all the gzip support has to be done by hand - in this case by 393898184e3Ssthencreating a tied filehandle to deal with creating the gzip header and 394898184e3Ssthentrailer. 395898184e3Ssthen 396898184e3Ssthen=back 397898184e3Ssthen 398898184e3SsthenC<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one 399898184e3Ssthenof the reasons for writing it in the first place). So if 400898184e3SsthenC<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied 401898184e3Ssthenfilehandle code can be removed. Here is the rewritten code. 402898184e3Ssthen 403898184e3Ssthen package Apache::GZip; 404898184e3Ssthen 405898184e3Ssthen use strict vars; 406898184e3Ssthen use Apache::Constants ':common'; 407898184e3Ssthen use IO::Compress::Gzip; 408898184e3Ssthen use IO::File; 409898184e3Ssthen 410898184e3Ssthen sub handler { 411898184e3Ssthen my $r = shift; 412898184e3Ssthen my ($fh,$gz); 413898184e3Ssthen my $file = $r->filename; 414898184e3Ssthen return DECLINED unless $fh=IO::File->new($file); 415898184e3Ssthen $r->header_out('Content-Encoding'=>'gzip'); 416898184e3Ssthen $r->send_http_header; 417898184e3Ssthen return OK if $r->header_only; 418898184e3Ssthen 419eac174f2Safresh1 my $gz = IO::Compress::Gzip->new( '-', Minimal => 1 ) 420898184e3Ssthen or return DECLINED ; 421898184e3Ssthen 422898184e3Ssthen print $gz $_ while <$fh>; 423898184e3Ssthen 424898184e3Ssthen return OK; 425898184e3Ssthen } 426898184e3Ssthen 427898184e3Ssthenor even more succinctly, like this, using a one-shot gzip 428898184e3Ssthen 429898184e3Ssthen package Apache::GZip; 430898184e3Ssthen 431898184e3Ssthen use strict vars; 432898184e3Ssthen use Apache::Constants ':common'; 433898184e3Ssthen use IO::Compress::Gzip qw(gzip); 434898184e3Ssthen 435898184e3Ssthen sub handler { 436898184e3Ssthen my $r = shift; 437898184e3Ssthen $r->header_out('Content-Encoding'=>'gzip'); 438898184e3Ssthen $r->send_http_header; 439898184e3Ssthen return OK if $r->header_only; 440898184e3Ssthen 441898184e3Ssthen gzip $r->filename => '-', Minimal => 1 442898184e3Ssthen or return DECLINED ; 443898184e3Ssthen 444898184e3Ssthen return OK; 445898184e3Ssthen } 446898184e3Ssthen 447898184e3Ssthen 1; 448898184e3Ssthen 449898184e3SsthenThe use of one-shot C<gzip> above just reads from C<< $r->filename >> and 450898184e3Ssthenwrites the compressed data to standard output. 451898184e3Ssthen 452898184e3SsthenNote the use of the C<Minimal> option in the code above. When using gzip 453898184e3Ssthenfor Content-Encoding you should I<always> use this option. In the example 454898184e3Ssthenabove it will prevent the filename being included in the gzip header and 455898184e3Ssthenmake the size of the gzip data stream a slight bit smaller. 456898184e3Ssthen 457898184e3Ssthen=head2 Compressed files and Net::FTP 458898184e3Ssthen 459898184e3SsthenThe C<Net::FTP> module provides two low-level methods called C<stor> and 460898184e3SsthenC<retr> that both return filehandles. These filehandles can used with the 461898184e3SsthenC<IO::Compress/Uncompress> modules to compress or uncompress files read 462898184e3Ssthenfrom or written to an FTP Server on the fly, without having to create a 463898184e3Ssthentemporary file. 464898184e3Ssthen 465898184e3SsthenFirstly, here is code that uses C<retr> to uncompressed a file as it is 466898184e3Ssthenread from the FTP Server. 467898184e3Ssthen 468898184e3Ssthen use Net::FTP; 469898184e3Ssthen use IO::Uncompress::Gunzip qw(:all); 470898184e3Ssthen 471eac174f2Safresh1 my $ftp = Net::FTP->new( ... ) 472898184e3Ssthen 473898184e3Ssthen my $retr_fh = $ftp->retr($compressed_filename); 474898184e3Ssthen gunzip $retr_fh => $outFilename, AutoClose => 1 475898184e3Ssthen or die "Cannot uncompress '$compressed_file': $GunzipError\n"; 476898184e3Ssthen 477898184e3Ssthenand this to compress a file as it is written to the FTP Server 478898184e3Ssthen 479898184e3Ssthen use Net::FTP; 480898184e3Ssthen use IO::Compress::Gzip qw(:all); 481898184e3Ssthen 482898184e3Ssthen my $stor_fh = $ftp->stor($filename); 483898184e3Ssthen gzip "filename" => $stor_fh, AutoClose => 1 484898184e3Ssthen or die "Cannot compress '$filename': $GzipError\n"; 485898184e3Ssthen 486898184e3Ssthen=head1 MISC 487898184e3Ssthen 488898184e3Ssthen=head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer. 489898184e3Ssthen 490898184e3SsthenA fairly common use-case is where compressed data is embedded in a larger 491898184e3Ssthenfile/buffer and you want to read both. 492898184e3Ssthen 493898184e3SsthenAs an example consider the structure of a zip file. This is a well-defined 494898184e3Ssthenfile format that mixes both compressed and uncompressed sections of data in 495898184e3Ssthena single file. 496898184e3Ssthen 497898184e3SsthenFor the purposes of this discussion you can think of a zip file as sequence 498898184e3Ssthenof compressed data streams, each of which is prefixed by an uncompressed 499898184e3Ssthenlocal header. The local header contains information about the compressed 500898184e3Ssthendata stream, including the name of the compressed file and, in particular, 501898184e3Ssthenthe length of the compressed data stream. 502898184e3Ssthen 503898184e3SsthenTo illustrate how to use C<InputLength> here is a script that walks a zip 504898184e3Ssthenfile and prints out how many lines are in each compressed file (if you 505898184e3Ssthenintend write code to walking through a zip file for real see 506898184e3SsthenL<IO::Uncompress::Unzip/"Walking through a zip file"> ). Also, although 507898184e3Ssthenthis example uses the zlib-based compression, the technique can be used by 508898184e3Ssthenthe other C<IO::Uncompress::*> modules. 509898184e3Ssthen 510898184e3Ssthen use strict; 511898184e3Ssthen use warnings; 512898184e3Ssthen 513898184e3Ssthen use IO::File; 514898184e3Ssthen use IO::Uncompress::RawInflate qw(:all); 515898184e3Ssthen 516898184e3Ssthen use constant ZIP_LOCAL_HDR_SIG => 0x04034b50; 517898184e3Ssthen use constant ZIP_LOCAL_HDR_LENGTH => 30; 518898184e3Ssthen 519898184e3Ssthen my $file = $ARGV[0] ; 520898184e3Ssthen 521eac174f2Safresh1 my $fh = IO::File->new( "<$file" ) 522898184e3Ssthen or die "Cannot open '$file': $!\n"; 523898184e3Ssthen 524898184e3Ssthen while (1) 525898184e3Ssthen { 526898184e3Ssthen my $sig; 527898184e3Ssthen my $buffer; 528898184e3Ssthen 529898184e3Ssthen my $x ; 530898184e3Ssthen ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH 531898184e3Ssthen or die "Truncated file: $!\n"; 532898184e3Ssthen 533898184e3Ssthen my $signature = unpack ("V", substr($buffer, 0, 4)); 534898184e3Ssthen 535898184e3Ssthen last unless $signature == ZIP_LOCAL_HDR_SIG; 536898184e3Ssthen 537898184e3Ssthen # Read Local Header 538898184e3Ssthen my $gpFlag = unpack ("v", substr($buffer, 6, 2)); 539898184e3Ssthen my $compressedMethod = unpack ("v", substr($buffer, 8, 2)); 540898184e3Ssthen my $compressedLength = unpack ("V", substr($buffer, 18, 4)); 541898184e3Ssthen my $uncompressedLength = unpack ("V", substr($buffer, 22, 4)); 542898184e3Ssthen my $filename_length = unpack ("v", substr($buffer, 26, 2)); 543898184e3Ssthen my $extra_length = unpack ("v", substr($buffer, 28, 2)); 544898184e3Ssthen 545898184e3Ssthen my $filename ; 546898184e3Ssthen $fh->read($filename, $filename_length) == $filename_length 547898184e3Ssthen or die "Truncated file\n"; 548898184e3Ssthen 549898184e3Ssthen $fh->read($buffer, $extra_length) == $extra_length 550898184e3Ssthen or die "Truncated file\n"; 551898184e3Ssthen 552898184e3Ssthen if ($compressedMethod != 8 && $compressedMethod != 0) 553898184e3Ssthen { 554898184e3Ssthen warn "Skipping file '$filename' - not deflated $compressedMethod\n"; 555898184e3Ssthen $fh->read($buffer, $compressedLength) == $compressedLength 556898184e3Ssthen or die "Truncated file\n"; 557898184e3Ssthen next; 558898184e3Ssthen } 559898184e3Ssthen 560898184e3Ssthen if ($compressedMethod == 0 && $gpFlag & 8 == 8) 561898184e3Ssthen { 562898184e3Ssthen die "Streamed Stored not supported for '$filename'\n"; 563898184e3Ssthen } 564898184e3Ssthen 565898184e3Ssthen next if $compressedLength == 0; 566898184e3Ssthen 567898184e3Ssthen # Done reading the Local Header 568898184e3Ssthen 569eac174f2Safresh1 my $inf = IO::Uncompress::RawInflate->new( $fh, 570898184e3Ssthen Transparent => 1, 571eac174f2Safresh1 InputLength => $compressedLength ) 572898184e3Ssthen or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 573898184e3Ssthen 574898184e3Ssthen my $line_count = 0; 575898184e3Ssthen 576898184e3Ssthen while (<$inf>) 577898184e3Ssthen { 578898184e3Ssthen ++ $line_count; 579898184e3Ssthen } 580898184e3Ssthen 581898184e3Ssthen print "$filename: $line_count\n"; 582898184e3Ssthen } 583898184e3Ssthen 584898184e3SsthenThe majority of the code above is concerned with reading the zip local 585898184e3Ssthenheader data. The code that I want to focus on is at the bottom. 586898184e3Ssthen 587898184e3Ssthen while (1) { 588898184e3Ssthen 589898184e3Ssthen # read local zip header data 590898184e3Ssthen # get $filename 591898184e3Ssthen # get $compressedLength 592898184e3Ssthen 593eac174f2Safresh1 my $inf = IO::Uncompress::RawInflate->new( $fh, 594898184e3Ssthen Transparent => 1, 595eac174f2Safresh1 InputLength => $compressedLength ) 596898184e3Ssthen or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 597898184e3Ssthen 598898184e3Ssthen my $line_count = 0; 599898184e3Ssthen 600898184e3Ssthen while (<$inf>) 601898184e3Ssthen { 602898184e3Ssthen ++ $line_count; 603898184e3Ssthen } 604898184e3Ssthen 605898184e3Ssthen print "$filename: $line_count\n"; 606898184e3Ssthen } 607898184e3Ssthen 608898184e3SsthenThe call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf> 609898184e3Ssthenthat can be used to read from the parent filehandle C<$fh>, uncompressing 610898184e3Ssthenit as it goes. The use of the C<InputLength> option will guarantee that 611898184e3SsthenI<at most> C<$compressedLength> bytes of compressed data will be read from 612898184e3Ssthenthe C<$fh> filehandle (The only exception is for an error case like a 613898184e3Ssthentruncated file or a corrupt data stream). 614898184e3Ssthen 615898184e3SsthenThis means that once RawInflate is finished C<$fh> will be left at the 616898184e3Ssthenbyte directly after the compressed data stream. 617898184e3Ssthen 618898184e3SsthenNow consider what the code looks like without C<InputLength> 619898184e3Ssthen 620898184e3Ssthen while (1) { 621898184e3Ssthen 622898184e3Ssthen # read local zip header data 623898184e3Ssthen # get $filename 624898184e3Ssthen # get $compressedLength 625898184e3Ssthen 626898184e3Ssthen # read all the compressed data into $data 627898184e3Ssthen read($fh, $data, $compressedLength); 628898184e3Ssthen 629eac174f2Safresh1 my $inf = IO::Uncompress::RawInflate->new( \$data, 630eac174f2Safresh1 Transparent => 1 ) 631898184e3Ssthen or die "Cannot uncompress $file [$filename]: $RawInflateError\n" ; 632898184e3Ssthen 633898184e3Ssthen my $line_count = 0; 634898184e3Ssthen 635898184e3Ssthen while (<$inf>) 636898184e3Ssthen { 637898184e3Ssthen ++ $line_count; 638898184e3Ssthen } 639898184e3Ssthen 640898184e3Ssthen print "$filename: $line_count\n"; 641898184e3Ssthen } 642898184e3Ssthen 643898184e3SsthenThe difference here is the addition of the temporary variable C<$data>. 644898184e3SsthenThis is used to store a copy of the compressed data while it is being 645898184e3Ssthenuncompressed. 646898184e3Ssthen 647898184e3SsthenIf you know that C<$compressedLength> isn't that big then using temporary 648898184e3Ssthenstorage won't be a problem. But if C<$compressedLength> is very large or 649898184e3Ssthenyou are writing an application that other people will use, and so have no 650898184e3Ssthenidea how big C<$compressedLength> will be, it could be an issue. 651898184e3Ssthen 652898184e3SsthenUsing C<InputLength> avoids the use of temporary storage and means the 653898184e3Ssthenapplication can cope with large compressed data streams. 654898184e3Ssthen 655898184e3SsthenOne final point -- obviously C<InputLength> can only be used whenever you 656898184e3Ssthenknow the length of the compressed data beforehand, like here with a zip 657898184e3Ssthenfile. 658898184e3Ssthen 65956d68f1eSafresh1=head1 SUPPORT 66056d68f1eSafresh1 66156d68f1eSafresh1General feedback/questions/bug reports should be sent to 66256d68f1eSafresh1L<https://github.com/pmqs//issues> (preferred) or 66356d68f1eSafresh1L<https://rt.cpan.org/Public/Dist/Display.html?Name=>. 66456d68f1eSafresh1 665898184e3Ssthen=head1 SEE ALSO 666898184e3Ssthen 667b46d8ef2Safresh1L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzma>, L<IO::Uncompress::UnLzma>, L<IO::Compress::Xz>, L<IO::Uncompress::UnXz>, L<IO::Compress::Lzip>, L<IO::Uncompress::UnLzip>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Compress::Zstd>, L<IO::Uncompress::UnZstd>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress> 668898184e3Ssthen 669898184e3SsthenL<IO::Compress::FAQ|IO::Compress::FAQ> 670898184e3Ssthen 671898184e3SsthenL<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>, 672898184e3SsthenL<Archive::Tar|Archive::Tar>, 673898184e3SsthenL<IO::Zlib|IO::Zlib> 674898184e3Ssthen 675898184e3Ssthen=head1 AUTHOR 676898184e3Ssthen 6779f11ffb7Safresh1This module was written by Paul Marquess, C<pmqs@cpan.org>. 678898184e3Ssthen 679898184e3Ssthen=head1 MODIFICATION HISTORY 680898184e3Ssthen 681898184e3SsthenSee the Changes file. 682898184e3Ssthen 683898184e3Ssthen=head1 COPYRIGHT AND LICENSE 684898184e3Ssthen 685*3d61058aSafresh1Copyright (c) 2005-2024 Paul Marquess. All rights reserved. 686898184e3Ssthen 687898184e3SsthenThis program is free software; you can redistribute it and/or 688898184e3Ssthenmodify it under the same terms as Perl itself. 689898184e3Ssthen 690