1898184e3Ssthen
2898184e3Ssthen=head1 NAME
3898184e3Ssthen
4898184e3SsthenIO::Compress::FAQ -- Frequently Asked Questions about IO::Compress
5898184e3Ssthen
6898184e3Ssthen=head1 DESCRIPTION
7898184e3Ssthen
8898184e3SsthenCommon questions answered.
9898184e3Ssthen
10898184e3Ssthen=head1 GENERAL
11898184e3Ssthen
12898184e3Ssthen=head2 Compatibility with Unix compress/uncompress.
13898184e3Ssthen
14898184e3SsthenAlthough C<Compress::Zlib> has a pair of functions called C<compress> and
15898184e3SsthenC<uncompress>, they are I<not> related to the Unix programs of the same
16898184e3Ssthenname. The C<Compress::Zlib> module is not compatible with Unix
17898184e3SsthenC<compress>.
18898184e3Ssthen
19898184e3SsthenIf you have the C<uncompress> program available, you can use this to read
20898184e3Ssthencompressed files
21898184e3Ssthen
22898184e3Ssthen    open F, "uncompress -c $filename |";
23898184e3Ssthen    while (<F>)
24898184e3Ssthen    {
25898184e3Ssthen        ...
26898184e3Ssthen
27898184e3SsthenAlternatively, if you have the C<gunzip> program available, you can use
28898184e3Ssthenthis to read compressed files
29898184e3Ssthen
30898184e3Ssthen    open F, "gunzip -c $filename |";
31898184e3Ssthen    while (<F>)
32898184e3Ssthen    {
33898184e3Ssthen        ...
34898184e3Ssthen
35898184e3Ssthenand this to write compress files, if you have the C<compress> program
36898184e3Ssthenavailable
37898184e3Ssthen
38898184e3Ssthen    open F, "| compress -c $filename ";
39898184e3Ssthen    print F "data";
40898184e3Ssthen    ...
41898184e3Ssthen    close F ;
42898184e3Ssthen
43898184e3Ssthen=head2 Accessing .tar.Z files
44898184e3Ssthen
45898184e3SsthenThe C<Archive::Tar> module can optionally use C<Compress::Zlib> (via the
46898184e3SsthenC<IO::Zlib> module) to access tar files that have been compressed with
47898184e3SsthenC<gzip>. Unfortunately tar files compressed with the Unix C<compress>
48898184e3Ssthenutility cannot be read by C<Compress::Zlib> and so cannot be directly
49898184e3Ssthenaccessed by C<Archive::Tar>.
50898184e3Ssthen
51898184e3SsthenIf the C<uncompress> or C<gunzip> programs are available, you can use one
52898184e3Ssthenof these workarounds to read C<.tar.Z> files from C<Archive::Tar>
53898184e3Ssthen
54898184e3SsthenFirstly with C<uncompress>
55898184e3Ssthen
56898184e3Ssthen    use strict;
57898184e3Ssthen    use warnings;
58898184e3Ssthen    use Archive::Tar;
59898184e3Ssthen
60898184e3Ssthen    open F, "uncompress -c $filename |";
61898184e3Ssthen    my $tar = Archive::Tar->new(*F);
62898184e3Ssthen    ...
63898184e3Ssthen
64898184e3Ssthenand this with C<gunzip>
65898184e3Ssthen
66898184e3Ssthen    use strict;
67898184e3Ssthen    use warnings;
68898184e3Ssthen    use Archive::Tar;
69898184e3Ssthen
70898184e3Ssthen    open F, "gunzip -c $filename |";
71898184e3Ssthen    my $tar = Archive::Tar->new(*F);
72898184e3Ssthen    ...
73898184e3Ssthen
74898184e3SsthenSimilarly, if the C<compress> program is available, you can use this to
75898184e3Ssthenwrite a C<.tar.Z> file
76898184e3Ssthen
77898184e3Ssthen    use strict;
78898184e3Ssthen    use warnings;
79898184e3Ssthen    use Archive::Tar;
80898184e3Ssthen    use IO::File;
81898184e3Ssthen
82eac174f2Safresh1    my $fh = IO::File->new( "| compress -c >$filename" );
83898184e3Ssthen    my $tar = Archive::Tar->new();
84898184e3Ssthen    ...
85898184e3Ssthen    $tar->write($fh);
86898184e3Ssthen    $fh->close ;
87898184e3Ssthen
88898184e3Ssthen=head2 How do I recompress using a different compression?
89898184e3Ssthen
90898184e3SsthenThis is easier that you might expect if you realise that all the
91898184e3SsthenC<IO::Compress::*> objects are derived from C<IO::File> and that all the
92898184e3SsthenC<IO::Uncompress::*> modules can read from an C<IO::File> filehandle.
93898184e3Ssthen
94898184e3SsthenSo, for example, say you have a file compressed with gzip that you want to
95898184e3Ssthenrecompress with bzip2. Here is all that is needed to carry out the
96898184e3Ssthenrecompression.
97898184e3Ssthen
98898184e3Ssthen    use IO::Uncompress::Gunzip ':all';
99898184e3Ssthen    use IO::Compress::Bzip2 ':all';
100898184e3Ssthen
101898184e3Ssthen    my $gzipFile = "somefile.gz";
102898184e3Ssthen    my $bzipFile = "somefile.bz2";
103898184e3Ssthen
104eac174f2Safresh1    my $gunzip = IO::Uncompress::Gunzip->new( $gzipFile )
105898184e3Ssthen        or die "Cannot gunzip $gzipFile: $GunzipError\n" ;
106898184e3Ssthen
107898184e3Ssthen    bzip2 $gunzip => $bzipFile
108898184e3Ssthen        or die "Cannot bzip2 to $bzipFile: $Bzip2Error\n" ;
109898184e3Ssthen
110898184e3SsthenNote, there is a limitation of this technique. Some compression file
111898184e3Ssthenformats store extra information along with the compressed data payload. For
112898184e3Ssthenexample, gzip can optionally store the original filename and Zip stores a
113898184e3Ssthenlot of information about the original file. If the original compressed file
114898184e3Ssthencontains any of this extra information, it will not be transferred to the
1156fb12b70Safresh1new compressed file using the technique above.
116898184e3Ssthen
117898184e3Ssthen=head1 ZIP
118898184e3Ssthen
119898184e3Ssthen=head2 What Compression Types do IO::Compress::Zip & IO::Uncompress::Unzip support?
120898184e3Ssthen
121898184e3SsthenThe following compression formats are supported by C<IO::Compress::Zip> and
122898184e3SsthenC<IO::Uncompress::Unzip>
123898184e3Ssthen
124898184e3Ssthen=over 5
125898184e3Ssthen
126898184e3Ssthen=item * Store (method 0)
127898184e3Ssthen
128898184e3SsthenNo compression at all.
129898184e3Ssthen
130898184e3Ssthen=item * Deflate (method 8)
131898184e3Ssthen
132898184e3SsthenThis is the default compression used when creating a zip file with
133898184e3SsthenC<IO::Compress::Zip>.
134898184e3Ssthen
135898184e3Ssthen=item * Bzip2 (method 12)
136898184e3Ssthen
137898184e3SsthenOnly supported if the C<IO-Compress-Bzip2> module is installed.
138898184e3Ssthen
139898184e3Ssthen=item * Lzma (method 14)
140898184e3Ssthen
141898184e3SsthenOnly supported if the C<IO-Compress-Lzma> module is installed.
142898184e3Ssthen
143898184e3Ssthen=back
144898184e3Ssthen
145898184e3Ssthen=head2 Can I Read/Write Zip files larger the 4 Gig?
146898184e3Ssthen
147898184e3SsthenYes, both the C<IO-Compress-Zip> and C<IO-Uncompress-Unzip>  modules
148898184e3Ssthensupport the zip feature called I<Zip64>. That allows them to read/write
149898184e3Ssthenfiles/buffers larger than 4Gig.
150898184e3Ssthen
151898184e3SsthenIf you are creating a Zip file using the one-shot interface, and any of the
152898184e3Sstheninput files is greater than 4Gig, a zip64 complaint zip file will be
153898184e3Ssthencreated.
154898184e3Ssthen
155898184e3Ssthen    zip "really-large-file" => "my.zip";
156898184e3Ssthen
157898184e3SsthenSimilarly with the one-shot interface, if the input is a buffer larger than
158898184e3Ssthen4 Gig, a zip64 complaint zip file will be created.
159898184e3Ssthen
160898184e3Ssthen    zip \$really_large_buffer => "my.zip";
161898184e3Ssthen
162898184e3SsthenThe one-shot interface allows you to force the creation of a zip64 zip file
163898184e3Ssthenby including the C<Zip64> option.
164898184e3Ssthen
165898184e3Ssthen    zip $filehandle => "my.zip", Zip64 => 1;
166898184e3Ssthen
167898184e3SsthenIf you want to create a zip64 zip file with the OO interface you must
168898184e3Ssthenspecify the C<Zip64> option.
169898184e3Ssthen
170eac174f2Safresh1    my $zip = IO::Compress::Zip->new( "whatever", Zip64 => 1 );
171898184e3Ssthen
172898184e3SsthenWhen uncompressing with C<IO-Uncompress-Unzip>, it will automatically
173898184e3Ssthendetect if the zip file is zip64.
174898184e3Ssthen
175898184e3SsthenIf you intend to manipulate the Zip64 zip files created with
176898184e3SsthenC<IO-Compress-Zip> using an external zip/unzip, make sure that it supports
177898184e3SsthenZip64.
178898184e3Ssthen
179898184e3SsthenIn particular, if you are using Info-Zip you need to have zip version 3.x
180898184e3Ssthenor better to update a Zip64 archive and unzip version 6.x to read a zip64
181898184e3Ssthenarchive.
182898184e3Ssthen
18391f110e0Safresh1=head2 Can I write more that 64K entries is a Zip files?
18491f110e0Safresh1
18591f110e0Safresh1Yes. Zip64 allows this. See previous question.
18691f110e0Safresh1
187898184e3Ssthen=head2 Zip Resources
188898184e3Ssthen
189898184e3SsthenThe primary reference for zip files is the "appnote" document available at
190898184e3SsthenL<http://www.pkware.com/documents/casestudies/APPNOTE.TXT>
191898184e3Ssthen
192898184e3SsthenAn alternatively is the Info-Zip appnote. This is available from
193898184e3SsthenL<ftp://ftp.info-zip.org/pub/infozip/doc/>
194898184e3Ssthen
195898184e3Ssthen=head1 GZIP
196898184e3Ssthen
197898184e3Ssthen=head2 Gzip Resources
198898184e3Ssthen
199898184e3SsthenThe primary reference for gzip files is RFC 1952
200eac174f2Safresh1L<https://datatracker.ietf.org/doc/html/rfc1952>
201898184e3Ssthen
2029f11ffb7Safresh1The primary site for gzip is L<http://www.gzip.org>.
203898184e3Ssthen
204b8851fccSafresh1=head2 Dealing with concatenated gzip files
20591f110e0Safresh1
20691f110e0Safresh1If the gunzip program encounters a file containing multiple gzip files
20791f110e0Safresh1concatenated together it will automatically uncompress them all.
20891f110e0Safresh1The example below illustrates this behaviour
20991f110e0Safresh1
21091f110e0Safresh1    $ echo abc | gzip -c >x.gz
21191f110e0Safresh1    $ echo def | gzip -c >>x.gz
21291f110e0Safresh1    $ gunzip -c x.gz
21391f110e0Safresh1    abc
21491f110e0Safresh1    def
21591f110e0Safresh1
2166fb12b70Safresh1By default C<IO::Uncompress::Gunzip> will I<not> behave like the gunzip
21791f110e0Safresh1program. It will only uncompress the first gzip data stream in the file, as
21891f110e0Safresh1shown below
21991f110e0Safresh1
22091f110e0Safresh1    $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT'
22191f110e0Safresh1    abc
22291f110e0Safresh1
22391f110e0Safresh1To force C<IO::Uncompress::Gunzip> to uncompress all the gzip data streams,
22491f110e0Safresh1include the C<MultiStream> option, as shown below
22591f110e0Safresh1
22691f110e0Safresh1    $ perl -MIO::Uncompress::Gunzip=:all -e 'gunzip "x.gz" => \*STDOUT, MultiStream => 1'
22791f110e0Safresh1    abc
22891f110e0Safresh1    def
22991f110e0Safresh1
230b8851fccSafresh1=head2 Reading bgzip files with IO::Uncompress::Gunzip
231b8851fccSafresh1
232b8851fccSafresh1A C<bgzip> file consists of a series of valid gzip-compliant data streams
233b8851fccSafresh1concatenated together. To read a file created by C<bgzip> with
234b8851fccSafresh1C<IO::Uncompress::Gunzip> use the C<MultiStream> option as shown in the
235b8851fccSafresh1previous section.
236b8851fccSafresh1
237b8851fccSafresh1See the section titled "The BGZF compression format" in
2389f11ffb7Safresh1L<http://samtools.github.io/hts-specs/SAMv1.pdf> for a definition of
239b8851fccSafresh1C<bgzip>.
240b8851fccSafresh1
241898184e3Ssthen=head1 ZLIB
242898184e3Ssthen
243898184e3Ssthen=head2 Zlib Resources
244898184e3Ssthen
245898184e3SsthenThe primary site for the I<zlib> compression library is
2469f11ffb7Safresh1L<http://www.zlib.org>.
247898184e3Ssthen
24891f110e0Safresh1=head1 Bzip2
24991f110e0Safresh1
25091f110e0Safresh1=head2 Bzip2 Resources
25191f110e0Safresh1
2529f11ffb7Safresh1The primary site for bzip2 is L<http://www.bzip.org>.
25391f110e0Safresh1
25491f110e0Safresh1=head2 Dealing with Concatenated bzip2 files
25591f110e0Safresh1
25691f110e0Safresh1If the bunzip2 program encounters a file containing multiple bzip2 files
25791f110e0Safresh1concatenated together it will automatically uncompress them all.
25891f110e0Safresh1The example below illustrates this behaviour
25991f110e0Safresh1
26091f110e0Safresh1    $ echo abc | bzip2 -c >x.bz2
26191f110e0Safresh1    $ echo def | bzip2 -c >>x.bz2
26291f110e0Safresh1    $ bunzip2 -c x.bz2
26391f110e0Safresh1    abc
26491f110e0Safresh1    def
26591f110e0Safresh1
2666fb12b70Safresh1By default C<IO::Uncompress::Bunzip2> will I<not> behave like the bunzip2
26791f110e0Safresh1program. It will only uncompress the first bunzip2 data stream in the file, as
26891f110e0Safresh1shown below
26991f110e0Safresh1
27091f110e0Safresh1    $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT'
27191f110e0Safresh1    abc
27291f110e0Safresh1
27391f110e0Safresh1To force C<IO::Uncompress::Bunzip2> to uncompress all the bzip2 data streams,
27491f110e0Safresh1include the C<MultiStream> option, as shown below
27591f110e0Safresh1
27691f110e0Safresh1    $ perl -MIO::Uncompress::Bunzip2=:all -e 'bunzip2 "x.bz2" => \*STDOUT, MultiStream => 1'
27791f110e0Safresh1    abc
27891f110e0Safresh1    def
27991f110e0Safresh1
28091f110e0Safresh1=head2 Interoperating with Pbzip2
28191f110e0Safresh1
28291f110e0Safresh1Pbzip2 (L<http://compression.ca/pbzip2/>) is a parallel implementation of
28391f110e0Safresh1bzip2. The output from pbzip2 consists of a series of concatenated bzip2
28491f110e0Safresh1data streams.
28591f110e0Safresh1
28691f110e0Safresh1By default C<IO::Uncompress::Bzip2> will only uncompress the first bzip2
28791f110e0Safresh1data stream in a pbzip2 file. To uncompress the complete pbzip2 file you
28891f110e0Safresh1must include the C<MultiStream> option, like this.
28991f110e0Safresh1
29091f110e0Safresh1    bunzip2 $input => \$output, MultiStream => 1
29191f110e0Safresh1        or die "bunzip2 failed: $Bunzip2Error\n";
29291f110e0Safresh1
293898184e3Ssthen=head1 HTTP & NETWORK
294898184e3Ssthen
295898184e3Ssthen=head2 Apache::GZip Revisited
296898184e3Ssthen
297898184e3SsthenBelow is a mod_perl Apache compression module, called C<Apache::GZip>,
298898184e3Ssthentaken from
2999f11ffb7Safresh1L<http://perl.apache.org/docs/tutorials/tips/mod_perl_tricks/mod_perl_tricks.html#On_the_Fly_Compression>
300898184e3Ssthen
301898184e3Ssthen  package Apache::GZip;
302898184e3Ssthen  #File: Apache::GZip.pm
303898184e3Ssthen
304898184e3Ssthen  use strict vars;
305898184e3Ssthen  use Apache::Constants ':common';
306898184e3Ssthen  use Compress::Zlib;
307898184e3Ssthen  use IO::File;
308898184e3Ssthen  use constant GZIP_MAGIC => 0x1f8b;
309898184e3Ssthen  use constant OS_MAGIC => 0x03;
310898184e3Ssthen
311898184e3Ssthen  sub handler {
312898184e3Ssthen      my $r = shift;
313898184e3Ssthen      my ($fh,$gz);
314898184e3Ssthen      my $file = $r->filename;
315898184e3Ssthen      return DECLINED unless $fh=IO::File->new($file);
316898184e3Ssthen      $r->header_out('Content-Encoding'=>'gzip');
317898184e3Ssthen      $r->send_http_header;
318898184e3Ssthen      return OK if $r->header_only;
319898184e3Ssthen
320898184e3Ssthen      tie *STDOUT,'Apache::GZip',$r;
321898184e3Ssthen      print($_) while <$fh>;
322898184e3Ssthen      untie *STDOUT;
323898184e3Ssthen      return OK;
324898184e3Ssthen  }
325898184e3Ssthen
326898184e3Ssthen  sub TIEHANDLE {
327898184e3Ssthen      my($class,$r) = @_;
328898184e3Ssthen      # initialize a deflation stream
329898184e3Ssthen      my $d = deflateInit(-WindowBits=>-MAX_WBITS()) || return undef;
330898184e3Ssthen
331898184e3Ssthen      # gzip header -- don't ask how I found out
332898184e3Ssthen      $r->print(pack("nccVcc",GZIP_MAGIC,Z_DEFLATED,0,time(),0,OS_MAGIC));
333898184e3Ssthen
334898184e3Ssthen      return bless { r   => $r,
335898184e3Ssthen                     crc =>  crc32(undef),
336898184e3Ssthen                     d   => $d,
337898184e3Ssthen                     l   =>  0
338898184e3Ssthen                   },$class;
339898184e3Ssthen  }
340898184e3Ssthen
341898184e3Ssthen  sub PRINT {
342898184e3Ssthen      my $self = shift;
343898184e3Ssthen      foreach (@_) {
344898184e3Ssthen        # deflate the data
345898184e3Ssthen        my $data = $self->{d}->deflate($_);
346898184e3Ssthen        $self->{r}->print($data);
347898184e3Ssthen        # keep track of its length and crc
348898184e3Ssthen        $self->{l} += length($_);
349898184e3Ssthen        $self->{crc} = crc32($_,$self->{crc});
350898184e3Ssthen      }
351898184e3Ssthen  }
352898184e3Ssthen
353898184e3Ssthen  sub DESTROY {
354898184e3Ssthen     my $self = shift;
355898184e3Ssthen
356898184e3Ssthen     # flush the output buffers
357898184e3Ssthen     my $data = $self->{d}->flush;
358898184e3Ssthen     $self->{r}->print($data);
359898184e3Ssthen
360898184e3Ssthen     # print the CRC and the total length (uncompressed)
361898184e3Ssthen     $self->{r}->print(pack("LL",@{$self}{qw/crc l/}));
362898184e3Ssthen  }
363898184e3Ssthen
364898184e3Ssthen  1;
365898184e3Ssthen
366898184e3SsthenHere's the Apache configuration entry you'll need to make use of it.  Once
367898184e3Ssthenset it will result in everything in the /compressed directory will be
368898184e3Ssthencompressed automagically.
369898184e3Ssthen
370898184e3Ssthen  <Location /compressed>
371898184e3Ssthen     SetHandler  perl-script
372898184e3Ssthen     PerlHandler Apache::GZip
373898184e3Ssthen  </Location>
374898184e3Ssthen
375898184e3SsthenAlthough at first sight there seems to be quite a lot going on in
376898184e3SsthenC<Apache::GZip>, you could sum up what the code was doing as follows --
377898184e3Ssthenread the contents of the file in C<< $r->filename >>, compress it and write
378898184e3Ssthenthe compressed data to standard output. That's all.
379898184e3Ssthen
380898184e3SsthenThis code has to jump through a few hoops to achieve this because
381898184e3Ssthen
382898184e3Ssthen=over
383898184e3Ssthen
384898184e3Ssthen=item 1.
385898184e3Ssthen
386898184e3SsthenThe gzip support in C<Compress::Zlib> version 1.x can only work with a real
387898184e3Ssthenfilesystem filehandle. The filehandles used by Apache modules are not
388898184e3Ssthenassociated with the filesystem.
389898184e3Ssthen
390898184e3Ssthen=item 2.
391898184e3Ssthen
392898184e3SsthenThat means all the gzip support has to be done by hand - in this case by
393898184e3Ssthencreating a tied filehandle to deal with creating the gzip header and
394898184e3Ssthentrailer.
395898184e3Ssthen
396898184e3Ssthen=back
397898184e3Ssthen
398898184e3SsthenC<IO::Compress::Gzip> doesn't have that filehandle limitation (this was one
399898184e3Ssthenof the reasons for writing it in the first place). So if
400898184e3SsthenC<IO::Compress::Gzip> is used instead of C<Compress::Zlib> the whole tied
401898184e3Ssthenfilehandle code can be removed. Here is the rewritten code.
402898184e3Ssthen
403898184e3Ssthen  package Apache::GZip;
404898184e3Ssthen
405898184e3Ssthen  use strict vars;
406898184e3Ssthen  use Apache::Constants ':common';
407898184e3Ssthen  use IO::Compress::Gzip;
408898184e3Ssthen  use IO::File;
409898184e3Ssthen
410898184e3Ssthen  sub handler {
411898184e3Ssthen      my $r = shift;
412898184e3Ssthen      my ($fh,$gz);
413898184e3Ssthen      my $file = $r->filename;
414898184e3Ssthen      return DECLINED unless $fh=IO::File->new($file);
415898184e3Ssthen      $r->header_out('Content-Encoding'=>'gzip');
416898184e3Ssthen      $r->send_http_header;
417898184e3Ssthen      return OK if $r->header_only;
418898184e3Ssthen
419eac174f2Safresh1      my $gz = IO::Compress::Gzip->new( '-', Minimal => 1 )
420898184e3Ssthen          or return DECLINED ;
421898184e3Ssthen
422898184e3Ssthen      print $gz $_ while <$fh>;
423898184e3Ssthen
424898184e3Ssthen      return OK;
425898184e3Ssthen  }
426898184e3Ssthen
427898184e3Ssthenor even more succinctly, like this, using a one-shot gzip
428898184e3Ssthen
429898184e3Ssthen  package Apache::GZip;
430898184e3Ssthen
431898184e3Ssthen  use strict vars;
432898184e3Ssthen  use Apache::Constants ':common';
433898184e3Ssthen  use IO::Compress::Gzip qw(gzip);
434898184e3Ssthen
435898184e3Ssthen  sub handler {
436898184e3Ssthen      my $r = shift;
437898184e3Ssthen      $r->header_out('Content-Encoding'=>'gzip');
438898184e3Ssthen      $r->send_http_header;
439898184e3Ssthen      return OK if $r->header_only;
440898184e3Ssthen
441898184e3Ssthen      gzip $r->filename => '-', Minimal => 1
442898184e3Ssthen        or return DECLINED ;
443898184e3Ssthen
444898184e3Ssthen      return OK;
445898184e3Ssthen  }
446898184e3Ssthen
447898184e3Ssthen  1;
448898184e3Ssthen
449898184e3SsthenThe use of one-shot C<gzip> above just reads from C<< $r->filename >> and
450898184e3Ssthenwrites the compressed data to standard output.
451898184e3Ssthen
452898184e3SsthenNote the use of the C<Minimal> option in the code above. When using gzip
453898184e3Ssthenfor Content-Encoding you should I<always> use this option. In the example
454898184e3Ssthenabove it will prevent the filename being included in the gzip header and
455898184e3Ssthenmake the size of the gzip data stream a slight bit smaller.
456898184e3Ssthen
457898184e3Ssthen=head2 Compressed files and Net::FTP
458898184e3Ssthen
459898184e3SsthenThe C<Net::FTP> module provides two low-level methods called C<stor> and
460898184e3SsthenC<retr> that both return filehandles. These filehandles can used with the
461898184e3SsthenC<IO::Compress/Uncompress> modules to compress or uncompress files read
462898184e3Ssthenfrom or written to an FTP Server on the fly, without having to create a
463898184e3Ssthentemporary file.
464898184e3Ssthen
465898184e3SsthenFirstly, here is code that uses C<retr> to uncompressed a file as it is
466898184e3Ssthenread from the FTP Server.
467898184e3Ssthen
468898184e3Ssthen    use Net::FTP;
469898184e3Ssthen    use IO::Uncompress::Gunzip qw(:all);
470898184e3Ssthen
471eac174f2Safresh1    my $ftp = Net::FTP->new( ... )
472898184e3Ssthen
473898184e3Ssthen    my $retr_fh = $ftp->retr($compressed_filename);
474898184e3Ssthen    gunzip $retr_fh => $outFilename, AutoClose => 1
475898184e3Ssthen        or die "Cannot uncompress '$compressed_file': $GunzipError\n";
476898184e3Ssthen
477898184e3Ssthenand this to compress a file as it is written to the FTP Server
478898184e3Ssthen
479898184e3Ssthen    use Net::FTP;
480898184e3Ssthen    use IO::Compress::Gzip qw(:all);
481898184e3Ssthen
482898184e3Ssthen    my $stor_fh = $ftp->stor($filename);
483898184e3Ssthen    gzip "filename" => $stor_fh, AutoClose => 1
484898184e3Ssthen        or die "Cannot compress '$filename': $GzipError\n";
485898184e3Ssthen
486898184e3Ssthen=head1 MISC
487898184e3Ssthen
488898184e3Ssthen=head2 Using C<InputLength> to uncompress data embedded in a larger file/buffer.
489898184e3Ssthen
490898184e3SsthenA fairly common use-case is where compressed data is embedded in a larger
491898184e3Ssthenfile/buffer and you want to read both.
492898184e3Ssthen
493898184e3SsthenAs an example consider the structure of a zip file. This is a well-defined
494898184e3Ssthenfile format that mixes both compressed and uncompressed sections of data in
495898184e3Ssthena single file.
496898184e3Ssthen
497898184e3SsthenFor the purposes of this discussion you can think of a zip file as sequence
498898184e3Ssthenof compressed data streams, each of which is prefixed by an uncompressed
499898184e3Ssthenlocal header. The local header contains information about the compressed
500898184e3Ssthendata stream, including the name of the compressed file and, in particular,
501898184e3Ssthenthe length of the compressed data stream.
502898184e3Ssthen
503898184e3SsthenTo illustrate how to use C<InputLength> here is a script that walks a zip
504898184e3Ssthenfile and prints out how many lines are in each compressed file (if you
505898184e3Ssthenintend write code to walking through a zip file for real see
506898184e3SsthenL<IO::Uncompress::Unzip/"Walking through a zip file"> ). Also, although
507898184e3Ssthenthis example uses the zlib-based compression, the technique can be used by
508898184e3Ssthenthe other C<IO::Uncompress::*> modules.
509898184e3Ssthen
510898184e3Ssthen    use strict;
511898184e3Ssthen    use warnings;
512898184e3Ssthen
513898184e3Ssthen    use IO::File;
514898184e3Ssthen    use IO::Uncompress::RawInflate qw(:all);
515898184e3Ssthen
516898184e3Ssthen    use constant ZIP_LOCAL_HDR_SIG  => 0x04034b50;
517898184e3Ssthen    use constant ZIP_LOCAL_HDR_LENGTH => 30;
518898184e3Ssthen
519898184e3Ssthen    my $file = $ARGV[0] ;
520898184e3Ssthen
521eac174f2Safresh1    my $fh = IO::File->new( "<$file" )
522898184e3Ssthen                or die "Cannot open '$file': $!\n";
523898184e3Ssthen
524898184e3Ssthen    while (1)
525898184e3Ssthen    {
526898184e3Ssthen        my $sig;
527898184e3Ssthen        my $buffer;
528898184e3Ssthen
529898184e3Ssthen        my $x ;
530898184e3Ssthen        ($x = $fh->read($buffer, ZIP_LOCAL_HDR_LENGTH)) == ZIP_LOCAL_HDR_LENGTH
531898184e3Ssthen            or die "Truncated file: $!\n";
532898184e3Ssthen
533898184e3Ssthen        my $signature = unpack ("V", substr($buffer, 0, 4));
534898184e3Ssthen
535898184e3Ssthen        last unless $signature == ZIP_LOCAL_HDR_SIG;
536898184e3Ssthen
537898184e3Ssthen        # Read Local Header
538898184e3Ssthen        my $gpFlag             = unpack ("v", substr($buffer, 6, 2));
539898184e3Ssthen        my $compressedMethod   = unpack ("v", substr($buffer, 8, 2));
540898184e3Ssthen        my $compressedLength   = unpack ("V", substr($buffer, 18, 4));
541898184e3Ssthen        my $uncompressedLength = unpack ("V", substr($buffer, 22, 4));
542898184e3Ssthen        my $filename_length    = unpack ("v", substr($buffer, 26, 2));
543898184e3Ssthen        my $extra_length       = unpack ("v", substr($buffer, 28, 2));
544898184e3Ssthen
545898184e3Ssthen        my $filename ;
546898184e3Ssthen        $fh->read($filename, $filename_length) == $filename_length
547898184e3Ssthen            or die "Truncated file\n";
548898184e3Ssthen
549898184e3Ssthen        $fh->read($buffer, $extra_length) == $extra_length
550898184e3Ssthen            or die "Truncated file\n";
551898184e3Ssthen
552898184e3Ssthen        if ($compressedMethod != 8 && $compressedMethod != 0)
553898184e3Ssthen        {
554898184e3Ssthen            warn "Skipping file '$filename' - not deflated $compressedMethod\n";
555898184e3Ssthen            $fh->read($buffer, $compressedLength) == $compressedLength
556898184e3Ssthen                or die "Truncated file\n";
557898184e3Ssthen            next;
558898184e3Ssthen        }
559898184e3Ssthen
560898184e3Ssthen        if ($compressedMethod == 0 && $gpFlag & 8 == 8)
561898184e3Ssthen        {
562898184e3Ssthen            die "Streamed Stored not supported for '$filename'\n";
563898184e3Ssthen        }
564898184e3Ssthen
565898184e3Ssthen        next if $compressedLength == 0;
566898184e3Ssthen
567898184e3Ssthen        # Done reading the Local Header
568898184e3Ssthen
569eac174f2Safresh1        my $inf = IO::Uncompress::RawInflate->new( $fh,
570898184e3Ssthen                            Transparent => 1,
571eac174f2Safresh1                            InputLength => $compressedLength )
572898184e3Ssthen          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
573898184e3Ssthen
574898184e3Ssthen        my $line_count = 0;
575898184e3Ssthen
576898184e3Ssthen        while (<$inf>)
577898184e3Ssthen        {
578898184e3Ssthen            ++ $line_count;
579898184e3Ssthen        }
580898184e3Ssthen
581898184e3Ssthen        print "$filename: $line_count\n";
582898184e3Ssthen    }
583898184e3Ssthen
584898184e3SsthenThe majority of the code above is concerned with reading the zip local
585898184e3Ssthenheader data. The code that I want to focus on is at the bottom.
586898184e3Ssthen
587898184e3Ssthen    while (1) {
588898184e3Ssthen
589898184e3Ssthen        # read local zip header data
590898184e3Ssthen        # get $filename
591898184e3Ssthen        # get $compressedLength
592898184e3Ssthen
593eac174f2Safresh1        my $inf = IO::Uncompress::RawInflate->new( $fh,
594898184e3Ssthen                            Transparent => 1,
595eac174f2Safresh1                            InputLength => $compressedLength )
596898184e3Ssthen          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
597898184e3Ssthen
598898184e3Ssthen        my $line_count = 0;
599898184e3Ssthen
600898184e3Ssthen        while (<$inf>)
601898184e3Ssthen        {
602898184e3Ssthen            ++ $line_count;
603898184e3Ssthen        }
604898184e3Ssthen
605898184e3Ssthen        print "$filename: $line_count\n";
606898184e3Ssthen    }
607898184e3Ssthen
608898184e3SsthenThe call to C<IO::Uncompress::RawInflate> creates a new filehandle C<$inf>
609898184e3Ssthenthat can be used to read from the parent filehandle C<$fh>, uncompressing
610898184e3Ssthenit as it goes. The use of the C<InputLength> option will guarantee that
611898184e3SsthenI<at most> C<$compressedLength> bytes of compressed data will be read from
612898184e3Ssthenthe C<$fh> filehandle (The only exception is for an error case like a
613898184e3Ssthentruncated file or a corrupt data stream).
614898184e3Ssthen
615898184e3SsthenThis means that once RawInflate is finished C<$fh> will be left at the
616898184e3Ssthenbyte directly after the compressed data stream.
617898184e3Ssthen
618898184e3SsthenNow consider what the code looks like without C<InputLength>
619898184e3Ssthen
620898184e3Ssthen    while (1) {
621898184e3Ssthen
622898184e3Ssthen        # read local zip header data
623898184e3Ssthen        # get $filename
624898184e3Ssthen        # get $compressedLength
625898184e3Ssthen
626898184e3Ssthen        # read all the compressed data into $data
627898184e3Ssthen        read($fh, $data, $compressedLength);
628898184e3Ssthen
629eac174f2Safresh1        my $inf = IO::Uncompress::RawInflate->new( \$data,
630eac174f2Safresh1                            Transparent => 1 )
631898184e3Ssthen          or die "Cannot uncompress $file [$filename]: $RawInflateError\n"  ;
632898184e3Ssthen
633898184e3Ssthen        my $line_count = 0;
634898184e3Ssthen
635898184e3Ssthen        while (<$inf>)
636898184e3Ssthen        {
637898184e3Ssthen            ++ $line_count;
638898184e3Ssthen        }
639898184e3Ssthen
640898184e3Ssthen        print "$filename: $line_count\n";
641898184e3Ssthen    }
642898184e3Ssthen
643898184e3SsthenThe difference here is the addition of the temporary variable C<$data>.
644898184e3SsthenThis is used to store a copy of the compressed data while it is being
645898184e3Ssthenuncompressed.
646898184e3Ssthen
647898184e3SsthenIf you know that C<$compressedLength> isn't that big then using temporary
648898184e3Ssthenstorage won't be a problem. But if C<$compressedLength> is very large or
649898184e3Ssthenyou are writing an application that other people will use, and so have no
650898184e3Ssthenidea how big C<$compressedLength> will be, it could be an issue.
651898184e3Ssthen
652898184e3SsthenUsing C<InputLength> avoids the use of temporary storage and means the
653898184e3Ssthenapplication can cope with large compressed data streams.
654898184e3Ssthen
655898184e3SsthenOne final point -- obviously C<InputLength> can only be used whenever you
656898184e3Ssthenknow the length of the compressed data beforehand, like here with a zip
657898184e3Ssthenfile.
658898184e3Ssthen
65956d68f1eSafresh1=head1 SUPPORT
66056d68f1eSafresh1
66156d68f1eSafresh1General feedback/questions/bug reports should be sent to
66256d68f1eSafresh1L<https://github.com/pmqs//issues> (preferred) or
66356d68f1eSafresh1L<https://rt.cpan.org/Public/Dist/Display.html?Name=>.
66456d68f1eSafresh1
665898184e3Ssthen=head1 SEE ALSO
666898184e3Ssthen
667b46d8ef2Safresh1L<Compress::Zlib>, L<IO::Compress::Gzip>, L<IO::Uncompress::Gunzip>, L<IO::Compress::Deflate>, L<IO::Uncompress::Inflate>, L<IO::Compress::RawDeflate>, L<IO::Uncompress::RawInflate>, L<IO::Compress::Bzip2>, L<IO::Uncompress::Bunzip2>, L<IO::Compress::Lzma>, L<IO::Uncompress::UnLzma>, L<IO::Compress::Xz>, L<IO::Uncompress::UnXz>, L<IO::Compress::Lzip>, L<IO::Uncompress::UnLzip>, L<IO::Compress::Lzop>, L<IO::Uncompress::UnLzop>, L<IO::Compress::Lzf>, L<IO::Uncompress::UnLzf>, L<IO::Compress::Zstd>, L<IO::Uncompress::UnZstd>, L<IO::Uncompress::AnyInflate>, L<IO::Uncompress::AnyUncompress>
668898184e3Ssthen
669898184e3SsthenL<IO::Compress::FAQ|IO::Compress::FAQ>
670898184e3Ssthen
671898184e3SsthenL<File::GlobMapper|File::GlobMapper>, L<Archive::Zip|Archive::Zip>,
672898184e3SsthenL<Archive::Tar|Archive::Tar>,
673898184e3SsthenL<IO::Zlib|IO::Zlib>
674898184e3Ssthen
675898184e3Ssthen=head1 AUTHOR
676898184e3Ssthen
6779f11ffb7Safresh1This module was written by Paul Marquess, C<pmqs@cpan.org>.
678898184e3Ssthen
679898184e3Ssthen=head1 MODIFICATION HISTORY
680898184e3Ssthen
681898184e3SsthenSee the Changes file.
682898184e3Ssthen
683898184e3Ssthen=head1 COPYRIGHT AND LICENSE
684898184e3Ssthen
685*3d61058aSafresh1Copyright (c) 2005-2024 Paul Marquess. All rights reserved.
686898184e3Ssthen
687898184e3SsthenThis program is free software; you can redistribute it and/or
688898184e3Ssthenmodify it under the same terms as Perl itself.
689898184e3Ssthen
690