1[Info-ZIP note, 981119:  this file is based on PKWARE's appnote.txt of
2 15 February 1996, taking into account PKWARE's revised appnote.txt version
3 of 01 September 1998.  It has been unofficially corrected and extended by
4 Info-ZIP without explicit permission by PKWARE.  Although Info-ZIP
5 believes the information to be accurate and complete, it is provided
6 under a disclaimer similar to the PKWARE disclaimer below, differing
7 only in the substitution of "Info-ZIP" for "PKWARE".  In other words,
8 use this information at your own risk, but we think it's correct.
9
10 Specification info from PKWARE that was obviously wrong has been corrected
11 silently (e.g. missing structure fields, wrong numbers
12 As of PKZIPW 2.50, two new incompatibilities have been introduced by PKWARE;
13 they are noted below.  Note that the "NTFS tag" conflict is currently not
14 real; PKZIPW 2.50 actually tags NTFS files as having come from a FAT
15 file system, too.]
16
17
18Disclaimer
19----------
20
21Although PKWARE will attempt to supply current and accurate
22information relating to its file formats, algorithms, and the
23subject programs, the possibility of error can not be eliminated.
24PKWARE therefore expressly disclaims any warranty that the
25information contained in the associated materials relating to the
26subject programs and/or the format of the files created or
27accessed by the subject programs and/or the algorithms used by
28the subject programs, or any other matter, is current, correct or
29accurate as delivered.  Any risk of damage due to any possible
30inaccurate information is assumed by the user of the information.
31Furthermore, the information relating to the subject programs
32and/or the file formats created or accessed by the subject
33programs and/or the algorithms used by the subject programs is
34subject to change without notice.
35
36
37General Format of a ZIP file
38----------------------------
39
40  Files stored in arbitrary order.  Large zipfiles can span multiple
41  diskette media.
42
43  Overall zipfile format:
44
45    [local file header + file data + data_descriptor] . . .
46    [central directory] end of central directory record
47
48
49  A.  Local file header:
50
51        local file header signature     4 bytes  (0x04034b50)
52        version needed to extract       2 bytes
53        general purpose bit flag        2 bytes
54        compression method              2 bytes
55        last mod file time              2 bytes
56        last mod file date              2 bytes
57        crc-32                          4 bytes
58        compressed size                 4 bytes
59        uncompressed size               4 bytes
60        filename length                 2 bytes
61        extra field length              2 bytes
62
63        filename (variable size)
64        extra field (variable size)
65
66
67  B.  Data descriptor:
68
69        data descriptor signature       4 bytes  (0x08074b50)
70        crc-32                          4 bytes
71        compressed size                 4 bytes
72        uncompressed size               4 bytes
73
74      This descriptor exists only if bit 3 of the general
75      purpose bit flag is set (see below).  It is byte aligned
76      and immediately follows the last byte of compressed data.
77      This descriptor is used only when it was not possible to
78      seek in the output zip file, e.g., when the output zip file
79      was standard output or a non seekable device.
80
81  C.  Central directory structure:
82
83      [file header] . . .  end of central dir record
84
85      File header:
86
87        central file header signature   4 bytes  (0x02014b50)
88        version made by                 2 bytes
89        version needed to extract       2 bytes
90        general purpose bit flag        2 bytes
91        compression method              2 bytes
92        last mod file time              2 bytes
93        last mod file date              2 bytes
94        crc-32                          4 bytes
95        compressed size                 4 bytes
96        uncompressed size               4 bytes
97        filename length                 2 bytes
98        extra field length              2 bytes
99        file comment length             2 bytes
100        disk number start               2 bytes
101        internal file attributes        2 bytes
102        external file attributes        4 bytes
103        relative offset of local header 4 bytes
104
105        filename (variable size)
106        extra field (variable size)
107        file comment (variable size)
108
109      End of central dir record:
110
111        end of central dir signature    4 bytes  (0x06054b50)
112        number of this disk             2 bytes
113        number of the disk with the
114        start of the central directory  2 bytes
115        total number of entries in
116        the central dir on this disk    2 bytes
117        total number of entries in
118        the central dir                 2 bytes
119        size of the central directory   4 bytes
120        offset of start of central
121        directory with respect to
122        the starting disk number        4 bytes
123        zipfile comment length          2 bytes
124        zipfile comment (variable size)
125
126
127  D.  Explanation of fields:
128
129      version made by (2 bytes)
130
131          The upper byte indicates the host system (OS) for the
132          file.  Software can use this information to determine
133          the line record format for text files etc.  The current
134          mappings are:
135
136          0 - FAT file system (DOS, OS/2, NT)         + PKZIPW 2.50 VFAT, NTFS
137          1 - Amiga
138          2 - VMS (VAX or Alpha AXP)
139          3 - Unix
140          4 - VM/CMS
141          5 - Atari
142          6 - HPFS file system (OS/2, NT 3.x)
143          7 - Macintosh
144          8 - Z-System
145          9 - CP/M
146          10 - TOPS-20                          [supposedly PKZIPW 2.50 NTFS]
147          11 - NTFS file system (NT)            [used by Info-ZIP, only]
148          12 - SMS/QDOS
149          13 - Acorn RISC OS
150          14 - VFAT file system (Win95, NT)     [Info-ZIP reservation, unused]
151          15 - MVS
152          16 - BeOS (BeBox or PowerMac)
153          17 - Tandem
154          18 thru 255 - unused
155
156          The lower byte indicates the version number of the
157          software used to encode the file.  The value/10
158          indicates the major version number, and the value
159          mod 10 is the minor version number.
160
161      version needed to extract (2 bytes)
162
163          The minimum software version needed to extract the
164          file, mapped as above.
165
166      general purpose bit flag: (2 bytes)
167
168          Bit 0: If set, indicates that the file is encrypted.
169
170          (For Method 6 - Imploding)
171          Bit 1: If the compression method used was type 6,
172                 Imploding, then this bit, if set, indicates
173                 an 8K sliding dictionary was used.  If clear,
174                 then a 4K sliding dictionary was used.
175          Bit 2: If the compression method used was type 6,
176                 Imploding, then this bit, if set, indicates
177                 an 3 Shannon-Fano trees were used to encode the
178                 sliding dictionary output.  If clear, then 2
179                 Shannon-Fano trees were used.
180
181          (For Method 8 - Deflating)
182          Bit 2  Bit 1
183            0      0    Normal (-en) compression option was used.
184            0      1    Maximum (-ex) compression option was used.
185            1      0    Fast (-ef) compression option was used.
186            1      1    Super Fast (-es) compression option was used.
187
188          Note:  Bits 1 and 2 are undefined if the compression
189                 method is any other.
190
191          Bit 3: If this bit is set, the fields crc-32, compressed size
192                 and uncompressed size are set to zero in the local
193                 header.  The correct values are put in the data descriptor
194                 immediately following the compressed data.  (Note: PKZIP
195                 version 2.04g for DOS only recognizes this bit for method 8
196                 compression, newer versions of PKZIP recognize this bit
197                 for any compression method.)
198                [Info-ZIP note: This bit was introduced by PKZIP 2.04 for
199                 DOS. In general, this feature can only be reliably used
200                 together with compression methods that allow intrinsic
201                 detection of the "end-of-compressed-data" condition. From
202                 the set of compression methods described in this Zip archive
203                 specification, only "deflate" meets this requirement.
204                 Especially, the method STORED does not work!
205                 The Info-ZIP tools recognize this bit regardless of the
206                 compression method; but, they rely on correctly set
207                 "compressed size" information in the central directory entry.]
208
209          Bit 5: If this bit is set, this indicates that the file is compressed
210                 patched data. (Note: Requires PKZIP version 2.70 or greater)
211
212          The upper three bits are reserved and used internally
213          by the software when processing the zipfile.  The
214          remaining bits are unused.
215
216      compression method: (2 bytes)
217
218          (see accompanying documentation for algorithm
219          descriptions)
220
221          0 - The file is stored (no compression)
222          1 - The file is Shrunk
223          2 - The file is Reduced with compression factor 1
224          3 - The file is Reduced with compression factor 2
225          4 - The file is Reduced with compression factor 3
226          5 - The file is Reduced with compression factor 4
227          6 - The file is Imploded
228          7 - Reserved for Tokenizing compression algorithm
229          8 - The file is Deflated
230          9 - Reserved for enhanced Deflating
231         10 - PKWARE Data Compression Library Imploding
232
233      date and time fields: (2 bytes each)
234
235          The date and time are encoded in standard MS-DOS format.
236          If input came from standard input, the date and time are
237          those at which compression was started for this data.
238
239      CRC-32: (4 bytes)
240
241          The CRC-32 algorithm was generously contributed by
242          David Schwaderer and can be found in his excellent
243          book "C Programmers Guide to NetBIOS" published by
244          Howard W. Sams & Co. Inc.  The 'magic number' for
245          the CRC is 0xdebb20e3.  The proper CRC pre and post
246          conditioning is used, meaning that the CRC register
247          is pre-conditioned with all ones (a starting value
248          of 0xffffffff) and the value is post-conditioned by
249          taking the one's complement of the CRC residual.
250          If bit 3 of the general purpose flag is set, this
251          field is set to zero in the local header and the correct
252          value is put in the data descriptor and in the central
253          directory.
254
255      compressed size: (4 bytes)
256      uncompressed size: (4 bytes)
257
258          The size of the file compressed and uncompressed,
259          respectively.  If bit 3 of the general purpose bit flag
260          is set, these fields are set to zero in the local header
261          and the correct values are put in the data descriptor and
262          in the central directory.
263
264      filename length: (2 bytes)
265      extra field length: (2 bytes)
266      file comment length: (2 bytes)
267
268          The length of the filename, extra field, and comment
269          fields respectively.  The combined length of any
270          directory record and these three fields should not
271          generally exceed 65,535 bytes.  If input came from standard
272          input, the filename length is set to zero.
273
274         [Info-ZIP note:
275          This feature is not yet supported by any PKWARE version of ZIP
276          (at least not in PKZIP for DOS and PKZIP for Windows/WinNT).
277          The Info-ZIP programs handle standard input differently:
278          If input came from standard input, the filename is set to "-"
279          (length one).]
280
281
282      disk number start: (2 bytes)
283
284          The number of the disk on which this file begins.
285
286      internal file attributes: (2 bytes)
287
288          The lowest bit of this field indicates, if set, that
289          the file is apparently an ASCII or text file.  If not
290          set, that the file apparently contains binary data.
291          The remaining bits are unused in version 1.0.
292
293      external file attributes: (4 bytes)
294
295          The mapping of the external attributes is
296          host-system dependent (see 'version made by').  For
297          MS-DOS, the low order byte is the MS-DOS directory
298          attribute byte.  If input came from standard input, this
299          field is set to zero.
300
301      relative offset of local header: (4 bytes)
302
303          This is the offset from the start of the first disk on
304          which this file appears, to where the local header should
305          be found.
306
307      filename: (Variable)
308
309          The name of the file, with optional relative path.
310          The path stored should not contain a drive or
311          device letter, or a leading slash.  All slashes
312          should be forward slashes '/' as opposed to
313          backwards slashes '\' for compatibility with Amiga
314          and Unix file systems etc.  If input came from standard
315          input, there is no filename field.
316         [Info-ZIP discrepancy:
317          If input came from standard input, the file name is set
318          to "-" (without the quotes).
319          As far as we know, the PKWARE specification for "input from
320          stdin" is not supported by PKZIP/PKUNZIP for DOS, OS/2, Windows
321          Windows NT.]
322
323      extra field: (Variable)
324
325          This is for future expansion.  If additional information
326          needs to be stored in the future, it should be stored
327          here.  Earlier versions of the software can then safely
328          skip this file, and find the next file or header.  This
329          field will be 0 length in version 1.0.
330
331          In order to allow different programs and different types
332          of information to be stored in the 'extra' field in .ZIP
333          files, the following structure should be used for all
334          programs storing data in this field:
335
336          header1+data1 + header2+data2 . . .
337
338          Each header should consist of:
339
340            Header ID - 2 bytes
341            Data Size - 2 bytes
342
343          Note: all fields stored in Intel low-byte/high-byte order.
344
345          The Header ID field indicates the type of data that is in
346          the following data block.
347
348          Header ID's of 0 thru 31 are reserved for use by PKWARE.
349          The remaining ID's can be used by third party vendors for
350          proprietary usage.
351
352          The current Header ID mappings defined by PKWARE are:
353
354          0x0007        AV Info
355          0x0009        OS/2 extended attributes      (also Info-ZIP)
356          0x000a        PKWARE Win95/WinNT FileTimes  [undocumented!]
357          0x000c        PKWARE VAX/VMS                (also Info-ZIP)
358          0x000d        PKWARE Unix
359          0x000f        Patch Descriptor
360
361          The Header ID mappings defined by Info-ZIP and third parties are:
362
363          0x07c8        Info-ZIP Macintosh (old, J. Lee)
364          0x2605        ZipIt Macintosh (first version)
365          0x2705        ZipIt Macintosh v 1.3.5 and newer (w/o full filename)
366          0x334d        Info-ZIP Macintosh (new, D. Haase's 'Mac3' field )
367          0x4341        Acorn/SparkFS (David Pilling)
368          0x4453        Windows NT security descriptor (binary ACL)
369          0x4704        VM/CMS
370          0x470f        MVS
371          0x4b46        FWKCS MD5 (third party, see below)
372          0x4c41        OS/2 access control list (text ACL)
373          0x4d49        Info-ZIP VMS (VAX or Alpha)
374          0x5356        AOS/VS (binary ACL)
375          0x5455        extended timestamp
376          0x5855        Info-ZIP Unix (original; also OS/2, NT, etc.)
377          0x6542        BeOS (BeBox, PowerMac, etc.)
378          0x756e        ASi Unix
379          0x7855        Info-ZIP Unix (new)
380          0xfb4a        SMS/QDOS
381
382          The Data Size field indicates the size of the following
383          data block. Programs can use this value to skip to the
384          next header block, passing over any data blocks that are
385          not of interest.
386
387          Note: As stated above, the size of the entire .ZIP file
388                header, including the filename, comment, and extra
389                field should not exceed 64K in size.
390
391          In case two different programs should appropriate the same
392          Header ID value, it is strongly recommended that each
393          program place a unique signature of at least two bytes in
394          size (and preferably 4 bytes or bigger) at the start of
395          each data area.  Every program should verify that its
396          unique signature is present, in addition to the Header ID
397          value being correct, before assuming that it is a block of
398          known type.
399
400          In the following descriptions, note that "Short" means two bytes,
401          "Long" means four bytes, and "Long-Long" means eight bytes,
402          regardless of their native sizes.  Unless specifically noted, all
403          integer fields should be interpreted as unsigned (non-negative)
404          numbers.
405
406
407         -OS/2 Extended Attributes Extra Field:
408          ====================================
409
410          The following is the layout of the OS/2 extended attributes "extra"
411          block.  (Last Revision 19960922)
412
413          Note: all fields stored in Intel low-byte/high-byte order.
414
415          Local-header version:
416
417          Value         Size            Description
418          -----         ----            -----------
419  (OS/2)  0x0009        Short           tag for this extra block type
420          TSize         Short           total data size for this block
421          BSize         Long            uncompressed EA data size
422          CType         Short           compression type
423          EACRC         Long            CRC value for uncompressed EA data
424          (var.)        variable        compressed EA data
425
426          Central-header version:
427
428          Value         Size            Description
429          -----         ----            -----------
430  (OS/2)  0x0009        Short           tag for this extra block type
431          TSize         Short           total data size for this block
432          BSize         Long            size of uncompressed local EA data
433
434          The value of CType is interpreted according to the "compression
435          method" section above; i.e., 0 for stored, 8 for deflated, etc.
436
437          The OS/2 extended attribute structure (FEA2LIST) is compressed and
438          then stored in its entirety within this structure.  There will only
439          ever be one block of data in the variable-length field.
440
441
442         -OS/2 Access Control List Extra Field:
443          ====================================
444
445          The following is the layout of the OS/2 ACL extra block.
446          (Last Revision 19960922)
447
448          Local-header version:
449
450          Value         Size            Description
451          -----         ----            -----------
452  (ACL)   0x4c41        Short           tag for this extra block type
453          TSize         Short           total data size for this block
454          BSize         Long            uncompressed ACL data size
455          CType         Short           compression type
456          EACRC         Long            CRC value for uncompressed ACL data
457          (var.)        variable        compressed ACL data
458
459          Central-header version:
460
461          Value         Size            Description
462          -----         ----            -----------
463  (ACL)   0x4c41        Short           tag for this extra block type
464          TSize         Short           total data size for this block
465          BSize         Long            size of uncompressed local ACL data
466
467          The value of CType is interpreted according to the "compression
468          method" section above; i.e., 0 for stored, 8 for deflated, etc.
469
470          The uncompressed ACL data consist of a text header of the form
471          "ACL1:%hX,%hd\n", where the first field is the OS/2 ACCINFO acc_attr
472          member and the second is acc_count, followed by acc_count strings
473          of the form "%s,%hx\n", where the first field is acl_ugname (user
474          group name) and the second acl_access.  This block type will be
475          extended for other operating systems as needed.
476
477
478         -Windows NT Security Descriptor Extra Field:
479          ==========================================
480
481          The following is the layout of the NT Security Descriptor (another
482          type of ACL) extra block.  (Last Revision 19960922)
483
484          Local-header version:
485
486          Value         Size            Description
487          -----         ----            -----------
488  (SD)    0x4453        Short           tag for this extra block type
489          TSize         Short           total data size for this block
490          BSize         Long            uncompressed SD data size
491          Version       Byte            version of uncompressed SD data format
492          CType         Short           compression type
493          EACRC         Long            CRC value for uncompressed SD data
494          (var.)        variable        compressed SD data
495
496          Central-header version:
497
498          Value         Size            Description
499          -----         ----            -----------
500  (SD)    0x4453        Short           tag for this extra block type
501          TSize         Short           total data size for this block
502          BSize         Long            size of uncompressed local SD data
503
504          The value of CType is interpreted according to the "compression
505          method" section above; i.e., 0 for stored, 8 for deflated, etc.
506          Version specifies how the compressed data are to be interpreted
507          and allows for future expansion of this extra field type.  Currently
508          only version 0 is defined.
509
510          For version 0, the compressed data are to be interpreted as a single
511          valid Windows NT SECURITY_DESCRIPTOR data structure, in self-relative
512          format.
513
514
515         -PKWARE Win95/WinNT Extra Field:
516          ==============================
517
518          The following description covers PKWARE's undocumented
519          Windows 95 & Windows NT extra field, introduced with the
520          release of PKZIP for Windows 2.50. (Last Revision 19980425)
521
522          This field has a fixed data size of 32 bytes and is only stored
523          as local extra field.
524
525          Value         Size            Description
526          -----         ----            -----------
527 (WinNT)  0x000a        Short           Tag for this "extra" block type
528          TSize         Short           Total Data Size for this block
529          Unknwn1       Long            ???? (all 0 ?)
530          Unknwn2       Long            ????
531          ModTime       Long-Long       64-bit NTFS last-modified filetime
532          AccTime       Long-Long       64-bit NTFS last-access filetime
533          CreTime       Long-Long       64-bit NTFS creation filetime
534
535          The NTFS filetimes are 64-bit unsigned integers, stored in Intel
536          (least significant byte first) byte order. They determine the
537          number of 1.0E-07 seconds (1/10th microseconds!) past WinNT "epoch",
538          which is "01-Jan-1601 00:00:00 UTC".
539
540
541         -PKWARE VAX/VMS Extra Field:
542          ==========================
543
544          The following is the layout of PKWARE's VAX/VMS attributes "extra"
545          block.  (Last Revision 12/17/91)
546
547          Note: all fields stored in Intel low-byte/high-byte order.
548
549          Value         Size            Description
550          -----         ----            -----------
551  (VMS)   0x000c        Short           Tag for this "extra" block type
552          TSize         Short           Total Data Size for this block
553          CRC           Long            32-bit CRC for remainder of the block
554          Tag1          Short           VMS attribute tag value #1
555          Size1         Short           Size of attribute #1, in bytes
556          (var.)        Size1           Attribute #1 data
557          .
558          .
559          .
560          TagN          Short           VMS attribute tage value #N
561          SizeN         Short           Size of attribute #N, in bytes
562          (var.)        SizeN           Attribute #N data
563
564          Rules:
565
566          1. There will be one or more of attributes present, which will
567             each be preceded by the above TagX & SizeX values.  These
568             values are identical to the ATR$C_XXXX and ATR$S_XXXX constants
569             which are defined in ATR.H under VMS C.  Neither of these values
570             will ever be zero.
571
572          2. No word alignment or padding is performed.
573
574          3. A well-behaved PKZIP/VMS program should never produce more than
575             one sub-block with the same TagX value.  Also, there will never
576             be more than one "extra" block of type 0x000c in a particular
577             directory record.
578
579
580         -Info-ZIP VMS Extra Field:
581          ========================
582
583          The following is the layout of Info-ZIP's VMS attributes extra
584          block for VAX or Alpha AXP.  The local-header and central-header
585          versions are identical.  (Last Revision 19960922)
586
587          Value         Size            Description
588          -----         ----            -----------
589  (VMS2)  0x4d49        Short           tag for this extra block type
590          TSize         Short           total data size for this block
591          ID            Long            block ID
592          Flags         Short           info bytes
593          BSize         Short           uncompressed block size
594          Reserved      Long            (reserved)
595          (var.)        variable        compressed VMS file-attributes block
596
597          The block ID is one of the following unterminated strings:
598
599                "VFAB"          struct FAB
600                "VALL"          struct XABALL
601                "VFHC"          struct XABFHC
602                "VDAT"          struct XABDAT
603                "VRDT"          struct XABRDT
604                "VPRO"          struct XABPRO
605                "VKEY"          struct XABKEY
606                "VMSV"          version (e.g., "V6.1"; truncated at hyphen)
607                "VNAM"          reserved
608
609          The lower three bits of Flags indicate the compression method.  The
610          currently defined methods are:
611
612                0       stored (not compressed)
613                1       simple "RLE"
614                2       deflated
615
616          The "RLE" method simply replaces zero-valued bytes with zero-valued
617          bits and non-zero-valued bytes with a "1" bit followed by the byte
618          value.
619
620          The variable-length compressed data contains only the data corre-
621          sponding to the indicated structure or string.  Typically multiple
622          VMS2 extra fields are present (each with a unique block type).
623
624
625         -Info-ZIP Macintosh Extra Field:
626          ==============================
627
628          The following is the layout of the (old) Info-ZIP resource-fork extra
629          block for Macintosh.  The local-header and central-header versions
630          are identical.  (Last Revision 19960922)
631
632          Value         Size            Description
633          -----         ----            -----------
634  (Mac)   0x07c8        Short           tag for this extra block type
635          TSize         Short           total data size for this block
636          "JLEE"        beLong          extra-field signature
637          FInfo         16 bytes        Macintosh FInfo structure
638          CrDat         beLong          HParamBlockRec fileParam.ioFlCrDat
639          MdDat         beLong          HParamBlockRec fileParam.ioFlMdDat
640          Flags         beLong          info bits
641          DirID         beLong          HParamBlockRec fileParam.ioDirID
642          VolName       28 bytes        volume name (optional)
643
644          All fields but the first two are in native Macintosh format
645          (big-endian Motorola order, not little-endian Intel).  The least
646          significant bit of Flags is 1 if the file is a data fork, 0 other-
647          wise.  In addition, if this extra field is present, the filename
648          has an extra 'd' or 'r' appended to indicate data fork or resource
649          fork.  The 28-byte VolName field may be omitted.
650
651
652         -ZipIt Macintosh Extra Field (long):
653          ==================================
654
655          The following is the layout of the ZipIt extra block for Macintosh.
656          The local-header and central-header versions are identical.
657          (Last Revision 19970130)
658
659          Value         Size            Description
660          -----         ----            -----------
661  (Mac2)  0x2605        Short           tag for this extra block type
662          TSize         Short           total data size for this block
663          "ZPIT"        beLong          extra-field signature
664          FnLen         Byte            length of FileName
665          FileName      variable        full Macintosh filename
666          FileType      Byte[4]         four-byte Mac file type string
667          Creator       Byte[4]         four-byte Mac creator string
668
669
670         -ZipIt Macintosh Extra Field (short):
671          ===================================
672
673          The following is the layout of a shortened variant of the
674          ZipIt extra block for Macintosh (without "full name" entry).
675          This variant is used by ZipIt 1.3.5 and newer for entries that
676          do not need a "full Mac filename" record.
677          The local-header and central-header versions are identical.
678          (Last Revision 19980903)
679
680          Value         Size            Description
681          -----         ----            -----------
682  (Mac2b) 0x2705        Short           tag for this extra block type
683          TSize         Short           total data size for this block
684          "ZPIT"        beLong          extra-field signature
685          FileType      Byte[4]         four-byte Mac file type string
686          Creator       Byte[4]         four-byte Mac creator string
687
688
689         -Info-ZIP Macintosh Extra Field (new):
690          ====================================
691
692          The following is the layout of the (new) Info-ZIP extra
693          block for Macintosh, designed by Dirk Haase.
694          All values are in little-endian.
695          (Last Revision 19981005)
696
697          Local-header version:
698
699          Value         Size            Description
700          -----         ----            -----------
701  (Mac3)  0x334d        Short           tag for this extra block type ("M3")
702          TSize         Short           total data size for this block
703          BSize         Long            uncompressed finder attribute data size
704          Flags         Short           info bits
705          fdType        Byte[4]         Type of the File (4-byte string)
706          fdCreator     Byte[4]         Creator of the File (4-byte string)
707          (CType)       Short           compression type
708          (CRC)         Long            CRC value for uncompressed MacOS data
709          Attribs       variable        finder attribute data (see below)
710
711
712          Central-header version:
713
714          Value         Size            Description
715          -----         ----            -----------
716  (Mac3)  0x334d        Short           tag for this extra block type ("M3")
717          TSize         Short           total data size for this block
718          BSize         Long            uncompressed finder attribute data size
719          Flags         Short           info bits
720          fdType        Byte[4]         Type of the File (4-byte string)
721          fdCreator     Byte[4]         Creator of the File (4-byte string)
722
723          The third bit of Flags in both headers indicates whether
724          the LOCAL extra field is uncompressed (and therefore whether CType
725          and CRC are omitted):
726
727          Bits of the Flags:
728              bit 0           if set, file is a data fork; otherwise unset
729              bit 1           if set, filename will be not changed
730              bit 2           if set, Attribs is uncompressed (no CType, CRC)
731              bit 3           if set, date and times are in 64 bit
732                              if zero date and times are in 32 bit.
733              bit 4           if set, timezone offsets fields for the native
734                              Mac times are omitted (UTC support deactivated)
735              bits 5-15       reserved;
736
737
738          Attributes:
739
740          Attribs is a Mac-specific block of data in little-endian format with
741          the following structure (if compressed, uncompress it first):
742
743          Value         Size            Description
744          -----         ----            -----------
745          fdFlags       Short           Finder Flags
746          fdLocation.v  Short           Finder Icon Location
747          fdLocation.h  Short           Finder Icon Location
748          fdFldr        Short           Folder containing file
749
750          FXInfo        16 bytes        Macintosh FXInfo structure
751            FXInfo-Structure:
752                fdIconID        Short
753                fdUnused[3]     Short       unused but reserved 6 bytes
754                fdScript        Byte        Script flag and number
755                fdXFlags        Byte        More flag bits
756                fdComment       Short       Comment ID
757                fdPutAway       Long        Home Dir ID
758
759          FVersNum      Byte            file version number
760                                        may be not used by MacOS
761          ACUser        Byte            directory access rights
762
763          FlCrDat       ULong           date and time of creation
764          FlMdDat       ULong           date and time of last modification
765          FlBkDat       ULong           date and time of last backup
766            These time numbers are original Mac FileTime values (local time!).
767            Currently, date-time width is 32-bit, but future version may
768            support be 64-bit times (see flags)
769
770          CrGMTOffs     Long(signed!)   difference "local Creat. time - UTC"
771          MdGMTOffs     Long(signed!)   difference "local Modif. time - UTC"
772          BkGMTOffs     Long(signed!)   difference "local Backup time - UTC"
773            These "local time - UTC" differences (stored in seconds) may be
774            used to support timestamp adjustment after inter-timezone transfer.
775            These fields are optional; bit 4 of the flags word controls their
776            presence.
777
778          Charset       Short           TextEncodingBase (Charset)
779                                        valid for the following two fields
780
781          FullPath      variable        Path of the current file.
782                                        Zero terminated string (C-String)
783                                        Currently coded in the native Charset.
784
785          Comment       variable        Finder Comment of the current file.
786                                        Zero terminated string (C-String)
787                                        Currently coded in the native Charset.
788
789
790         -Acorn SparkFS Extra Field:
791          =========================
792
793          The following is the layout of David Pilling's SparkFS extra block
794          for Acorn RISC OS.  The local-header and central-header versions are
795          identical.  (Last Revision 19960922)
796
797          Value         Size            Description
798          -----         ----            -----------
799  (Acorn) 0x4341        Short           tag for this extra block type
800          TSize         Short           total data size for this block
801          "ARC0"        Long            extra-field signature
802          LoadAddr      Long            load address or file type
803          ExecAddr      Long            exec address
804          Attr          Long            file permissions
805          Zero          Long            reserved; always zero
806
807          The following bits of Attr are associated with the given file
808          permissions:
809
810                bit 0           user-writable ('W')
811                bit 1           user-readable ('R')
812                bit 2           reserved
813                bit 3           locked ('L')
814                bit 4           publicly writable ('w')
815                bit 5           publicly readable ('r')
816                bit 6           reserved
817                bit 7           reserved
818
819
820         -VM/CMS Extra Field:
821          ==================
822
823          The following is the layout of the file-attributes extra block for
824          VM/CMS.  The local-header and central-header versions are
825          identical.  (Last Revision 19960922)
826
827          Value         Size            Description
828          -----         ----            -----------
829 (VM/CMS) 0x4704        Short           tag for this extra block type
830          TSize         Short           total data size for this block
831          flData        variable        file attributes data
832
833          flData is an uncompressed fldata_t struct.
834
835
836         -MVS Extra Field:
837          ===============
838
839          The following is the layout of the file-attributes extra block for
840          MVS.  The local-header and central-header versions are identical.
841          (Last Revision 19960922)
842
843          Value         Size            Description
844          -----         ----            -----------
845  (MVS)   0x470f        Short           tag for this extra block type
846          TSize         Short           total data size for this block
847          flData        variable        file attributes data
848
849          flData is an uncompressed fldata_t struct.
850
851
852         -PKWARE Unix Extra Field:
853          ========================
854
855          The following is the layout of PKWARE's Unix "extra" block.
856          It was introduced with the release of PKZIP for Unix 2.50.
857          Note: all fields are stored in Intel low-byte/high-byte order.
858          (Last Revision 19980901)
859
860          This field has a minimum data size of 12 bytes and is only stored
861          as local extra field.
862
863          Value         Size            Description
864          -----         ----            -----------
865 (Unix0)  0x000d        Short           Tag for this "extra" block type
866          TSize         Short           Total Data Size for this block
867          AcTime        Long            time of last access (UTC/GMT)
868          ModTime       Long            time of last modification (UTC/GMT)
869          UID           Short           Unix user ID
870          GID           Short           Unix group ID
871          (var)         variable        Variable length data field
872
873          The variable length data field will contain file type
874          specific data.  Currently the only values allowed are
875          the original "linked to" file names for hard or symbolic links.
876
877          The fixed part of this field has the same layout as Info-ZIP's
878          abandoned "Unix1 timestamps & owner ID info" extra field;
879          only the two tag bytes are different.
880
881
882         -PATCH Descriptor Extra Field:
883          ============================
884
885          The following is the layout of the Patch Descriptor "extra"
886          block.
887
888          Note: all fields stored in Intel low-byte/high-byte order.
889
890          Value         Size            Description
891          -----         ----            -----------
892  (Patch) 0x000f        Short           Tag for this "extra" block type
893          TSize         Short           Size of the total "extra" block
894          Version       Short           Version of the descriptor
895          Flags         Long            Actions and reactions (see below)
896          OldSize       Long            Size of the file about to be patched
897          OldCRC        Long            32-bit CRC of the file about to be patched
898          NewSize       Long            Size of the resulting file
899          NewCRC        Long            32-bit CRC of the resulting file
900
901
902          Actions and reactions
903
904          Bits          Description
905          ----          ----------------
906          0             Use for autodetection
907          1             Treat as selfpatch
908          2-3           RESERVED
909          4-5           Action (see below)
910          6-7           RESERVED
911          8-9           Reaction (see below) to absent file
912          10-11         Reaction (see below) to newer file
913          12-13         Reaction (see below) to unknown file
914          14-15         RESERVED
915          16-31         RESERVED
916
917          Actions
918
919          Action       Value
920          ------       -----
921          none         0
922          add          1
923          delete       2
924          patch        3
925
926          Reactions
927
928          Reaction     Value
929          --------     -----
930          ask          0
931          skip         1
932          ignore       2
933          fail         3
934
935
936         -Extended Timestamp Extra Field:
937          ==============================
938
939          The following is the layout of the extended-timestamp extra block.
940          (Last Revision 19970118)
941
942          Local-header version:
943
944          Value         Size            Description
945          -----         ----            -----------
946  (time)  0x5455        Short           tag for this extra block type
947          TSize         Short           total data size for this block
948          Flags         Byte            info bits
949          (ModTime)     Long            time of last modification (UTC/GMT)
950          (AcTime)      Long            time of last access (UTC/GMT)
951          (CrTime)      Long            time of original creation (UTC/GMT)
952
953          Central-header version:
954
955          Value         Size            Description
956          -----         ----            -----------
957  (time)  0x5455        Short           tag for this extra block type
958          TSize         Short           total data size for this block
959          Flags         Byte            info bits (refers to local header!)
960          (ModTime)     Long            time of last modification (UTC/GMT)
961
962          The central-header extra field contains the modification time only,
963          or no timestamp at all.  TSize is used to flag its presence or
964          absence.  But note:
965
966              If "Flags" indicates that Modtime is present in the local header
967              field, it MUST be present in the central header field, too!
968              This correspondence is required because the modification time
969              value may be used to support trans-timezone freshening and
970              updating operations with zip archives.
971
972          The time values are in standard Unix signed-long format, indicating
973          the number of seconds since 1 January 1970 00:00:00.  The times
974          are relative to Coordinated Universal Time (UTC), also sometimes
975          referred to as Greenwich Mean Time (GMT).  To convert to local time,
976          the software must know the local timezone offset from UTC/GMT.
977
978          The lower three bits of Flags in both headers indicate which time-
979          stamps are present in the LOCAL extra field:
980
981                bit 0           if set, modification time is present
982                bit 1           if set, access time is present
983                bit 2           if set, creation time is present
984                bits 3-7        reserved for additional timestamps; not set
985
986          Those times that are present will appear in the order indicated, but
987          any combination of times may be omitted.  (Creation time may be
988          present without access time, for example.)  TSize should equal
989          (1 + 4*(number of set bits in Flags)), as the block is currently
990          defined.  Other timestamps may be added in the future.
991
992
993         -Info-ZIP Unix Extra Field (type 1):
994          ==================================
995
996          The following is the layout of the old Info-ZIP extra block for
997          Unix.  It has been replaced by the extended-timestamp extra block
998          (0x5455) and the Unix type 2 extra block (0x7855).
999          (Last Revision 19970118)
1000
1001          Local-header version:
1002
1003          Value         Size            Description
1004          -----         ----            -----------
1005  (Unix1) 0x5855        Short           tag for this extra block type
1006          TSize         Short           total data size for this block
1007          AcTime        Long            time of last access (UTC/GMT)
1008          ModTime       Long            time of last modification (UTC/GMT)
1009          UID           Short           Unix user ID
1010          GID           Short           Unix group ID
1011
1012          Central-header version:
1013
1014          Value         Size            Description
1015          -----         ----            -----------
1016  (Unix1) 0x5855        Short           tag for this extra block type
1017          TSize         Short           total data size for this block
1018          AcTime        Long            time of last access (GMT/UTC)
1019          ModTime       Long            time of last modification (GMT/UTC)
1020
1021          The file access and modification times are in standard Unix signed-
1022          long format, indicating the number of seconds since 1 January 1970
1023          00:00:00.  The times are relative to Coordinated Universal Time
1024          (UTC), also sometimes referred to as Greenwich Mean Time (GMT).  To
1025          convert to local time, the software must know the local timezone
1026          offset from UTC/GMT.  The modification time may be used by non-Unix
1027          systems to support inter-timezone freshening and updating of zip
1028          archives.
1029
1030          The local-header extra block may optionally contain UID and GID
1031          info for the file.  The local-header TSize value is the only
1032          indication of this.  Note that Unix UIDs and GIDs are usually
1033          specific to a particular machine, and they generally require root
1034          access to restore.
1035
1036          This extra field type is obsolete, but it has been in use since
1037          mid-1994.  Therefore future archiving software should continue to
1038          support it.  Some guidelines:
1039
1040              An archive member should either contain the old "Unix1"
1041              extra field block or the new extra field types "time" and/or
1042              "Unix2".
1043
1044              If both the old "Unix1" block type and one or both of the new
1045              block types "time" and "Unix2" are found, the "Unix1" block
1046              should be considered invalid and ignored.
1047
1048              Unarchiving software should recognize both old and new extra
1049              field block types, but the info from new types overrides the
1050              old "Unix1" field.
1051
1052              Archiving software should recognize "Unix1" extra fields for
1053              timestamp comparison but never create it for updated, freshened
1054              or new archive members.  When copying existing members to a new
1055              archive, any "Unix1" extra field blocks should be converted to
1056              the new "time" and/or "Unix2" types.
1057
1058
1059         -Info-ZIP Unix Extra Field (type 2):
1060          ==================================
1061
1062          The following is the layout of the new Info-ZIP extra block for
1063          Unix.  (Last Revision 19960922)
1064
1065          Local-header version:
1066
1067          Value         Size            Description
1068          -----         ----            -----------
1069  (Unix2) 0x7855        Short           tag for this extra block type
1070          TSize         Short           total data size for this block
1071          UID           Short           Unix user ID
1072          GID           Short           Unix group ID
1073
1074          Central-header version:
1075
1076          Value         Size            Description
1077          -----         ----            -----------
1078  (Unix2) 0x7855        Short           tag for this extra block type
1079          TSize         Short           total data size for this block
1080
1081          The data size of the central-header version is zero; it is used
1082          solely as a flag that UID/GID info is present in the local-header
1083          extra field.  If additional fields are ever added to the local
1084          version, the central version may be extended to indicate this.
1085
1086          Note that Unix UIDs and GIDs are usually specific to a particular
1087          machine, and they generally require root access to restore.
1088
1089
1090         -ASi Unix Extra Field:
1091          ====================
1092
1093          The following is the layout of the ASi extra block for Unix.  The
1094          local-header and central-header versions are identical.
1095          (Last Revision 19960916)
1096
1097          Value         Size            Description
1098          -----         ----            -----------
1099  (Unix3) 0x756e        Short           tag for this extra block type
1100          TSize         Short           total data size for this block
1101          CRC           Long            CRC-32 of the remaining data
1102          Mode          Short           file permissions
1103          SizDev        Long            symlink'd size OR major/minor dev num
1104          UID           Short           user ID
1105          GID           Short           group ID
1106          (var.)        variable        symbolic link filename
1107
1108          Mode is the standard Unix st_mode field from struct stat, containing
1109          user/group/other permissions, setuid/setgid and symlink info, etc.
1110
1111          If Mode indicates that this file is a symbolic link, SizDev is the
1112          size of the file to which the link points.  Otherwise, if the file
1113          is a device, SizDev contains the standard Unix st_rdev field from
1114          struct stat (includes the major and minor numbers of the device).
1115          SizDev is undefined in other cases.
1116
1117          If Mode indicates that the file is a symbolic link, the final field
1118          will be the name of the file to which the link points.  The file-
1119          name length can be inferred from TSize.
1120
1121          [Note that TSize may incorrectly refer to the data size not counting
1122           the CRC; i.e., it may be four bytes too small.]
1123
1124
1125         -BeOS Extra Field:
1126          ================
1127
1128          The following is the layout of the file-attributes extra block for
1129          BeOS.  (Last Revision 19970531)
1130
1131          Local-header version:
1132
1133          Value         Size            Description
1134          -----         ----            -----------
1135  (BeOS)  0x6542        Short           tag for this extra block type
1136          TSize         Short           total data size for this block
1137          BSize         Long            uncompressed file attribute data size
1138          Flags         Byte            info bits
1139          (CType)       Short           compression type
1140          (CRC)         Long            CRC value for uncompressed file attribs
1141          Attribs       variable        file attribute data
1142
1143          Central-header version:
1144
1145          Value         Size            Description
1146          -----         ----            -----------
1147  (BeOS)  0x6542        Short           tag for this extra block type
1148          TSize         Short           total data size for this block
1149          BSize         Long            size of uncompressed local EF block data
1150          Flags         Byte            info bits
1151
1152          The least significant bit of Flags in both headers indicates whether
1153          the LOCAL extra field is uncompressed (and therefore whether CType
1154          and CRC are omitted):
1155
1156                bit 0           if set, Attribs is uncompressed (no CType, CRC)
1157                bits 1-7        reserved; if set, assume error or unknown data
1158
1159          Currently the only supported compression types are deflated (type 8)
1160          and stored (type 0); the latter is not used by Info-ZIP's Zip but is
1161          supported by UnZip.
1162
1163          Attribs is a BeOS-specific block of data in big-endian format with
1164          the following structure (if compressed, uncompress it first):
1165
1166              Value     Size            Description
1167              -----     ----            -----------
1168              Name      variable        attribute name (null-terminated string)
1169              Type      Long            attribute type (32-bit unsigned integer)
1170              Size      Long Long       data size for this sub-block (64 bits)
1171              Data      variable        attribute data
1172
1173          The attribute structure is repeated for every attribute.  The Data
1174          field may contain anything--text, flags, bitmaps, etc.
1175
1176
1177         -SMS/QDOS Extra Field:
1178          ====================
1179
1180          The following is the layout of the file-attributes extra block for
1181          SMS/QDOS.  The local-header and central-header versions are identical.
1182          (Last Revision 19960929)
1183
1184          Value         Size            Description
1185          -----         ----            -----------
1186  (QDOS)  0xfb4a        Short           tag for this extra block type
1187          TSize         Short           total data size for this block
1188          LongID        Long            extra-field signature
1189          (ExtraID)     Long            additional signature/flag bytes
1190          QDirect       64 bytes        qdirect structure
1191
1192          LongID may be "QZHD" or "QDOS".  In the latter case, ExtraID will
1193          be present.  Its first three bytes are "02\0"; the last byte is
1194          currently undefined.
1195
1196          QDirect contains the file's uncompressed directory info (qdirect
1197          struct).  Its elements are in native (big-endian) format:
1198
1199          d_length      beLong          file length
1200          d_access      byte            file access type
1201          d_type        byte            file type
1202          d_datalen     beLong          data length
1203          d_reserved    beLong          unused
1204          d_szname      beShort         size of filename
1205          d_name        36 bytes        filename
1206          d_update      beLong          time of last update
1207          d_refdate     beLong          file version number
1208          d_backup      beLong          time of last backup (archive date)
1209
1210
1211         -AOS/VS Extra Field:
1212          ==================
1213
1214          The following is the layout of the extra block for Data General
1215          AOS/VS.  The local-header and central-header versions are identical.
1216          (Last Revision 19961125)
1217
1218          Value         Size            Description
1219          -----         ----            -----------
1220  (AOSVS) 0x5356        Short           tag for this extra block type
1221          TSize         Short           total data size for this block
1222          "FCI\0"       Long            extra-field signature
1223          Version       Byte            version of AOS/VS extra block (10 = 1.0)
1224          Fstat         variable        fstat packet
1225          AclBuf        variable        raw ACL data ($MXACL bytes)
1226
1227          Fstat contains the file's uncompressed fstat packet, which is one of
1228          the following:
1229
1230                normal fstat packet             (P_FSTAT struct)
1231                DIR/CPD fstat packet            (P_FSTAT_DIR struct)
1232                unit (device) fstat packet      (P_FSTAT_UNIT struct)
1233                IPC file fstat packet           (P_FSTAT_IPC struct)
1234
1235          AclBuf contains the raw ACL data; its length is $MXACL.
1236
1237
1238         -FWKCS MD5 Extra Field:
1239          =====================
1240
1241          The following is the layout of the optional extra block used by the
1242          FWKCS utility.  There is no local-header version; the following
1243          applies only to the central header.  (Last Revision 19961207)
1244
1245          Central-header version:
1246
1247          Value         Size            Description
1248          -----         ----            -----------
1249  (MD5)   0x4b46        Short           tag for this extra block type
1250          TSize         Short           total data size for this block (19)
1251          "MD5"         3 bytes         extra-field signature
1252          MD5hash       16 bytes        128-bit MD5 hash of uncompressed data
1253
1254          The MD5 hash in this extra block is used to automatically identify
1255          files independent of their filenames; it is an an enhanced contents-
1256          signature.
1257
1258          FWKCS provides an option to strip this extra field, if
1259          present, from a zipfile central directory. In adding
1260          this extra field, FWKCS preserves Zipfile Authenticity
1261          Verification; if stripping this extra field, FWKCS
1262          preserves all versions of AV through PKZIP version 2.04g.
1263
1264          ``The MD5 algorithm is being placed in the public domain for review
1265          and possible adoption as a standard.'' (Ron Rivest, MIT Laboratory
1266          for Computer Science and RSA Data Security, Inc., April 1992, RFC
1267          1321, 11.76-77).  FWKCS, and FWKCS Contents_Signature System, are
1268          trademarks of Frederick W. Kantor.
1269
1270
1271
1272      file comment: (Variable)
1273
1274          The comment for this file.
1275
1276      number of this disk: (2 bytes)
1277
1278          The number of this disk, which contains central
1279          directory end record.
1280
1281      number of the disk with the start of the central directory: (2 bytes)
1282
1283          The number of the disk on which the central
1284          directory starts.
1285
1286      total number of entries in the central dir on this disk: (2 bytes)
1287
1288          The number of central directory entries on this disk.
1289
1290      total number of entries in the central dir: (2 bytes)
1291
1292          The total number of files in the zipfile.
1293
1294
1295      size of the central directory: (4 bytes)
1296
1297          The size (in bytes) of the entire central directory.
1298
1299      offset of start of central directory with respect to
1300      the starting disk number:  (4 bytes)
1301
1302          Offset of the start of the central directory on the
1303          disk on which the central directory starts.
1304
1305      zipfile comment length: (2 bytes)
1306
1307          The length of the comment for this zipfile.
1308
1309      zipfile comment: (Variable)
1310
1311          The comment for this zipfile.
1312
1313
1314  D.  General notes:
1315
1316      1)  All fields unless otherwise noted are unsigned and stored
1317          in Intel low-byte:high-byte, low-word:high-word order.
1318
1319      2)  String fields are not null terminated, since the
1320          length is given explicitly.
1321
1322      3)  Local headers should not span disk boundaries.  Also, even
1323          though the central directory can span disk boundaries, no
1324          single record in the central directory should be split
1325          across disks.
1326
1327      4)  The entries in the central directory may not necessarily
1328          be in the same order that files appear in the zipfile.
1329
1330UnShrinking - Method 1
1331----------------------
1332
1333Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
1334with partial clearing.  The initial code size is 9 bits, and
1335the maximum code size is 13 bits.  Shrinking differs from
1336conventional Dynamic Ziv-Lempel-Welch implementations in several
1337respects:
1338
13391)  The code size is controlled by the compressor, and is not
1340    automatically increased when codes larger than the current
1341    code size are created (but not necessarily used).  When
1342    the decompressor encounters the code sequence 256
1343    (decimal) followed by 1, it should increase the code size
1344    read from the input stream to the next bit size.  No
1345    blocking of the codes is performed, so the next code at
1346    the increased size should be read from the input stream
1347    immediately after where the previous code at the smaller
1348    bit size was read.  Again, the decompressor should not
1349    increase the code size used until the sequence 256,1 is
1350    encountered.
1351
13522)  When the table becomes full, total clearing is not
1353    performed.  Rather, when the compressor emits the code
1354    sequence 256,2 (decimal), the decompressor should clear
1355    all leaf nodes from the Ziv-Lempel tree, and continue to
1356    use the current code size.  The nodes that are cleared
1357    from the Ziv-Lempel tree are then re-used, with the lowest
1358    code value re-used first, and the highest code value
1359    re-used last.  The compressor can emit the sequence 256,2
1360    at any time.
1361
1362
1363
1364Expanding - Methods 2-5
1365-----------------------
1366
1367The Reducing algorithm is actually a combination of two
1368distinct algorithms.  The first algorithm compresses repeated
1369byte sequences, and the second algorithm takes the compressed
1370stream from the first algorithm and applies a probabilistic
1371compression method.
1372
1373The probabilistic compression stores an array of 'follower
1374sets' S(j), for j=0 to 255, corresponding to each possible
1375ASCII character.  Each set contains between 0 and 32
1376characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
1377The sets are stored at the beginning of the data area for a
1378Reduced file, in reverse order, with S(255) first, and S(0)
1379last.
1380
1381The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
1382where N(j) is the size of set S(j).  N(j) can be 0, in which
1383case the follower set for S(j) is empty.  Each N(j) value is
1384encoded in 6 bits, followed by N(j) eight bit character values
1385corresponding to S(j)[0] to S(j)[N(j)-1] respectively.  If
1386N(j) is 0, then no values for S(j) are stored, and the value
1387for N(j-1) immediately follows.
1388
1389Immediately after the follower sets, is the compressed data
1390stream.  The compressed data stream can be interpreted for the
1391probabilistic decompression as follows:
1392
1393
1394let Last-Character <- 0.
1395loop until done
1396    if the follower set S(Last-Character) is empty then
1397        read 8 bits from the input stream, and copy this
1398        value to the output stream.
1399    otherwise if the follower set S(Last-Character) is non-empty then
1400        read 1 bit from the input stream.
1401        if this bit is not zero then
1402            read 8 bits from the input stream, and copy this
1403            value to the output stream.
1404        otherwise if this bit is zero then
1405            read B(N(Last-Character)) bits from the input
1406            stream, and assign this value to I.
1407            Copy the value of S(Last-Character)[I] to the
1408            output stream.
1409
1410    assign the last value placed on the output stream to
1411    Last-Character.
1412end loop
1413
1414
1415B(N(j)) is defined as the minimal number of bits required to
1416encode the value N(j)-1.
1417
1418
1419The decompressed stream from above can then be expanded to
1420re-create the original file as follows:
1421
1422
1423let State <- 0.
1424
1425loop until done
1426    read 8 bits from the input stream into C.
1427    case State of
1428        0:  if C is not equal to DLE (144 decimal) then
1429                copy C to the output stream.
1430            otherwise if C is equal to DLE then
1431                let State <- 1.
1432
1433        1:  if C is non-zero then
1434                let V <- C.
1435                let Len <- L(V)
1436                let State <- F(Len).
1437            otherwise if C is zero then
1438                copy the value 144 (decimal) to the output stream.
1439                let State <- 0
1440
1441        2:  let Len <- Len + C
1442            let State <- 3.
1443
1444        3:  move backwards D(V,C) bytes in the output stream
1445            (if this position is before the start of the output
1446            stream, then assume that all the data before the
1447            start of the output stream is filled with zeros).
1448            copy Len+3 bytes from this position to the output stream.
1449            let State <- 0.
1450    end case
1451end loop
1452
1453
1454The functions F,L, and D are dependent on the 'compression
1455factor', 1 through 4, and are defined as follows:
1456
1457For compression factor 1:
1458    L(X) equals the lower 7 bits of X.
1459    F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
1460    D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
1461For compression factor 2:
1462    L(X) equals the lower 6 bits of X.
1463    F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
1464    D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
1465For compression factor 3:
1466    L(X) equals the lower 5 bits of X.
1467    F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
1468    D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
1469For compression factor 4:
1470    L(X) equals the lower 4 bits of X.
1471    F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
1472    D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
1473
1474
1475Imploding - Method 6
1476--------------------
1477
1478The Imploding algorithm is actually a combination of two distinct
1479algorithms.  The first algorithm compresses repeated byte
1480sequences using a sliding dictionary.  The second algorithm is
1481used to compress the encoding of the sliding dictionary output,
1482using multiple Shannon-Fano trees.
1483
1484The Imploding algorithm can use a 4K or 8K sliding dictionary
1485size. The dictionary size used can be determined by bit 1 in the
1486general purpose flag word; a 0 bit indicates a 4K dictionary
1487while a 1 bit indicates an 8K dictionary.
1488
1489The Shannon-Fano trees are stored at the start of the compressed
1490file. The number of trees stored is defined by bit 2 in the
1491general purpose flag word; a 0 bit indicates two trees stored, a
14921 bit indicates three trees are stored.  If 3 trees are stored,
1493the first Shannon-Fano tree represents the encoding of the
1494Literal characters, the second tree represents the encoding of
1495the Length information, the third represents the encoding of the
1496Distance information.  When 2 Shannon-Fano trees are stored, the
1497Length tree is stored first, followed by the Distance tree.
1498
1499The Literal Shannon-Fano tree, if present is used to represent
1500the entire ASCII character set, and contains 256 values.  This
1501tree is used to compress any data not compressed by the sliding
1502dictionary algorithm.  When this tree is present, the Minimum
1503Match Length for the sliding dictionary is 3.  If this tree is
1504not present, the Minimum Match Length is 2.
1505
1506The Length Shannon-Fano tree is used to compress the Length part
1507of the (length,distance) pairs from the sliding dictionary
1508output.  The Length tree contains 64 values, ranging from the
1509Minimum Match Length, to 63 plus the Minimum Match Length.
1510
1511The Distance Shannon-Fano tree is used to compress the Distance
1512part of the (length,distance) pairs from the sliding dictionary
1513output. The Distance tree contains 64 values, ranging from 0 to
151463, representing the upper 6 bits of the distance value.  The
1515distance values themselves will be between 0 and the sliding
1516dictionary size, either 4K or 8K.
1517
1518The Shannon-Fano trees themselves are stored in a compressed
1519format. The first byte of the tree data represents the number of
1520bytes of data representing the (compressed) Shannon-Fano tree
1521minus 1.  The remaining bytes represent the Shannon-Fano tree
1522data encoded as:
1523
1524    High 4 bits: Number of values at this bit length + 1. (1 - 16)
1525    Low  4 bits: Bit Length needed to represent value + 1. (1 - 16)
1526
1527The Shannon-Fano codes can be constructed from the bit lengths
1528using the following algorithm:
1529
15301)  Sort the Bit Lengths in ascending order, while retaining the
1531    order of the original lengths stored in the file.
1532
15332)  Generate the Shannon-Fano trees:
1534
1535    Code <- 0
1536    CodeIncrement <- 0
1537    LastBitLength <- 0
1538    i <- number of Shannon-Fano codes - 1   (either 255 or 63)
1539
1540    loop while i >= 0
1541        Code = Code + CodeIncrement
1542        if BitLength(i) <> LastBitLength then
1543            LastBitLength=BitLength(i)
1544            CodeIncrement = 1 shifted left (16 - LastBitLength)
1545        ShannonCode(i) = Code
1546        i <- i - 1
1547    end loop
1548
1549
15503)  Reverse the order of all the bits in the above ShannonCode()
1551    vector, so that the most significant bit becomes the least
1552    significant bit.  For example, the value 0x1234 (hex) would
1553    become 0x2C48 (hex).
1554
15554)  Restore the order of Shannon-Fano codes as originally stored
1556    within the file.
1557
1558Example:
1559
1560    This example will show the encoding of a Shannon-Fano tree
1561    of size 8.  Notice that the actual Shannon-Fano trees used
1562    for Imploding are either 64 or 256 entries in size.
1563
1564Example:   0x02, 0x42, 0x01, 0x13
1565
1566    The first byte indicates 3 values in this table.  Decoding the
1567    bytes:
1568            0x42 = 5 codes of 3 bits long
1569            0x01 = 1 code  of 2 bits long
1570            0x13 = 2 codes of 4 bits long
1571
1572    This would generate the original bit length array of:
1573    (3, 3, 3, 3, 3, 2, 4, 4)
1574
1575    There are 8 codes in this table for the values 0 thru 7.  Using the
1576    algorithm to obtain the Shannon-Fano codes produces:
1577
1578                                  Reversed     Order     Original
1579Val  Sorted   Constructed Code      Value     Restored    Length
1580---  ------   -----------------   --------    --------    ------
15810:     2      1100000000000000        11       101          3
15821:     3      1010000000000000       101       001          3
15832:     3      1000000000000000       001       110          3
15843:     3      0110000000000000       110       010          3
15854:     3      0100000000000000       010       100          3
15865:     3      0010000000000000       100        11          2
15876:     4      0001000000000000      1000      1000          4
15887:     4      0000000000000000      0000      0000          4
1589
1590
1591The values in the Val, Order Restored and Original Length columns
1592now represent the Shannon-Fano encoding tree that can be used for
1593decoding the Shannon-Fano encoded data.  How to parse the
1594variable length Shannon-Fano values from the data stream is beyond the
1595scope of this document.  (See the references listed at the end of
1596this document for more information.)  However, traditional decoding
1597schemes used for Huffman variable length decoding, such as the
1598Greenlaw algorithm, can be successfully applied.
1599
1600The compressed data stream begins immediately after the
1601compressed Shannon-Fano data.  The compressed data stream can be
1602interpreted as follows:
1603
1604loop until done
1605    read 1 bit from input stream.
1606
1607    if this bit is non-zero then       (encoded data is literal data)
1608        if Literal Shannon-Fano tree is present
1609            read and decode character using Literal Shannon-Fano tree.
1610        otherwise
1611            read 8 bits from input stream.
1612        copy character to the output stream.
1613    otherwise                   (encoded data is sliding dictionary match)
1614        if 8K dictionary size
1615            read 7 bits for offset Distance (lower 7 bits of offset).
1616        otherwise
1617            read 6 bits for offset Distance (lower 6 bits of offset).
1618
1619        using the Distance Shannon-Fano tree, read and decode the
1620          upper 6 bits of the Distance value.
1621
1622        using the Length Shannon-Fano tree, read and decode
1623          the Length value.
1624
1625        Length <- Length + Minimum Match Length
1626
1627        if Length = 63 + Minimum Match Length
1628            read 8 bits from the input stream,
1629            add this value to Length.
1630
1631        move backwards Distance+1 bytes in the output stream, and
1632        copy Length characters from this position to the output
1633        stream.  (if this position is before the start of the output
1634        stream, then assume that all the data before the start of
1635        the output stream is filled with zeros).
1636end loop
1637
1638Tokenizing - Method 7
1639--------------------
1640
1641This method is not used by PKZIP.
1642
1643Deflating - Method 8
1644-----------------
1645
1646The Deflate algorithm is similar to the Implode algorithm using
1647a sliding dictionary of up to 32K with secondary compression
1648from Huffman/Shannon-Fano codes.
1649
1650The compressed data is stored in blocks with a header describing
1651the block and the Huffman codes used in the data block.  The header
1652format is as follows:
1653
1654   Bit 0: Last Block bit     This bit is set to 1 if this is the last
1655                             compressed block in the data.
1656   Bits 1-2: Block type
1657      00 (0) - Block is stored - All stored data is byte aligned.
1658               Skip bits until next byte, then next word = block length,
1659               followed by the ones compliment of the block length word.
1660               Remaining data in block is the stored data.
1661
1662      01 (1) - Use fixed Huffman codes for literal and distance codes.
1663               Lit Code    Bits             Dist Code   Bits
1664               ---------   ----             ---------   ----
1665                 0 - 143    8                 0 - 31      5
1666               144 - 255    9
1667               256 - 279    7
1668               280 - 287    8
1669
1670               Literal codes 286-287 and distance codes 30-31 are never
1671               used but participate in the huffman construction.
1672
1673      10 (2) - Dynamic Huffman codes.  (See expanding Huffman codes)
1674
1675      11 (3) - Reserved - Flag a "Error in compressed data" if seen.
1676
1677Expanding Huffman Codes
1678-----------------------
1679If the data block is stored with dynamic Huffman codes, the Huffman
1680codes are sent in the following compressed format:
1681
1682   5 Bits: # of Literal codes sent - 257 (257 - 286)
1683           All other codes are never sent.
1684   5 Bits: # of Dist codes - 1           (1 - 32)
1685   4 Bits: # of Bit Length codes - 4     (4 - 19)
1686
1687The Huffman codes are sent as bit lengths and the codes are built as
1688described in the implode algorithm.  The bit lengths themselves are
1689compressed with Huffman codes.  There are 19 bit length codes:
1690
1691   0 - 15: Represent bit lengths of 0 - 15
1692       16: Copy the previous bit length 3 - 6 times.
1693           The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
1694              Example:  Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
1695                        expand to 12 bit lengths of 8 (1 + 6 + 5)
1696       17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
1697       18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
1698
1699The lengths of the bit length codes are sent packed 3 bits per value
1700(0 - 7) in the following order:
1701
1702   16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
1703
1704The Huffman codes should be built as described in the Implode algorithm
1705except codes are assigned starting at the shortest bit length, i.e. the
1706shortest code should be all 0's rather than all 1's.  Also, codes with
1707a bit length of zero do not participate in the tree construction.  The
1708codes are then used to decode the bit lengths for the literal and distance
1709tables.
1710
1711The bit lengths for the literal tables are sent first with the number
1712of entries sent described by the 5 bits sent earlier.  There are up
1713to 286 literal characters; the first 256 represent the respective 8
1714bit character, code 256 represents the End-Of-Block code, the remaining
171529 codes represent copy lengths of 3 thru 258.  There are up to 30
1716distance codes representing distances from 1 thru 32k as described
1717below.
1718
1719                             Length Codes
1720                             ------------
1721      Extra             Extra              Extra              Extra
1722 Code Bits Length  Code Bits Lengths  Code Bits Lengths  Code Bits Length(s)
1723 ---- ---- ------  ---- ---- -------  ---- ---- -------  ---- ---- ---------
1724  257   0     3     265   1   11,12    273   3   35-42    281   5  131-162
1725  258   0     4     266   1   13,14    274   3   43-50    282   5  163-194
1726  259   0     5     267   1   15,16    275   3   51-58    283   5  195-226
1727  260   0     6     268   1   17,18    276   3   59-66    284   5  227-257
1728  261   0     7     269   2   19-22    277   4   67-82    285   0    258
1729  262   0     8     270   2   23-26    278   4   83-98
1730  263   0     9     271   2   27-30    279   4   99-114
1731  264   0    10     272   2   31-34    280   4  115-130
1732
1733                            Distance Codes
1734                            --------------
1735      Extra           Extra             Extra               Extra
1736 Code Bits Dist  Code Bits  Dist   Code Bits Distance  Code Bits Distance
1737 ---- ---- ----  ---- ---- ------  ---- ---- --------  ---- ---- --------
1738   0   0    1      8   3   17-24    16    7  257-384    24   11  4097-6144
1739   1   0    2      9   3   25-32    17    7  385-512    25   11  6145-8192
1740   2   0    3     10   4   33-48    18    8  513-768    26   12  8193-12288
1741   3   0    4     11   4   49-64    19    8  769-1024   27   12 12289-16384
1742   4   1   5,6    12   5   65-96    20    9 1025-1536   28   13 16385-24576
1743   5   1   7,8    13   5   97-128   21    9 1537-2048   29   13 24577-32768
1744   6   2   9-12   14   6  129-192   22   10 2049-3072
1745   7   2  13-16   15   6  193-256   23   10 3073-4096
1746
1747The compressed data stream begins immediately after the
1748compressed header data.  The compressed data stream can be
1749interpreted as follows:
1750
1751do
1752   read header from input stream.
1753
1754   if stored block
1755      skip bits until byte aligned
1756      read count and 1's compliment of count
1757      copy count bytes data block
1758   otherwise
1759      loop until end of block code sent
1760         decode literal character from input stream
1761         if literal < 256
1762            copy character to the output stream
1763         otherwise
1764            if literal = end of block
1765               break from loop
1766            otherwise
1767               decode distance from input stream
1768
1769               move backwards distance bytes in the output stream, and
1770               copy length characters from this position to the output
1771               stream.
1772      end loop
1773while not last block
1774
1775if data descriptor exists
1776   skip bits until byte aligned
1777   check data descriptor signature
1778   read crc and sizes
1779endif
1780
1781Decryption
1782----------
1783
1784The encryption used in PKZIP was generously supplied by Roger
1785Schlafly.  PKWARE is grateful to Mr. Schlafly for his expert
1786help and advice in the field of data encryption.
1787
1788PKZIP encrypts the compressed data stream.  Encrypted files must
1789be decrypted before they can be extracted.
1790
1791Each encrypted file has an extra 12 bytes stored at the start of
1792the data area defining the encryption header for that file.  The
1793encryption header is originally set to random values, and then
1794itself encrypted, using three, 32-bit keys.  The key values are
1795initialized using the supplied encryption password.  After each byte
1796is encrypted, the keys are then updated using pseudo-random number
1797generation techniques in combination with the same CRC-32 algorithm
1798used in PKZIP and described elsewhere in this document.
1799
1800The following is the basic steps required to decrypt a file:
1801
18021) Initialize the three 32-bit keys with the password.
18032) Read and decrypt the 12-byte encryption header, further
1804   initializing the encryption keys.
18053) Read and decrypt the compressed data stream using the
1806   encryption keys.
1807
1808
1809Step 1 - Initializing the encryption keys
1810-----------------------------------------
1811
1812Key(0) <- 305419896
1813Key(1) <- 591751049
1814Key(2) <- 878082192
1815
1816loop for i <- 0 to length(password)-1
1817    update_keys(password(i))
1818end loop
1819
1820
1821Where update_keys() is defined as:
1822
1823
1824update_keys(char):
1825  Key(0) <- crc32(key(0),char)
1826  Key(1) <- Key(1) + (Key(0) & 000000ffH)
1827  Key(1) <- Key(1) * 134775813 + 1
1828  Key(2) <- crc32(key(2),key(1) >> 24)
1829end update_keys
1830
1831
1832Where crc32(old_crc,char) is a routine that given a CRC value and a
1833character, returns an updated CRC value after applying the CRC-32
1834algorithm described elsewhere in this document.
1835
1836
1837Step 2 - Decrypting the encryption header
1838-----------------------------------------
1839
1840The purpose of this step is to further initialize the encryption
1841keys, based on random data, to render a plaintext attack on the
1842data ineffective.
1843
1844
1845Read the 12-byte encryption header into Buffer, in locations
1846Buffer(0) thru Buffer(11).
1847
1848loop for i <- 0 to 11
1849    C <- buffer(i) ^ decrypt_byte()
1850    update_keys(C)
1851    buffer(i) <- C
1852end loop
1853
1854
1855Where decrypt_byte() is defined as:
1856
1857
1858unsigned char decrypt_byte()
1859    local unsigned short temp
1860    temp <- Key(2) | 2
1861    decrypt_byte <- (temp * (temp ^ 1)) >> 8
1862end decrypt_byte
1863
1864
1865After the header is decrypted,  the last 1 or 2 bytes in Buffer
1866should be the high-order word/byte of the CRC for the file being
1867decrypted, stored in Intel low-byte/high-byte order, or the high-order
1868byte of the file time if bit 3 of the general purpose bit flag is set.
1869Versions of PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
1870used on versions after 2.0.  This can be used to test if the password
1871supplied is correct or not.
1872
1873
1874Step 3 - Decrypting the compressed data stream
1875----------------------------------------------
1876
1877The compressed data stream can be decrypted as follows:
1878
1879
1880loop until done
1881    read a character into C
1882    Temp <- C ^ decrypt_byte()
1883    update_keys(temp)
1884    output Temp
1885end loop
1886
1887
1888In addition to the above mentioned contributors to PKZIP and PKUNZIP,
1889I would like to extend special thanks to Robert Mahoney for suggesting
1890the extension .ZIP for this software.
1891
1892
1893References:
1894
1895    Fiala, Edward R., and Greene, Daniel H., "Data compression with
1896       finite windows",  Communications of the ACM, Volume 32, Number 4,
1897       April 1989, pages 490-505.
1898
1899    Held, Gilbert, "Data Compression, Techniques and Applications,
1900                    Hardware and Software Considerations",
1901       John Wiley & Sons, 1987.
1902
1903    Huffman, D.A., "A method for the construction of minimum-redundancy
1904       codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
1905       pages 1098-1101.
1906
1907    Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
1908       Number 10, October 1989, pages 29-37.
1909
1910    Nelson, Mark, "The Data Compression Book",  M&T Books, 1991.
1911
1912    Storer, James A., "Data Compression, Methods and Theory",
1913       Computer Science Press, 1988
1914
1915    Welch, Terry, "A Technique for High-Performance Data Compression",
1916       IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
1917
1918    Ziv, J. and Lempel, A., "A universal algorithm for sequential data
1919       compression", Communications of the ACM, Volume 30, Number 6,
1920       June 1987, pages 520-540.
1921
1922    Ziv, J. and Lempel, A., "Compression of individual sequences via
1923       variable-rate coding", IEEE Transactions on Information Theory,
1924       Volume 24, Number 5, September 1978, pages 530-536.
1925