1.\" Copyright (c) 2003-2007 Tim Kientzle
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/lib/libarchive/tar.5,v 1.18 2008/05/26 17:00:23 kientzle Exp $
26.\"
27.Dd May 20, 2004
28.Dt TAR 5
29.Os
30.Sh NAME
31.Nm tar
32.Nd format of tape archive files
33.Sh DESCRIPTION
34The
35.Nm
36archive format collects any number of files, directories, and other
37file system objects (symbolic links, device nodes, etc.) into a single
38stream of bytes.
39The format was originally designed to be used with
40tape drives that operate with fixed-size blocks, but is widely used as
41a general packaging mechanism.
42.Ss General Format
43A
44.Nm
45archive consists of a series of 512-byte records.
46Each file system object requires a header record which stores basic metadata
47(pathname, owner, permissions, etc.) and zero or more records containing any
48file data.
49The end of the archive is indicated by two records consisting
50entirely of zero bytes.
51.Pp
52For compatibility with tape drives that use fixed block sizes,
53programs that read or write tar files always read or write a fixed
54number of records with each I/O operation.
55These
56.Dq blocks
57are always a multiple of the record size.
58The most common block size\(emand the maximum supported by historic
59implementations\(emis 10240 bytes or 20 records.
60(Note: the terms
61.Dq block
62and
63.Dq record
64here are not entirely standard; this document follows the
65convention established by John Gilmore in documenting
66.Nm pdtar . )
67.Ss Old-Style Archive Format
68The original tar archive format has been extended many times to
69include additional information that various implementors found
70necessary.
71This section describes the variant implemented by the tar command
72included in
73.At v7 ,
74which is one of the earliest widely-used versions of the tar program.
75.Pp
76The header record for an old-style
77.Nm
78archive consists of the following:
79.Bd -literal -offset indent
80struct header_old_tar {
81	char name[100];
82	char mode[8];
83	char uid[8];
84	char gid[8];
85	char size[12];
86	char mtime[12];
87	char checksum[8];
88	char linkflag[1];
89	char linkname[100];
90	char pad[255];
91};
92.Ed
93All unused bytes in the header record are filled with nulls.
94.Bl -tag -width indent
95.It Va name
96Pathname, stored as a null-terminated string.
97Early tar implementations only stored regular files (including
98hardlinks to those files).
99One common early convention used a trailing "/" character to indicate
100a directory name, allowing directory permissions and owner information
101to be archived and restored.
102.It Va mode
103File mode, stored as an octal number in ASCII.
104.It Va uid , Va gid
105User id and group id of owner, as octal numbers in ASCII.
106.It Va size
107Size of file, as octal number in ASCII.
108For regular files only, this indicates the amount of data
109that follows the header.
110In particular, this field was ignored by early tar implementations
111when extracting hardlinks.
112Modern writers should always store a zero length for hardlink entries.
113.It Va mtime
114Modification time of file, as an octal number in ASCII.
115This indicates the number of seconds since the start of the epoch,
11600:00:00 UTC January 1, 1970.
117Note that negative values should be avoided
118here, as they are handled inconsistently.
119.It Va checksum
120Header checksum, stored as an octal number in ASCII.
121To compute the checksum, set the checksum field to all spaces,
122then sum all bytes in the header using unsigned arithmetic.
123This field should be stored as six octal digits followed by a null and a space
124character.
125Note that many early implementations of tar used signed arithmetic
126for the checksum field, which can cause interoperability problems
127when transferring archives between systems.
128Modern robust readers compute the checksum both ways and accept the
129header if either computation matches.
130.It Va linkflag , Va linkname
131In order to preserve hardlinks and conserve tape, a file
132with multiple links is only written to the archive the first
133time it is encountered.
134The next time it is encountered, the
135.Va linkflag
136is set to an ASCII
137.Sq 1
138and the
139.Va linkname
140field holds the first name under which this file appears.
141(Note that regular files have a null value in the
142.Va linkflag
143field.)
144.El
145.Pp
146Early tar implementations varied in how they terminated these fields.
147The tar command in
148.At v7
149used the following conventions (this is also documented in early BSD manpages):
150the pathname must be null-terminated;
151the mode, uid, and gid fields must end in a space and a null byte;
152the size and mtime fields must end in a space;
153the checksum is terminated by a null and a space.
154Early implementations filled the numeric fields with leading spaces.
155This seems to have been common practice until the
156.St -p1003.1-88
157standard was released.
158For best portability, modern implementations should fill the numeric
159fields with leading zeros.
160.Ss Pre-POSIX Archives
161An early draft of
162.St -p1003.1-88
163served as the basis for John Gilmore's
164.Nm pdtar
165program and many system implementations from the late 1980s
166and early 1990s.
167These archives generally follow the POSIX ustar
168format described below with the following variations:
169.Bl -bullet -compact -width indent
170.It
171The magic value is
172.Dq ustar\ \&
173(note the following space).
174The version field contains a space character followed by a null.
175.It
176The numeric fields are generally filled with leading spaces
177(not leading zeros as recommended in the final standard).
178.It
179The prefix field is often not used, limiting pathnames to
180the 100 characters of old-style archives.
181.El
182.Ss POSIX ustar Archives
183.St -p1003.1-88
184defined a standard tar file format to be read and written
185by compliant implementations of
186.Xr tar 1 .
187This format is often called the
188.Dq ustar
189format, after the magic value used
190in the header.
191(The name is an acronym for
192.Dq Unix Standard TAR . )
193It extends the historic format with new fields:
194.Bd -literal -offset indent
195struct header_posix_ustar {
196	char name[100];
197	char mode[8];
198	char uid[8];
199	char gid[8];
200	char size[12];
201	char mtime[12];
202	char checksum[8];
203	char typeflag[1];
204	char linkname[100];
205	char magic[6];
206	char version[2];
207	char uname[32];
208	char gname[32];
209	char devmajor[8];
210	char devminor[8];
211	char prefix[155];
212	char pad[12];
213};
214.Ed
215.Bl -tag -width indent
216.It Va typeflag
217Type of entry.
218POSIX extended the earlier
219.Va linkflag
220field with several new type values:
221.Bl -tag -width indent -compact
222.It Dq 0
223Regular file.
224NUL should be treated as a synonym, for compatibility purposes.
225.It Dq 1
226Hard link.
227.It Dq 2
228Symbolic link.
229.It Dq 3
230Character device node.
231.It Dq 4
232Block device node.
233.It Dq 5
234Directory.
235.It Dq 6
236FIFO node.
237.It Dq 7
238Reserved.
239.It Other
240A POSIX-compliant implementation must treat any unrecognized typeflag value
241as a regular file.
242In particular, writers should ensure that all entries
243have a valid filename so that they can be restored by readers that do not
244support the corresponding extension.
245Uppercase letters "A" through "Z" are reserved for custom extensions.
246Note that sockets and whiteout entries are not archivable.
247.El
248It is worth noting that the
249.Va size
250field, in particular, has different meanings depending on the type.
251For regular files, of course, it indicates the amount of data
252following the header.
253For directories, it may be used to indicate the total size of all
254files in the directory, for use by operating systems that pre-allocate
255directory space.
256For all other types, it should be set to zero by writers and ignored
257by readers.
258.It Va magic
259Contains the magic value
260.Dq ustar
261followed by a NUL byte to indicate that this is a POSIX standard archive.
262Full compliance requires the uname and gname fields be properly set.
263.It Va version
264Version.
265This should be
266.Dq 00
267(two copies of the ASCII digit zero) for POSIX standard archives.
268.It Va uname , Va gname
269User and group names, as null-terminated ASCII strings.
270These should be used in preference to the uid/gid values
271when they are set and the corresponding names exist on
272the system.
273.It Va devmajor , Va devminor
274Major and minor numbers for character device or block device entry.
275.It Va prefix
276First part of pathname.
277If the pathname is too long to fit in the 100 bytes provided by the standard
278format, it can be split at any
279.Pa /
280character with the first portion going here.
281If the prefix field is not empty, the reader will prepend
282the prefix value and a
283.Pa /
284character to the regular name field to obtain the full pathname.
285.El
286.Pp
287Note that all unused bytes must be set to
288.Dv NUL .
289.Pp
290Field termination is specified slightly differently by POSIX
291than by previous implementations.
292The
293.Va magic ,
294.Va uname ,
295and
296.Va gname
297fields must have a trailing
298.Dv NUL .
299The
300.Va pathname ,
301.Va linkname ,
302and
303.Va prefix
304fields must have a trailing
305.Dv NUL
306unless they fill the entire field.
307(In particular, it is possible to store a 256-character pathname if it
308happens to have a
309.Pa /
310as the 156th character.)
311POSIX requires numeric fields to be zero-padded in the front, and allows
312them to be terminated with either space or
313.Dv NUL
314characters.
315.Pp
316Currently, most tar implementations comply with the ustar
317format, occasionally extending it by adding new fields to the
318blank area at the end of the header record.
319.Ss Pax Interchange Format
320There are many attributes that cannot be portably stored in a
321POSIX ustar archive.
322.St -p1003.1-2001
323defined a
324.Dq pax interchange format
325that uses two new types of entries to hold text-formatted
326metadata that applies to following entries.
327Note that a pax interchange format archive is a ustar archive in every
328respect.
329The new data is stored in ustar-compatible archive entries that use the
330.Dq x
331or
332.Dq g
333typeflag.
334In particular, older implementations that do not fully support these
335extensions will extract the metadata into regular files, where the
336metadata can be examined as necessary.
337.Pp
338An entry in a pax interchange format archive consists of one or
339two standard ustar entries, each with its own header and data.
340The first optional entry stores the extended attributes
341for the following entry.
342This optional first entry has an "x" typeflag and a size field that
343indicates the total size of the extended attributes.
344The extended attributes themselves are stored as a series of text-format
345lines encoded in the portable UTF-8 encoding.
346Each line consists of a decimal number, a space, a key string, an equals
347sign, a value string, and a new line.
348The decimal number indicates the length of the entire line, including the
349initial length field and the trailing newline.
350An example of such a field is:
351.Dl 25 ctime=1084839148.1212\en
352Keys in all lowercase are standard keys.
353Vendors can add their own keys by prefixing them with an all uppercase
354vendor name and a period.
355Note that, unlike the historic header, numeric values are stored using
356decimal, not octal.
357A description of some common keys follows:
358.Bl -tag -width indent
359.It Cm atime , Cm ctime , Cm mtime
360File access, inode change, and modification times.
361These fields can be negative or include a decimal point and a fractional value.
362.It Cm uname , Cm uid , Cm gname , Cm gid
363User name, group name, and numeric UID and GID values.
364The user name and group name stored here are encoded in UTF8
365and can thus include non-ASCII characters.
366The UID and GID fields can be of arbitrary length.
367.It Cm linkpath
368The full path of the linked-to file.
369Note that this is encoded in UTF8 and can thus include non-ASCII characters.
370.It Cm path
371The full pathname of the entry.
372Note that this is encoded in UTF8 and can thus include non-ASCII characters.
373.It Cm realtime.* , Cm security.*
374These keys are reserved and may be used for future standardization.
375.It Cm size
376The size of the file.
377Note that there is no length limit on this field, allowing conforming
378archives to store files much larger than the historic 8GB limit.
379.It Cm SCHILY.*
380Vendor-specific attributes used by Joerg Schilling's
381.Nm star
382implementation.
383.It Cm SCHILY.acl.access , Cm SCHILY.acl.default
384Stores the access and default ACLs as textual strings in a format
385that is an extension of the format specified by POSIX.1e draft 17.
386In particular, each user or group access specification can include a fourth
387colon-separated field with the numeric UID or GID.
388This allows ACLs to be restored on systems that may not have complete
389user or group information available (such as when NIS/YP or LDAP services
390are temporarily unavailable).
391.It Cm SCHILY.devminor , Cm SCHILY.devmajor
392The full minor and major numbers for device nodes.
393.It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks
394The device number, inode number, and link count for the entry.
395In particular, note that a pax interchange format archive using Joerg
396Schilling's
397.Cm SCHILY.*
398extensions can store all of the data from
399.Va struct stat .
400.It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key
401Libarchive stores POSIX.1e-style extended attributes using
402keys of this form.
403The
404.Ar key
405value is URL-encoded:
406All non-ASCII characters and the two special characters
407.Dq =
408and
409.Dq %
410are encoded as
411.Dq %
412followed by two uppercase hexadecimal digits.
413The value of this key is the extended attribute value
414encoded in base 64.
415XXX Detail the base-64 format here XXX
416.It Cm VENDOR.*
417XXX document other vendor-specific extensions XXX
418.El
419.Pp
420Any values stored in an extended attribute override the corresponding
421values in the regular tar header.
422Note that compliant readers should ignore the regular fields when they
423are overridden.
424This is important, as existing archivers are known to store non-compliant
425values in the standard header fields in this situation.
426There are no limits on length for any of these fields.
427In particular, numeric fields can be arbitrarily large.
428All text fields are encoded in UTF8.
429Compliant writers should store only portable 7-bit ASCII characters in
430the standard ustar header and use extended
431attributes whenever a text value contains non-ASCII characters.
432.Pp
433In addition to the
434.Cm x
435entry described above, the pax interchange format
436also supports a
437.Cm g
438entry.
439The
440.Cm g
441entry is identical in format, but specifies attributes that serve as
442defaults for all subsequent archive entries.
443The
444.Cm g
445entry is not widely used.
446.Pp
447Besides the new
448.Cm x
449and
450.Cm g
451entries, the pax interchange format has a few other minor variations
452from the earlier ustar format.
453The most troubling one is that hardlinks are permitted to have
454data following them.
455This allows readers to restore any hardlink to a file without
456having to rewind the archive to find an earlier entry.
457However, it creates complications for robust readers, as it is no longer
458clear whether or not they should ignore the size field for hardlink entries.
459.Ss GNU Tar Archives
460The GNU tar program started with a pre-POSIX format similar to that
461described earlier and has extended it using several different mechanisms:
462It added new fields to the empty space in the header (some of which was later
463used by POSIX for conflicting purposes);
464it allowed the header to be continued over multiple records;
465and it defined new entries that modify following entries
466(similar in principle to the
467.Cm x
468entry described above, but each GNU special entry is single-purpose,
469unlike the general-purpose
470.Cm x
471entry).
472As a result, GNU tar archives are not POSIX compatible, although
473more lenient POSIX-compliant readers can successfully extract most
474GNU tar archives.
475.Bd -literal -offset indent
476struct header_gnu_tar {
477	char name[100];
478	char mode[8];
479	char uid[8];
480	char gid[8];
481	char size[12];
482	char mtime[12];
483	char checksum[8];
484	char typeflag[1];
485	char linkname[100];
486	char magic[6];
487	char version[2];
488	char uname[32];
489	char gname[32];
490	char devmajor[8];
491	char devminor[8];
492	char atime[12];
493	char ctime[12];
494	char offset[12];
495	char longnames[4];
496	char unused[1];
497	struct {
498		char offset[12];
499		char numbytes[12];
500	} sparse[4];
501	char isextended[1];
502	char realsize[12];
503	char pad[17];
504};
505.Ed
506.Bl -tag -width indent
507.It Va typeflag
508GNU tar uses the following special entry types, in addition to
509those defined by POSIX:
510.Bl -tag -width indent
511.It "7"
512GNU tar treats type "7" records identically to type "0" records,
513except on one obscure RTOS where they are used to indicate the
514pre-allocation of a contiguous file on disk.
515.It "D"
516This indicates a directory entry.
517Unlike the POSIX-standard "5"
518typeflag, the header is followed by data records listing the names
519of files in this directory.
520Each name is preceded by an ASCII "Y"
521if the file is stored in this archive or "N" if the file is not
522stored in this archive.
523Each name is terminated with a null, and
524an extra null marks the end of the name list.
525The purpose of this
526entry is to support incremental backups; a program restoring from
527such an archive may wish to delete files on disk that did not exist
528in the directory when the archive was made.
529.Pp
530Note that the "D" typeflag specifically violates POSIX, which requires
531that unrecognized typeflags be restored as normal files.
532In this case, restoring the "D" entry as a file could interfere
533with subsequent creation of the like-named directory.
534.It "K"
535The data for this entry is a long linkname for the following regular entry.
536.It "L"
537The data for this entry is a long pathname for the following regular entry.
538.It "M"
539This is a continuation of the last file on the previous volume.
540GNU multi-volume archives guarantee that each volume begins with a valid
541entry header.
542To ensure this, a file may be split, with part stored at the end of one volume,
543and part stored at the beginning of the next volume.
544The "M" typeflag indicates that this entry continues an existing file.
545Such entries can only occur as the first or second entry
546in an archive (the latter only if the first entry is a volume label).
547The
548.Va size
549field specifies the size of this entry.
550The
551.Va offset
552field at bytes 369-380 specifies the offset where this file fragment
553begins.
554The
555.Va realsize
556field specifies the total size of the file (which must equal
557.Va size
558plus
559.Va offset ) .
560When extracting, GNU tar checks that the header file name is the one it is
561expecting, that the header offset is in the correct sequence, and that
562the sum of offset and size is equal to realsize.
563FreeBSD's version of GNU tar does not handle the corner case of an
564archive's being continued in the middle of a long name or other
565extension header.
566.It "N"
567Type "N" records are no longer generated by GNU tar.
568They contained a
569list of files to be renamed or symlinked after extraction; this was
570originally used to support long names.
571The contents of this record
572are a text description of the operations to be done, in the form
573.Dq Rename %s to %s\en
574or
575.Dq Symlink %s to %s\en ;
576in either case, both
577filenames are escaped using K&R C syntax.
578.It "S"
579This is a
580.Dq sparse
581regular file.
582Sparse files are stored as a series of fragments.
583The header contains a list of fragment offset/length pairs.
584If more than four such entries are required, the header is
585extended as necessary with
586.Dq extra
587header extensions (an older format that is no longer used), or
588.Dq sparse
589extensions.
590.It "V"
591The
592.Va name
593field should be interpreted as a tape/volume header name.
594This entry should generally be ignored on extraction.
595.El
596.It Va magic
597The magic field holds the five characters
598.Dq ustar
599followed by a space.
600Note that POSIX ustar archives have a trailing null.
601.It Va version
602The version field holds a space character followed by a null.
603Note that POSIX ustar archives use two copies of the ASCII digit
604.Dq 0 .
605.It Va atime , Va ctime
606The time the file was last accessed and the time of
607last change of file information, stored in octal as with
608.Va mtime .
609.It Va longnames
610This field is apparently no longer used.
611.It Sparse Va offset / Va numbytes
612Each such structure specifies a single fragment of a sparse
613file.
614The two fields store values as octal numbers.
615The fragments are each padded to a multiple of 512 bytes
616in the archive.
617On extraction, the list of fragments is collected from the
618header (including any extension headers), and the data
619is then read and written to the file at appropriate offsets.
620.It Va isextended
621If this is set to non-zero, the header will be followed by additional
622.Dq sparse header
623records.
624Each such record contains information about as many as 21 additional
625sparse blocks as shown here:
626.Bd -literal -offset indent
627struct gnu_sparse_header {
628	struct {
629		char offset[12];
630		char numbytes[12];
631	} sparse[21];
632	char    isextended[1];
633	char    padding[7];
634};
635.Ed
636.It Va realsize
637A binary representation of the file's complete size, with a much larger range
638than the POSIX file size.
639In particular, with
640.Cm M
641type files, the current entry is only a portion of the file.
642In that case, the POSIX size field will indicate the size of this
643entry; the
644.Va realsize
645field will indicate the total size of the file.
646.El
647.Ss Solaris Tar
648XXX More Details Needed XXX
649.Pp
650Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an
651.Dq extended
652format that is fundamentally similar to pax interchange format,
653with the following differences:
654.Bl -bullet -compact -width indent
655.It
656Extended attributes are stored in an entry whose type is
657.Cm X ,
658not
659.Cm x ,
660as used by pax interchange format.
661The detailed format of this entry appears to be the same
662as detailed above for the
663.Cm x
664entry.
665.It
666An additional
667.Cm A
668entry is used to store an ACL for the following regular entry.
669The body of this entry contains a seven-digit octal number
670(whose value is 01000000 plus the number of ACL entries)
671followed by a zero byte, followed by the
672textual ACL description.
673.El
674.Ss Other Extensions
675One common extension, utilized by GNU tar, star, and other newer
676.Nm
677implementations, permits binary numbers in the standard numeric
678fields.
679This is flagged by setting the high bit of the first character.
680This permits 95-bit values for the length and time fields
681and 63-bit values for the uid, gid, and device numbers.
682GNU tar supports this extension for the
683length, mtime, ctime, and atime fields.
684Joerg Schilling's star program supports this extension for
685all numeric fields.
686Note that this extension is largely obsoleted by the extended attribute
687record provided by the pax interchange format.
688.Pp
689Another early GNU extension allowed base-64 values rather
690than octal.
691This extension was short-lived and such archives are almost never seen.
692However, there is still code in GNU tar to support them; this code is
693responsible for a very cryptic warning message that is sometimes seen when
694GNU tar encounters a damaged archive.
695.Sh SEE ALSO
696.Xr ar 1 ,
697.Xr pax 1 ,
698.Xr tar 1
699.Sh STANDARDS
700The
701.Nm tar
702utility is no longer a part of POSIX or the Single Unix Standard.
703It last appeared in
704.St -susv2 .
705It has been supplanted in subsequent standards by
706.Xr pax 1 .
707The ustar format is currently part of the specification for the
708.Xr pax 1
709utility.
710The pax interchange file format is new with
711.St -p1003.1-2001 .
712.Sh HISTORY
713A
714.Nm tar
715command appeared in Seventh Edition Unix, which was released in January, 1979.
716It replaced the
717.Nm tp
718program from Fourth Edition Unix which in turn replaced the
719.Nm tap
720program from First Edition Unix.
721John Gilmore's
722.Nm pdtar
723public-domain implementation (circa 1987) was highly influential
724and formed the basis of
725.Nm GNU tar .
726Joerg Shilling's
727.Nm star
728archiver is another open-source (GPL) archiver (originally developed
729circa 1985) which features complete support for pax interchange
730format.
731