1.\" Copyright (c) 2003-2007 Tim Kientzle 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice, this list of conditions and the following disclaimer. 9.\" 2. Redistributions in binary form must reproduce the above copyright 10.\" notice, this list of conditions and the following disclaimer in the 11.\" documentation and/or other materials provided with the distribution. 12.\" 13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 16.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 23.\" SUCH DAMAGE. 24.\" 25.\" $FreeBSD: src/lib/libarchive/tar.5,v 1.18 2008/05/26 17:00:23 kientzle Exp $ 26.\" 27.Dd May 20, 2004 28.Dt TAR 5 29.Os 30.Sh NAME 31.Nm tar 32.Nd format of tape archive files 33.Sh DESCRIPTION 34The 35.Nm 36archive format collects any number of files, directories, and other 37file system objects (symbolic links, device nodes, etc.) into a single 38stream of bytes. 39The format was originally designed to be used with 40tape drives that operate with fixed-size blocks, but is widely used as 41a general packaging mechanism. 42.Ss General Format 43A 44.Nm 45archive consists of a series of 512-byte records. 46Each file system object requires a header record which stores basic metadata 47(pathname, owner, permissions, etc.) and zero or more records containing any 48file data. 49The end of the archive is indicated by two records consisting 50entirely of zero bytes. 51.Pp 52For compatibility with tape drives that use fixed block sizes, 53programs that read or write tar files always read or write a fixed 54number of records with each I/O operation. 55These 56.Dq blocks 57are always a multiple of the record size. 58The most common block size\(emand the maximum supported by historic 59implementations\(emis 10240 bytes or 20 records. 60(Note: the terms 61.Dq block 62and 63.Dq record 64here are not entirely standard; this document follows the 65convention established by John Gilmore in documenting 66.Nm pdtar . ) 67.Ss Old-Style Archive Format 68The original tar archive format has been extended many times to 69include additional information that various implementors found 70necessary. 71This section describes the variant implemented by the tar command 72included in 73.At v7 , 74which is one of the earliest widely-used versions of the tar program. 75.Pp 76The header record for an old-style 77.Nm 78archive consists of the following: 79.Bd -literal -offset indent 80struct header_old_tar { 81 char name[100]; 82 char mode[8]; 83 char uid[8]; 84 char gid[8]; 85 char size[12]; 86 char mtime[12]; 87 char checksum[8]; 88 char linkflag[1]; 89 char linkname[100]; 90 char pad[255]; 91}; 92.Ed 93All unused bytes in the header record are filled with nulls. 94.Bl -tag -width indent 95.It Va name 96Pathname, stored as a null-terminated string. 97Early tar implementations only stored regular files (including 98hardlinks to those files). 99One common early convention used a trailing "/" character to indicate 100a directory name, allowing directory permissions and owner information 101to be archived and restored. 102.It Va mode 103File mode, stored as an octal number in ASCII. 104.It Va uid , Va gid 105User id and group id of owner, as octal numbers in ASCII. 106.It Va size 107Size of file, as octal number in ASCII. 108For regular files only, this indicates the amount of data 109that follows the header. 110In particular, this field was ignored by early tar implementations 111when extracting hardlinks. 112Modern writers should always store a zero length for hardlink entries. 113.It Va mtime 114Modification time of file, as an octal number in ASCII. 115This indicates the number of seconds since the start of the epoch, 11600:00:00 UTC January 1, 1970. 117Note that negative values should be avoided 118here, as they are handled inconsistently. 119.It Va checksum 120Header checksum, stored as an octal number in ASCII. 121To compute the checksum, set the checksum field to all spaces, 122then sum all bytes in the header using unsigned arithmetic. 123This field should be stored as six octal digits followed by a null and a space 124character. 125Note that many early implementations of tar used signed arithmetic 126for the checksum field, which can cause interoperability problems 127when transferring archives between systems. 128Modern robust readers compute the checksum both ways and accept the 129header if either computation matches. 130.It Va linkflag , Va linkname 131In order to preserve hardlinks and conserve tape, a file 132with multiple links is only written to the archive the first 133time it is encountered. 134The next time it is encountered, the 135.Va linkflag 136is set to an ASCII 137.Sq 1 138and the 139.Va linkname 140field holds the first name under which this file appears. 141(Note that regular files have a null value in the 142.Va linkflag 143field.) 144.El 145.Pp 146Early tar implementations varied in how they terminated these fields. 147The tar command in 148.At v7 149used the following conventions (this is also documented in early BSD manpages): 150the pathname must be null-terminated; 151the mode, uid, and gid fields must end in a space and a null byte; 152the size and mtime fields must end in a space; 153the checksum is terminated by a null and a space. 154Early implementations filled the numeric fields with leading spaces. 155This seems to have been common practice until the 156.St -p1003.1-88 157standard was released. 158For best portability, modern implementations should fill the numeric 159fields with leading zeros. 160.Ss Pre-POSIX Archives 161An early draft of 162.St -p1003.1-88 163served as the basis for John Gilmore's 164.Nm pdtar 165program and many system implementations from the late 1980s 166and early 1990s. 167These archives generally follow the POSIX ustar 168format described below with the following variations: 169.Bl -bullet -compact -width indent 170.It 171The magic value is 172.Dq ustar\ \& 173(note the following space). 174The version field contains a space character followed by a null. 175.It 176The numeric fields are generally filled with leading spaces 177(not leading zeros as recommended in the final standard). 178.It 179The prefix field is often not used, limiting pathnames to 180the 100 characters of old-style archives. 181.El 182.Ss POSIX ustar Archives 183.St -p1003.1-88 184defined a standard tar file format to be read and written 185by compliant implementations of 186.Xr tar 1 . 187This format is often called the 188.Dq ustar 189format, after the magic value used 190in the header. 191(The name is an acronym for 192.Dq Unix Standard TAR . ) 193It extends the historic format with new fields: 194.Bd -literal -offset indent 195struct header_posix_ustar { 196 char name[100]; 197 char mode[8]; 198 char uid[8]; 199 char gid[8]; 200 char size[12]; 201 char mtime[12]; 202 char checksum[8]; 203 char typeflag[1]; 204 char linkname[100]; 205 char magic[6]; 206 char version[2]; 207 char uname[32]; 208 char gname[32]; 209 char devmajor[8]; 210 char devminor[8]; 211 char prefix[155]; 212 char pad[12]; 213}; 214.Ed 215.Bl -tag -width indent 216.It Va typeflag 217Type of entry. 218POSIX extended the earlier 219.Va linkflag 220field with several new type values: 221.Bl -tag -width indent -compact 222.It Dq 0 223Regular file. 224NUL should be treated as a synonym, for compatibility purposes. 225.It Dq 1 226Hard link. 227.It Dq 2 228Symbolic link. 229.It Dq 3 230Character device node. 231.It Dq 4 232Block device node. 233.It Dq 5 234Directory. 235.It Dq 6 236FIFO node. 237.It Dq 7 238Reserved. 239.It Other 240A POSIX-compliant implementation must treat any unrecognized typeflag value 241as a regular file. 242In particular, writers should ensure that all entries 243have a valid filename so that they can be restored by readers that do not 244support the corresponding extension. 245Uppercase letters "A" through "Z" are reserved for custom extensions. 246Note that sockets and whiteout entries are not archivable. 247.El 248It is worth noting that the 249.Va size 250field, in particular, has different meanings depending on the type. 251For regular files, of course, it indicates the amount of data 252following the header. 253For directories, it may be used to indicate the total size of all 254files in the directory, for use by operating systems that pre-allocate 255directory space. 256For all other types, it should be set to zero by writers and ignored 257by readers. 258.It Va magic 259Contains the magic value 260.Dq ustar 261followed by a NUL byte to indicate that this is a POSIX standard archive. 262Full compliance requires the uname and gname fields be properly set. 263.It Va version 264Version. 265This should be 266.Dq 00 267(two copies of the ASCII digit zero) for POSIX standard archives. 268.It Va uname , Va gname 269User and group names, as null-terminated ASCII strings. 270These should be used in preference to the uid/gid values 271when they are set and the corresponding names exist on 272the system. 273.It Va devmajor , Va devminor 274Major and minor numbers for character device or block device entry. 275.It Va prefix 276First part of pathname. 277If the pathname is too long to fit in the 100 bytes provided by the standard 278format, it can be split at any 279.Pa / 280character with the first portion going here. 281If the prefix field is not empty, the reader will prepend 282the prefix value and a 283.Pa / 284character to the regular name field to obtain the full pathname. 285.El 286.Pp 287Note that all unused bytes must be set to 288.Dv NUL . 289.Pp 290Field termination is specified slightly differently by POSIX 291than by previous implementations. 292The 293.Va magic , 294.Va uname , 295and 296.Va gname 297fields must have a trailing 298.Dv NUL . 299The 300.Va pathname , 301.Va linkname , 302and 303.Va prefix 304fields must have a trailing 305.Dv NUL 306unless they fill the entire field. 307(In particular, it is possible to store a 256-character pathname if it 308happens to have a 309.Pa / 310as the 156th character.) 311POSIX requires numeric fields to be zero-padded in the front, and allows 312them to be terminated with either space or 313.Dv NUL 314characters. 315.Pp 316Currently, most tar implementations comply with the ustar 317format, occasionally extending it by adding new fields to the 318blank area at the end of the header record. 319.Ss Pax Interchange Format 320There are many attributes that cannot be portably stored in a 321POSIX ustar archive. 322.St -p1003.1-2001 323defined a 324.Dq pax interchange format 325that uses two new types of entries to hold text-formatted 326metadata that applies to following entries. 327Note that a pax interchange format archive is a ustar archive in every 328respect. 329The new data is stored in ustar-compatible archive entries that use the 330.Dq x 331or 332.Dq g 333typeflag. 334In particular, older implementations that do not fully support these 335extensions will extract the metadata into regular files, where the 336metadata can be examined as necessary. 337.Pp 338An entry in a pax interchange format archive consists of one or 339two standard ustar entries, each with its own header and data. 340The first optional entry stores the extended attributes 341for the following entry. 342This optional first entry has an "x" typeflag and a size field that 343indicates the total size of the extended attributes. 344The extended attributes themselves are stored as a series of text-format 345lines encoded in the portable UTF-8 encoding. 346Each line consists of a decimal number, a space, a key string, an equals 347sign, a value string, and a new line. 348The decimal number indicates the length of the entire line, including the 349initial length field and the trailing newline. 350An example of such a field is: 351.Dl 25 ctime=1084839148.1212\en 352Keys in all lowercase are standard keys. 353Vendors can add their own keys by prefixing them with an all uppercase 354vendor name and a period. 355Note that, unlike the historic header, numeric values are stored using 356decimal, not octal. 357A description of some common keys follows: 358.Bl -tag -width indent 359.It Cm atime , Cm ctime , Cm mtime 360File access, inode change, and modification times. 361These fields can be negative or include a decimal point and a fractional value. 362.It Cm uname , Cm uid , Cm gname , Cm gid 363User name, group name, and numeric UID and GID values. 364The user name and group name stored here are encoded in UTF8 365and can thus include non-ASCII characters. 366The UID and GID fields can be of arbitrary length. 367.It Cm linkpath 368The full path of the linked-to file. 369Note that this is encoded in UTF8 and can thus include non-ASCII characters. 370.It Cm path 371The full pathname of the entry. 372Note that this is encoded in UTF8 and can thus include non-ASCII characters. 373.It Cm realtime.* , Cm security.* 374These keys are reserved and may be used for future standardization. 375.It Cm size 376The size of the file. 377Note that there is no length limit on this field, allowing conforming 378archives to store files much larger than the historic 8GB limit. 379.It Cm SCHILY.* 380Vendor-specific attributes used by Joerg Schilling's 381.Nm star 382implementation. 383.It Cm SCHILY.acl.access , Cm SCHILY.acl.default 384Stores the access and default ACLs as textual strings in a format 385that is an extension of the format specified by POSIX.1e draft 17. 386In particular, each user or group access specification can include a fourth 387colon-separated field with the numeric UID or GID. 388This allows ACLs to be restored on systems that may not have complete 389user or group information available (such as when NIS/YP or LDAP services 390are temporarily unavailable). 391.It Cm SCHILY.devminor , Cm SCHILY.devmajor 392The full minor and major numbers for device nodes. 393.It Cm SCHILY.dev, Cm SCHILY.ino , Cm SCHILY.nlinks 394The device number, inode number, and link count for the entry. 395In particular, note that a pax interchange format archive using Joerg 396Schilling's 397.Cm SCHILY.* 398extensions can store all of the data from 399.Va struct stat . 400.It Cm LIBARCHIVE.xattr. Ns Ar namespace Ns . Ns Ar key 401Libarchive stores POSIX.1e-style extended attributes using 402keys of this form. 403The 404.Ar key 405value is URL-encoded: 406All non-ASCII characters and the two special characters 407.Dq = 408and 409.Dq % 410are encoded as 411.Dq % 412followed by two uppercase hexadecimal digits. 413The value of this key is the extended attribute value 414encoded in base 64. 415XXX Detail the base-64 format here XXX 416.It Cm VENDOR.* 417XXX document other vendor-specific extensions XXX 418.El 419.Pp 420Any values stored in an extended attribute override the corresponding 421values in the regular tar header. 422Note that compliant readers should ignore the regular fields when they 423are overridden. 424This is important, as existing archivers are known to store non-compliant 425values in the standard header fields in this situation. 426There are no limits on length for any of these fields. 427In particular, numeric fields can be arbitrarily large. 428All text fields are encoded in UTF8. 429Compliant writers should store only portable 7-bit ASCII characters in 430the standard ustar header and use extended 431attributes whenever a text value contains non-ASCII characters. 432.Pp 433In addition to the 434.Cm x 435entry described above, the pax interchange format 436also supports a 437.Cm g 438entry. 439The 440.Cm g 441entry is identical in format, but specifies attributes that serve as 442defaults for all subsequent archive entries. 443The 444.Cm g 445entry is not widely used. 446.Pp 447Besides the new 448.Cm x 449and 450.Cm g 451entries, the pax interchange format has a few other minor variations 452from the earlier ustar format. 453The most troubling one is that hardlinks are permitted to have 454data following them. 455This allows readers to restore any hardlink to a file without 456having to rewind the archive to find an earlier entry. 457However, it creates complications for robust readers, as it is no longer 458clear whether or not they should ignore the size field for hardlink entries. 459.Ss GNU Tar Archives 460The GNU tar program started with a pre-POSIX format similar to that 461described earlier and has extended it using several different mechanisms: 462It added new fields to the empty space in the header (some of which was later 463used by POSIX for conflicting purposes); 464it allowed the header to be continued over multiple records; 465and it defined new entries that modify following entries 466(similar in principle to the 467.Cm x 468entry described above, but each GNU special entry is single-purpose, 469unlike the general-purpose 470.Cm x 471entry). 472As a result, GNU tar archives are not POSIX compatible, although 473more lenient POSIX-compliant readers can successfully extract most 474GNU tar archives. 475.Bd -literal -offset indent 476struct header_gnu_tar { 477 char name[100]; 478 char mode[8]; 479 char uid[8]; 480 char gid[8]; 481 char size[12]; 482 char mtime[12]; 483 char checksum[8]; 484 char typeflag[1]; 485 char linkname[100]; 486 char magic[6]; 487 char version[2]; 488 char uname[32]; 489 char gname[32]; 490 char devmajor[8]; 491 char devminor[8]; 492 char atime[12]; 493 char ctime[12]; 494 char offset[12]; 495 char longnames[4]; 496 char unused[1]; 497 struct { 498 char offset[12]; 499 char numbytes[12]; 500 } sparse[4]; 501 char isextended[1]; 502 char realsize[12]; 503 char pad[17]; 504}; 505.Ed 506.Bl -tag -width indent 507.It Va typeflag 508GNU tar uses the following special entry types, in addition to 509those defined by POSIX: 510.Bl -tag -width indent 511.It "7" 512GNU tar treats type "7" records identically to type "0" records, 513except on one obscure RTOS where they are used to indicate the 514pre-allocation of a contiguous file on disk. 515.It "D" 516This indicates a directory entry. 517Unlike the POSIX-standard "5" 518typeflag, the header is followed by data records listing the names 519of files in this directory. 520Each name is preceded by an ASCII "Y" 521if the file is stored in this archive or "N" if the file is not 522stored in this archive. 523Each name is terminated with a null, and 524an extra null marks the end of the name list. 525The purpose of this 526entry is to support incremental backups; a program restoring from 527such an archive may wish to delete files on disk that did not exist 528in the directory when the archive was made. 529.Pp 530Note that the "D" typeflag specifically violates POSIX, which requires 531that unrecognized typeflags be restored as normal files. 532In this case, restoring the "D" entry as a file could interfere 533with subsequent creation of the like-named directory. 534.It "K" 535The data for this entry is a long linkname for the following regular entry. 536.It "L" 537The data for this entry is a long pathname for the following regular entry. 538.It "M" 539This is a continuation of the last file on the previous volume. 540GNU multi-volume archives guarantee that each volume begins with a valid 541entry header. 542To ensure this, a file may be split, with part stored at the end of one volume, 543and part stored at the beginning of the next volume. 544The "M" typeflag indicates that this entry continues an existing file. 545Such entries can only occur as the first or second entry 546in an archive (the latter only if the first entry is a volume label). 547The 548.Va size 549field specifies the size of this entry. 550The 551.Va offset 552field at bytes 369-380 specifies the offset where this file fragment 553begins. 554The 555.Va realsize 556field specifies the total size of the file (which must equal 557.Va size 558plus 559.Va offset ) . 560When extracting, GNU tar checks that the header file name is the one it is 561expecting, that the header offset is in the correct sequence, and that 562the sum of offset and size is equal to realsize. 563FreeBSD's version of GNU tar does not handle the corner case of an 564archive's being continued in the middle of a long name or other 565extension header. 566.It "N" 567Type "N" records are no longer generated by GNU tar. 568They contained a 569list of files to be renamed or symlinked after extraction; this was 570originally used to support long names. 571The contents of this record 572are a text description of the operations to be done, in the form 573.Dq Rename %s to %s\en 574or 575.Dq Symlink %s to %s\en ; 576in either case, both 577filenames are escaped using K&R C syntax. 578.It "S" 579This is a 580.Dq sparse 581regular file. 582Sparse files are stored as a series of fragments. 583The header contains a list of fragment offset/length pairs. 584If more than four such entries are required, the header is 585extended as necessary with 586.Dq extra 587header extensions (an older format that is no longer used), or 588.Dq sparse 589extensions. 590.It "V" 591The 592.Va name 593field should be interpreted as a tape/volume header name. 594This entry should generally be ignored on extraction. 595.El 596.It Va magic 597The magic field holds the five characters 598.Dq ustar 599followed by a space. 600Note that POSIX ustar archives have a trailing null. 601.It Va version 602The version field holds a space character followed by a null. 603Note that POSIX ustar archives use two copies of the ASCII digit 604.Dq 0 . 605.It Va atime , Va ctime 606The time the file was last accessed and the time of 607last change of file information, stored in octal as with 608.Va mtime . 609.It Va longnames 610This field is apparently no longer used. 611.It Sparse Va offset / Va numbytes 612Each such structure specifies a single fragment of a sparse 613file. 614The two fields store values as octal numbers. 615The fragments are each padded to a multiple of 512 bytes 616in the archive. 617On extraction, the list of fragments is collected from the 618header (including any extension headers), and the data 619is then read and written to the file at appropriate offsets. 620.It Va isextended 621If this is set to non-zero, the header will be followed by additional 622.Dq sparse header 623records. 624Each such record contains information about as many as 21 additional 625sparse blocks as shown here: 626.Bd -literal -offset indent 627struct gnu_sparse_header { 628 struct { 629 char offset[12]; 630 char numbytes[12]; 631 } sparse[21]; 632 char isextended[1]; 633 char padding[7]; 634}; 635.Ed 636.It Va realsize 637A binary representation of the file's complete size, with a much larger range 638than the POSIX file size. 639In particular, with 640.Cm M 641type files, the current entry is only a portion of the file. 642In that case, the POSIX size field will indicate the size of this 643entry; the 644.Va realsize 645field will indicate the total size of the file. 646.El 647.Ss Solaris Tar 648XXX More Details Needed XXX 649.Pp 650Solaris tar (beginning with SunOS XXX 5.7 ?? XXX) supports an 651.Dq extended 652format that is fundamentally similar to pax interchange format, 653with the following differences: 654.Bl -bullet -compact -width indent 655.It 656Extended attributes are stored in an entry whose type is 657.Cm X , 658not 659.Cm x , 660as used by pax interchange format. 661The detailed format of this entry appears to be the same 662as detailed above for the 663.Cm x 664entry. 665.It 666An additional 667.Cm A 668entry is used to store an ACL for the following regular entry. 669The body of this entry contains a seven-digit octal number 670(whose value is 01000000 plus the number of ACL entries) 671followed by a zero byte, followed by the 672textual ACL description. 673.El 674.Ss Other Extensions 675One common extension, utilized by GNU tar, star, and other newer 676.Nm 677implementations, permits binary numbers in the standard numeric 678fields. 679This is flagged by setting the high bit of the first character. 680This permits 95-bit values for the length and time fields 681and 63-bit values for the uid, gid, and device numbers. 682GNU tar supports this extension for the 683length, mtime, ctime, and atime fields. 684Joerg Schilling's star program supports this extension for 685all numeric fields. 686Note that this extension is largely obsoleted by the extended attribute 687record provided by the pax interchange format. 688.Pp 689Another early GNU extension allowed base-64 values rather 690than octal. 691This extension was short-lived and such archives are almost never seen. 692However, there is still code in GNU tar to support them; this code is 693responsible for a very cryptic warning message that is sometimes seen when 694GNU tar encounters a damaged archive. 695.Sh SEE ALSO 696.Xr ar 1 , 697.Xr pax 1 , 698.Xr tar 1 699.Sh STANDARDS 700The 701.Nm tar 702utility is no longer a part of POSIX or the Single Unix Standard. 703It last appeared in 704.St -susv2 . 705It has been supplanted in subsequent standards by 706.Xr pax 1 . 707The ustar format is currently part of the specification for the 708.Xr pax 1 709utility. 710The pax interchange file format is new with 711.St -p1003.1-2001 . 712.Sh HISTORY 713A 714.Nm tar 715command appeared in Seventh Edition Unix, which was released in January, 1979. 716It replaced the 717.Nm tp 718program from Fourth Edition Unix which in turn replaced the 719.Nm tap 720program from First Edition Unix. 721John Gilmore's 722.Nm pdtar 723public-domain implementation (circa 1987) was highly influential 724and formed the basis of 725.Nm GNU tar . 726Joerg Shilling's 727.Nm star 728archiver is another open-source (GPL) archiver (originally developed 729circa 1985) which features complete support for pax interchange 730format. 731