1.\" Copyright (c) 2007 Tim Kientzle
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.Dd December 23, 2011
26.Dt CPIO 5
27.Os
28.Sh NAME
29.Nm cpio
30.Nd format of cpio archive files
31.Sh DESCRIPTION
32The
33.Nm
34archive format collects any number of files, directories, and other
35file system objects (symbolic links, device nodes, etc.) into a single
36stream of bytes.
37.Ss General Format
38Each file system object in a
39.Nm
40archive comprises a header record with basic numeric metadata
41followed by the full pathname of the entry and the file data.
42The header record stores a series of integer values that generally
43follow the fields in
44.Va struct stat .
45(See
46.Xr stat 2
47for details.)
48The variants differ primarily in how they store those integers
49(binary, octal, or hexadecimal).
50The header is followed by the pathname of the
51entry (the length of the pathname is stored in the header)
52and any file data.
53The end of the archive is indicated by a special record with
54the pathname
55.Dq TRAILER!!! .
56.Ss PWB format
57The PWB binary
58.Nm
59format is the original format, when cpio was introduced as part of the
60Programmer's Work Bench system, a variant of 6th Edition UNIX.  It
61stores numbers as 2-byte and 4-byte binary values.
62Each entry begins with a header in the following format:
63.Pp
64.Bd -literal -offset indent
65struct header_pwb_cpio {
66        short   h_magic;
67        short   h_dev;
68        short   h_ino;
69        short   h_mode;
70        short   h_uid;
71        short   h_gid;
72        short   h_nlink;
73        short   h_majmin;
74        long    h_mtime;
75        short   h_namesize;
76        long    h_filesize;
77};
78.Ed
79.Pp
80The
81.Va short
82fields here are 16-bit integer values, while the
83.Va long
84fields are 32 bit integers.  Since PWB UNIX, like the 6th Edition UNIX
85it was based on, only ran on PDP-11 computers, they
86are in PDP-endian format, which has little-endian shorts, and
87big-endian longs.  That is, the long integer whose hexadecimal
88representation is 0x12345678 would be stored in four successive bytes
89as 0x34, 0x12, 0x78, 0x56.
90The fields are as follows:
91.Bl -tag -width indent
92.It Va h_magic
93The integer value octal 070707.
94.It Va h_dev , Va h_ino
95The device and inode numbers from the disk.
96These are used by programs that read
97.Nm
98archives to determine when two entries refer to the same file.
99Programs that synthesize
100.Nm
101archives should be careful to set these to distinct values for each entry.
102.It Va h_mode
103The mode specifies both the regular permissions and the file type, and
104it also holds a couple of bits that are irrelevant to the cpio format,
105because the field is actually a raw copy of the mode field in the inode
106representing the file.  These are the IALLOC flag, which shows that
107the inode entry is in use, and the ILARG flag, which shows that the
108file it represents is large enough to have indirect blocks pointers in
109the inode.
110The mode is decoded as follows:
111.Pp
112.Bl -tag -width "MMMMMMM" -compact
113.It 0100000
114IALLOC flag - irrelevant to cpio.
115.It 0060000
116This masks the file type bits.
117.It 0040000
118File type value for directories.
119.It 0020000
120File type value for character special devices.
121.It 0060000
122File type value for block special devices.
123.It 0010000
124ILARG flag - irrelevant to cpio.
125.It 0004000
126SUID bit.
127.It 0002000
128SGID bit.
129.It 0001000
130Sticky bit.
131.It 0000777
132The lower 9 bits specify read/write/execute permissions
133for world, group, and user following standard POSIX conventions.
134.El
135.It Va h_uid , Va h_gid
136The numeric user id and group id of the owner.
137.It Va h_nlink
138The number of links to this file.
139Directories always have a value of at least two here.
140Note that hardlinked files include file data with every copy in the archive.
141.It Va h_majmin
142For block special and character special entries,
143this field contains the associated device number, with the major
144number in the high byte, and the minor number in the low byte.
145For all other entry types, it should be set to zero by writers
146and ignored by readers.
147.It Va h_mtime
148Modification time of the file, indicated as the number
149of seconds since the start of the epoch,
15000:00:00 UTC January 1, 1970.
151.It Va h_namesize
152The number of bytes in the pathname that follows the header.
153This count includes the trailing NUL byte.
154.It Va h_filesize
155The size of the file.  Note that this archive format is limited to 16
156megabyte file sizes, because PWB UNIX, like 6th Edition, only used
157an unsigned 24 bit integer for the file size internally.
158.El
159.Pp
160The pathname immediately follows the fixed header.
161If
162.Cm h_namesize
163is odd, an additional NUL byte is added after the pathname.
164The file data is then appended, again with an additional NUL
165appended if needed to get the next header at an even offset.
166.Pp
167Hardlinked files are not given special treatment;
168the full file contents are included with each copy of the
169file.
170.Ss New Binary Format
171The new binary
172.Nm
173format showed up when cpio was adopted into late 7th Edition UNIX.
174It is exactly like the PWB binary format, described above, except for
175three changes:
176.Pp
177First, UNIX now ran on more than one hardware type, so the endianness
178of 16 bit integers must be determined by observing the magic number at
179the start of the header.  The 32 bit integers are still always stored
180with the most significant word first, though, so each of those two, in
181the struct shown above, was stored as an array of two 16 bit integers,
182in the traditional order.  Those 16 bit integers, like all the others
183in the struct, were accessed using a macro that byte swapped them if
184necessary.
185.Pp
186Next, 7th Edition had more file types to store, and the IALLOC and ILARG
187flag bits were re-purposed to accommodate these.  The revised use of the
188various bits is as follows:
189.Pp
190.Bl -tag -width "MMMMMMM" -compact
191.It 0170000
192This masks the file type bits.
193.It 0140000
194File type value for sockets.
195.It 0120000
196File type value for symbolic links.
197For symbolic links, the link body is stored as file data.
198.It 0100000
199File type value for regular files.
200.It 0060000
201File type value for block special devices.
202.It 0040000
203File type value for directories.
204.It 0020000
205File type value for character special devices.
206.It 0010000
207File type value for named pipes or FIFOs.
208.It 0004000
209SUID bit.
210.It 0002000
211SGID bit.
212.It 0001000
213Sticky bit.
214.It 0000777
215The lower 9 bits specify read/write/execute permissions
216for world, group, and user following standard POSIX conventions.
217.El
218.Pp
219Finally, the file size field now represents a signed 32 bit integer in
220the underlying file system, so the maximum file size has increased to
2212 gigabytes.
222.Pp
223Note that there is no obvious way to tell which of the two binary
224formats an archive uses, other than to see which one makes more
225sense.  The typical error scenario is that a PWB format archive
226unpacked as if it were in the new format will create named sockets
227instead of directories, and then fail to unpack files that should
228go in those directories.  Running
229.Va bsdcpio -itv
230on an unknown archive will make it obvious which it is: if it's
231PWB format, directories will be listed with an 's' instead of
232a 'd' as the first character of the mode string, and the larger
233files will have a '?' in that position.
234.Ss Portable ASCII Format
235.St -susv2
236standardized an ASCII variant that is portable across all
237platforms.
238It is commonly known as the
239.Dq old character
240format or as the
241.Dq odc
242format.
243It stores the same numeric fields as the old binary format, but
244represents them as 6-character or 11-character octal values.
245.Pp
246.Bd -literal -offset indent
247struct cpio_odc_header {
248        char    c_magic[6];
249        char    c_dev[6];
250        char    c_ino[6];
251        char    c_mode[6];
252        char    c_uid[6];
253        char    c_gid[6];
254        char    c_nlink[6];
255        char    c_rdev[6];
256        char    c_mtime[11];
257        char    c_namesize[6];
258        char    c_filesize[11];
259};
260.Ed
261.Pp
262The fields are identical to those in the new binary format.
263The name and file body follow the fixed header.
264Unlike the binary formats, there is no additional padding
265after the pathname or file contents.
266If the files being archived are themselves entirely ASCII, then
267the resulting archive will be entirely ASCII, except for the
268NUL byte that terminates the name field.
269.Ss New ASCII Format
270The "new" ASCII format uses 8-byte hexadecimal fields for
271all numbers and separates device numbers into separate fields
272for major and minor numbers.
273.Pp
274.Bd -literal -offset indent
275struct cpio_newc_header {
276        char    c_magic[6];
277        char    c_ino[8];
278        char    c_mode[8];
279        char    c_uid[8];
280        char    c_gid[8];
281        char    c_nlink[8];
282        char    c_mtime[8];
283        char    c_filesize[8];
284        char    c_devmajor[8];
285        char    c_devminor[8];
286        char    c_rdevmajor[8];
287        char    c_rdevminor[8];
288        char    c_namesize[8];
289        char    c_check[8];
290};
291.Ed
292.Pp
293Except as specified below, the fields here match those specified
294for the new binary format above.
295.Bl -tag -width indent
296.It Va magic
297The string
298.Dq 070701 .
299.It Va check
300This field is always set to zero by writers and ignored by readers.
301See the next section for more details.
302.El
303.Pp
304The pathname is followed by NUL bytes so that the total size
305of the fixed header plus pathname is a multiple of four.
306Likewise, the file data is padded to a multiple of four bytes.
307Note that this format supports only 4 gigabyte files (unlike the
308older ASCII format, which supports 8 gigabyte files).
309.Pp
310In this format, hardlinked files are handled by setting the
311filesize to zero for each entry except the first one that
312appears in the archive.
313.Ss New CRC Format
314The CRC format is identical to the new ASCII format described
315in the previous section except that the magic field is set
316to
317.Dq 070702
318and the
319.Va check
320field is set to the sum of all bytes in the file data.
321This sum is computed treating all bytes as unsigned values
322and using unsigned arithmetic.
323Only the least-significant 32 bits of the sum are stored.
324.Ss HP variants
325The
326.Nm cpio
327implementation distributed with HPUX used XXXX but stored
328device numbers differently XXX.
329.Ss Other Extensions and Variants
330Sun Solaris uses additional file types to store extended file
331data, including ACLs and extended attributes, as special
332entries in cpio archives.
333.Pp
334XXX Others? XXX
335.Sh SEE ALSO
336.Xr cpio 1 ,
337.Xr tar 5
338.Sh STANDARDS
339The
340.Nm cpio
341utility is no longer a part of POSIX or the Single Unix Standard.
342It last appeared in
343.St -susv2 .
344It has been supplanted in subsequent standards by
345.Xr pax 1 .
346The portable ASCII format is currently part of the specification for the
347.Xr pax 1
348utility.
349.Sh HISTORY
350The original cpio utility was written by Dick Haight
351while working in AT&T's Unix Support Group.
352It appeared in 1977 as part of PWB/UNIX 1.0, the
353.Dq Programmer's Work Bench
354derived from
355.At v6
356that was used internally at AT&T.
357Both the new binary and old character formats were in use
358by 1980, according to the System III source released
359by SCO under their
360.Dq Ancient Unix
361license.
362The character format was adopted as part of
363.St -p1003.1-88 .
364XXX when did "newc" appear?  Who invented it?  When did HP come out with their variant?  When did Sun introduce ACLs and extended attributes? XXX
365.Sh BUGS
366The
367.Dq CRC
368format is mis-named, as it uses a simple checksum and
369not a cyclic redundancy check.
370.Pp
371The binary formats are limited to 16 bits for user id, group id,
372device, and inode numbers.  They are limited to 16 megabyte and 2
373gigabyte file sizes for the older and newer variants, respectively.
374.Pp
375The old ASCII format is limited to 18 bits for
376the user id, group id, device, and inode numbers.
377It is limited to 8 gigabyte file sizes.
378.Pp
379The new ASCII format is limited to 4 gigabyte file sizes.
380.Pp
381None of the cpio formats store user or group names,
382which are essential when moving files between systems with
383dissimilar user or group numbering.
384.Pp
385Especially when writing older cpio variants, it may be necessary
386to map actual device/inode values to synthesized values that
387fit the available fields.
388With very large filesystems, this may be necessary even for
389the newer formats.
390