1.\" Copyright (c) 2007 Tim Kientzle
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\"
13.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
14.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
15.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
16.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
17.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
18.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
19.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
20.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
21.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
22.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
23.\" SUCH DAMAGE.
24.\"
25.\" $FreeBSD: src/lib/libarchive/cpio.5,v 1.2 2008/05/26 17:00:23 kientzle Exp $
26.\"
27.Dd October 5, 2007
28.Dt CPIO 5
29.Os
30.Sh NAME
31.Nm cpio
32.Nd format of cpio archive files
33.Sh DESCRIPTION
34The
35.Nm
36archive format collects any number of files, directories, and other
37file system objects (symbolic links, device nodes, etc.) into a single
38stream of bytes.
39.Ss General Format
40Each file system object in a
41.Nm
42archive comprises a header record with basic numeric metadata
43followed by the full pathname of the entry and the file data.
44The header record stores a series of integer values that generally
45follow the fields in
46.Va struct stat .
47(See
48.Xr stat 2
49for details.)
50The variants differ primarily in how they store those integers
51(binary, octal, or hexadecimal).
52The header is followed by the pathname of the
53entry (the length of the pathname is stored in the header)
54and any file data.
55The end of the archive is indicated by a special record with
56the pathname
57.Dq TRAILER!!! .
58.Ss PWB format
59XXX Any documentation of the original PWB/UNIX 1.0 format? XXX
60.Ss Old Binary Format
61The old binary
62.Nm
63format stores numbers as 2-byte and 4-byte binary values.
64Each entry begins with a header in the following format:
65.Bd -literal -offset indent
66struct header_old_cpio {
67        unsigned short   c_magic;
68        unsigned short   c_dev;
69        unsigned short   c_ino;
70        unsigned short   c_mode;
71        unsigned short   c_uid;
72        unsigned short   c_gid;
73        unsigned short   c_nlink;
74        unsigned short   c_rdev;
75	unsigned short   c_mtime[2];
76        unsigned short   c_namesize;
77	unsigned short   c_filesize[2];
78};
79.Ed
80.Pp
81The
82.Va unsigned short
83fields here are 16-bit integer values; the
84.Va unsigned int
85fields are 32-bit integer values.
86The fields are as follows
87.Bl -tag -width indent
88.It Va magic
89The integer value octal 070707.
90This value can be used to determine whether this archive is
91written with little-endian or big-endian integers.
92.It Va dev , Va ino
93The device and inode numbers from the disk.
94These are used by programs that read
95.Nm
96archives to determine when two entries refer to the same file.
97Programs that synthesize
98.Nm
99archives should be careful to set these to distinct values for each entry.
100.It Va mode
101The mode specifies both the regular permissions and the file type.
102It consists of several bit fields as follows:
103.Bl -tag -width "MMMMMMM" -compact
104.It 0170000
105This masks the file type bits.
106.It 0140000
107File type value for sockets.
108.It 0120000
109File type value for symbolic links.
110For symbolic links, the link body is stored as file data.
111.It 0100000
112File type value for regular files.
113.It 0060000
114File type value for block special devices.
115.It 0040000
116File type value for directories.
117.It 0020000
118File type value for character special devices.
119.It 0010000
120File type value for named pipes or FIFOs.
121.It 0004000
122SUID bit.
123.It 0002000
124SGID bit.
125.It 0001000
126Sticky bit.
127On some systems, this modifies the behavior of executables and/or directories.
128.It 0000777
129The lower 9 bits specify read/write/execute permissions
130for world, group, and user following standard POSIX conventions.
131.El
132.It Va uid , Va gid
133The numeric user id and group id of the owner.
134.It Va nlink
135The number of links to this file.
136Directories always have a value of at least two here.
137Note that hardlinked files include file data with every copy in the archive.
138.It Va rdev
139For block special and character special entries,
140this field contains the associated device number.
141For all other entry types, it should be set to zero by writers
142and ignored by readers.
143.It Va mtime
144Modification time of the file, indicated as the number
145of seconds since the start of the epoch,
14600:00:00 UTC January 1, 1970.
147The four-byte integer is stored with the most-significant 16 bits first
148followed by the least-significant 16 bits.
149Each of the two 16 bit values are stored in machine-native byte order.
150.It Va namesize
151The number of bytes in the pathname that follows the header.
152This count includes the trailing NUL byte.
153.It Va filesize
154The size of the file.
155Note that this archive format is limited to
156four gigabyte file sizes.
157See
158.Va mtime
159above for a description of the storage of four-byte integers.
160.El
161.Pp
162The pathname immediately follows the fixed header.
163If the
164.Cm namesize
165is odd, an additional NUL byte is added after the pathname.
166The file data is then appended, padded with NUL
167bytes to an even length.
168.Pp
169Hardlinked files are not given special treatment;
170the full file contents are included with each copy of the
171file.
172.Ss Portable ASCII Format
173.St -susv2
174standardized an ASCII variant that is portable across all
175platforms.
176It is commonly known as the
177.Dq old character
178format or as the
179.Dq odc
180format.
181It stores the same numeric fields as the old binary format, but
182represents them as 6-character or 11-character octal values.
183.Bd -literal -offset indent
184struct cpio_odc_header {
185        char    c_magic[6];
186        char    c_dev[6];
187        char    c_ino[6];
188        char    c_mode[6];
189        char    c_uid[6];
190        char    c_gid[6];
191        char    c_nlink[6];
192        char    c_rdev[6];
193        char    c_mtime[11];
194        char    c_namesize[6];
195        char    c_filesize[11];
196};
197.Ed
198.Pp
199The fields are identical to those in the old binary format.
200The name and file body follow the fixed header.
201Unlike the old binary format, there is no additional padding
202after the pathname or file contents.
203If the files being archived are themselves entirely ASCII, then
204the resulting archive will be entirely ASCII, except for the
205NUL byte that terminates the name field.
206.Ss New ASCII Format
207The "new" ASCII format uses 8-byte hexadecimal fields for
208all numbers and separates device numbers into separate fields
209for major and minor numbers.
210.Bd -literal -offset indent
211struct cpio_newc_header {
212        char    c_magic[6];
213        char    c_ino[8];
214        char    c_mode[8];
215        char    c_uid[8];
216        char    c_gid[8];
217        char    c_nlink[8];
218        char    c_mtime[8];
219        char    c_filesize[8];
220        char    c_devmajor[8];
221        char    c_devminor[8];
222        char    c_rdevmajor[8];
223        char    c_rdevminor[8];
224        char    c_namesize[8];
225        char    c_check[8];
226};
227.Ed
228.Pp
229Except as specified below, the fields here match those specified
230for the old binary format above.
231.Bl -tag -width indent
232.It Va magic
233The string
234.Dq 070701 .
235.It Va check
236This field is always set to zero by writers and ignored by readers.
237See the next section for more details.
238.El
239.Pp
240The pathname is followed by NUL bytes so that the total size
241of the fixed header plus pathname is a multiple of four.
242Likewise, the file data is padded to a multiple of four bytes.
243Note that this format supports only 4 gigabyte files (unlike the
244older ASCII format, which supports 8 gigabyte files).
245.Pp
246In this format, hardlinked files are handled by setting the
247filesize to zero for each entry except the last one that
248appears in the archive.
249.Ss New CRC Format
250The CRC format is identical to the new ASCII format described
251in the previous section except that the magic field is set
252to
253.Dq 070702
254and the
255.Va check
256field is set to the sum of all bytes in the file data.
257This sum is computed treating all bytes as unsigned values
258and using unsigned arithmetic.
259Only the least-significant 32 bits of the sum are stored.
260.Ss HP variants
261The
262.Nm cpio
263implementation distributed with HPUX used XXXX but stored
264device numbers differently XXX.
265.Ss Other Extensions and Variants
266Sun Solaris uses additional file types to store extended file
267data, including ACLs and extended attributes, as special
268entries in cpio archives.
269.Pp
270XXX Others? XXX
271.Sh SEE ALSO
272.Xr cpio 1 ,
273.Xr tar 5
274.Sh STANDARDS
275The
276.Nm cpio
277utility is no longer a part of POSIX or the Single Unix Standard.
278It last appeared in
279.St -susv2 .
280It has been supplanted in subsequent standards by
281.Xr pax 1 .
282The portable ASCII format is currently part of the specification for the
283.Xr pax 1
284utility.
285.Sh HISTORY
286The original cpio utility was written by Dick Haight
287while working in AT&T's Unix Support Group.
288It appeared in 1977 as part of PWB/UNIX 1.0, the
289.Dq Programmer's Work Bench
290derived from
291.At v6
292that was used internally at AT&T.
293Both the old binary and old character formats were in use
294by 1980, according to the System III source released
295by SCO under their
296.Dq Ancient Unix
297license.
298The character format was adopted as part of
299.St -p1003.1-88 .
300XXX when did "newc" appear?  Who invented it?  When did HP come out with their variant?  When did Sun introduce ACLs and extended attributes? XXX
301.Sh BUGS
302The
303.Dq CRC
304format is mis-named, as it uses a simple checksum and
305not a cyclic redundancy check.
306.Pp
307The old binary format is limited to 16 bits for user id,
308group id, device, and inode numbers.
309It is limited to 4 gigabyte file sizes.
310.Pp
311The old ASCII format is limited to 18 bits for
312the user id, group id, device, and inode numbers.
313It is limited to 8 gigabyte file sizes.
314.Pp
315The new ASCII format is limited to 4 gigabyte file sizes.
316.Pp
317None of the cpio formats store user or group names,
318which are essential when moving files between systems with
319dissimilar user or group numbering.
320.Pp
321Especially when writing older cpio variants, it may be necessary
322to map actual device/inode values to synthesized values that
323fit the available fields.
324With very large filesystems, this may be necessary even for
325the newer formats.
326