xref: /openbsd/usr.bin/file/file.1 (revision db3296cf)
1.\" $OpenBSD: file.1,v 1.21 2003/06/13 18:31:14 deraadt Exp $
2.\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
3.\"
4.\" Copyright (c) Ian F. Darwin 1986-1995.
5.\" Software written by Ian F. Darwin and others;
6.\" maintained 1995-present by Christos Zoulas and others.
7.\"
8.\" Redistribution and use in source and binary forms, with or without
9.\" modification, are permitted provided that the following conditions
10.\" are met:
11.\" 1. Redistributions of source code must retain the above copyright
12.\"    notice immediately at the beginning of the file, without modification,
13.\"    this list of conditions, and the following disclaimer.
14.\" 2. Redistributions in binary form must reproduce the above copyright
15.\"    notice, this list of conditions and the following disclaimer in the
16.\"    documentation and/or other materials provided with the distribution.
17.\"
18.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
21.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
22.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
28.\" SUCH DAMAGE.
29.\"
30.Dd July 30, 1997
31.Dt FILE 1
32.Os
33.Sh NAME
34.Nm file
35.Nd determine file type
36.Sh SYNOPSIS
37.Nm file
38.Op Fl vbczL
39.Op Fl f Ar namefile
40.Op Fl m Ar magicfiles
41.Ar file Op Ar ...
42.Sh DESCRIPTION
43This manual page documents version 3.22 of the
44.Nm
45command.
46.Nm
47tests each argument in an attempt to classify it.
48There are three sets of tests, performed in this order:
49filesystem tests, magic number tests, and language tests.
50The first test that succeeds causes the file type to be printed.
51.Pp
52The type printed will usually contain one of the words
53.Dq text
54(the file contains only
55.Tn ASCII
56characters and is probably safe to read on an
57.Tn ASCII
58terminal),
59.Dq executable
60(the file contains the result of compiling a program
61in a form understandable to some
62.Ux
63kernel or another),
64or
65.Dq data
66meaning anything else (data is usually binary or non-printable).
67.Pp
68Exceptions are well-known file formats (core files, tar archives)
69that are known to contain binary data.
70When modifying the file
71.Pa /etc/magic
72or the program itself,
73.Em "preserve these keywords" .
74.Pp
75People depend on knowing that all the readable files in a directory
76have the word
77.Dq text
78printed.
79Don't do as Berkeley did; change
80.Dq shell commands text
81to
82.Dq shell script .
83.Pp
84The filesystem tests are based on examining the return from a
85.Xr stat 2
86system call.
87The program checks to see if the file is empty,
88or if it's some sort of special file.
89Any known file types appropriate to the system you are running on
90(sockets, symbolic links, or named pipes (FIFOs) on those systems that
91implement them)
92are intuited if they are defined in
93the system header file
94.Aq Pa sys/stat.h .
95.Pp
96The magic number tests are used to check for files with data in
97particular fixed formats.
98The canonical example of this is a binary executable (compiled program)
99.Pa a.out
100file, whose format is defined in
101.Aq Pa a.out.h
102and possibly
103.Aq Pa exec.h
104in the standard include directory.
105These files have a
106.Dq magic number
107stored in a particular place
108near the beginning of the file that tells the
109.Ux
110operating system
111that the file is a binary executable, and which of several types thereof.
112.Pp
113The concept of magic number has been applied by extension to data files.
114Any file with some invariant identifier at a small fixed
115offset into the file can usually be described in this way.
116The information in these files is read from the magic file
117.Pa /etc/magic .
118.Pp
119If an argument appears to be an
120.Tn ASCII
121file,
122.Nm
123attempts to guess its language.
124The language tests look for particular strings (cf
125.Pa names.h )
126that can appear anywhere in the first few blocks of a file.
127For example, the keyword
128.Em .br
129indicates that the file is most likely a
130.Xr troff 1
131input file, just as the keyword
132.Li struct
133indicates a C program.
134These tests are less reliable than the previous
135two groups, so they are performed last.
136The language test routines also test for some miscellany
137(such as
138.Xr tar 1
139archives) and determine whether an unknown file should be
140labelled as
141.Dq ASCII text
142or
143.Dq data .
144.Pp
145The options are as follows:
146.Bl -tag -width Ds
147.It Fl v
148Print the version of the program and exit.
149.It Fl m Ar list
150Specify an alternate
151.Ar list
152of files containing magic numbers.
153This can be a single file, or a colon-separated list of files.
154.It Fl z
155Try to look inside compressed files.
156.It Fl b
157Do not prepend filenames to output lines (brief mode).
158.It Fl c
159Cause a checking printout of the parsed form of the magic file.
160This is usually used in conjunction with
161.Fl m
162to debug a new magic file before installing it.
163.It Fl f Ar namefile
164Read the names of the files to be examined from
165.Ar namefile
166(one per line)
167before the argument list.
168Either
169.Ar namefile
170or at least one filename argument must be present;
171to test the standard input, use
172.Dq -
173as a filename argument.
174.It Fl L
175Cause symlinks to be followed, as the like-named option in
176.Xr ls 1 .
177(on systems that support symbolic links).
178.El
179.Sh ENVIRONMENT
180.Bl -tag -width indent
181.It Ev MAGIC
182Default magic number files.
183.El
184.Sh FILES
185.Bl -tag -width /etc/magic -compact
186.It Pa /etc/magic
187default list of magic numbers
188.El
189.Sh SEE ALSO
190.Xr hexdump 1 ,
191.Xr od 1 ,
192.Xr strings 1 ,
193.Xr magic 5
194.Sh STANDARDS CONFORMANCE
195This program is believed to exceed the System V Interface Definition
196of FILE(CMD), as near as one can determine from the vague language
197contained therein.
198Its behaviour is mostly compatible with the System V program of the same name.
199This version knows more magic, however, so it will produce
200different (albeit more accurate) output in many cases.
201.Pp
202The one significant difference
203between this version and System V
204is that this version treats any white space
205as a delimiter, so that spaces in pattern strings must be escaped.
206For example,
207.Pp
208>10     string  language impress\       (imPRESS data)
209.Pp
210in an existing magic file would have to be changed to
211.Pp
212>10     string  language\e impress      (imPRESS data)
213.Pp
214In addition, in this version, if a pattern string contains a backslash,
215it must be escaped.
216For example
217.Pp
2180       string          \ebegindata     Andrew Toolkit document
219.Pp
220in an existing magic file would have to be changed to
221.Pp
2220       string          \e\ebegindata   Andrew Toolkit document
223.Pp
224SunOS releases 3.2 and later from Sun Microsystems include a
225.Nm file
226command derived from the System V one, but with some extensions.
227My version differs from Sun's only in minor ways.
228It includes the extension of the
229.Ql &
230operator, used as,
231for example,
232.Pp
233>16     long&0x7fffffff >0              not stripped
234.Sh MAGIC DIRECTORY
235The magic file entries have been collected from various sources,
236mainly USENET, and contributed by various authors.
237.An Christos Zoulas
238(address below) will collect additional
239or corrected magic file entries.
240A consolidation of magic file entries
241will be distributed periodically.
242The order of entries in the magic file is significant.
243Depending on what system you are using, the order that
244they are put together may be incorrect.
245If your old
246.Nm
247command uses a magic file,
248keep the old magic file around for comparison purposes
249(rename it to
250.Pa /etc/magic.orig ) .
251.Sh HISTORY
252There has been a
253.Nm
254command in every
255.Ux
256since at least Research Version 4
257(man page dated November, 1973).
258The System V version introduced one significant major change:
259the external list of magic number types.
260This slowed the program down slightly but made it a lot more flexible.
261.Pp
262This program, based on the System V version, was written by
263.An Ian F. Darwin Aq ian@darwinisys.com
264without looking at anybody else's source code.
265.Pp
266.An John Gilmore
267revised the code extensively, making it better than
268the first version.
269.An Geoff Collyer
270found several inadequacies
271and provided some magic file entries.
272.Pp
273Altered by
274.An Rob McMahon Aq cudcv@warwick.ac.uk ,
2751989, to extend the
276.Ql &
277operator from simple
278.Dq x&y != 0
279to
280.Dq x&y op z .
281.Pp
282Altered by
283.An Guy Harris Aq guy@auspex.com ,
2841993, to:
285.Bl -item -offset indent
286.It
287put the
288.Dq old-style
289.Ql &
290operator back the way it was, because
291.Bl -enum -offset indent
292.It
293Rob McMahon's change broke the
294previous style of usage,
295.It
296The SunOS
297.Dq new-style
298.Ql &
299operator, which this version of
300.Nm
301supports, also handles
302.Dq x&y op z ,
303.It
304Rob's change wasn't documented in any case;
305.El
306.It
307put in multiple levels of
308.Ql > ;
309.It
310put in
311.Dq beshort ,
312.Dq leshort ,
313etc. keywords to look at numbers in the
314file in a specific byte order, rather than in the native byte order of
315the process running
316.Nm file .
317.El
318.Pp
319Currently maintained by
320.An Christos Zoulas Aq christos@zoulas.com .
321.Sh LEGAL NOTICE
322Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
323Covered by the standard Berkeley Software Distribution copyright; see the file
324LEGAL.NOTICE in the distribution.
325.Pp
326The files
327.Pa tar.h
328and
329.Pa is_tar.c
330were written by
331.An John Gilmore
332from his public-domain
333.Nm tar
334program.
335.Sh BUGS
336There must be a better way to automate the construction of the Magic
337file from all the glop in Magdir.
338What is it?
339Better yet, the magic file should be compiled into binary (say,
340.Xr ndbm 3
341or, better yet, fixed-length
342.Tn ASCII
343strings for use in heterogenous network environments) for faster startup.
344Then the program would run as fast as the Version 7 program of the same name,
345with the flexibility of the System V version.
346.Pp
347.Nm
348uses several algorithms that favor speed over accuracy;
349thus it can be misled about the contents of
350.Tn ASCII
351files.
352.Pp
353The support for
354.Tn ASCII
355files (primarily for programming languages)
356is simplistic, inefficient and requires recompilation to update.
357.Pp
358There should be an
359.Dq else
360clause to follow a series of continuation lines.
361.Pp
362The magic file and keywords should have regular expression support.
363Their use of
364.Tn ASCII TAB
365as a field delimiter is ugly and makes
366it hard to edit the files, but is entrenched.
367.Pp
368It might be advisable to allow upper-case letters in keywords
369for e.g.,
370.Xr troff 1
371commands vs man page macros.
372Regular expression support would make this easy.
373.Pp
374The program doesn't grok \s-2FORTRAN\s0.
375It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
376appear indented at the start of line.
377Regular expression support would make this easy.
378.Pp
379The list of keywords in
380.Em ascmagic
381probably belongs in the Magic file.
382This could be done by using some keyword like
383.Ql *
384for the offset value.
385.Pp
386Another optimization would be to sort
387the magic file so that we can just run down all the
388tests for the first byte, first word, first long, etc, once we
389have fetched it.
390Complain about conflicts in the magic file entries.
391Make a rule that the magic entries sort based on file offset rather
392than position within the magic file?
393.Pp
394The program should provide a way to give an estimate
395of
396.Dq how good
397a guess is.
398We end up removing guesses (e.g.,
399.Dq From\ \&
400as first 5 chars of file) because
401they are not as good as other guesses (e.g.,
402.Dq Newsgroups:
403versus
404.Qq Return-Path: ) .
405Still, if the others don't pan out, it should be
406possible to use the first guess.
407.Pp
408This program is slower than some vendors'
409.Nm
410commands.
411.Pp
412This manual page, and particularly this section, is too long.
413.Sh AVAILABILITY
414You can obtain the original author's latest version by anonymous FTP
415on
416.Em ftp.astron.com
417in the directory
418.Pa /pub/file/file-X.YY.tar.gz .
419