xref: /openbsd/usr.bin/file/file.1 (revision 78b63d65)
1.\" $OpenBSD: file.1,v 1.15 2001/10/04 23:02:32 pjanzen Exp $
2.\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $
3.Dd July 30, 1997
4.Dt FILE 1
5.Os
6.Sh NAME
7.Nm file
8.Nd determine file type
9.Sh SYNOPSIS
10.Nm file
11.Op Fl vczL
12.Op Fl f Ar namefile
13.Op Fl m Ar magicfiles
14.Ar file Op Ar ...
15.Sh DESCRIPTION
16This manual page documents version 3.22 of the
17.Nm
18command.
19.Nm
20tests each argument in an attempt to classify it.
21There are three sets of tests, performed in this order:
22filesystem tests, magic number tests, and language tests.
23The first test that succeeds causes the file type to be printed.
24.Pp
25The type printed will usually contain one of the words
26.Dq text
27(the file contains only
28.Tn ASCII
29characters and is probably safe to read on an
30.Tn ASCII
31terminal),
32.Dq executable
33(the file contains the result of compiling a program
34in a form understandable to some
35.Ux
36kernel or another),
37or
38.Dq data
39meaning anything else (data is usually binary or non-printable).
40.Pp
41Exceptions are well-known file formats (core files, tar archives)
42that are known to contain binary data.
43When modifying the file
44.Pa /etc/magic
45or the program itself,
46.Em "preserve these keywords" .
47.Pp
48People depend on knowing that all the readable files in a directory
49have the word
50.Dq text
51printed.
52Don't do as Berkeley did; change
53.Dq shell commands text
54to
55.Dq shell script .
56.Pp
57The filesystem tests are based on examining the return from a
58.Xr stat 2
59system call.
60The program checks to see if the file is empty,
61or if it's some sort of special file.
62Any known file types appropriate to the system you are running on
63(sockets, symbolic links, or named pipes (FIFOs) on those systems that
64implement them)
65are intuited if they are defined in
66the system header file
67.Aq Pa sys/stat.h .
68.Pp
69The magic number tests are used to check for files with data in
70particular fixed formats.
71The canonical example of this is a binary executable (compiled program)
72.Pa a.out
73file, whose format is defined in
74.Aq Pa a.out.h
75and possibly
76.Aq Pa exec.h
77in the standard include directory.
78These files have a
79.Dq magic number
80stored in a particular place
81near the beginning of the file that tells the
82.Ux
83operating system
84that the file is a binary executable, and which of several types thereof.
85.Pp
86The concept of magic number has been applied by extension to data files.
87Any file with some invariant identifier at a small fixed
88offset into the file can usually be described in this way.
89The information in these files is read from the magic file
90.Pa /etc/magic .
91.Pp
92If an argument appears to be an
93.Tn ASCII
94file,
95.Nm
96attempts to guess its language.
97The language tests look for particular strings (cf
98.Pa names.h )
99that can appear anywhere in the first few blocks of a file.
100For example, the keyword
101.Em .br
102indicates that the file is most likely a
103.Xr troff 1
104input file, just as the keyword
105.Li struct
106indicates a C program.
107These tests are less reliable than the previous
108two groups, so they are performed last.
109The language test routines also test for some miscellany
110(such as
111.Xr tar 1
112archives) and determine whether an unknown file should be
113labelled as
114.Dq ASCII text
115or
116.Dq data .
117.Pp
118The options are as follows:
119.Bl -tag -width Ds
120.It Fl v
121Print the version of the program and exit.
122.It Fl m Ar list
123Specify an alternate
124.Ar list
125of files containing magic numbers.
126This can be a single file, or a colon-separated list of files.
127.It Fl z
128Try to look inside compressed files.
129.It Fl c
130Cause a checking printout of the parsed form of the magic file.
131This is usually used in conjunction with
132.Fl m
133to debug a new magic file before installing it.
134.It Fl f Ar namefile
135Read the names of the files to be examined from
136.Ar namefile
137(one per line)
138before the argument list.
139Either
140.Ar namefile
141or at least one filename argument must be present;
142to test the standard input, use
143.Dq -
144as a filename argument.
145.It Fl L
146Cause symlinks to be followed, as the like-named option in
147.Xr ls 1 .
148(on systems that support symbolic links).
149.El
150.Sh ENVIRONMENT
151.Bl -tag -width indent
152.It Ev MAGIC
153Default magic number files.
154.El
155.Sh FILES
156.Bl -tag -width /etc/magic -compact
157.It Pa /etc/magic
158default list of magic numbers
159.El
160.Sh SEE ALSO
161.Xr hexdump 1 ,
162.Xr od 1 ,
163.Xr strings 1 ,
164.Xr magic 5
165.Sh STANDARDS CONFORMANCE
166This program is believed to exceed the System V Interface Definition
167of FILE(CMD), as near as one can determine from the vague language
168contained therein.
169Its behaviour is mostly compatible with the System V program of the same name.
170This version knows more magic, however, so it will produce
171different (albeit more accurate) output in many cases.
172.Pp
173The one significant difference
174between this version and System V
175is that this version treats any white space
176as a delimiter, so that spaces in pattern strings must be escaped.
177For example,
178.Pp
179>10     string  language impress\       (imPRESS data)
180.Pp
181in an existing magic file would have to be changed to
182.Pp
183>10     string  language\e impress      (imPRESS data)
184.Pp
185In addition, in this version, if a pattern string contains a backslash,
186it must be escaped.
187For example
188.Pp
1890       string          \ebegindata     Andrew Toolkit document
190.Pp
191in an existing magic file would have to be changed to
192.Pp
1930       string          \e\ebegindata   Andrew Toolkit document
194.Pp
195SunOS releases 3.2 and later from Sun Microsystems include a
196.Xr file 1
197command derived from the System V one, but with some extensions.
198My version differs from Sun's only in minor ways.
199It includes the extension of the
200.Ql &
201operator, used as,
202for example,
203.Pp
204>16     long&0x7fffffff >0              not stripped
205.Sh MAGIC DIRECTORY
206The magic file entries have been collected from various sources,
207mainly USENET, and contributed by various authors.
208.An Christos Zoulas
209(address below) will collect additional
210or corrected magic file entries.
211A consolidation of magic file entries
212will be distributed periodically.
213The order of entries in the magic file is significant.
214Depending on what system you are using, the order that
215they are put together may be incorrect.
216If your old
217.Nm
218command uses a magic file,
219keep the old magic file around for comparison purposes
220(rename it to
221.Pa /etc/magic.orig ) .
222.Sh HISTORY
223There has been a
224.Nm
225command in every
226.Ux
227since at least Research Version 6
228(man page dated January, 1975).
229The System V version introduced one significant major change:
230the external list of magic number types.
231This slowed the program down slightly but made it a lot more flexible.
232.Pp
233This program, based on the System V version, was written by
234.An Ian F. Darwin Aq ian@darwinisys.com
235without looking at anybody else's source code.
236.Pp
237.An John Gilmore
238revised the code extensively, making it better than
239the first version.
240.An Geoff Collyer
241found several inadequacies
242and provided some magic file entries.
243.Pp
244Altered by
245.An Rob McMahon Aq cudcv@warwick.ac.uk ,
2461989, to extend the
247.Ql &
248operator from simple
249.Dq x&y != 0
250to
251.Dq x&y op z .
252.Pp
253Altered by
254.An Guy Harris Aq guy@auspex.com ,
2551993, to:
256.Bl -item -offset indent
257.It
258put the
259.Dq old-style
260.Ql &
261operator back the way it was, because
262.Bl -enum -offset indent
263.It
264Rob McMahon's change broke the
265previous style of usage,
266.It
267The SunOS
268.Dq new-style
269.Ql &
270operator, which this version of
271.Nm
272supports, also handles
273.Dq x&y op z ,
274.It
275Rob's change wasn't documented in any case;
276.El
277.It
278put in multiple levels of
279.Ql > ;
280.It
281put in
282.Dq beshort ,
283.Dq leshort ,
284etc. keywords to look at numbers in the
285file in a specific byte order, rather than in the native byte order of
286the process running
287.Nm file .
288.El
289.Pp
290Currently maintained by
291.An Christos Zoulas Aq christos@zoulas.com .
292.Sh LEGAL NOTICE
293Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
294Covered by the standard Berkeley Software Distribution copyright; see the file
295LEGAL.NOTICE in the distribution.
296.Pp
297The files
298.Pa tar.h
299and
300.Pa is_tar.c
301were written by
302.An John Gilmore
303from his public-domain
304.Nm tar
305program.
306.Sh BUGS
307There must be a better way to automate the construction of the Magic
308file from all the glop in Magdir.
309What is it?
310Better yet, the magic file should be compiled into binary (say,
311.Xr ndbm 3
312or, better yet, fixed-length
313.Tn ASCII
314strings for use in heterogenous network environments) for faster startup.
315Then the program would run as fast as the Version 7 program of the same name,
316with the flexibility of the System V version.
317.Pp
318.Nm
319uses several algorithms that favor speed over accuracy;
320thus it can be misled about the contents of
321.Tn ASCII
322files.
323.Pp
324The support for
325.Tn ASCII
326files (primarily for programming languages)
327is simplistic, inefficient and requires recompilation to update.
328.Pp
329There should be an
330.Dq else
331clause to follow a series of continuation lines.
332.Pp
333The magic file and keywords should have regular expression support.
334Their use of
335.Tn ASCII TAB
336as a field delimiter is ugly and makes
337it hard to edit the files, but is entrenched.
338.Pp
339It might be advisable to allow upper-case letters in keywords
340for e.g.,
341.Xr troff 1
342commands vs man page macros.
343Regular expression support would make this easy.
344.Pp
345The program doesn't grok \s-2FORTRAN\s0.
346It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
347appear indented at the start of line.
348Regular expression support would make this easy.
349.Pp
350The list of keywords in
351.Em ascmagic
352probably belongs in the Magic file.
353This could be done by using some keyword like
354.Ql *
355for the offset value.
356.Pp
357Another optimization would be to sort
358the magic file so that we can just run down all the
359tests for the first byte, first word, first long, etc, once we
360have fetched it.
361Complain about conflicts in the magic file entries.
362Make a rule that the magic entries sort based on file offset rather
363than position within the magic file?
364.Pp
365The program should provide a way to give an estimate
366of
367.Dq how good
368a guess is.
369We end up removing guesses (e.g.,
370.Dq From\
371as first 5 chars of file) because
372they are not as good as other guesses (e.g.,
373.Dq Newsgroups:
374versus
375.Qq Return-Path: ) .
376Still, if the others don't pan out, it should be
377possible to use the first guess.
378.Pp
379This program is slower than some vendors'
380.Nm
381commands.
382.Pp
383This manual page, and particularly this section, is too long.
384.Sh AVAILABILITY
385You can obtain the original author's latest version by anonymous FTP
386on
387.Em ftp.astron.com
388in the directory
389.Pa /pub/file/file-X.YY.tar.gz
390