xref: /dragonfly/contrib/file/doc/file.man (revision dcd37f7d)
1.\" $File: file.man,v 1.82 2009/11/04 22:30:34 christos Exp $
2.Dd October 9, 2008
3.Dt FILE __CSECTION__
4.Os
5.Sh NAME
6.Nm file
7.Nd determine file type
8.Sh SYNOPSIS
9.Nm
10.Bk -words
11.Op Fl bchikLNnprsvz0
12.Op Fl -apple
13.Op Fl -mime-encoding
14.Op Fl -mime-type
15.Op Fl e Ar testname
16.Op Fl F Ar separator
17.Op Fl f Ar namefile
18.Op Fl m Ar magicfiles
19.Ar
20.Ek -words
21.Nm
22.Fl C
23.Op Fl m Ar magicfiles
24.Nm
25.Op Fl -help
26.Sh DESCRIPTION
27This manual page documents version __VERSION__ of the
28.Nm
29command.
30.Pp
31.Nm
32tests each argument in an attempt to classify it.
33There are three sets of tests, performed in this order:
34filesystem tests, magic tests, and language tests.
35The
36.Em first
37test that succeeds causes the file type to be printed.
38.Pp
39The type printed will usually contain one of the words
40.Em text
41(the file contains only
42printing characters and a few common control
43characters and is probably safe to read on an
44.Dv ASCII
45terminal),
46.Em executable
47(the file contains the result of compiling a program
48in a form understandable to some
49.Dv UNIX
50kernel or another),
51or
52.Em data
53meaning anything else (data is usually
54.Sq binary
55or non-printable).
56Exceptions are well-known file formats (core files, tar archives)
57that are known to contain binary data.
58When modifying magic files or the program itself, make sure to
59.Em "preserve these keywords" .
60Users depend on knowing that all the readable files in a directory
61have the word
62.Sq text
63printed.
64Don't do as Berkeley did and change
65.Sq shell commands text
66to
67.Sq shell script .
68.Pp
69The filesystem tests are based on examining the return from a
70.Xr stat 2
71system call.
72The program checks to see if the file is empty,
73or if it's some sort of special file.
74Any known file types appropriate to the system you are running on
75(sockets, symbolic links, or named pipes (FIFOs) on those systems that
76implement them)
77are intuited if they are defined in
78the system header file
79.In sys/stat.h .
80.Pp
81The magic tests are used to check for files with data in
82particular fixed formats.
83The canonical example of this is a binary executable (compiled program)
84.Dv a.out
85file, whose format is defined in
86.In elf.h ,
87.In a.out.h
88and possibly
89.In exec.h
90in the standard include directory.
91These files have a
92.Sq "magic number"
93stored in a particular place
94near the beginning of the file that tells the
95.Dv UNIX operating system
96that the file is a binary executable, and which of several types thereof.
97The concept of a
98.Sq "magic"
99has been applied by extension to data files.
100Any file with some invariant identifier at a small fixed
101offset into the file can usually be described in this way.
102The information identifying these files is read from the compiled
103magic file
104.Pa __MAGIC__.mgc ,
105or the files in the directory
106.Pa __MAGIC__
107if the compiled file does not exist. In addition, if
108.Pa $HOME/.magic.mgc
109or
110.Pa $HOME/.magic
111exists, it will be used in preference to the system magic files.
112.Pp
113If a file does not match any of the entries in the magic file,
114it is examined to see if it seems to be a text file.
115ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
116(such as those used on Macintosh and IBM PC systems),
117UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
118character sets can be distinguished by the different
119ranges and sequences of bytes that constitute printable text
120in each set.
121If a file passes any of these tests, its character set is reported.
122ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
123as
124.Sq text
125because they will be mostly readable on nearly any terminal;
126UTF-16 and EBCDIC are only
127.Sq character data
128because, while
129they contain text, it is text that will require translation
130before it can be read.
131In addition,
132.Nm
133will attempt to determine other characteristics of text-type files.
134If the lines of a file are terminated by CR, CRLF, or NEL, instead
135of the Unix-standard LF, this will be reported.
136Files that contain embedded escape sequences or overstriking
137will also be identified.
138.Pp
139Once
140.Nm
141has determined the character set used in a text-type file,
142it will
143attempt to determine in what language the file is written.
144The language tests look for particular strings (cf.
145.In names.h
146) that can appear anywhere in the first few blocks of a file.
147For example, the keyword
148.Em .br
149indicates that the file is most likely a
150.Xr troff 1
151input file, just as the keyword
152.Em struct
153indicates a C program.
154These tests are less reliable than the previous
155two groups, so they are performed last.
156The language test routines also test for some miscellany
157(such as
158.Xr tar 1
159archives).
160.Pp
161Any file that cannot be identified as having been written
162in any of the character sets listed above is simply said to be
163.Sq data .
164.Sh OPTIONS
165.Bl -tag -width indent
166.It Fl b , -brief
167Do not prepend filenames to output lines (brief mode).
168.It Fl C , -compile
169Write a
170.Pa magic.mgc
171output file that contains a pre-parsed version of the magic file or directory.
172.It Fl c , -checking-printout
173Cause a checking printout of the parsed form of the magic file.
174This is usually used in conjunction with the
175.Fl m
176flag to debug a new magic file before installing it.
177.It Fl e , -exclude Ar testname
178Exclude the test named in
179.Ar testname
180from the list of tests made to determine the file type. Valid test names
181are:
182.Bl -tag -width compress
183.It apptype
184.Dv EMX
185application type (only on EMX).
186.It text
187Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the
188.Sq encoding
189option).
190.It encoding
191Different text encodings for soft magic tests.
192.It tokens
193Looks for known tokens inside text files.
194.It cdf
195Prints details of Compound Document Files.
196.It compress
197Checks for, and looks inside, compressed files.
198.It elf
199Prints ELF file details.
200.It soft
201Consults magic files.
202.It tar
203Examines tar files.
204.El
205.It Fl F , -separator Ar separator
206Use the specified string as the separator between the filename and the
207file result returned. Defaults to
208.Sq \&: .
209.It Fl f , -files-from Ar namefile
210Read the names of the files to be examined from
211.Ar namefile
212(one per line)
213before the argument list.
214Either
215.Ar namefile
216or at least one filename argument must be present;
217to test the standard input, use
218.Sq -
219as a filename argument.
220.It Fl h , -no-dereference
221option causes symlinks not to be followed
222(on systems that support symbolic links). This is the default if the
223environment variable
224.Dv POSIXLY_CORRECT
225is not defined.
226.It Fl i , -mime
227Causes the file command to output mime type strings rather than the more
228traditional human readable ones. Thus it may say
229.Sq text/plain; charset=us-ascii
230rather than
231.Sq ASCII text .
232In order for this option to work, file changes the way
233it handles files recognized by the command itself (such as many of the
234text file types, directories etc), and makes use of an alternative
235.Sq magic
236file.
237(See the FILES section, below).
238.It Fl -mime-type , -mime-encoding
239Like
240.Fl i ,
241but print only the specified element(s).
242.It Fl k , -keep-going
243Don't stop at the first match, keep going. Subsequent matches will be
244have the string
245.Sq "\[rs]012\- "
246prepended.
247(If you want a newline, see the
248.Sq "\-r"
249option.)
250.It Fl L , -dereference
251option causes symlinks to be followed, as the like-named option in
252.Xr ls 1
253(on systems that support symbolic links).
254This is the default if the environment variable
255.Dv POSIXLY_CORRECT
256is defined.
257.It Fl m , -magic-file Ar magicfiles
258Specify an alternate list of files and directories containing magic.
259This can be a single item, or a colon-separated list.
260If a compiled magic file is found alongside a file or directory, it will be used instead.
261.It Fl N , -no-pad
262Don't pad filenames so that they align in the output.
263.It Fl n , -no-buffer
264Force stdout to be flushed after checking each file.
265This is only useful if checking a list of files.
266It is intended to be used by programs that want filetype output from a pipe.
267.It Fl p , -preserve-date
268On systems that support
269.Xr utime 2
270or
271.Xr utimes 2 ,
272attempt to preserve the access time of files analyzed, to pretend that
273.Nm
274never read them.
275.It Fl r , -raw
276Don't translate unprintable characters to \eooo.
277Normally
278.Nm
279translates unprintable characters to their octal representation.
280.It Fl s , -special-files
281Normally,
282.Nm
283only attempts to read and determine the type of argument files which
284.Xr stat 2
285reports are ordinary files.
286This prevents problems, because reading special files may have peculiar
287consequences.
288Specifying the
289.Fl s
290option causes
291.Nm
292to also read argument files which are block or character special files.
293This is useful for determining the filesystem types of the data in raw
294disk partitions, which are block special files.
295This option also causes
296.Nm
297to disregard the file size as reported by
298.Xr stat 2
299since on some systems it reports a zero size for raw disk partitions.
300.It Fl v , -version
301Print the version of the program and exit.
302.It Fl z , -uncompress
303Try to look inside compressed files.
304.It Fl 0 , -print0
305Output a null character
306.Sq \e0
307after the end of the filename. Nice to
308.Xr cut 1
309the output. This does not affect the separator which is still printed.
310.It Fl -help
311Print a help message and exit.
312.El
313.Sh FILES
314.Bl -tag -width __MAGIC__.mgc -compact
315.It Pa __MAGIC__.mgc
316Default compiled list of magic.
317.It Pa __MAGIC__
318Directory containing default magic files.
319.El
320.Sh ENVIRONMENT
321The environment variable
322.Dv MAGIC
323can be used to set the default magic file name.
324If that variable is set, then
325.Nm
326will not attempt to open
327.Pa $HOME/.magic .
328.Nm
329adds
330.Sq .mgc
331to the value of this variable as appropriate.
332The environment variable
333.Dv POSIXLY_CORRECT
334controls (on systems that support symbolic links), whether
335.Nm
336will attempt to follow symlinks or not. If set, then
337.Nm
338follows symlink, otherwise it does not. This is also controlled
339by the
340.Fl L
341and
342.Fl h
343options.
344.Sh SEE ALSO
345.Xr magic __FSECTION__ ,
346.Xr strings 1 ,
347.Xr od 1 ,
348.Xr hexdump 1,
349.Xr file 1posix
350.Sh STANDARDS CONFORMANCE
351This program is believed to exceed the System V Interface Definition
352of FILE(CMD), as near as one can determine from the vague language
353contained therein.
354Its behavior is mostly compatible with the System V program of the same name.
355This version knows more magic, however, so it will produce
356different (albeit more accurate) output in many cases.
357.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
358.Pp
359The one significant difference
360between this version and System V
361is that this version treats any white space
362as a delimiter, so that spaces in pattern strings must be escaped.
363For example,
364.Bd -literal -offset indent
365>10	string	language impress\ 	(imPRESS data)
366.Ed
367.Pp
368in an existing magic file would have to be changed to
369.Bd -literal -offset indent
370>10	string	language\e impress	(imPRESS data)
371.Ed
372.Pp
373In addition, in this version, if a pattern string contains a backslash,
374it must be escaped.
375For example
376.Bd -literal -offset indent
3770	string		\ebegindata	Andrew Toolkit document
378.Ed
379.Pp
380in an existing magic file would have to be changed to
381.Bd -literal -offset indent
3820	string		\e\ebegindata	Andrew Toolkit document
383.Ed
384.Pp
385SunOS releases 3.2 and later from Sun Microsystems include a
386.Nm
387command derived from the System V one, but with some extensions.
388My version differs from Sun's only in minor ways.
389It includes the extension of the
390.Sq &
391operator, used as,
392for example,
393.Bd -literal -offset indent
394>16	long&0x7fffffff	>0		not stripped
395.Ed
396.Sh MAGIC DIRECTORY
397The magic file entries have been collected from various sources,
398mainly USENET, and contributed by various authors.
399Christos Zoulas (address below) will collect additional
400or corrected magic file entries.
401A consolidation of magic file entries
402will be distributed periodically.
403.Pp
404The order of entries in the magic file is significant.
405Depending on what system you are using, the order that
406they are put together may be incorrect.
407If your old
408.Nm
409command uses a magic file,
410keep the old magic file around for comparison purposes
411(rename it to
412.Pa __MAGIC__.orig ).
413.Sh EXAMPLES
414.Bd -literal -offset indent
415$ file file.c file /dev/{wd0a,hda}
416file.c:   C program text
417file:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
418	  dynamically linked (uses shared libs), stripped
419/dev/wd0a: block special (0/0)
420/dev/hda: block special (3/0)
421
422$ file -s /dev/wd0{b,d}
423/dev/wd0b: data
424/dev/wd0d: x86 boot sector
425
426$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
427/dev/hda:   x86 boot sector
428/dev/hda1:  Linux/i386 ext2 filesystem
429/dev/hda2:  x86 boot sector
430/dev/hda3:  x86 boot sector, extended partition table
431/dev/hda4:  Linux/i386 ext2 filesystem
432/dev/hda5:  Linux/i386 swap file
433/dev/hda6:  Linux/i386 swap file
434/dev/hda7:  Linux/i386 swap file
435/dev/hda8:  Linux/i386 swap file
436/dev/hda9:  empty
437/dev/hda10: empty
438
439$ file -i file.c file /dev/{wd0a,hda}
440file.c:      text/x-c
441file:        application/x-executable
442/dev/hda:    application/x-not-regular-file
443/dev/wd0a:   application/x-not-regular-file
444
445.Ed
446.Sh HISTORY
447There has been a
448.Nm
449command in every
450.Dv UNIX since at least Research Version 4
451(man page dated November, 1973).
452The System V version introduced one significant major change:
453the external list of magic types.
454This slowed the program down slightly but made it a lot more flexible.
455.Pp
456This program, based on the System V version,
457was written by Ian Darwin <ian@darwinsys.com>
458without looking at anybody else's source code.
459.Pp
460John Gilmore revised the code extensively, making it better than
461the first version.
462Geoff Collyer found several inadequacies
463and provided some magic file entries.
464Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
465.Pp
466Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
467.Pp
468Primary development and maintenance from 1990 to the present by
469Christos Zoulas (christos@astron.com).
470.Pp
471Altered by Chris Lowth, chris@lowth.com, 2000:
472Handle the
473.Fl i
474option to output mime type strings, using an alternative
475magic file and internal logic.
476.Pp
477Altered by Eric Fischer (enf@pobox.com), July, 2000,
478to identify character codes and attempt to identify the languages
479of non-ASCII files.
480.Pp
481Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
482support and merge MIME and non-MIME magic, support directories as well
483as files of magic, apply many bug fixes and improve the build system.
484.Pp
485The list of contributors to the
486.Sq magic
487directory (magic files)
488is too long to include here.
489You know who you are; thank you.
490Many contributors are listed in the source files.
491.Sh LEGAL NOTICE
492Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
493Covered by the standard Berkeley Software Distribution copyright; see the file
494LEGAL.NOTICE in the source distribution.
495.Pp
496The files
497.Dv tar.h
498and
499.Dv is_tar.c
500were written by John Gilmore from his public-domain
501.Xr tar 1
502program, and are not covered by the above license.
503.Sh BUGS
504.Pp
505There must be a better way to automate the construction of the Magic
506file from all the glop in Magdir.
507What is it?
508.Pp
509.Nm
510uses several algorithms that favor speed over accuracy,
511thus it can be misled about the contents of
512text
513files.
514.Pp
515The support for text files (primarily for programming languages)
516is simplistic, inefficient and requires recompilation to update.
517.Pp
518The list of keywords in
519.Dv ascmagic
520probably belongs in the Magic file.
521This could be done by using some keyword like
522.Sq *
523for the offset value.
524.Pp
525Complain about conflicts in the magic file entries.
526Make a rule that the magic entries sort based on file offset rather
527than position within the magic file?
528.Pp
529The program should provide a way to give an estimate
530of
531.Sq how good
532a guess is.
533We end up removing guesses (e.g.
534.Sq From\
535as first 5 chars of file) because
536they are not as good as other guesses (e.g.
537.Sq Newsgroups:
538versus
539.Sq Return-Path:
540).
541Still, if the others don't pan out, it should be possible to use the
542first guess.
543.Pp
544This manual page, and particularly this section, is too long.
545.Sh RETURN CODE
546.Nm
547returns 0 on success, and non-zero on error.
548.Sh AVAILABILITY
549You can obtain the original author's latest version by anonymous FTP
550on
551.Dv ftp.astron.com
552in the directory
553.Dv /pub/file/file-X.YZ.tar.gz
554