xref: /dragonfly/contrib/file/doc/file.man (revision 4a65f651)
1.\" $File: file.man,v 1.79 2008/11/06 22:49:08 rrt Exp $
2.Dd October 9, 2008
3.Dt FILE __CSECTION__
4.Os
5.Sh NAME
6.Nm file
7.Nd determine file type
8.Sh SYNOPSIS
9.Nm
10.Op Fl bchikLnNprsvz
11.Op Fl -mime-type
12.Op Fl -mime-encoding
13.Op Fl f Ar namefile
14.Op Fl F Ar separator
15.Op Fl m Ar magicfiles
16.Ar file
17.Nm
18.Fl C
19.Op Fl m Ar magicfile
20.Nm
21.Op Fl -help
22.Sh DESCRIPTION
23This manual page documents version __VERSION__ of the
24.Nm
25command.
26.Pp
27.Nm
28tests each argument in an attempt to classify it.
29There are three sets of tests, performed in this order:
30filesystem tests, magic tests, and language tests.
31The
32.Em first
33test that succeeds causes the file type to be printed.
34.Pp
35The type printed will usually contain one of the words
36.Em text
37(the file contains only
38printing characters and a few common control
39characters and is probably safe to read on an
40.Dv ASCII
41terminal),
42.Em executable
43(the file contains the result of compiling a program
44in a form understandable to some
45.Dv UNIX
46kernel or another),
47or
48.Em data
49meaning anything else (data is usually
50.Sq binary
51or non-printable).
52Exceptions are well-known file formats (core files, tar archives)
53that are known to contain binary data.
54When modifying magic files or the program itself, make sure to
55.Em "preserve these keywords" .
56Users depend on knowing that all the readable files in a directory
57have the word
58.Sq text
59printed.
60Don't do as Berkeley did and change
61.Sq shell commands text
62to
63.Sq shell script .
64.Pp
65The filesystem tests are based on examining the return from a
66.Xr stat 2
67system call.
68The program checks to see if the file is empty,
69or if it's some sort of special file.
70Any known file types appropriate to the system you are running on
71(sockets, symbolic links, or named pipes (FIFOs) on those systems that
72implement them)
73are intuited if they are defined in
74the system header file
75.In sys/stat.h .
76.Pp
77The magic tests are used to check for files with data in
78particular fixed formats.
79The canonical example of this is a binary executable (compiled program)
80.Dv a.out
81file, whose format is defined in
82.In elf.h ,
83.In a.out.h
84and possibly
85.In exec.h
86in the standard include directory.
87These files have a
88.Sq "magic number"
89stored in a particular place
90near the beginning of the file that tells the
91.Dv UNIX operating system
92that the file is a binary executable, and which of several types thereof.
93The concept of a
94.Sq "magic"
95has been applied by extension to data files.
96Any file with some invariant identifier at a small fixed
97offset into the file can usually be described in this way.
98The information identifying these files is read from the compiled
99magic file
100.Pa __MAGIC__.mgc ,
101or the files in the directory
102.Pa __MAGIC__
103if the compiled file does not exist. In addition, if
104.Pa $HOME/.magic.mgc
105or
106.Pa $HOME/.magic
107exists, it will be used in preference to the system magic files.
108.Pp
109If a file does not match any of the entries in the magic file,
110it is examined to see if it seems to be a text file.
111ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
112(such as those used on Macintosh and IBM PC systems),
113UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
114character sets can be distinguished by the different
115ranges and sequences of bytes that constitute printable text
116in each set.
117If a file passes any of these tests, its character set is reported.
118ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
119as
120.Sq text
121because they will be mostly readable on nearly any terminal;
122UTF-16 and EBCDIC are only
123.Sq character data
124because, while
125they contain text, it is text that will require translation
126before it can be read.
127In addition,
128.Nm
129will attempt to determine other characteristics of text-type files.
130If the lines of a file are terminated by CR, CRLF, or NEL, instead
131of the Unix-standard LF, this will be reported.
132Files that contain embedded escape sequences or overstriking
133will also be identified.
134.Pp
135Once
136.Nm
137has determined the character set used in a text-type file,
138it will
139attempt to determine in what language the file is written.
140The language tests look for particular strings (cf.
141.In names.h
142) that can appear anywhere in the first few blocks of a file.
143For example, the keyword
144.Em .br
145indicates that the file is most likely a
146.Xr troff 1
147input file, just as the keyword
148.Em struct
149indicates a C program.
150These tests are less reliable than the previous
151two groups, so they are performed last.
152The language test routines also test for some miscellany
153(such as
154.Xr tar 1
155archives).
156.Pp
157Any file that cannot be identified as having been written
158in any of the character sets listed above is simply said to be
159.Sq data .
160.Sh OPTIONS
161.Bl -tag -width indent
162.It Fl b , -brief
163Do not prepend filenames to output lines (brief mode).
164.It Fl c , -checking-printout
165Cause a checking printout of the parsed form of the magic file.
166This is usually used in conjunction with the
167.Fl m
168flag to debug a new magic file before installing it.
169.It Fl C , -compile
170Write a
171.Pa magic.mgc
172output file that contains a pre-parsed version of the magic file or directory.
173.It Fl e , -exclude Ar testname
174Exclude the test named in
175.Ar testname
176from the list of tests made to determine the file type. Valid test names
177are:
178.Bl -tag -width
179.It apptype
180.Dv EMX
181application type (only on EMX).
182.It text
183Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the
184.Sq encoding
185option).
186.It encoding
187Different text encodings for soft magic tests.
188.It tokens
189Looks for known tokens inside text files.
190.It cdf
191Prints details of Compound Document Files.
192.It compress
193Checks for, and looks inside, compressed files.
194.It elf
195Prints ELF file details.
196.It soft
197Consults magic files.
198.It tar
199Examines tar files.
200.El
201.It Fl f , -files-from Ar namefile
202Read the names of the files to be examined from
203.Ar namefile
204(one per line)
205before the argument list.
206Either
207.Ar namefile
208or at least one filename argument must be present;
209to test the standard input, use
210.Sq -
211as a filename argument.
212.It Fl F , -separator Ar separator
213Use the specified string as the separator between the filename and the
214file result returned. Defaults to
215.Sq \&: .
216.It Fl h , -no-dereference
217option causes symlinks not to be followed
218(on systems that support symbolic links). This is the default if the
219environment variable
220.Dv POSIXLY_CORRECT
221is not defined.
222.It Fl i , -mime
223Causes the file command to output mime type strings rather than the more
224traditional human readable ones. Thus it may say
225.Sq text/plain; charset=us-ascii
226rather than
227.Sq ASCII text .
228In order for this option to work, file changes the way
229it handles files recognized by the command itself (such as many of the
230text file types, directories etc), and makes use of an alternative
231.Sq magic
232file.
233(See the FILES section, below).
234.It Fl -mime-type , -mime-encoding
235Like
236.Fl i ,
237but print only the specified element(s).
238.It Fl k , -keep-going
239Don't stop at the first match, keep going. Subsequent matches will be
240have the string
241.Sq "\[rs]012\- "
242prepended.
243(If you want a newline, see the
244.Sq "\-r"
245option.)
246.It Fl L , -dereference
247option causes symlinks to be followed, as the like-named option in
248.Xr ls 1
249(on systems that support symbolic links).
250This is the default if the environment variable
251.Dv POSIXLY_CORRECT
252is defined.
253.It Fl m , -magic-file Ar list
254Specify an alternate list of files and directories containing magic.
255This can be a single item, or a colon-separated list.
256If a compiled magic file is found alongside a file or directory, it will be used instead.
257.It Fl n , -no-buffer
258Force stdout to be flushed after checking each file.
259This is only useful if checking a list of files.
260It is intended to be used by programs that want filetype output from a pipe.
261.It Fl N , -no-pad
262Don't pad filenames so that they align in the output.
263.It Fl p , -preserve-date
264On systems that support
265.Xr utime 2
266or
267.Xr utimes 2 ,
268attempt to preserve the access time of files analyzed, to pretend that
269.Nm
270never read them.
271.It Fl r , -raw
272Don't translate unprintable characters to \eooo.
273Normally
274.Nm
275translates unprintable characters to their octal representation.
276.It Fl s , -special-files
277Normally,
278.Nm
279only attempts to read and determine the type of argument files which
280.Xr stat 2
281reports are ordinary files.
282This prevents problems, because reading special files may have peculiar
283consequences.
284Specifying the
285.Fl s
286option causes
287.Nm
288to also read argument files which are block or character special files.
289This is useful for determining the filesystem types of the data in raw
290disk partitions, which are block special files.
291This option also causes
292.Nm
293to disregard the file size as reported by
294.Xr stat 2
295since on some systems it reports a zero size for raw disk partitions.
296.It Fl v , -version
297Print the version of the program and exit.
298.It Fl z , -uncompress
299Try to look inside compressed files.
300.It Fl 0 , -print0
301Output a null character
302.Sq \e0
303after the end of the filename. Nice to
304.Xr cut 1
305the output. This does not affect the separator which is still printed.
306.It Fl -help
307Print a help message and exit.
308.El
309.Sh FILES
310.Bl -tag -width __MAGIC__.mgc -compact
311.It Pa __MAGIC__.mgc
312Default compiled list of magic.
313.It Pa __MAGIC__
314Directory containing default magic files.
315.El
316.Sh ENVIRONMENT
317The environment variable
318.Dv MAGIC
319can be used to set the default magic file name.
320If that variable is set, then
321.Nm
322will not attempt to open
323.Pa $HOME/.magic .
324.Nm
325adds
326.Sq .mgc
327to the value of this variable as appropriate.
328The environment variable
329.Dv POSIXLY_CORRECT
330controls (on systems that support symbolic links), whether
331.Nm
332will attempt to follow symlinks or not. If set, then
333.Nm
334follows symlink, otherwise it does not. This is also controlled
335by the
336.Fl L
337and
338.Fl h
339options.
340.Sh SEE ALSO
341.Xr magic __FSECTION__ ,
342.Xr strings 1 ,
343.Xr od 1 ,
344.Xr hexdump 1,
345.Xr file 1posix
346.Sh STANDARDS CONFORMANCE
347This program is believed to exceed the System V Interface Definition
348of FILE(CMD), as near as one can determine from the vague language
349contained therein.
350Its behavior is mostly compatible with the System V program of the same name.
351This version knows more magic, however, so it will produce
352different (albeit more accurate) output in many cases.
353.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
354.Pp
355The one significant difference
356between this version and System V
357is that this version treats any white space
358as a delimiter, so that spaces in pattern strings must be escaped.
359For example,
360.Bd -literal -offset indent
361>10	string	language impress\ 	(imPRESS data)
362.Ed
363.Pp
364in an existing magic file would have to be changed to
365.Bd -literal -offset indent
366>10	string	language\e impress	(imPRESS data)
367.Ed
368.Pp
369In addition, in this version, if a pattern string contains a backslash,
370it must be escaped.
371For example
372.Bd -literal -offset indent
3730	string		\ebegindata	Andrew Toolkit document
374.Ed
375.Pp
376in an existing magic file would have to be changed to
377.Bd -literal -offset indent
3780	string		\e\ebegindata	Andrew Toolkit document
379.Ed
380.Pp
381SunOS releases 3.2 and later from Sun Microsystems include a
382.Nm
383command derived from the System V one, but with some extensions.
384My version differs from Sun's only in minor ways.
385It includes the extension of the
386.Sq &
387operator, used as,
388for example,
389.Bd -literal -offset indent
390>16	long&0x7fffffff	>0		not stripped
391.Ed
392.Sh MAGIC DIRECTORY
393The magic file entries have been collected from various sources,
394mainly USENET, and contributed by various authors.
395Christos Zoulas (address below) will collect additional
396or corrected magic file entries.
397A consolidation of magic file entries
398will be distributed periodically.
399.Pp
400The order of entries in the magic file is significant.
401Depending on what system you are using, the order that
402they are put together may be incorrect.
403If your old
404.Nm
405command uses a magic file,
406keep the old magic file around for comparison purposes
407(rename it to
408.Pa __MAGIC__.orig ).
409.Sh EXAMPLES
410.Bd -literal -offset indent
411$ file file.c file /dev/{wd0a,hda}
412file.c:   C program text
413file:     ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
414	  dynamically linked (uses shared libs), stripped
415/dev/wd0a: block special (0/0)
416/dev/hda: block special (3/0)
417
418$ file -s /dev/wd0{b,d}
419/dev/wd0b: data
420/dev/wd0d: x86 boot sector
421
422$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
423/dev/hda:   x86 boot sector
424/dev/hda1:  Linux/i386 ext2 filesystem
425/dev/hda2:  x86 boot sector
426/dev/hda3:  x86 boot sector, extended partition table
427/dev/hda4:  Linux/i386 ext2 filesystem
428/dev/hda5:  Linux/i386 swap file
429/dev/hda6:  Linux/i386 swap file
430/dev/hda7:  Linux/i386 swap file
431/dev/hda8:  Linux/i386 swap file
432/dev/hda9:  empty
433/dev/hda10: empty
434
435$ file -i file.c file /dev/{wd0a,hda}
436file.c:      text/x-c
437file:        application/x-executable
438/dev/hda:    application/x-not-regular-file
439/dev/wd0a:   application/x-not-regular-file
440
441.Ed
442.Sh HISTORY
443There has been a
444.Nm
445command in every
446.Dv UNIX since at least Research Version 4
447(man page dated November, 1973).
448The System V version introduced one significant major change:
449the external list of magic types.
450This slowed the program down slightly but made it a lot more flexible.
451.Pp
452This program, based on the System V version,
453was written by Ian Darwin <ian@darwinsys.com>
454without looking at anybody else's source code.
455.Pp
456John Gilmore revised the code extensively, making it better than
457the first version.
458Geoff Collyer found several inadequacies
459and provided some magic file entries.
460Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
461.Pp
462Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
463.Pp
464Primary development and maintenance from 1990 to the present by
465Christos Zoulas (christos@astron.com).
466.Pp
467Altered by Chris Lowth, chris@lowth.com, 2000:
468Handle the
469.Fl i
470option to output mime type strings, using an alternative
471magic file and internal logic.
472.Pp
473Altered by Eric Fischer (enf@pobox.com), July, 2000,
474to identify character codes and attempt to identify the languages
475of non-ASCII files.
476.Pp
477Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
478support and merge MIME and non-MIME magic, support directories as well
479as files of magic, apply many bug fixes and improve the build system.
480.Pp
481The list of contributors to the
482.Sq magic
483directory (magic files)
484is too long to include here.
485You know who you are; thank you.
486Many contributors are listed in the source files.
487.Sh LEGAL NOTICE
488Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
489Covered by the standard Berkeley Software Distribution copyright; see the file
490LEGAL.NOTICE in the source distribution.
491.Pp
492The files
493.Dv tar.h
494and
495.Dv is_tar.c
496were written by John Gilmore from his public-domain
497.Xr tar 1
498program, and are not covered by the above license.
499.Sh BUGS
500.Pp
501There must be a better way to automate the construction of the Magic
502file from all the glop in Magdir.
503What is it?
504.Pp
505.Nm
506uses several algorithms that favor speed over accuracy,
507thus it can be misled about the contents of
508text
509files.
510.Pp
511The support for text files (primarily for programming languages)
512is simplistic, inefficient and requires recompilation to update.
513.Pp
514The list of keywords in
515.Dv ascmagic
516probably belongs in the Magic file.
517This could be done by using some keyword like
518.Sq *
519for the offset value.
520.Pp
521Complain about conflicts in the magic file entries.
522Make a rule that the magic entries sort based on file offset rather
523than position within the magic file?
524.Pp
525The program should provide a way to give an estimate
526of
527.Sq how good
528a guess is.
529We end up removing guesses (e.g.
530.Sq From\
531as first 5 chars of file) because
532they are not as good as other guesses (e.g.
533.Sq Newsgroups:
534versus
535.Sq Return-Path:
536).
537Still, if the others don't pan out, it should be possible to use the
538first guess.
539.Pp
540This manual page, and particularly this section, is too long.
541.Sh RETURN CODE
542.Nm
543returns 0 on success, and non-zero on error.
544.Sh AVAILABILITY
545You can obtain the original author's latest version by anonymous FTP
546on
547.Dv ftp.astron.com
548in the directory
549.Dv /pub/file/file-X.YZ.tar.gz
550