xref: /386bsd/usr/share/man/cat1/file.0 (revision a2142627)
1ascii..SH NAME - determine file type [ ] [ namefile ] [ magicfile
2] file ...  tests each argument in an  attempt  to  classify  it.
3There   are  three  sets  of  tests,  performed  in  this  order:
4filesystem tests, magic number tests, and  language  tests.   The
5test  that succeeds causes the file type to be printed.  The type
6printed will usually contain one of the words (the file  contains
7only  ASCII  characters  and is probably safe to read on an ASCII
8terminal), (the file contains the result of compiling  a  program
9in  a  form  understandable  to  some UNIX kernel or another), or
10meaning  anything  else  (data  is  usually  `binary'   or   non-
11printable).   Exceptions are well-known file formats (core files,
12tar archives) that  are  known  to  contain  binary  data.   When
13modifying  the  file  or  the  program  itself,  People depend on
14knowing that all the readable files in a directory have the  word
15``text''  printed.   Don't do as one computer vendor did - change
16``shell commands text''  to  ``shell  script''.   The  filesystem
17tests  are based on examining the return from a system call.  The
18program checks to see if the file is empty, or if it's some  sort
19of  special file.  Any known file types appropriate to the system
20you are running on (sockets and symbolic links on  4.2BSD,  named
21pipes  (FIFOs)  on  System V) are intuited if they are defined in
22the system header file The magic number tests are used  to  check
23for  files  with data in particular fixed formats.  The canonical
24example of this is a binary executable (compiled  program)  file,
25whose  format  is defined in and possibly in the standard include
26directory.  These  files  have  a  `magic  number'  stored  in  a
27particular  place  near  the beginning of the file that tells the
28UNIX operating system that the file is a binary  executable,  and
29which  of  several  types thereof.  The concept of `magic number'
30has been applied by extension to data files.  Any file with  some
31invariant  identifier  at  a small fixed offset into the file can
32usually be described in this way.  The information in these files
33is read from the magic file If an argument appears to be an file,
34attempts to guess its language.   The  language  tests  look  for
35particular  strings  (cf _n_a_m_e_s._h) that can appear anywhere in the
36first few blocks of a file.  For example, the  keyword  indicates
37that  the  file  is  most  likely a troff input file, just as the
38keyword indicates a C program.  These  tests  are  less  reliable
39than  the  previous  two groups, so they are performed last.  The
40language test routines also test for  some  miscellany  (such  as
41archives)  and  determine  whether  an  unknown  file  should  be
42labelled as `ascii text' or `data'.  Use to specify an  alternate
43file  of magic numbers.  The option causes a checking printout of
44the parsed form of the magic  file.   This  is  usually  used  in
45conjunction  with to debug a new magic file before installing it.
46The option specifies that the names of the files to  be  examined
47are  to  be  read  (one  per line) from before the argument list.
48Either or at least one filename argument must be present; to test
49the  standard input, use ``-'' as a filename argument.  - default
50list of magic numbers - description  of  magic  file  format.   -
51tools  for  examining non-textfiles.  This program is believed to
52exceed the System V Interface Definition of FILE(CMD), as near as
53one can determine from the vague language contained therein.  Its
54behaviour is mostly compatible with the System V program  of  the
55same  name.   This  version knows more magic, however, so it will
56produce different (albeit more accurate) output  in  many  cases.
57The  one significant difference between this version and System V
58is that this version treats any white space as  a  delimiter,  so
59that  spaces  in  pattern  strings must be escaped.  For example,
60>10  string    language   impress      (imPRESS   data)   in   an
61existing    magic    file   would   have   to   be   changed   to
62>10  string    language\   impress (imPRESS   data)    The    Sun
63Microsystems  implementation of System V compatibility includes a
64file(1) command that has some  extentions.   My  version  differs
65from  Sun's  only  in minor ways.  The significant one is the `&'
66operator,  which  Sun's  program   expects   as,   for   example,
67>16  long&0x7fffffff     >0        not  stripped would be entered
68in my version as >16  long &0x7fffffff    not stripped which is a
69little  less general; it simply tests (location 16)&0x7ffffff and
70returns its truth value  as  a  C  expression.   The  magic  file
71entries  have been collected from various sources, mainly USENET,
72and contributed by various authors.  Ian Darwin  (address  below)
73will  collect  additional  or  corrected  magic  file entries.  A
74consolidation  of  magic  file  entries   will   be   distributed
75periodically.   The  order  of  entries  in  the  magic  file  is
76significant.  Depending on what system you are using,  the  order
77that they are put together may be incorrect.  If your old command
78uses a magic file, keep the old magic file around for  comparison
79purposes  (rename  it  to  There has been a command in every UNIX
80since at least Research Version 6 (man page dated January, 1975).
81The System V version introduced one significant major change: the
82external list of magic number types.   This  slowed  the  program
83down  slightly  but  made  it a lot more flexible.  This program,
84based on the System V version, was written by Ian Darwin  without
85looking  at anybody else's source code.  John Gilmore revised the
86code extensively, making it better than the first version.  Geoff
87Collyer  found  several inadequacies and provided some magic file
88entries.  The program has undergone  continued  evolution  since.
89Copyright  (c)  Ian F. Darwin,  1986 and 1987.  Written by Ian F.
90Darwin, UUCP address {utzoo | ihnp4}!darwin!ian, Internet address
91ian@sq.com,  postal  address:  P.O.  Box 603, Station F, Toronto,
92Ontario, CANADA M4Y 2L8.  and written by and copyright  by  Henry
93Spencer,  utzoo!henry.   This  software  is  not  subject  to any
94license of the American Telephone and Telegraph Company or of the
95Regents  of  the University of California.  Permission is granted
96to anyone to use this software for any purpose  on  any  computer
97system,  and  to  alter it and redistribute it freely, subject to
98the following restrictions: 1. The author is not responsible  for
99the  consequences  of  use of this software, no matter how awful,
100even if they arise from flaws in  it.   2.  The  origin  of  this
101software  must not be misrepresented, either by explicit claim or
102by omission.  Since few users ever  read  sources,  credits  must
103appear in the documentation.  3. Altered versions must be plainly
104marked as such, and must  not  be  misrepresented  as  being  the
105original  software.   Since  few users ever read sources, credits
106must appear in the documentation.  4.  This  notice  may  not  be
107removed  or  altered.   A  few  support  files  (_g_e_t_o_p_t,  _s_t_r_t_o_k)
108distributed with this  package  are  by  Henry  Spencer  and  are
109subject  to  the same terms as above.  A few simple support files
110(_s_t_r_t_o_l, _s_t_r_c_h_r) distributed with this package are in the  public
111domain;  they  are so marked.  The files and were written by John
112Gilmore from his public-domain program, and are  not  covered  by
113the  above  restrictions.   There  must  be a way to automate the
114construction of the Magic file from all the glop in magdir.  What
115is  it?   uses several algorithms that favor speed over accuracy,
116thus it can be misled about the contents  of  ASCII  files.   The
117support  for ASCII files (primarily for programming languages) is
118simplistic, inefficient and  requires  recompilation  to  update.
119Should  there  be  an  ``else''  clause  to  follow  a  series of
120continuation lines?  Is it worthwhile to implement recursive file
121inspection,  so  that  compressed files, uuencoded, etc., can say
122``compressed  ascii  text''  or  ``compressed   executable''   or
123``compressed  tar  archive"  or  whatever?   The  magic  file and
124keywords should have regular expression  support.   It  might  be
125advisable to allow upper-case letters in keywords for e.g., troff
126commands vs man page macros.  Regular  expression  support  would
127make  this easy.  The program doesn't grok FORTRAN.  It should be
128able to figure FORTRAN  by  seeing  some  keywords  which  appear
129indented  at the start of line.  Regular expression support would
130make this easy.  The list of keywords in probably belongs in  the
131Magic  file.   This  could be done by using some keyword like `*'
132for the offset value.  The program should malloc the  magic  file
133structures,  rather  than using a fixed-size array as at present.
134The magic file should be compiled into  binary  (or  better  yet,
135fixed-length  ASCII  strings  for  use  in  heterogenous  network
136environments) for faster startup.  Then the program would run  as
137fast  as  the  Version  7  program  of  the  same  name, with the
138flexibility of the System V version.  But then there  would  have
139to   be   yet   another  magic  number  for  the  file.   Another
140optimisation would be to sort the magic file so that we can  just
141run  down  all  the  tests  for the first byte, first word, first
142long, etc, once we have fetched it.  Complain about conflicts  in
143the  magic file entries.  Make a rule that the magic entries sort
144based on file offset rather than position within the magic  file?
145The  program  should  provide  a way to give an estimate of ``how
146good'' a guess is.  We end up removing guesses (e.g. ``From '' as
147first  5  chars  of  file)  because they are not as good as other
148guesses (e.g. ``Newsgroups:'' versus "Return-Path:").  Still,  if
149the  others don't pan out, it should be possible to use the first
150guess.  Perhaps the program should automatically  try  all  tests
151with  byte-swapping done, to avoid having to figure out the byte-
152swapped values when constructing the magic file.  Of course  this
153will  run  more slowly, so it should probably be an option (-a?).
154This manual page, and particularly this section, is too long.
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199