1.\" $OpenBSD: file.1,v 1.21 2003/06/13 18:31:14 deraadt Exp $ 2.\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $ 3.\" 4.\" Copyright (c) Ian F. Darwin 1986-1995. 5.\" Software written by Ian F. Darwin and others; 6.\" maintained 1995-present by Christos Zoulas and others. 7.\" 8.\" Redistribution and use in source and binary forms, with or without 9.\" modification, are permitted provided that the following conditions 10.\" are met: 11.\" 1. Redistributions of source code must retain the above copyright 12.\" notice immediately at the beginning of the file, without modification, 13.\" this list of conditions, and the following disclaimer. 14.\" 2. Redistributions in binary form must reproduce the above copyright 15.\" notice, this list of conditions and the following disclaimer in the 16.\" documentation and/or other materials provided with the distribution. 17.\" 18.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 19.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 21.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR 22.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 24.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 25.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 26.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 27.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 28.\" SUCH DAMAGE. 29.\" 30.Dd July 30, 1997 31.Dt FILE 1 32.Os 33.Sh NAME 34.Nm file 35.Nd determine file type 36.Sh SYNOPSIS 37.Nm file 38.Op Fl vbczL 39.Op Fl f Ar namefile 40.Op Fl m Ar magicfiles 41.Ar file Op Ar ... 42.Sh DESCRIPTION 43This manual page documents version 3.22 of the 44.Nm 45command. 46.Nm 47tests each argument in an attempt to classify it. 48There are three sets of tests, performed in this order: 49filesystem tests, magic number tests, and language tests. 50The first test that succeeds causes the file type to be printed. 51.Pp 52The type printed will usually contain one of the words 53.Dq text 54(the file contains only 55.Tn ASCII 56characters and is probably safe to read on an 57.Tn ASCII 58terminal), 59.Dq executable 60(the file contains the result of compiling a program 61in a form understandable to some 62.Ux 63kernel or another), 64or 65.Dq data 66meaning anything else (data is usually binary or non-printable). 67.Pp 68Exceptions are well-known file formats (core files, tar archives) 69that are known to contain binary data. 70When modifying the file 71.Pa /etc/magic 72or the program itself, 73.Em "preserve these keywords" . 74.Pp 75People depend on knowing that all the readable files in a directory 76have the word 77.Dq text 78printed. 79Don't do as Berkeley did; change 80.Dq shell commands text 81to 82.Dq shell script . 83.Pp 84The filesystem tests are based on examining the return from a 85.Xr stat 2 86system call. 87The program checks to see if the file is empty, 88or if it's some sort of special file. 89Any known file types appropriate to the system you are running on 90(sockets, symbolic links, or named pipes (FIFOs) on those systems that 91implement them) 92are intuited if they are defined in 93the system header file 94.Aq Pa sys/stat.h . 95.Pp 96The magic number tests are used to check for files with data in 97particular fixed formats. 98The canonical example of this is a binary executable (compiled program) 99.Pa a.out 100file, whose format is defined in 101.Aq Pa a.out.h 102and possibly 103.Aq Pa exec.h 104in the standard include directory. 105These files have a 106.Dq magic number 107stored in a particular place 108near the beginning of the file that tells the 109.Ux 110operating system 111that the file is a binary executable, and which of several types thereof. 112.Pp 113The concept of magic number has been applied by extension to data files. 114Any file with some invariant identifier at a small fixed 115offset into the file can usually be described in this way. 116The information in these files is read from the magic file 117.Pa /etc/magic . 118.Pp 119If an argument appears to be an 120.Tn ASCII 121file, 122.Nm 123attempts to guess its language. 124The language tests look for particular strings (cf 125.Pa names.h ) 126that can appear anywhere in the first few blocks of a file. 127For example, the keyword 128.Em .br 129indicates that the file is most likely a 130.Xr troff 1 131input file, just as the keyword 132.Li struct 133indicates a C program. 134These tests are less reliable than the previous 135two groups, so they are performed last. 136The language test routines also test for some miscellany 137(such as 138.Xr tar 1 139archives) and determine whether an unknown file should be 140labelled as 141.Dq ASCII text 142or 143.Dq data . 144.Pp 145The options are as follows: 146.Bl -tag -width Ds 147.It Fl v 148Print the version of the program and exit. 149.It Fl m Ar list 150Specify an alternate 151.Ar list 152of files containing magic numbers. 153This can be a single file, or a colon-separated list of files. 154.It Fl z 155Try to look inside compressed files. 156.It Fl b 157Do not prepend filenames to output lines (brief mode). 158.It Fl c 159Cause a checking printout of the parsed form of the magic file. 160This is usually used in conjunction with 161.Fl m 162to debug a new magic file before installing it. 163.It Fl f Ar namefile 164Read the names of the files to be examined from 165.Ar namefile 166(one per line) 167before the argument list. 168Either 169.Ar namefile 170or at least one filename argument must be present; 171to test the standard input, use 172.Dq - 173as a filename argument. 174.It Fl L 175Cause symlinks to be followed, as the like-named option in 176.Xr ls 1 . 177(on systems that support symbolic links). 178.El 179.Sh ENVIRONMENT 180.Bl -tag -width indent 181.It Ev MAGIC 182Default magic number files. 183.El 184.Sh FILES 185.Bl -tag -width /etc/magic -compact 186.It Pa /etc/magic 187default list of magic numbers 188.El 189.Sh SEE ALSO 190.Xr hexdump 1 , 191.Xr od 1 , 192.Xr strings 1 , 193.Xr magic 5 194.Sh STANDARDS CONFORMANCE 195This program is believed to exceed the System V Interface Definition 196of FILE(CMD), as near as one can determine from the vague language 197contained therein. 198Its behaviour is mostly compatible with the System V program of the same name. 199This version knows more magic, however, so it will produce 200different (albeit more accurate) output in many cases. 201.Pp 202The one significant difference 203between this version and System V 204is that this version treats any white space 205as a delimiter, so that spaces in pattern strings must be escaped. 206For example, 207.Pp 208>10 string language impress\ (imPRESS data) 209.Pp 210in an existing magic file would have to be changed to 211.Pp 212>10 string language\e impress (imPRESS data) 213.Pp 214In addition, in this version, if a pattern string contains a backslash, 215it must be escaped. 216For example 217.Pp 2180 string \ebegindata Andrew Toolkit document 219.Pp 220in an existing magic file would have to be changed to 221.Pp 2220 string \e\ebegindata Andrew Toolkit document 223.Pp 224SunOS releases 3.2 and later from Sun Microsystems include a 225.Nm file 226command derived from the System V one, but with some extensions. 227My version differs from Sun's only in minor ways. 228It includes the extension of the 229.Ql & 230operator, used as, 231for example, 232.Pp 233>16 long&0x7fffffff >0 not stripped 234.Sh MAGIC DIRECTORY 235The magic file entries have been collected from various sources, 236mainly USENET, and contributed by various authors. 237.An Christos Zoulas 238(address below) will collect additional 239or corrected magic file entries. 240A consolidation of magic file entries 241will be distributed periodically. 242The order of entries in the magic file is significant. 243Depending on what system you are using, the order that 244they are put together may be incorrect. 245If your old 246.Nm 247command uses a magic file, 248keep the old magic file around for comparison purposes 249(rename it to 250.Pa /etc/magic.orig ) . 251.Sh HISTORY 252There has been a 253.Nm 254command in every 255.Ux 256since at least Research Version 4 257(man page dated November, 1973). 258The System V version introduced one significant major change: 259the external list of magic number types. 260This slowed the program down slightly but made it a lot more flexible. 261.Pp 262This program, based on the System V version, was written by 263.An Ian F. Darwin Aq ian@darwinisys.com 264without looking at anybody else's source code. 265.Pp 266.An John Gilmore 267revised the code extensively, making it better than 268the first version. 269.An Geoff Collyer 270found several inadequacies 271and provided some magic file entries. 272.Pp 273Altered by 274.An Rob McMahon Aq cudcv@warwick.ac.uk , 2751989, to extend the 276.Ql & 277operator from simple 278.Dq x&y != 0 279to 280.Dq x&y op z . 281.Pp 282Altered by 283.An Guy Harris Aq guy@auspex.com , 2841993, to: 285.Bl -item -offset indent 286.It 287put the 288.Dq old-style 289.Ql & 290operator back the way it was, because 291.Bl -enum -offset indent 292.It 293Rob McMahon's change broke the 294previous style of usage, 295.It 296The SunOS 297.Dq new-style 298.Ql & 299operator, which this version of 300.Nm 301supports, also handles 302.Dq x&y op z , 303.It 304Rob's change wasn't documented in any case; 305.El 306.It 307put in multiple levels of 308.Ql > ; 309.It 310put in 311.Dq beshort , 312.Dq leshort , 313etc. keywords to look at numbers in the 314file in a specific byte order, rather than in the native byte order of 315the process running 316.Nm file . 317.El 318.Pp 319Currently maintained by 320.An Christos Zoulas Aq christos@zoulas.com . 321.Sh LEGAL NOTICE 322Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 323Covered by the standard Berkeley Software Distribution copyright; see the file 324LEGAL.NOTICE in the distribution. 325.Pp 326The files 327.Pa tar.h 328and 329.Pa is_tar.c 330were written by 331.An John Gilmore 332from his public-domain 333.Nm tar 334program. 335.Sh BUGS 336There must be a better way to automate the construction of the Magic 337file from all the glop in Magdir. 338What is it? 339Better yet, the magic file should be compiled into binary (say, 340.Xr ndbm 3 341or, better yet, fixed-length 342.Tn ASCII 343strings for use in heterogenous network environments) for faster startup. 344Then the program would run as fast as the Version 7 program of the same name, 345with the flexibility of the System V version. 346.Pp 347.Nm 348uses several algorithms that favor speed over accuracy; 349thus it can be misled about the contents of 350.Tn ASCII 351files. 352.Pp 353The support for 354.Tn ASCII 355files (primarily for programming languages) 356is simplistic, inefficient and requires recompilation to update. 357.Pp 358There should be an 359.Dq else 360clause to follow a series of continuation lines. 361.Pp 362The magic file and keywords should have regular expression support. 363Their use of 364.Tn ASCII TAB 365as a field delimiter is ugly and makes 366it hard to edit the files, but is entrenched. 367.Pp 368It might be advisable to allow upper-case letters in keywords 369for e.g., 370.Xr troff 1 371commands vs man page macros. 372Regular expression support would make this easy. 373.Pp 374The program doesn't grok \s-2FORTRAN\s0. 375It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which 376appear indented at the start of line. 377Regular expression support would make this easy. 378.Pp 379The list of keywords in 380.Em ascmagic 381probably belongs in the Magic file. 382This could be done by using some keyword like 383.Ql * 384for the offset value. 385.Pp 386Another optimization would be to sort 387the magic file so that we can just run down all the 388tests for the first byte, first word, first long, etc, once we 389have fetched it. 390Complain about conflicts in the magic file entries. 391Make a rule that the magic entries sort based on file offset rather 392than position within the magic file? 393.Pp 394The program should provide a way to give an estimate 395of 396.Dq how good 397a guess is. 398We end up removing guesses (e.g., 399.Dq From\ \& 400as first 5 chars of file) because 401they are not as good as other guesses (e.g., 402.Dq Newsgroups: 403versus 404.Qq Return-Path: ) . 405Still, if the others don't pan out, it should be 406possible to use the first guess. 407.Pp 408This program is slower than some vendors' 409.Nm 410commands. 411.Pp 412This manual page, and particularly this section, is too long. 413.Sh AVAILABILITY 414You can obtain the original author's latest version by anonymous FTP 415on 416.Em ftp.astron.com 417in the directory 418.Pa /pub/file/file-X.YY.tar.gz . 419