1.\" $OpenBSD: file.1,v 1.15 2001/10/04 23:02:32 pjanzen Exp $ 2.\" $FreeBSD: src/usr.bin/file/file.1,v 1.16 2000/03/01 12:19:39 sheldonh Exp $ 3.Dd July 30, 1997 4.Dt FILE 1 5.Os 6.Sh NAME 7.Nm file 8.Nd determine file type 9.Sh SYNOPSIS 10.Nm file 11.Op Fl vczL 12.Op Fl f Ar namefile 13.Op Fl m Ar magicfiles 14.Ar file Op Ar ... 15.Sh DESCRIPTION 16This manual page documents version 3.22 of the 17.Nm 18command. 19.Nm 20tests each argument in an attempt to classify it. 21There are three sets of tests, performed in this order: 22filesystem tests, magic number tests, and language tests. 23The first test that succeeds causes the file type to be printed. 24.Pp 25The type printed will usually contain one of the words 26.Dq text 27(the file contains only 28.Tn ASCII 29characters and is probably safe to read on an 30.Tn ASCII 31terminal), 32.Dq executable 33(the file contains the result of compiling a program 34in a form understandable to some 35.Ux 36kernel or another), 37or 38.Dq data 39meaning anything else (data is usually binary or non-printable). 40.Pp 41Exceptions are well-known file formats (core files, tar archives) 42that are known to contain binary data. 43When modifying the file 44.Pa /etc/magic 45or the program itself, 46.Em "preserve these keywords" . 47.Pp 48People depend on knowing that all the readable files in a directory 49have the word 50.Dq text 51printed. 52Don't do as Berkeley did; change 53.Dq shell commands text 54to 55.Dq shell script . 56.Pp 57The filesystem tests are based on examining the return from a 58.Xr stat 2 59system call. 60The program checks to see if the file is empty, 61or if it's some sort of special file. 62Any known file types appropriate to the system you are running on 63(sockets, symbolic links, or named pipes (FIFOs) on those systems that 64implement them) 65are intuited if they are defined in 66the system header file 67.Aq Pa sys/stat.h . 68.Pp 69The magic number tests are used to check for files with data in 70particular fixed formats. 71The canonical example of this is a binary executable (compiled program) 72.Pa a.out 73file, whose format is defined in 74.Aq Pa a.out.h 75and possibly 76.Aq Pa exec.h 77in the standard include directory. 78These files have a 79.Dq magic number 80stored in a particular place 81near the beginning of the file that tells the 82.Ux 83operating system 84that the file is a binary executable, and which of several types thereof. 85.Pp 86The concept of magic number has been applied by extension to data files. 87Any file with some invariant identifier at a small fixed 88offset into the file can usually be described in this way. 89The information in these files is read from the magic file 90.Pa /etc/magic . 91.Pp 92If an argument appears to be an 93.Tn ASCII 94file, 95.Nm 96attempts to guess its language. 97The language tests look for particular strings (cf 98.Pa names.h ) 99that can appear anywhere in the first few blocks of a file. 100For example, the keyword 101.Em .br 102indicates that the file is most likely a 103.Xr troff 1 104input file, just as the keyword 105.Li struct 106indicates a C program. 107These tests are less reliable than the previous 108two groups, so they are performed last. 109The language test routines also test for some miscellany 110(such as 111.Xr tar 1 112archives) and determine whether an unknown file should be 113labelled as 114.Dq ASCII text 115or 116.Dq data . 117.Pp 118The options are as follows: 119.Bl -tag -width Ds 120.It Fl v 121Print the version of the program and exit. 122.It Fl m Ar list 123Specify an alternate 124.Ar list 125of files containing magic numbers. 126This can be a single file, or a colon-separated list of files. 127.It Fl z 128Try to look inside compressed files. 129.It Fl c 130Cause a checking printout of the parsed form of the magic file. 131This is usually used in conjunction with 132.Fl m 133to debug a new magic file before installing it. 134.It Fl f Ar namefile 135Read the names of the files to be examined from 136.Ar namefile 137(one per line) 138before the argument list. 139Either 140.Ar namefile 141or at least one filename argument must be present; 142to test the standard input, use 143.Dq - 144as a filename argument. 145.It Fl L 146Cause symlinks to be followed, as the like-named option in 147.Xr ls 1 . 148(on systems that support symbolic links). 149.El 150.Sh ENVIRONMENT 151.Bl -tag -width indent 152.It Ev MAGIC 153Default magic number files. 154.El 155.Sh FILES 156.Bl -tag -width /etc/magic -compact 157.It Pa /etc/magic 158default list of magic numbers 159.El 160.Sh SEE ALSO 161.Xr hexdump 1 , 162.Xr od 1 , 163.Xr strings 1 , 164.Xr magic 5 165.Sh STANDARDS CONFORMANCE 166This program is believed to exceed the System V Interface Definition 167of FILE(CMD), as near as one can determine from the vague language 168contained therein. 169Its behaviour is mostly compatible with the System V program of the same name. 170This version knows more magic, however, so it will produce 171different (albeit more accurate) output in many cases. 172.Pp 173The one significant difference 174between this version and System V 175is that this version treats any white space 176as a delimiter, so that spaces in pattern strings must be escaped. 177For example, 178.Pp 179>10 string language impress\ (imPRESS data) 180.Pp 181in an existing magic file would have to be changed to 182.Pp 183>10 string language\e impress (imPRESS data) 184.Pp 185In addition, in this version, if a pattern string contains a backslash, 186it must be escaped. 187For example 188.Pp 1890 string \ebegindata Andrew Toolkit document 190.Pp 191in an existing magic file would have to be changed to 192.Pp 1930 string \e\ebegindata Andrew Toolkit document 194.Pp 195SunOS releases 3.2 and later from Sun Microsystems include a 196.Xr file 1 197command derived from the System V one, but with some extensions. 198My version differs from Sun's only in minor ways. 199It includes the extension of the 200.Ql & 201operator, used as, 202for example, 203.Pp 204>16 long&0x7fffffff >0 not stripped 205.Sh MAGIC DIRECTORY 206The magic file entries have been collected from various sources, 207mainly USENET, and contributed by various authors. 208.An Christos Zoulas 209(address below) will collect additional 210or corrected magic file entries. 211A consolidation of magic file entries 212will be distributed periodically. 213The order of entries in the magic file is significant. 214Depending on what system you are using, the order that 215they are put together may be incorrect. 216If your old 217.Nm 218command uses a magic file, 219keep the old magic file around for comparison purposes 220(rename it to 221.Pa /etc/magic.orig ) . 222.Sh HISTORY 223There has been a 224.Nm 225command in every 226.Ux 227since at least Research Version 6 228(man page dated January, 1975). 229The System V version introduced one significant major change: 230the external list of magic number types. 231This slowed the program down slightly but made it a lot more flexible. 232.Pp 233This program, based on the System V version, was written by 234.An Ian F. Darwin Aq ian@darwinisys.com 235without looking at anybody else's source code. 236.Pp 237.An John Gilmore 238revised the code extensively, making it better than 239the first version. 240.An Geoff Collyer 241found several inadequacies 242and provided some magic file entries. 243.Pp 244Altered by 245.An Rob McMahon Aq cudcv@warwick.ac.uk , 2461989, to extend the 247.Ql & 248operator from simple 249.Dq x&y != 0 250to 251.Dq x&y op z . 252.Pp 253Altered by 254.An Guy Harris Aq guy@auspex.com , 2551993, to: 256.Bl -item -offset indent 257.It 258put the 259.Dq old-style 260.Ql & 261operator back the way it was, because 262.Bl -enum -offset indent 263.It 264Rob McMahon's change broke the 265previous style of usage, 266.It 267The SunOS 268.Dq new-style 269.Ql & 270operator, which this version of 271.Nm 272supports, also handles 273.Dq x&y op z , 274.It 275Rob's change wasn't documented in any case; 276.El 277.It 278put in multiple levels of 279.Ql > ; 280.It 281put in 282.Dq beshort , 283.Dq leshort , 284etc. keywords to look at numbers in the 285file in a specific byte order, rather than in the native byte order of 286the process running 287.Nm file . 288.El 289.Pp 290Currently maintained by 291.An Christos Zoulas Aq christos@zoulas.com . 292.Sh LEGAL NOTICE 293Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 294Covered by the standard Berkeley Software Distribution copyright; see the file 295LEGAL.NOTICE in the distribution. 296.Pp 297The files 298.Pa tar.h 299and 300.Pa is_tar.c 301were written by 302.An John Gilmore 303from his public-domain 304.Nm tar 305program. 306.Sh BUGS 307There must be a better way to automate the construction of the Magic 308file from all the glop in Magdir. 309What is it? 310Better yet, the magic file should be compiled into binary (say, 311.Xr ndbm 3 312or, better yet, fixed-length 313.Tn ASCII 314strings for use in heterogenous network environments) for faster startup. 315Then the program would run as fast as the Version 7 program of the same name, 316with the flexibility of the System V version. 317.Pp 318.Nm 319uses several algorithms that favor speed over accuracy; 320thus it can be misled about the contents of 321.Tn ASCII 322files. 323.Pp 324The support for 325.Tn ASCII 326files (primarily for programming languages) 327is simplistic, inefficient and requires recompilation to update. 328.Pp 329There should be an 330.Dq else 331clause to follow a series of continuation lines. 332.Pp 333The magic file and keywords should have regular expression support. 334Their use of 335.Tn ASCII TAB 336as a field delimiter is ugly and makes 337it hard to edit the files, but is entrenched. 338.Pp 339It might be advisable to allow upper-case letters in keywords 340for e.g., 341.Xr troff 1 342commands vs man page macros. 343Regular expression support would make this easy. 344.Pp 345The program doesn't grok \s-2FORTRAN\s0. 346It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which 347appear indented at the start of line. 348Regular expression support would make this easy. 349.Pp 350The list of keywords in 351.Em ascmagic 352probably belongs in the Magic file. 353This could be done by using some keyword like 354.Ql * 355for the offset value. 356.Pp 357Another optimization would be to sort 358the magic file so that we can just run down all the 359tests for the first byte, first word, first long, etc, once we 360have fetched it. 361Complain about conflicts in the magic file entries. 362Make a rule that the magic entries sort based on file offset rather 363than position within the magic file? 364.Pp 365The program should provide a way to give an estimate 366of 367.Dq how good 368a guess is. 369We end up removing guesses (e.g., 370.Dq From\ 371as first 5 chars of file) because 372they are not as good as other guesses (e.g., 373.Dq Newsgroups: 374versus 375.Qq Return-Path: ) . 376Still, if the others don't pan out, it should be 377possible to use the first guess. 378.Pp 379This program is slower than some vendors' 380.Nm 381commands. 382.Pp 383This manual page, and particularly this section, is too long. 384.Sh AVAILABILITY 385You can obtain the original author's latest version by anonymous FTP 386on 387.Em ftp.astron.com 388in the directory 389.Pa /pub/file/file-X.YY.tar.gz 390