1.\" $File: file.man,v 1.94 2011/04/20 19:08:44 christos Exp $ 2.Dd April 20, 2011 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Bk -words 11.Op Fl bchiklLNnprsvz0 12.Op Fl -apple 13.Op Fl -mime-encoding 14.Op Fl -mime-type 15.Op Fl e Ar testname 16.Op Fl F Ar separator 17.Op Fl f Ar namefile 18.Op Fl m Ar magicfiles 19.Ar 20.Ek -words 21.Nm 22.Fl C 23.Op Fl m Ar magicfiles 24.Nm 25.Op Fl -help 26.Sh DESCRIPTION 27This manual page documents version __VERSION__ of the 28.Nm 29command. 30.Pp 31.Nm 32tests each argument in an attempt to classify it. 33There are three sets of tests, performed in this order: 34filesystem tests, magic tests, and language tests. 35The 36.Em first 37test that succeeds causes the file type to be printed. 38.Pp 39The type printed will usually contain one of the words 40.Em text 41(the file contains only 42printing characters and a few common control 43characters and is probably safe to read on an 44.Dv ASCII 45terminal), 46.Em executable 47(the file contains the result of compiling a program 48in a form understandable to some 49.Dv UNIX 50kernel or another), 51or 52.Em data 53meaning anything else (data is usually 54.Sq binary 55or non-printable). 56Exceptions are well-known file formats (core files, tar archives) 57that are known to contain binary data. 58When modifying magic files or the program itself, make sure to 59.Em "preserve these keywords" . 60Users depend on knowing that all the readable files in a directory 61have the word 62.Sq text 63printed. 64Don't do as Berkeley did and change 65.Sq shell commands text 66to 67.Sq shell script . 68.Pp 69The filesystem tests are based on examining the return from a 70.Xr stat 2 71system call. 72The program checks to see if the file is empty, 73or if it's some sort of special file. 74Any known file types appropriate to the system you are running on 75(sockets, symbolic links, or named pipes (FIFOs) on those systems that 76implement them) 77are intuited if they are defined in 78the system header file 79.In sys/stat.h . 80.Pp 81The magic tests are used to check for files with data in 82particular fixed formats. 83The canonical example of this is a binary executable (compiled program) 84.Dv a.out 85file, whose format is defined in 86.In elf.h , 87.In a.out.h 88and possibly 89.In exec.h 90in the standard include directory. 91These files have a 92.Sq "magic number" 93stored in a particular place 94near the beginning of the file that tells the 95.Dv UNIX operating system 96that the file is a binary executable, and which of several types thereof. 97The concept of a 98.Sq "magic" 99has been applied by extension to data files. 100Any file with some invariant identifier at a small fixed 101offset into the file can usually be described in this way. 102The information identifying these files is read from the compiled 103magic file 104.Pa __MAGIC__.mgc , 105or the files in the directory 106.Pa __MAGIC__ 107if the compiled file does not exist. 108In addition, if 109.Pa $HOME/.magic.mgc 110or 111.Pa $HOME/.magic 112exists, it will be used in preference to the system magic files. 113.Pp 114If a file does not match any of the entries in the magic file, 115it is examined to see if it seems to be a text file. 116ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 117(such as those used on Macintosh and IBM PC systems), 118UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 119character sets can be distinguished by the different 120ranges and sequences of bytes that constitute printable text 121in each set. 122If a file passes any of these tests, its character set is reported. 123ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 124as 125.Sq text 126because they will be mostly readable on nearly any terminal; 127UTF-16 and EBCDIC are only 128.Sq character data 129because, while 130they contain text, it is text that will require translation 131before it can be read. 132In addition, 133.Nm 134will attempt to determine other characteristics of text-type files. 135If the lines of a file are terminated by CR, CRLF, or NEL, instead 136of the Unix-standard LF, this will be reported. 137Files that contain embedded escape sequences or overstriking 138will also be identified. 139.Pp 140Once 141.Nm 142has determined the character set used in a text-type file, 143it will 144attempt to determine in what language the file is written. 145The language tests look for particular strings (cf. 146.In names.h 147) that can appear anywhere in the first few blocks of a file. 148For example, the keyword 149.Em .br 150indicates that the file is most likely a 151.Xr troff 1 152input file, just as the keyword 153.Em struct 154indicates a C program. 155These tests are less reliable than the previous 156two groups, so they are performed last. 157The language test routines also test for some miscellany 158(such as 159.Xr tar 1 160archives). 161.Pp 162Any file that cannot be identified as having been written 163in any of the character sets listed above is simply said to be 164.Sq data . 165.Sh OPTIONS 166.Bl -tag -width indent 167.It Fl b , -brief 168Do not prepend filenames to output lines (brief mode). 169.It Fl C , -compile 170Write a 171.Pa magic.mgc 172output file that contains a pre-parsed version of the magic file or directory. 173.It Fl c , -checking-printout 174Cause a checking printout of the parsed form of the magic file. 175This is usually used in conjunction with the 176.Fl m 177flag to debug a new magic file before installing it. 178.It Fl e , -exclude Ar testname 179Exclude the test named in 180.Ar testname 181from the list of tests made to determine the file type. 182Valid test names are: 183.Bl -tag -width compress 184.It apptype 185.Dv EMX 186application type (only on EMX). 187.It ascii 188Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the 189.Sq encoding 190option). 191.It encoding 192Different text encodings for soft magic tests. 193.It tokens 194Looks for known tokens inside text files. 195.It cdf 196Prints details of Compound Document Files. 197.It compress 198Checks for, and looks inside, compressed files. 199.It elf 200Prints ELF file details. 201.It soft 202Consults magic files. 203.It tar 204Examines tar files. 205.El 206.It Fl F , -separator Ar separator 207Use the specified string as the separator between the filename and the 208file result returned. 209Defaults to 210.Sq \&: . 211.It Fl f , -files-from Ar namefile 212Read the names of the files to be examined from 213.Ar namefile 214(one per line) 215before the argument list. 216Either 217.Ar namefile 218or at least one filename argument must be present; 219to test the standard input, use 220.Sq - 221as a filename argument. 222.It Fl h , -no-dereference 223option causes symlinks not to be followed 224(on systems that support symbolic links). 225This is the default if the environment variable 226.Dv POSIXLY_CORRECT 227is not defined. 228.It Fl i , -mime 229Causes the file command to output mime type strings rather than the more 230traditional human readable ones. 231Thus it may say 232.Sq text/plain; charset=us-ascii 233rather than 234.Sq ASCII text . 235In order for this option to work, file changes the way 236it handles files recognized by the command itself (such as many of the 237text file types, directories etc), and makes use of an alternative 238.Sq magic 239file. 240(See the FILES section, below). 241.It Fl -mime-type , -mime-encoding 242Like 243.Fl i , 244but print only the specified element(s). 245.It Fl k , -keep-going 246Don't stop at the first match, keep going. 247Subsequent matches will be 248have the string 249.Sq "\[rs]012\- " 250prepended. 251(If you want a newline, see the 252.Sq "\-r" 253option.) 254.It Fl l , -list 255Print information about the strength of each magic pattern. 256.It Fl L , -dereference 257option causes symlinks to be followed, as the like-named option in 258.Xr ls 1 259(on systems that support symbolic links). 260This is the default if the environment variable 261.Dv POSIXLY_CORRECT 262is defined. 263.It Fl l 264Shows sorted patterns list in the order which is used for the matching. 265.It Fl m , -magic-file Ar magicfiles 266Specify an alternate list of files and directories containing magic. 267This can be a single item, or a colon-separated list. 268If a compiled magic file is found alongside a file or directory, it will be used instead. 269.It Fl N , -no-pad 270Don't pad filenames so that they align in the output. 271.It Fl n , -no-buffer 272Force stdout to be flushed after checking each file. 273This is only useful if checking a list of files. 274It is intended to be used by programs that want filetype output from a pipe. 275.It Fl p , -preserve-date 276On systems that support 277.Xr utime 2 278or 279.Xr utimes 2 , 280attempt to preserve the access time of files analyzed, to pretend that 281.Nm 282never read them. 283.It Fl r , -raw 284Don't translate unprintable characters to \eooo. 285Normally 286.Nm 287translates unprintable characters to their octal representation. 288.It Fl s , -special-files 289Normally, 290.Nm 291only attempts to read and determine the type of argument files which 292.Xr stat 2 293reports are ordinary files. 294This prevents problems, because reading special files may have peculiar 295consequences. 296Specifying the 297.Fl s 298option causes 299.Nm 300to also read argument files which are block or character special files. 301This is useful for determining the filesystem types of the data in raw 302disk partitions, which are block special files. 303This option also causes 304.Nm 305to disregard the file size as reported by 306.Xr stat 2 307since on some systems it reports a zero size for raw disk partitions. 308.It Fl v , -version 309Print the version of the program and exit. 310.It Fl z , -uncompress 311Try to look inside compressed files. 312.It Fl 0 , -print0 313Output a null character 314.Sq \e0 315after the end of the filename. 316Nice to 317.Xr cut 1 318the output. 319This does not affect the separator which is still printed. 320.It Fl -help 321Print a help message and exit. 322.El 323.Sh FILES 324.Bl -tag -width __MAGIC__.mgc -compact 325.It Pa __MAGIC__.mgc 326Default compiled list of magic. 327.It Pa __MAGIC__ 328Directory containing default magic files. 329.El 330.Sh ENVIRONMENT 331The environment variable 332.Dv MAGIC 333can be used to set the default magic file name. 334If that variable is set, then 335.Nm 336will not attempt to open 337.Pa $HOME/.magic . 338.Nm 339adds 340.Sq .mgc 341to the value of this variable as appropriate. 342However, 343.Pa file 344has to exist in order for 345.Pa file.mime 346to be considered. 347The environment variable 348.Dv POSIXLY_CORRECT 349controls (on systems that support symbolic links), whether 350.Nm 351will attempt to follow symlinks or not. 352If set, then 353.Nm 354follows symlink, otherwise it does not. 355This is also controlled by the 356.Fl L 357and 358.Fl h 359options. 360.Sh SEE ALSO 361.Xr magic __FSECTION__ , 362.Xr strings 1 , 363.Xr od 1 , 364.Xr hexdump 1 , 365.Xr file 1posix 366.Sh STANDARDS CONFORMANCE 367This program is believed to exceed the System V Interface Definition 368of FILE(CMD), as near as one can determine from the vague language 369contained therein. 370Its behavior is mostly compatible with the System V program of the same name. 371This version knows more magic, however, so it will produce 372different (albeit more accurate) output in many cases. 373.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 374.Pp 375The one significant difference 376between this version and System V 377is that this version treats any white space 378as a delimiter, so that spaces in pattern strings must be escaped. 379For example, 380.Bd -literal -offset indent 381>10 string language impress\ (imPRESS data) 382.Ed 383.Pp 384in an existing magic file would have to be changed to 385.Bd -literal -offset indent 386>10 string language\e impress (imPRESS data) 387.Ed 388.Pp 389In addition, in this version, if a pattern string contains a backslash, 390it must be escaped. 391For example 392.Bd -literal -offset indent 3930 string \ebegindata Andrew Toolkit document 394.Ed 395.Pp 396in an existing magic file would have to be changed to 397.Bd -literal -offset indent 3980 string \e\ebegindata Andrew Toolkit document 399.Ed 400.Pp 401SunOS releases 3.2 and later from Sun Microsystems include a 402.Nm 403command derived from the System V one, but with some extensions. 404My version differs from Sun's only in minor ways. 405It includes the extension of the 406.Sq & 407operator, used as, 408for example, 409.Bd -literal -offset indent 410>16 long&0x7fffffff >0 not stripped 411.Ed 412.Sh MAGIC DIRECTORY 413The magic file entries have been collected from various sources, 414mainly USENET, and contributed by various authors. 415Christos Zoulas (address below) will collect additional 416or corrected magic file entries. 417A consolidation of magic file entries 418will be distributed periodically. 419.Pp 420The order of entries in the magic file is significant. 421Depending on what system you are using, the order that 422they are put together may be incorrect. 423If your old 424.Nm 425command uses a magic file, 426keep the old magic file around for comparison purposes 427(rename it to 428.Pa __MAGIC__.orig ). 429.Sh EXAMPLES 430.Bd -literal -offset indent 431$ file file.c file /dev/{wd0a,hda} 432file.c: C program text 433file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 434 dynamically linked (uses shared libs), stripped 435/dev/wd0a: block special (0/0) 436/dev/hda: block special (3/0) 437 438$ file -s /dev/wd0{b,d} 439/dev/wd0b: data 440/dev/wd0d: x86 boot sector 441 442$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 443/dev/hda: x86 boot sector 444/dev/hda1: Linux/i386 ext2 filesystem 445/dev/hda2: x86 boot sector 446/dev/hda3: x86 boot sector, extended partition table 447/dev/hda4: Linux/i386 ext2 filesystem 448/dev/hda5: Linux/i386 swap file 449/dev/hda6: Linux/i386 swap file 450/dev/hda7: Linux/i386 swap file 451/dev/hda8: Linux/i386 swap file 452/dev/hda9: empty 453/dev/hda10: empty 454 455$ file -i file.c file /dev/{wd0a,hda} 456file.c: text/x-c 457file: application/x-executable 458/dev/hda: application/x-not-regular-file 459/dev/wd0a: application/x-not-regular-file 460 461.Ed 462.Sh HISTORY 463There has been a 464.Nm 465command in every 466.Dv UNIX since at least Research Version 4 467(man page dated November, 1973). 468The System V version introduced one significant major change: 469the external list of magic types. 470This slowed the program down slightly but made it a lot more flexible. 471.Pp 472This program, based on the System V version, 473was written by Ian Darwin 474.Aq ian@darwinsys.com 475without looking at anybody else's source code. 476.Pp 477John Gilmore revised the code extensively, making it better than 478the first version. 479Geoff Collyer found several inadequacies 480and provided some magic file entries. 481Contributions by the `&' operator by Rob McMahon 482.Aq cudcv@warwick.ac.uk , 4831989. 484.Pp 485Guy Harris 486.Aq guy@netapp.com 487made many changes from 1993 to the present. 488.Pp 489Primary development and maintenance from 1990 to the present by 490Christos Zoulas 491.Aq christos@astron.com . 492.Pp 493Altered by Chris Lowth 494.Aq chris@lowth.com , 4952000: handle the 496.Fl i 497option to output mime type strings, using an alternative 498magic file and internal logic. 499.Pp 500Altered by Eric Fischer 501.Aq enf@pobox.com 502July, 2000, to identify character codes and attempt to identify the languages 503of non-ASCII files. 504.Pp 505Altered by Reuben Thomas 506.Aq rrt@sc3d.org , 5072007-2011, to improve MIME support, merge MIME and non-MIME magic, 508support directories as well as files of magic, apply many bug fixes, 509update and fix a lot of magic, improve the build system, improve the 510documentation, and rewrite the Python bindings in pure Python. 511.Pp 512The list of contributors to the 513.Sq magic 514directory (magic files) 515is too long to include here. 516You know who you are; thank you. 517Many contributors are listed in the source files. 518.Sh LEGAL NOTICE 519Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 520Covered by the standard Berkeley Software Distribution copyright; see the file 521COPYING in the source distribution. 522.Pp 523The files 524.Dv tar.h 525and 526.Dv is_tar.c 527were written by John Gilmore from his public-domain 528.Xr tar 1 529program, and are not covered by the above license. 530.Sh RETURN CODE 531.Nm 532returns 0 on success, and non-zero on error. 533.Sh BUGS 534.Pp 535Please report bugs and send patches to the bug tracker at 536.Pa http://bugs.gw.com/ 537or the mailing list at 538.Aq file@mx.gw.com . 539.Sh TODO 540.Pp 541Fix output so that tests for MIME and APPLE flags are not needed all 542over the place, and actual output is only done in one place. This 543needs a design. Suggestion: push possible outputs on to a list, then 544pick the last-pushed (most specific, one hopes) value at the end, or 545use a default if the list is empty. This should not slow down evaluation. 546.Pp 547Continue to squash all magic bugs. See Debian BTS for a good source. 548.Pp 549Store arbitrarily long strings, for example for %s patterns, so that 550they can be printed out. Fixes Debian bug #271672. Would require more 551complex store/load code in apprentice. 552.Pp 553Add syntax for relative offsets after current level (Debian bug #466037). 554.Pp 555Make file -ki work, i.e. give multiple MIME types. 556.Pp 557Add a zip library so we can peek inside Office2007 documents to 558figure out what they are. 559.Pp 560Don't complain when ~/.magic is not compiled. 561.Pp 562Add an option to print URLs for the sources of the file descriptions. 563.Sh AVAILABILITY 564You can obtain the original author's latest version by anonymous FTP 565from 566.Dv ftp.astron.com 567in the directory 568.Dv /pub/file/file-X.YZ.tar.gz 569