1.\" $File: file.man,v 1.79 2008/11/06 22:49:08 rrt Exp $ 2.Dd October 9, 2008 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Op Fl bchikLnNprsvz 11.Op Fl -mime-type 12.Op Fl -mime-encoding 13.Op Fl f Ar namefile 14.Op Fl F Ar separator 15.Op Fl m Ar magicfiles 16.Ar file 17.Nm 18.Fl C 19.Op Fl m Ar magicfile 20.Nm 21.Op Fl -help 22.Sh DESCRIPTION 23This manual page documents version __VERSION__ of the 24.Nm 25command. 26.Pp 27.Nm 28tests each argument in an attempt to classify it. 29There are three sets of tests, performed in this order: 30filesystem tests, magic tests, and language tests. 31The 32.Em first 33test that succeeds causes the file type to be printed. 34.Pp 35The type printed will usually contain one of the words 36.Em text 37(the file contains only 38printing characters and a few common control 39characters and is probably safe to read on an 40.Dv ASCII 41terminal), 42.Em executable 43(the file contains the result of compiling a program 44in a form understandable to some 45.Dv UNIX 46kernel or another), 47or 48.Em data 49meaning anything else (data is usually 50.Sq binary 51or non-printable). 52Exceptions are well-known file formats (core files, tar archives) 53that are known to contain binary data. 54When modifying magic files or the program itself, make sure to 55.Em "preserve these keywords" . 56Users depend on knowing that all the readable files in a directory 57have the word 58.Sq text 59printed. 60Don't do as Berkeley did and change 61.Sq shell commands text 62to 63.Sq shell script . 64.Pp 65The filesystem tests are based on examining the return from a 66.Xr stat 2 67system call. 68The program checks to see if the file is empty, 69or if it's some sort of special file. 70Any known file types appropriate to the system you are running on 71(sockets, symbolic links, or named pipes (FIFOs) on those systems that 72implement them) 73are intuited if they are defined in 74the system header file 75.In sys/stat.h . 76.Pp 77The magic tests are used to check for files with data in 78particular fixed formats. 79The canonical example of this is a binary executable (compiled program) 80.Dv a.out 81file, whose format is defined in 82.In elf.h , 83.In a.out.h 84and possibly 85.In exec.h 86in the standard include directory. 87These files have a 88.Sq "magic number" 89stored in a particular place 90near the beginning of the file that tells the 91.Dv UNIX operating system 92that the file is a binary executable, and which of several types thereof. 93The concept of a 94.Sq "magic" 95has been applied by extension to data files. 96Any file with some invariant identifier at a small fixed 97offset into the file can usually be described in this way. 98The information identifying these files is read from the compiled 99magic file 100.Pa __MAGIC__.mgc , 101or the files in the directory 102.Pa __MAGIC__ 103if the compiled file does not exist. In addition, if 104.Pa $HOME/.magic.mgc 105or 106.Pa $HOME/.magic 107exists, it will be used in preference to the system magic files. 108.Pp 109If a file does not match any of the entries in the magic file, 110it is examined to see if it seems to be a text file. 111ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 112(such as those used on Macintosh and IBM PC systems), 113UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 114character sets can be distinguished by the different 115ranges and sequences of bytes that constitute printable text 116in each set. 117If a file passes any of these tests, its character set is reported. 118ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 119as 120.Sq text 121because they will be mostly readable on nearly any terminal; 122UTF-16 and EBCDIC are only 123.Sq character data 124because, while 125they contain text, it is text that will require translation 126before it can be read. 127In addition, 128.Nm 129will attempt to determine other characteristics of text-type files. 130If the lines of a file are terminated by CR, CRLF, or NEL, instead 131of the Unix-standard LF, this will be reported. 132Files that contain embedded escape sequences or overstriking 133will also be identified. 134.Pp 135Once 136.Nm 137has determined the character set used in a text-type file, 138it will 139attempt to determine in what language the file is written. 140The language tests look for particular strings (cf. 141.In names.h 142) that can appear anywhere in the first few blocks of a file. 143For example, the keyword 144.Em .br 145indicates that the file is most likely a 146.Xr troff 1 147input file, just as the keyword 148.Em struct 149indicates a C program. 150These tests are less reliable than the previous 151two groups, so they are performed last. 152The language test routines also test for some miscellany 153(such as 154.Xr tar 1 155archives). 156.Pp 157Any file that cannot be identified as having been written 158in any of the character sets listed above is simply said to be 159.Sq data . 160.Sh OPTIONS 161.Bl -tag -width indent 162.It Fl b , -brief 163Do not prepend filenames to output lines (brief mode). 164.It Fl c , -checking-printout 165Cause a checking printout of the parsed form of the magic file. 166This is usually used in conjunction with the 167.Fl m 168flag to debug a new magic file before installing it. 169.It Fl C , -compile 170Write a 171.Pa magic.mgc 172output file that contains a pre-parsed version of the magic file or directory. 173.It Fl e , -exclude Ar testname 174Exclude the test named in 175.Ar testname 176from the list of tests made to determine the file type. Valid test names 177are: 178.Bl -tag -width 179.It apptype 180.Dv EMX 181application type (only on EMX). 182.It text 183Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the 184.Sq encoding 185option). 186.It encoding 187Different text encodings for soft magic tests. 188.It tokens 189Looks for known tokens inside text files. 190.It cdf 191Prints details of Compound Document Files. 192.It compress 193Checks for, and looks inside, compressed files. 194.It elf 195Prints ELF file details. 196.It soft 197Consults magic files. 198.It tar 199Examines tar files. 200.El 201.It Fl f , -files-from Ar namefile 202Read the names of the files to be examined from 203.Ar namefile 204(one per line) 205before the argument list. 206Either 207.Ar namefile 208or at least one filename argument must be present; 209to test the standard input, use 210.Sq - 211as a filename argument. 212.It Fl F , -separator Ar separator 213Use the specified string as the separator between the filename and the 214file result returned. Defaults to 215.Sq \&: . 216.It Fl h , -no-dereference 217option causes symlinks not to be followed 218(on systems that support symbolic links). This is the default if the 219environment variable 220.Dv POSIXLY_CORRECT 221is not defined. 222.It Fl i , -mime 223Causes the file command to output mime type strings rather than the more 224traditional human readable ones. Thus it may say 225.Sq text/plain; charset=us-ascii 226rather than 227.Sq ASCII text . 228In order for this option to work, file changes the way 229it handles files recognized by the command itself (such as many of the 230text file types, directories etc), and makes use of an alternative 231.Sq magic 232file. 233(See the FILES section, below). 234.It Fl -mime-type , -mime-encoding 235Like 236.Fl i , 237but print only the specified element(s). 238.It Fl k , -keep-going 239Don't stop at the first match, keep going. Subsequent matches will be 240have the string 241.Sq "\[rs]012\- " 242prepended. 243(If you want a newline, see the 244.Sq "\-r" 245option.) 246.It Fl L , -dereference 247option causes symlinks to be followed, as the like-named option in 248.Xr ls 1 249(on systems that support symbolic links). 250This is the default if the environment variable 251.Dv POSIXLY_CORRECT 252is defined. 253.It Fl m , -magic-file Ar list 254Specify an alternate list of files and directories containing magic. 255This can be a single item, or a colon-separated list. 256If a compiled magic file is found alongside a file or directory, it will be used instead. 257.It Fl n , -no-buffer 258Force stdout to be flushed after checking each file. 259This is only useful if checking a list of files. 260It is intended to be used by programs that want filetype output from a pipe. 261.It Fl N , -no-pad 262Don't pad filenames so that they align in the output. 263.It Fl p , -preserve-date 264On systems that support 265.Xr utime 2 266or 267.Xr utimes 2 , 268attempt to preserve the access time of files analyzed, to pretend that 269.Nm 270never read them. 271.It Fl r , -raw 272Don't translate unprintable characters to \eooo. 273Normally 274.Nm 275translates unprintable characters to their octal representation. 276.It Fl s , -special-files 277Normally, 278.Nm 279only attempts to read and determine the type of argument files which 280.Xr stat 2 281reports are ordinary files. 282This prevents problems, because reading special files may have peculiar 283consequences. 284Specifying the 285.Fl s 286option causes 287.Nm 288to also read argument files which are block or character special files. 289This is useful for determining the filesystem types of the data in raw 290disk partitions, which are block special files. 291This option also causes 292.Nm 293to disregard the file size as reported by 294.Xr stat 2 295since on some systems it reports a zero size for raw disk partitions. 296.It Fl v , -version 297Print the version of the program and exit. 298.It Fl z , -uncompress 299Try to look inside compressed files. 300.It Fl 0 , -print0 301Output a null character 302.Sq \e0 303after the end of the filename. Nice to 304.Xr cut 1 305the output. This does not affect the separator which is still printed. 306.It Fl -help 307Print a help message and exit. 308.El 309.Sh FILES 310.Bl -tag -width __MAGIC__.mgc -compact 311.It Pa __MAGIC__.mgc 312Default compiled list of magic. 313.It Pa __MAGIC__ 314Directory containing default magic files. 315.El 316.Sh ENVIRONMENT 317The environment variable 318.Dv MAGIC 319can be used to set the default magic file name. 320If that variable is set, then 321.Nm 322will not attempt to open 323.Pa $HOME/.magic . 324.Nm 325adds 326.Sq .mgc 327to the value of this variable as appropriate. 328The environment variable 329.Dv POSIXLY_CORRECT 330controls (on systems that support symbolic links), whether 331.Nm 332will attempt to follow symlinks or not. If set, then 333.Nm 334follows symlink, otherwise it does not. This is also controlled 335by the 336.Fl L 337and 338.Fl h 339options. 340.Sh SEE ALSO 341.Xr magic __FSECTION__ , 342.Xr strings 1 , 343.Xr od 1 , 344.Xr hexdump 1, 345.Xr file 1posix 346.Sh STANDARDS CONFORMANCE 347This program is believed to exceed the System V Interface Definition 348of FILE(CMD), as near as one can determine from the vague language 349contained therein. 350Its behavior is mostly compatible with the System V program of the same name. 351This version knows more magic, however, so it will produce 352different (albeit more accurate) output in many cases. 353.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 354.Pp 355The one significant difference 356between this version and System V 357is that this version treats any white space 358as a delimiter, so that spaces in pattern strings must be escaped. 359For example, 360.Bd -literal -offset indent 361>10 string language impress\ (imPRESS data) 362.Ed 363.Pp 364in an existing magic file would have to be changed to 365.Bd -literal -offset indent 366>10 string language\e impress (imPRESS data) 367.Ed 368.Pp 369In addition, in this version, if a pattern string contains a backslash, 370it must be escaped. 371For example 372.Bd -literal -offset indent 3730 string \ebegindata Andrew Toolkit document 374.Ed 375.Pp 376in an existing magic file would have to be changed to 377.Bd -literal -offset indent 3780 string \e\ebegindata Andrew Toolkit document 379.Ed 380.Pp 381SunOS releases 3.2 and later from Sun Microsystems include a 382.Nm 383command derived from the System V one, but with some extensions. 384My version differs from Sun's only in minor ways. 385It includes the extension of the 386.Sq & 387operator, used as, 388for example, 389.Bd -literal -offset indent 390>16 long&0x7fffffff >0 not stripped 391.Ed 392.Sh MAGIC DIRECTORY 393The magic file entries have been collected from various sources, 394mainly USENET, and contributed by various authors. 395Christos Zoulas (address below) will collect additional 396or corrected magic file entries. 397A consolidation of magic file entries 398will be distributed periodically. 399.Pp 400The order of entries in the magic file is significant. 401Depending on what system you are using, the order that 402they are put together may be incorrect. 403If your old 404.Nm 405command uses a magic file, 406keep the old magic file around for comparison purposes 407(rename it to 408.Pa __MAGIC__.orig ). 409.Sh EXAMPLES 410.Bd -literal -offset indent 411$ file file.c file /dev/{wd0a,hda} 412file.c: C program text 413file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 414 dynamically linked (uses shared libs), stripped 415/dev/wd0a: block special (0/0) 416/dev/hda: block special (3/0) 417 418$ file -s /dev/wd0{b,d} 419/dev/wd0b: data 420/dev/wd0d: x86 boot sector 421 422$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 423/dev/hda: x86 boot sector 424/dev/hda1: Linux/i386 ext2 filesystem 425/dev/hda2: x86 boot sector 426/dev/hda3: x86 boot sector, extended partition table 427/dev/hda4: Linux/i386 ext2 filesystem 428/dev/hda5: Linux/i386 swap file 429/dev/hda6: Linux/i386 swap file 430/dev/hda7: Linux/i386 swap file 431/dev/hda8: Linux/i386 swap file 432/dev/hda9: empty 433/dev/hda10: empty 434 435$ file -i file.c file /dev/{wd0a,hda} 436file.c: text/x-c 437file: application/x-executable 438/dev/hda: application/x-not-regular-file 439/dev/wd0a: application/x-not-regular-file 440 441.Ed 442.Sh HISTORY 443There has been a 444.Nm 445command in every 446.Dv UNIX since at least Research Version 4 447(man page dated November, 1973). 448The System V version introduced one significant major change: 449the external list of magic types. 450This slowed the program down slightly but made it a lot more flexible. 451.Pp 452This program, based on the System V version, 453was written by Ian Darwin <ian@darwinsys.com> 454without looking at anybody else's source code. 455.Pp 456John Gilmore revised the code extensively, making it better than 457the first version. 458Geoff Collyer found several inadequacies 459and provided some magic file entries. 460Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. 461.Pp 462Guy Harris, guy@netapp.com, made many changes from 1993 to the present. 463.Pp 464Primary development and maintenance from 1990 to the present by 465Christos Zoulas (christos@astron.com). 466.Pp 467Altered by Chris Lowth, chris@lowth.com, 2000: 468Handle the 469.Fl i 470option to output mime type strings, using an alternative 471magic file and internal logic. 472.Pp 473Altered by Eric Fischer (enf@pobox.com), July, 2000, 474to identify character codes and attempt to identify the languages 475of non-ASCII files. 476.Pp 477Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME 478support and merge MIME and non-MIME magic, support directories as well 479as files of magic, apply many bug fixes and improve the build system. 480.Pp 481The list of contributors to the 482.Sq magic 483directory (magic files) 484is too long to include here. 485You know who you are; thank you. 486Many contributors are listed in the source files. 487.Sh LEGAL NOTICE 488Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 489Covered by the standard Berkeley Software Distribution copyright; see the file 490LEGAL.NOTICE in the source distribution. 491.Pp 492The files 493.Dv tar.h 494and 495.Dv is_tar.c 496were written by John Gilmore from his public-domain 497.Xr tar 1 498program, and are not covered by the above license. 499.Sh BUGS 500.Pp 501There must be a better way to automate the construction of the Magic 502file from all the glop in Magdir. 503What is it? 504.Pp 505.Nm 506uses several algorithms that favor speed over accuracy, 507thus it can be misled about the contents of 508text 509files. 510.Pp 511The support for text files (primarily for programming languages) 512is simplistic, inefficient and requires recompilation to update. 513.Pp 514The list of keywords in 515.Dv ascmagic 516probably belongs in the Magic file. 517This could be done by using some keyword like 518.Sq * 519for the offset value. 520.Pp 521Complain about conflicts in the magic file entries. 522Make a rule that the magic entries sort based on file offset rather 523than position within the magic file? 524.Pp 525The program should provide a way to give an estimate 526of 527.Sq how good 528a guess is. 529We end up removing guesses (e.g. 530.Sq From\ 531as first 5 chars of file) because 532they are not as good as other guesses (e.g. 533.Sq Newsgroups: 534versus 535.Sq Return-Path: 536). 537Still, if the others don't pan out, it should be possible to use the 538first guess. 539.Pp 540This manual page, and particularly this section, is too long. 541.Sh RETURN CODE 542.Nm 543returns 0 on success, and non-zero on error. 544.Sh AVAILABILITY 545You can obtain the original author's latest version by anonymous FTP 546on 547.Dv ftp.astron.com 548in the directory 549.Dv /pub/file/file-X.YZ.tar.gz 550