1.\" $NetBSD: file.1,v 1.5 2010/05/14 16:51:32 joerg Exp $ 2.\" 3.\" $File: file.man,v 1.79 2008/11/06 22:49:08 rrt Exp $ 4.Dd October 9, 2008 5.Dt FILE 1 6.Os 7.Sh NAME 8.Nm file 9.Nd determine file type 10.Sh SYNOPSIS 11.Nm 12.Op Fl 0bchikLNnprsvz 13.Op Fl F Ar separator 14.Op Fl f Ar namefile 15.Op Fl m Ar magicfiles 16.Op Fl Fl mime-encoding 17.Op Fl Fl mime-type 18.Ar file 19.Nm 20.Fl C 21.Op Fl m Ar magicfile 22.Nm 23.Op Fl Fl help 24.Sh DESCRIPTION 25This manual page documents version 5.03 of the 26.Nm 27command. 28.Pp 29.Nm 30tests each argument in an attempt to classify it. 31There are three sets of tests, performed in this order: 32filesystem tests, magic tests, and language tests. 33The 34.Em first 35test that succeeds causes the file type to be printed. 36.Pp 37The type printed will usually contain one of the words 38.Em text 39(the file contains only 40printing characters and a few common control 41characters and is probably safe to read on an 42.Dv ASCII 43terminal), 44.Em executable 45(the file contains the result of compiling a program 46in a form understandable to some 47.Tn UNIX 48kernel or another), 49or 50.Em data 51meaning anything else (data is usually 52.Dq binary 53or non-printable). 54Exceptions are well-known file formats (core files, tar archives) 55that are known to contain binary data. 56When modifying magic files or the program itself, make sure to 57.Em "preserve these keywords" . 58Users depend on knowing that all the readable files in a directory 59have the word 60.Dq text 61printed. 62Don't do as Berkeley did and change 63.Dq shell commands text 64to 65.Dq shell script . 66.Pp 67The filesystem tests are based on examining the return from a 68.Xr stat 2 69system call. 70The program checks to see if the file is empty, 71or if it's some sort of special file. 72Any known file types appropriate to the system you are running on 73(sockets, symbolic links, or named pipes (FIFOs) on those systems that 74implement them) 75are intuited if they are defined in the system header file 76.In sys/stat.h . 77.Pp 78The magic tests are used to check for files with data in 79particular fixed formats. 80The canonical example of this is a binary executable (compiled program) 81.Dv a.out 82file, whose format is defined in 83.In elf.h , 84.In a.out.h 85and possibly 86.In exec.h 87in the standard include directory. 88These files have a 89.Dq "magic number" 90stored in a particular place 91near the beginning of the file that tells the 92.Tn UNIX 93operating system 94that the file is a binary executable, and which of several types thereof. 95The concept of a 96.Dq "magic" 97has been applied by extension to data files. 98Any file with some invariant identifier at a small fixed 99offset into the file can usually be described in this way. 100The information identifying these files is read from the compiled 101magic file 102.Pa /usr/share/misc/magic.mgc , 103or the files in the directory 104.Pa /usr/share/misc/magic 105if the compiled file does not exist. 106In addition, if 107.Pa $HOME/.magic.mgc 108or 109.Pa $HOME/.magic 110exists, it will be used in preference to the system magic files. 111.Pp 112If a file does not match any of the entries in the magic file, 113it is examined to see if it seems to be a text file. 114ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 115(such as those used on Macintosh and IBM PC systems), 116UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 117character sets can be distinguished by the different 118ranges and sequences of bytes that constitute printable text 119in each set. 120If a file passes any of these tests, its character set is reported. 121ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 122as 123.Dq text 124because they will be mostly readable on nearly any terminal; 125UTF-16 and EBCDIC are only 126.Dq character data 127because, while 128they contain text, it is text that will require translation 129before it can be read. 130In addition, 131.Nm 132will attempt to determine other characteristics of text-type files. 133If the lines of a file are terminated by CR, CRLF, or NEL, instead 134of the Unix-standard LF, this will be reported. 135Files that contain embedded escape sequences or overstriking 136will also be identified. 137.Pp 138Once 139.Nm 140has determined the character set used in a text-type file, 141it will 142attempt to determine in what language the file is written. 143The language tests look for particular strings (cf. 144.In names.h ) 145that can appear anywhere in the first few blocks of a file. 146For example, the keyword 147.Em .br 148indicates that the file is most likely a 149.Xr troff 1 150input file, just as the keyword 151.Em struct 152indicates a C program. 153These tests are less reliable than the previous 154two groups, so they are performed last. 155The language test routines also test for some miscellany 156(such as 157.Xr tar 1 158archives). 159.Pp 160Any file that cannot be identified as having been written 161in any of the character sets listed above is simply said to be 162.Dq data . 163.Sh OPTIONS 164.Bl -tag -width indent 165.It Fl 0 , -print0 166Output a null character 167.Sq \e0 168after the end of the filename. 169Nice to 170.Xr cut 1 171the output. 172This does not affect the separator which is still printed. 173.It Fl b , Fl Fl brief 174Do not prepend filenames to output lines (brief mode). 175.It Fl c , Fl Fl checking-printout 176Cause a checking printout of the parsed form of the magic file. 177This is usually used in conjunction with the 178.Fl m 179flag to debug a new magic file before installing it. 180.It Fl C , Fl Fl compile 181Write a 182.Pa magic.mgc 183output file that contains a pre-parsed version of the magic file or directory. 184.It Fl e , Fl Fl exclude Ar testname 185Exclude the test named in 186.Ar testname 187from the list of tests made to determine the file type. 188Valid test names are: 189.Bl -tag -width compress 190.It apptype 191.Dv EMX 192application type (only on EMX). 193.It text 194Various types of text files (this test will try to guess the text 195encoding, irrespective of the setting of the 196.Dq encoding 197option). 198.It encoding 199Different text encodings for soft magic tests. 200.It tokens 201Looks for known tokens inside text files. 202.It cdf 203Prints details of Compound Document Files. 204.It compress 205Checks for, and looks inside, compressed files. 206.It elf 207Prints ELF file details. 208.It soft 209Consults magic files. 210.It tar 211Examines tar files. 212.El 213.It Fl F , Fl Fl separator Ar separator 214Use the specified string as the separator between the filename and the 215file result returned. 216Defaults to 217.Sq \&: . 218.It Fl f , Fl Fl files-from Ar namefile 219Read the names of the files to be examined from 220.Ar namefile 221(one per line) 222before the argument list. 223Either 224.Ar namefile 225or at least one filename argument must be present; 226to test the standard input, use 227.Sq - 228as a filename argument. 229.It Fl h , Fl Fl no-dereference 230Do not follow symlinks 231(on systems that support symbolic links). 232This is the default if the environment variable 233.Ev POSIXLY_CORRECT 234is not defined. 235.It Fl Fl help 236Print a help message and exit. 237.It Fl i , Fl Fl mime 238Output mime type strings rather than the more 239traditional human readable ones. 240Thus 241.Nm 242may say 243.Dq text/plain; charset=us-ascii 244rather than 245.Dq ASCII text . 246In order for this option to work, 247.Nm 248changes the way 249it handles files recognized by the command itself (such as many of the 250text file types, directories etc), and makes use of an alternative 251.Dq magic 252file. 253(See the 254.Sx FILES 255section, below). 256.It Fl Fl mime-type , Fl Fl mime-encoding 257Like 258.Fl i , 259but print only the specified element(s). 260.It Fl k , Fl Fl keep-going 261Don't stop at the first match, keep going. 262Subsequent matches will have the string 263.Dq "\[rs]012\- " 264prepended. 265(If you want a newline, see the 266.Fl r 267option.) 268.It Fl L , Fl Fl dereference 269Follow symlinks, as the like-named option in 270.Xr ls 1 271(on systems that support symbolic links). 272This is the default if the environment variable 273.Ev POSIXLY_CORRECT 274is defined. 275.It Fl m , Fl Fl magic-file Ar list 276Specify an alternate list of files and directories containing magic. 277This can be a single item, or a colon-separated list. 278If a compiled magic file is found alongside a file or directory, 279it will be used instead. 280.It Fl N , Fl Fl no-pad 281Don't pad filenames so that they align in the output. 282.It Fl n , Fl Fl no-buffer 283Force stdout to be flushed after checking each file. 284This is only useful if checking a list of files. 285It is intended to be used by programs that want filetype output from a pipe. 286.It Fl p , Fl Fl preserve-date 287On systems that support 288.Xr utime 3 289or 290.Xr utimes 2 , 291attempt to preserve the access time of files analyzed, to pretend that 292.Nm 293never read them. 294.It Fl r , Fl Fl raw 295Don't translate unprintable characters to \eooo. 296Normally 297.Nm 298translates unprintable characters to their octal representation. 299.It Fl s , Fl Fl special-files 300Normally, 301.Nm 302only attempts to read and determine the type of argument files which 303.Xr stat 2 304reports are ordinary files. 305This prevents problems, because reading special files may have peculiar 306consequences. 307Specifying the 308.Fl s 309option causes 310.Nm 311to also read argument files which are block or character special files. 312This is useful for determining the filesystem types of the data in raw 313disk partitions, which are block special files. 314This option also causes 315.Nm 316to disregard the file size as reported by 317.Xr stat 2 318since on some systems it reports a zero size for raw disk partitions. 319.It Fl v , Fl Fl version 320Print the version of the program and exit. 321.It Fl z , Fl Fl uncompress 322Try to look inside compressed files. 323.El 324.Sh ENVIRONMENT 325The environment variable 326.Ev MAGIC 327can be used to set the default magic file name. 328If that variable is set, then 329.Nm 330will not attempt to open 331.Pa $HOME/.magic . 332.Nm 333adds 334.Dq Pa .mgc 335to the value of this variable as appropriate. 336The environment variable 337.Ev POSIXLY_CORRECT 338controls (on systems that support symbolic links), whether 339.Nm 340will attempt to follow symlinks or not. 341If set, then 342.Nm 343follows symlink, otherwise it does not. 344This is also controlled by the 345.Fl L 346and 347.Fl h 348options. 349.Sh FILES 350.Bl -tag -width /usr/share/misc/magic.mgc -compact 351.It Pa /usr/share/misc/magic.mgc 352Default compiled list of magic. 353.It Pa /usr/share/misc/magic 354Directory containing default magic files. 355.El 356.Sh EXIT STATUS 357.Ex -std 358.Sh EXAMPLES 359.Bd -literal -offset indent 360$ file file.c file /dev/{wd0a,hda} 361file.c: C program text 362file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 363 dynamically linked (uses shared libs), stripped 364/dev/wd0a: block special (0/0) 365/dev/hda: block special (3/0) 366 367$ file -s /dev/wd0{b,d} 368/dev/wd0b: data 369/dev/wd0d: x86 boot sector 370 371$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 372/dev/hda: x86 boot sector 373/dev/hda1: Linux/i386 ext2 filesystem 374/dev/hda2: x86 boot sector 375/dev/hda3: x86 boot sector, extended partition table 376/dev/hda4: Linux/i386 ext2 filesystem 377/dev/hda5: Linux/i386 swap file 378/dev/hda6: Linux/i386 swap file 379/dev/hda7: Linux/i386 swap file 380/dev/hda8: Linux/i386 swap file 381/dev/hda9: empty 382/dev/hda10: empty 383 384$ file -i file.c file /dev/{wd0a,hda} 385file.c: text/x-c 386file: application/x-executable 387/dev/hda: application/x-not-regular-file 388/dev/wd0a: application/x-not-regular-file 389 390.Ed 391.Sh SEE ALSO 392.Xr hexdump 1 , 393.Xr od 1 , 394.Xr strings 1 , 395.Xr magic 5 396.Sh STANDARDS 397This program is believed to exceed the System V Interface Definition 398of FILE(CMD), as near as one can determine from the vague language 399contained therein. 400Its behavior is mostly compatible with the System V program of the same name. 401This version knows more magic, however, so it will produce 402different (albeit more accurate) output in many cases. 403.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 404.Pp 405The one significant difference 406between this version and System V 407is that this version treats any white space 408as a delimiter, so that spaces in pattern strings must be escaped. 409For example, 410.Bd -literal -offset indent 411\*[Gt]10 string language impress\ (imPRESS data) 412.Ed 413.Pp 414in an existing magic file would have to be changed to 415.Bd -literal -offset indent 416\*[Gt]10 string language\e impress (imPRESS data) 417.Ed 418.Pp 419In addition, in this version, if a pattern string contains a backslash, 420it must be escaped. 421For example 422.Bd -literal -offset indent 4230 string \ebegindata Andrew Toolkit document 424.Ed 425.Pp 426in an existing magic file would have to be changed to 427.Bd -literal -offset indent 4280 string \e\ebegindata Andrew Toolkit document 429.Ed 430.Pp 431SunOS releases 3.2 and later from Sun Microsystems include a 432.Nm 433command derived from the System V one, but with some extensions. 434This version differs from Sun's only in minor ways. 435It includes the extension of the 436.Sq \*[Am] 437operator, used as, 438for example, 439.Bd -literal -offset indent 440\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 441.Ed 442.Sh MAGIC DIRECTORY 443The magic file entries have been collected from various sources, 444mainly USENET, and contributed by various authors. 445Christos Zoulas (address below) will collect additional 446or corrected magic file entries. 447A consolidation of magic file entries 448will be distributed periodically. 449.Pp 450The order of entries in the magic file is significant. 451Depending on what system you are using, the order that 452they are put together may be incorrect. 453If your old 454.Nm 455command uses a magic file, 456keep the old magic file around for comparison purposes 457(rename it to 458.Pa /usr/share/misc/magic.orig ) . 459.Sh HISTORY 460There has been a 461.Nm 462command in every 463.Dv UNIX since at least Research Version 4 464(man page dated November, 1973). 465The System V version introduced one significant major change: 466the external list of magic types. 467This slowed the program down slightly but made it a lot more flexible. 468.Pp 469This program, based on the System V version, 470was written by Ian Darwin 471.Aq ian@darwinsys.com 472without looking at anybody else's source code. 473.Pp 474John Gilmore revised the code extensively, making it better than 475the first version. 476Geoff Collyer found several inadequacies 477and provided some magic file entries. 478Contributions by the 479.Sq \*[Am] 480operator by Rob McMahon, cudcv@warwick.ac.uk, 1989. 481.Pp 482Guy Harris, guy@netapp.com, made many changes from 1993 to the present. 483.Pp 484Primary development and maintenance from 1990 to the present by 485Christos Zoulas 486.Aq christos@astron.com . 487.Pp 488Altered by Chris Lowth, chris@lowth.com, 2000: 489Handle the 490.Fl i 491option to output mime type strings, using an alternative 492magic file and internal logic. 493.Pp 494Altered by Eric Fischer 495.Aq enf@pobox.com , 496July, 2000, 497to identify character codes and attempt to identify the languages 498of non-ASCII files. 499.Pp 500Altered by Reuben Thomas 501.Aq rrt@sc3d.org , 5022007 to 2008, to improve MIME 503support and merge MIME and non-MIME magic, support directories as well 504as files of magic, apply many bug fixes and improve the build system. 505.Pp 506The list of contributors to the 507.Sq magic 508directory (magic files) 509is too long to include here. 510You know who you are; thank you. 511Many contributors are listed in the source files. 512.Sh LEGAL NOTICE 513Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 514Covered by the standard Berkeley Software Distribution copyright; see the file 515LEGAL.NOTICE in the source distribution. 516.Pp 517The files 518.Pa tar.h 519and 520.Pa is_tar.c 521were written by John Gilmore from his public-domain 522.Xr tar 1 523program, and are not covered by the above license. 524.Sh BUGS 525There must be a better way to automate the construction of the Magic 526file from all the glop in Magdir. 527What is it? 528.Pp 529.Nm 530uses several algorithms that favor speed over accuracy, 531thus it can be misled about the contents of text files. 532.Pp 533The support for text files (primarily for programming languages) 534is simplistic, inefficient and requires recompilation to update. 535.Pp 536The list of keywords in 537.Dv ascmagic 538probably belongs in the Magic file. 539This could be done by using some keyword like 540.Sq * 541for the offset value. 542.Pp 543Complain about conflicts in the magic file entries. 544Make a rule that the magic entries sort based on file offset rather 545than position within the magic file? 546.Pp 547The program should provide a way to give an estimate of 548.Sq how good 549a guess is. 550We end up removing guesses (e.g. 551.Sq From\ 552as first 5 chars of file) because 553they are not as good as other guesses (e.g. 554.Sq Newsgroups: 555versus 556.Sq Return-Path: ) . 557Still, if the others don't pan out, it should be possible to use the 558first guess. 559.Pp 560This manual page, and particularly this section, is too long. 561.Sh AVAILABILITY 562You can obtain the original author's latest version by anonymous FTP 563on 564.Pa ftp.astron.com 565in the directory 566.Pa /pub/file/file-X.YZ.tar.gz . 567