1.\" $File: file.man,v 1.140 2020/06/07 17:41:07 christos Exp $ 2.Dd June 7, 2020 3.Dt FILE __CSECTION__ 4.Os 5.Sh NAME 6.Nm file 7.Nd determine file type 8.Sh SYNOPSIS 9.Nm 10.Bk -words 11.Op Fl bcdEhiklLNnprsSvzZ0 12.Op Fl Fl apple 13.Op Fl Fl exclude-quiet 14.Op Fl Fl extension 15.Op Fl Fl mime-encoding 16.Op Fl Fl mime-type 17.Op Fl e Ar testname 18.Op Fl F Ar separator 19.Op Fl f Ar namefile 20.Op Fl m Ar magicfiles 21.Op Fl P Ar name=value 22.Ar 23.Ek 24.Nm 25.Fl C 26.Op Fl m Ar magicfiles 27.Nm 28.Op Fl Fl help 29.Sh DESCRIPTION 30This manual page documents version __VERSION__ of the 31.Nm 32command. 33.Pp 34.Nm 35tests each argument in an attempt to classify it. 36There are three sets of tests, performed in this order: 37filesystem tests, magic tests, and language tests. 38The 39.Em first 40test that succeeds causes the file type to be printed. 41.Pp 42The type printed will usually contain one of the words 43.Em text 44(the file contains only 45printing characters and a few common control 46characters and is probably safe to read on an 47.Dv ASCII 48terminal), 49.Em executable 50(the file contains the result of compiling a program 51in a form understandable to some 52.Tn UNIX 53kernel or another), 54or 55.Em data 56meaning anything else (data is usually 57.Dq binary 58or non-printable). 59Exceptions are well-known file formats (core files, tar archives) 60that are known to contain binary data. 61When modifying magic files or the program itself, make sure to 62.Em "preserve these keywords" . 63Users depend on knowing that all the readable files in a directory 64have the word 65.Dq text 66printed. 67Don't do as Berkeley did and change 68.Dq shell commands text 69to 70.Dq shell script . 71.Pp 72The filesystem tests are based on examining the return from a 73.Xr stat 2 74system call. 75The program checks to see if the file is empty, 76or if it's some sort of special file. 77Any known file types appropriate to the system you are running on 78(sockets, symbolic links, or named pipes (FIFOs) on those systems that 79implement them) 80are intuited if they are defined in the system header file 81.In sys/stat.h . 82.Pp 83The magic tests are used to check for files with data in 84particular fixed formats. 85The canonical example of this is a binary executable (compiled program) 86.Dv a.out 87file, whose format is defined in 88.In elf.h , 89.In a.out.h 90and possibly 91.In exec.h 92in the standard include directory. 93These files have a 94.Dq "magic number" 95stored in a particular place 96near the beginning of the file that tells the 97.Tn UNIX 98operating system 99that the file is a binary executable, and which of several types thereof. 100The concept of a 101.Dq "magic" 102has been applied by extension to data files. 103Any file with some invariant identifier at a small fixed 104offset into the file can usually be described in this way. 105The information identifying these files is read from the compiled 106magic file 107.Pa __MAGIC__.mgc , 108or the files in the directory 109.Pa __MAGIC__ 110if the compiled file does not exist. 111In addition, if 112.Pa $HOME/.magic.mgc 113or 114.Pa $HOME/.magic 115exists, it will be used in preference to the system magic files. 116.Pp 117If a file does not match any of the entries in the magic file, 118it is examined to see if it seems to be a text file. 119ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets 120(such as those used on Macintosh and IBM PC systems), 121UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC 122character sets can be distinguished by the different 123ranges and sequences of bytes that constitute printable text 124in each set. 125If a file passes any of these tests, its character set is reported. 126ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified 127as 128.Dq text 129because they will be mostly readable on nearly any terminal; 130UTF-16 and EBCDIC are only 131.Dq character data 132because, while 133they contain text, it is text that will require translation 134before it can be read. 135In addition, 136.Nm 137will attempt to determine other characteristics of text-type files. 138If the lines of a file are terminated by CR, CRLF, or NEL, instead 139of the Unix-standard LF, this will be reported. 140Files that contain embedded escape sequences or overstriking 141will also be identified. 142.Pp 143Once 144.Nm 145has determined the character set used in a text-type file, 146it will 147attempt to determine in what language the file is written. 148The language tests look for particular strings (cf. 149.In names.h ) 150that can appear anywhere in the first few blocks of a file. 151For example, the keyword 152.Em .br 153indicates that the file is most likely a 154.Xr troff 1 155input file, just as the keyword 156.Em struct 157indicates a C program. 158These tests are less reliable than the previous 159two groups, so they are performed last. 160The language test routines also test for some miscellany 161(such as 162.Xr tar 1 163archives, JSON files). 164.Pp 165Any file that cannot be identified as having been written 166in any of the character sets listed above is simply said to be 167.Dq data . 168.Sh OPTIONS 169.Bl -tag -width indent 170.It Fl Fl apple 171Causes the file command to output the file type and creator code as 172used by older MacOS versions. 173The code consists of eight letters, 174the first describing the file type, the latter the creator. 175This option works properly only for file formats that have the 176apple-style output defined. 177.It Fl b , Fl Fl brief 178Do not prepend filenames to output lines (brief mode). 179.It Fl C , Fl Fl compile 180Write a 181.Pa magic.mgc 182output file that contains a pre-parsed version of the magic file or directory. 183.It Fl c , Fl Fl checking-printout 184Cause a checking printout of the parsed form of the magic file. 185This is usually used in conjunction with the 186.Fl m 187flag to debug a new magic file before installing it. 188.It Fl d 189Prints internal debugging information to stderr. 190.It Fl E 191On filesystem errors (file not found etc), instead of handling the error 192as regular output as POSIX mandates and keep going, issue an error message 193and exit. 194.It Fl e , Fl Fl exclude Ar testname 195Exclude the test named in 196.Ar testname 197from the list of tests made to determine the file type. 198Valid test names are: 199.Bl -tag -width compress 200.It apptype 201.Dv EMX 202application type (only on EMX). 203.It ascii 204Various types of text files (this test will try to guess the text 205encoding, irrespective of the setting of the 206.Sq encoding 207option). 208.It encoding 209Different text encodings for soft magic tests. 210.It tokens 211Ignored for backwards compatibility. 212.It cdf 213Prints details of Compound Document Files. 214.It compress 215Checks for, and looks inside, compressed files. 216.It csv 217Checks Comma Separated Value files. 218.It elf 219Prints ELF file details, provided soft magic tests are enabled and the 220elf magic is found. 221.It json 222Examines JSON (RFC-7159) files by parsing them for compliance. 223.It soft 224Consults magic files. 225.It tar 226Examines tar files by verifying the checksum of the 512 byte tar header. 227Excluding this test can provide more detailed content description by using 228the soft magic method. 229.It text 230A synonym for 231.Sq ascii . 232.El 233.It Fl Fl exclude-quiet 234Like 235.Fl Fl exclude 236but ignore tests that 237.Nm 238does not know about. 239This is intended for compatilibity with older versions of 240.Nm . 241.It Fl Fl extension 242Print a slash-separated list of valid extensions for the file type found. 243.It Fl F , Fl Fl separator Ar separator 244Use the specified string as the separator between the filename and the 245file result returned. 246Defaults to 247.Sq \&: . 248.It Fl f , Fl Fl files-from Ar namefile 249Read the names of the files to be examined from 250.Ar namefile 251(one per line) 252before the argument list. 253Either 254.Ar namefile 255or at least one filename argument must be present; 256to test the standard input, use 257.Sq - 258as a filename argument. 259Please note that 260.Ar namefile 261is unwrapped and the enclosed filenames are processed when this option is 262encountered and before any further options processing is done. 263This allows one to process multiple lists of files with different command line 264arguments on the same 265.Nm 266invocation. 267Thus if you want to set the delimiter, you need to do it before you specify 268the list of files, like: 269.Dq Fl F Ar @ Fl f Ar namefile , 270instead of: 271.Dq Fl f Ar namefile Fl F Ar @ . 272.It Fl h , Fl Fl no-dereference 273option causes symlinks not to be followed 274(on systems that support symbolic links). 275This is the default if the environment variable 276.Dv POSIXLY_CORRECT 277is not defined. 278.It Fl i , Fl Fl mime 279Causes the file command to output mime type strings rather than the more 280traditional human readable ones. 281Thus it may say 282.Sq text/plain; charset=us-ascii 283rather than 284.Dq ASCII text . 285.It Fl Fl mime-type , Fl Fl mime-encoding 286Like 287.Fl i , 288but print only the specified element(s). 289.It Fl k , Fl Fl keep-going 290Don't stop at the first match, keep going. 291Subsequent matches will be 292have the string 293.Sq "\[rs]012\- " 294prepended. 295(If you want a newline, see the 296.Fl r 297option.) 298The magic pattern with the highest strength (see the 299.Fl l 300option) comes first. 301.It Fl l , Fl Fl list 302Shows a list of patterns and their strength sorted descending by 303.Xr magic __FSECTION__ 304strength 305which is used for the matching (see also the 306.Fl k 307option). 308.It Fl L , Fl Fl dereference 309option causes symlinks to be followed, as the like-named option in 310.Xr ls 1 311(on systems that support symbolic links). 312This is the default if the environment variable 313.Ev POSIXLY_CORRECT 314is defined. 315.It Fl m , Fl Fl magic-file Ar magicfiles 316Specify an alternate list of files and directories containing magic. 317This can be a single item, or a colon-separated list. 318If a compiled magic file is found alongside a file or directory, 319it will be used instead. 320.It Fl N , Fl Fl no-pad 321Don't pad filenames so that they align in the output. 322.It Fl n , Fl Fl no-buffer 323Force stdout to be flushed after checking each file. 324This is only useful if checking a list of files. 325It is intended to be used by programs that want filetype output from a pipe. 326.It Fl p , Fl Fl preserve-date 327On systems that support 328.Xr utime 3 329or 330.Xr utimes 2 , 331attempt to preserve the access time of files analyzed, to pretend that 332.Nm 333never read them. 334.It Fl P , Fl Fl parameter Ar name=value 335Set various parameter limits. 336.Bl -column "elf_phnum" "Default" "XXXXXXXXXXXXXXXXXXXXXXXXXXX" -offset indent 337.It Sy "Name" Ta Sy "Default" Ta Sy "Explanation" 338.It Li bytes Ta 1048576 Ta max number of bytes to read from file 339.It Li elf_notes Ta 256 Ta max ELF notes processed 340.It Li elf_phnum Ta 2048 Ta max ELF program sections processed 341.It Li elf_shnum Ta 32768 Ta max ELF sections processed 342.It Li indir Ta 50 Ta recursion limit for indirect magic 343.It Li name Ta 50 Ta use count limit for name/use magic 344.It Li regex Ta 8192 Ta length limit for regex searches 345.El 346.It Fl r , Fl Fl raw 347Don't translate unprintable characters to \eooo. 348Normally 349.Nm 350translates unprintable characters to their octal representation. 351.It Fl s , Fl Fl special-files 352Normally, 353.Nm 354only attempts to read and determine the type of argument files which 355.Xr stat 2 356reports are ordinary files. 357This prevents problems, because reading special files may have peculiar 358consequences. 359Specifying the 360.Fl s 361option causes 362.Nm 363to also read argument files which are block or character special files. 364This is useful for determining the filesystem types of the data in raw 365disk partitions, which are block special files. 366This option also causes 367.Nm 368to disregard the file size as reported by 369.Xr stat 2 370since on some systems it reports a zero size for raw disk partitions. 371.It Fl S , Fl Fl no-sandbox 372On systems where libseccomp 373.Pa ( https://github.com/seccomp/libseccomp ) 374is available, the 375.Fl S 376flag disables sandboxing which is enabled by default. 377This option is needed for file to execute external decompressing programs, 378i.e. when the 379.Fl z 380flag is specified and the built-in decompressors are not available. 381On systems where sandboxing is not available, this option has no effect. 382.It Fl v , Fl Fl version 383Print the version of the program and exit. 384.It Fl z , Fl Fl uncompress 385Try to look inside compressed files. 386.It Fl Z , Fl Fl uncompress-noreport 387Try to look inside compressed files, but report information about the contents 388only not the compression. 389.It Fl 0 , Fl Fl print0 390Output a null character 391.Sq \e0 392after the end of the filename. 393Nice to 394.Xr cut 1 395the output. 396This does not affect the separator, which is still printed. 397.Pp 398If this option is repeated more than once, then 399.Nm 400prints just the filename followed by a NUL followed by the description 401(or ERROR: text) followed by a second NUL for each entry. 402.It Fl -help 403Print a help message and exit. 404.El 405.Sh ENVIRONMENT 406The environment variable 407.Ev MAGIC 408can be used to set the default magic file name. 409If that variable is set, then 410.Nm 411will not attempt to open 412.Pa $HOME/.magic . 413.Nm 414adds 415.Dq Pa .mgc 416to the value of this variable as appropriate. 417The environment variable 418.Ev POSIXLY_CORRECT 419controls (on systems that support symbolic links), whether 420.Nm 421will attempt to follow symlinks or not. 422If set, then 423.Nm 424follows symlink, otherwise it does not. 425This is also controlled by the 426.Fl L 427and 428.Fl h 429options. 430.Sh FILES 431.Bl -tag -width __MAGIC__.mgc -compact 432.It Pa __MAGIC__.mgc 433Default compiled list of magic. 434.It Pa __MAGIC__ 435Directory containing default magic files. 436.El 437.Sh EXIT STATUS 438.Nm 439will exit with 440.Dv 0 441if the operation was successful or 442.Dv >0 443if an error was encountered. 444The following errors cause diagnostic messages, but don't affect the program 445exit code (as POSIX requires), unless 446.Fl E 447is specified: 448.Bl -bullet -compact -offset indent 449.It 450A file cannot be found 451.It 452There is no permission to read a file 453.It 454The file type cannot be determined 455.El 456.Sh EXAMPLES 457.Bd -literal -offset indent 458$ file file.c file /dev/{wd0a,hda} 459file.c: C program text 460file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 461 dynamically linked (uses shared libs), stripped 462/dev/wd0a: block special (0/0) 463/dev/hda: block special (3/0) 464 465$ file -s /dev/wd0{b,d} 466/dev/wd0b: data 467/dev/wd0d: x86 boot sector 468 469$ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10} 470/dev/hda: x86 boot sector 471/dev/hda1: Linux/i386 ext2 filesystem 472/dev/hda2: x86 boot sector 473/dev/hda3: x86 boot sector, extended partition table 474/dev/hda4: Linux/i386 ext2 filesystem 475/dev/hda5: Linux/i386 swap file 476/dev/hda6: Linux/i386 swap file 477/dev/hda7: Linux/i386 swap file 478/dev/hda8: Linux/i386 swap file 479/dev/hda9: empty 480/dev/hda10: empty 481 482$ file -i file.c file /dev/{wd0a,hda} 483file.c: text/x-c 484file: application/x-executable 485/dev/hda: application/x-not-regular-file 486/dev/wd0a: application/x-not-regular-file 487 488.Ed 489.Sh SEE ALSO 490.Xr hexdump 1 , 491.Xr od 1 , 492.Xr strings 1 , 493.Xr magic __FSECTION__ 494.Sh STANDARDS CONFORMANCE 495This program is believed to exceed the System V Interface Definition 496of FILE(CMD), as near as one can determine from the vague language 497contained therein. 498Its behavior is mostly compatible with the System V program of the same name. 499This version knows more magic, however, so it will produce 500different (albeit more accurate) output in many cases. 501.\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html 502.Pp 503The one significant difference 504between this version and System V 505is that this version treats any white space 506as a delimiter, so that spaces in pattern strings must be escaped. 507For example, 508.Bd -literal -offset indent 509\*[Gt]10 string language impress\ (imPRESS data) 510.Ed 511.Pp 512in an existing magic file would have to be changed to 513.Bd -literal -offset indent 514\*[Gt]10 string language\e impress (imPRESS data) 515.Ed 516.Pp 517In addition, in this version, if a pattern string contains a backslash, 518it must be escaped. 519For example 520.Bd -literal -offset indent 5210 string \ebegindata Andrew Toolkit document 522.Ed 523.Pp 524in an existing magic file would have to be changed to 525.Bd -literal -offset indent 5260 string \e\ebegindata Andrew Toolkit document 527.Ed 528.Pp 529SunOS releases 3.2 and later from Sun Microsystems include a 530.Nm 531command derived from the System V one, but with some extensions. 532This version differs from Sun's only in minor ways. 533It includes the extension of the 534.Sq \*[Am] 535operator, used as, 536for example, 537.Bd -literal -offset indent 538\*[Gt]16 long\*[Am]0x7fffffff \*[Gt]0 not stripped 539.Ed 540.Sh SECURITY 541On systems where libseccomp 542.Pa ( https://github.com/seccomp/libseccomp ) 543is available, 544.Nm 545is enforces limiting system calls to only the ones necessary for the 546operation of the program. 547This enforcement does not provide any security benefit when 548.Nm 549is asked to decompress input files running external programs with 550the 551.Fl z 552option. 553To enable execution of external decompressors, one needs to disable 554sandboxing using the 555.Fl S 556flag. 557.Sh MAGIC DIRECTORY 558The magic file entries have been collected from various sources, 559mainly USENET, and contributed by various authors. 560Christos Zoulas (address below) will collect additional 561or corrected magic file entries. 562A consolidation of magic file entries 563will be distributed periodically. 564.Pp 565The order of entries in the magic file is significant. 566Depending on what system you are using, the order that 567they are put together may be incorrect. 568If your old 569.Nm 570command uses a magic file, 571keep the old magic file around for comparison purposes 572(rename it to 573.Pa __MAGIC__.orig ) . 574.Sh HISTORY 575There has been a 576.Nm 577command in every 578.Dv UNIX since at least Research Version 4 579(man page dated November, 1973). 580The System V version introduced one significant major change: 581the external list of magic types. 582This slowed the program down slightly but made it a lot more flexible. 583.Pp 584This program, based on the System V version, 585was written by Ian Darwin 586.Aq ian@darwinsys.com 587without looking at anybody else's source code. 588.Pp 589John Gilmore revised the code extensively, making it better than 590the first version. 591Geoff Collyer found several inadequacies 592and provided some magic file entries. 593Contributions of the 594.Sq \*[Am] 595operator by Rob McMahon, 596.Aq cudcv@warwick.ac.uk , 5971989. 598.Pp 599Guy Harris, 600.Aq guy@netapp.com , 601made many changes from 1993 to the present. 602.Pp 603Primary development and maintenance from 1990 to the present by 604Christos Zoulas 605.Aq christos@astron.com . 606.Pp 607Altered by Chris Lowth 608.Aq chris@lowth.com , 6092000: handle the 610.Fl i 611option to output mime type strings, using an alternative 612magic file and internal logic. 613.Pp 614Altered by Eric Fischer 615.Aq enf@pobox.com , 616July, 2000, 617to identify character codes and attempt to identify the languages 618of non-ASCII files. 619.Pp 620Altered by Reuben Thomas 621.Aq rrt@sc3d.org , 6222007-2011, to improve MIME support, merge MIME and non-MIME magic, 623support directories as well as files of magic, apply many bug fixes, 624update and fix a lot of magic, improve the build system, improve the 625documentation, and rewrite the Python bindings in pure Python. 626.Pp 627The list of contributors to the 628.Sq magic 629directory (magic files) 630is too long to include here. 631You know who you are; thank you. 632Many contributors are listed in the source files. 633.Sh LEGAL NOTICE 634Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999. 635Covered by the standard Berkeley Software Distribution copyright; see the file 636COPYING in the source distribution. 637.Pp 638The files 639.Pa tar.h 640and 641.Pa is_tar.c 642were written by John Gilmore from his public-domain 643.Xr tar 1 644program, and are not covered by the above license. 645.Sh BUGS 646Please report bugs and send patches to the bug tracker at 647.Pa https://bugs.astron.com/ 648or the mailing list at 649.Aq file@astron.com 650(visit 651.Pa https://mailman.astron.com/mailman/listinfo/file 652first to subscribe). 653.Sh TODO 654Fix output so that tests for MIME and APPLE flags are not needed all 655over the place, and actual output is only done in one place. 656This needs a design. 657Suggestion: push possible outputs on to a list, then pick the 658last-pushed (most specific, one hopes) value at the end, or 659use a default if the list is empty. 660This should not slow down evaluation. 661.Pp 662The handling of 663.Dv MAGIC_CONTINUE 664and printing \e012- between entries is clumsy and complicated; refactor 665and centralize. 666.Pp 667Some of the encoding logic is hard-coded in encoding.c and can be moved 668to the magic files if we had a !:charset annotation 669.Pp 670Continue to squash all magic bugs. 671See Debian BTS for a good source. 672.Pp 673Store arbitrarily long strings, for example for %s patterns, so that 674they can be printed out. 675Fixes Debian bug #271672. 676This can be done by allocating strings in a string pool, storing the 677string pool at the end of the magic file and converting all the string 678pointers to relative offsets from the string pool. 679.Pp 680Add syntax for relative offsets after current level (Debian bug #466037). 681.Pp 682Make file -ki work, i.e. give multiple MIME types. 683.Pp 684Add a zip library so we can peek inside Office2007 documents to 685print more details about their contents. 686.Pp 687Add an option to print URLs for the sources of the file descriptions. 688.Pp 689Combine script searches and add a way to map executable names to MIME 690types (e.g. have a magic value for !:mime which causes the resulting 691string to be looked up in a table). 692This would avoid adding the same magic repeatedly for each new 693hash-bang interpreter. 694.Pp 695When a file descriptor is available, we can skip and adjust the buffer 696instead of the hacky buffer management we do now. 697.Pp 698Fix 699.Dq name 700and 701.Dq use 702to check for consistency at compile time (duplicate 703.Dq name , 704.Dq use 705pointing to undefined 706.Dq name 707). 708Make 709.Dq name 710/ 711.Dq use 712more efficient by keeping a sorted list of names. 713Special-case ^ to flip endianness in the parser so that it does not 714have to be escaped, and document it. 715.Pp 716If the offsets specified internally in the file exceed the buffer size 717( 718.Dv HOWMANY 719variable in file.h), then we don't seek to that offset, but we give up. 720It would be better if buffer managements was done when the file descriptor 721is available so move around the file. 722One must be careful though because this has performance (and thus security 723considerations). 724.Sh AVAILABILITY 725You can obtain the original author's latest version by anonymous FTP 726on 727.Pa ftp.astron.com 728in the directory 729.Pa /pub/file/file-X.YZ.tar.gz . 730