1=head1 NAME 2 3magicrescue - Scans a block device and extracts known file types by looking at 4magic bytes. 5 6=head1 SYNOPSIS 7 8B<magicrescue> [ I<options> ] I<devices> 9 10=head1 DESCRIPTION 11 12Magic Rescue opens I<devices> for reading, scans them for file types it knows 13how to recover and calls an external program to extract them. It looks at 14"magic bytes" in file contents, so it can be used both as an undelete utility 15and for recovering a corrupted drive or partition. It works on any file system, 16but on very fragmented file systems it can only recover the first chunk of 17each file. These chunks are sometimes as big as 50MB, however. 18 19To invoke B<magicrescue>, you must specify at least one device and the B<-d> 20and B<-r> options. See the L</USAGE> section in this manual for getting 21started. 22 23=head1 OPTIONS 24 25=over 7 26 27=item B<-b> I<blocksize> 28 29Default: 1. This will direct B<magicrescue> to only consider files that start 30at a multiple of the I<blocksize> argument. The option applies only to the 31recipes following it, so by specifying it multiple times it can be used to get 32different behavior for different recipes. 33 34Using this option you can usually get better performance, but fewer files will 35be found. In particular, files with leading garbage (e.g. many mp3 files) and 36files contained inside other files are likely to be skipped. Also, some file 37systems don't align small files to block boundaries, so those won't be found 38this way either. 39 40If you don't know your file system's block size, just use the value 512, which 41is almost always the hardware sector size. 42 43=item B<-d> I<directory> 44 45Mandatory. Output directory for found files. Make sure you have plenty of free 46space in this directory, especially when extracting very common file types such 47as jpeg or gzip files. Also make sure the file system is able to handle 48thousands of files in a single directory, i.e. don't use FAT if you are 49extracting many files. 50 51You should not place the output directory on the same block device you are 52trying to rescue files from. This might add the same file to the block device 53ahead of the current reading position, causing B<magicrescue> to find the same 54file again later. In the worst theoretical case, this could cause a 55loop where the same file is extracted thousands of times until disk space is 56exhausted. You are also likely to overwrite the deleted files you were looking 57for in the first place. 58 59=item B<-r> I<recipe> 60 61Mandatory. Recipe name, file, or directory. Specify this as either a plain 62name (e.g. C<jpeg-jfif>) or a path (e.g. F<recipes/jpeg-jfif>). If it doesn't 63find such a file in the current directory, it will look in F<./recipes> and 64F<PREFIX/share/magicrescue/recipes>, where I<PREFIX> is the path you installed 65to, e.g. F</usr/local>. 66 67If I<recipe> is a directory, all files in that directory will be treated as 68recipes. 69 70Browse the F<PREFIX/share/magicrescue/recipes> directory to see what recipes 71are available. A recipe is a text file, and you should read the comments 72inside it before using it. Either use the recipe as it is or copy it somewhere 73and modify it. 74 75For information on creating your own recipes, see the L</RECIPES> section. 76 77=item B<-I> I<file> 78 79Reads input files from I<file> in addition to those listed on the command line. 80If I<file> is C<->, read from standard input. Each line will be interpreted as 81a file name. 82 83=item B<-M> I<output_mode> 84 85Produce machine-readable output to stdout. I<output_mode> can be: 86 87=over 88 89=item B<i> 90 91Print each input file name before processing 92 93=item B<o> 94 95Print each output file name after processing 96 97=item B<io> 98 99Print both input and output file names. Input file names will be prefixed by 100C<i> and a space. Output file names will be prefixed by C<o> and a space. 101 102=back 103 104Nothing else will be written to standard output in this mode. 105 106=item B<-O> [B<+>|B<->|B<=>][B<0x>]I<offset> 107 108Resume from the specified I<offset> in the first device. If prefixed with 109B<0x> it will be interpreted as a hex number. 110 111The number may be prefixed with a sign: 112 113=over 114 115=item B<=> 116 117Seek to an absolute position (default) 118 119=item B<+> 120 121Seek to a relative position. On regular files this does the same as the above. 122 123=item B<-> 124 125Seek to EOF, minus the offset. 126 127=back 128 129=back 130 131=head1 USAGE 132 133Say you have destroyed the file system on /dev/hdb1 and you want to extract 134all the jpeg files you lost. This guide assumes you have installed Magic 135Rescue in F</usr/local>, which is the default. 136 137Make sure DMA and other optimizations are enabled on your disk, or it will take 138hours. In Linux, use hdparm to set these options: 139 140 $ hdparm -d 1 -c 1 -u 1 /dev/hdb 141 142Choose your output directory, somewhere with lots of disk space. 143 144 $ mkdir ~/output 145 146Look in the F</usr/local/share/magicrescue/recipes> directory for the recipes 147you want. Magic Rescue comes with recipes for some common file types, and you 148can make your own too (see the next section). Open the recipes you want to use 149in a text editor and read their comments. Most recipes require 3rd party 150software to work, and you may want to modify some parameters (such as 151B<min_output_file>) to suit your needs. 152 153Then invoke B<magicrescue> 154 155 $ magicrescue -r jpeg-jfif -r jpeg-exif -d ~/output /dev/hdb1 156 157It will scan through your entire hard disk, so it may take a while. You can 158stop it and resume later of you want to. To do so, interrupt it (with CTRL+C) 159and note the progress information saying what address it got to. Then restart 160it later with the B<-O> option. 161 162When it has finished you will probably find thousands of .jpg files in 163F<~/output>, including things you never knew was in your browser cache. Sorting 164through all those files can be a huge task, so you may want to use software or 165scripts to do it. 166 167First, try to eliminate duplicates with the B<dupemap>(1) tool included in this 168package. 169 170 $ dupemap delete,report ~/output 171 172If you are performing an undelete operation you will want to get rid 173of all the rescued files that also appear on the live file system. See the 174B<dupemap>(1) manual for instructions on doing this. 175 176If that's not enough, you can use use B<magicsort>(1) to get a better overview: 177 178 $ magicsort ~/output 179 180=head1 RECIPES 181 182=head2 Creating recipe files 183 184A recipe file is a relatively simple file of 3-5 lines of text. It describes 185how to recognise the beginning of the file and what to do when a file is 186recognised. For example, all jfif images start with the bytes C<0xff 0xd8>. 187At the 6th byte will be the string C<JFIF>. Look at F<recipes/jpeg-jfif> in 188the source distribution to follow this example. 189 190Matching magic data is done with a "match operation" that looks like this: 191 192I<offset> I<operation> I<parameter> 193 194where I<offset> is a decimal integer saying how many bytes from the beginning 195of the file this data is located, I<operation> refers to a built-in match 196operation in B<magicrescue>, and I<parameter> is specific to that operation. 197 198=over 199 200=item * 201 202The B<string> operation matches a string of any length. In the jfif example 203this is four bytes. You can use escape characters, like C<\n> or C<\xA7>. 204 205=item * 206 207The B<int32> operation matches 4 bytes ANDed with a bit mask. To match all 208four bytes, use the bit mask C<FFFFFFFF>. If you have no idea what a bit mask 209is, just use the B<string> operation instead. The mask C<FFFF0000> in the jfif 210example matches the first two bytes. 211 212=item * 213 214The B<char> operation is like "string", except it only matches a single 215character. 216 217=back 218 219To learn these patterns for a given file type, look at files of the desired 220type in a hex editor, search through the resource files for the B<file>(1) 221utility (L<http://freshmeat.net/projects/file>) and/or search the Internet for 222a reference on the format. 223 224If all the operations match, we have found the start of the file. Finding the 225end of the file is a much harder problem, and therefore it is delegated to an 226external shell command, which is named by the B<command> directive. This 227command receives the block device's file descriptor on stdin and must write to 228the file given to it in the C<$1> variable. Apart from that, the command can do 229anything it wants to try and extract the file. 230 231For some file types (such as jpeg), a tool already exists that can do this. 232However, many programs misbehave when told to read from the middle of a huge 233block device. Some seek to byte 0 before reading (can be fixed by prefixing 234cat|, but some refuse to work on a file they can't seek in). Others try to 235read the whole file into memory before doing anything, which will of course 236fail on a muti-gigabyte block device. And some fail completely to parse a 237partially corrupted file. 238 239This means that you may have to write your own tool or wrap an existing program 240in some scripts that make it behave better. For example, this could be to 241extract the first 10MB into a temporary file and let the program work on that. 242Or perhaps you can use F<tools/safecat> if the file may be very large. 243 244=head2 Recipe format reference 245 246Empty lines and lines starting with C<#> will be skipped. A recipe contains a 247series of match operations to find the content and a series of directives to 248specify what to do with it. 249 250Lines of the format I<offset> I<operation> I<parameter> will add a match 251operation to the list. Match operations will be tried in the order they appear 252in the recipe, and they must all match for the recipe to succeed. The 253I<offset> describes what offset this data will be found at, counting from the 254beginning of the file. I<operation> can have the following values: 255 256=over 7 257 258=item B<string> I<string> 259 260The parameter is a character sequence that may contain escape 261sequences such as \xFF. 262 263=item B<char> I<character> 264 265The parameter is a single character (byte), or an escape sequence. 266 267=item B<int32> I<value> I<bitmask> 268 269Both I<value> and I<bitmask> are expressed as 8-character hex strings. 270I<bitmask> will be ANDed with the data, and the result will be compared 271to I<value>. The byte order is as you see it in the hex editor, 272i.e. big-endian. 273 274=back 275 276The first match operation in a recipe is special, it will be used to scan 277through the file. Only the B<char> and B<string> operations can be used there. 278To add more operation types, look at the instructions in F<magicrescue.c>. 279 280A line that doesn't start with an integer is a directive. This can be: 281 282=over 7 283 284=item B<extension> I<ext> 285 286Mandatory. I<ext> names the file extension for this type, such as C<jpg>. 287 288=item B<command> I<command> 289 290Mandatory. When all the match operations succeed, this I<command> will be 291executed to extract the file from the block device. I<command> is passed to 292the shell with the block device's file descriptor (seeked to the right byte) on 293stdin. The shell variable C<$1> will contain the file its output should be 294written to, and it must respect this. Otherwise B<magicrescue> cannot tell 295whether it succeeded. 296 297=item B<rename> I<command> 298 299Optional. After a successful extraction this command will be run. Its purpose 300is to gather enough information about the file to rename it to something more 301meaningful. The script must not perform the rename command itself, but it 302should write to standard output the string C<RENAME>, followed by a space, 303followed by the new file name. Nothing else must be written to standard 304output. If the file should not be renamed, nothing should be written to 305standard output. Standard input and C<$1> will work like with the B<command> 306directive. 307 308=item B<min_output_file> I<size> 309 310Default: 100. Output files less than this size will be deleted. 311 312=item B<allow_overlap> I<bytes> 313 314By default, recipes will not match on overlapping byte ranges. 315B<allow_overlap> disables this, and it should always be used for recipes where 316the extracted file may be larger than it was on disk. If I<bytes> is negative, 317overlap checking will be completely disabled. Otherwise, overlap checking will 318be in effect for everything but the last I<bytes> of the output. For example, 319if the output may be up to 512 bytes bigger than the input, B<allow_overlap> 320should be set to 512. 321 322=back 323 324To test whether your recipe actually works, either just run it on your hard 325disk or use the F<tools/checkrecipe> script to pick out files that should match 326but don't. 327 328If you have created a recipe that works, please mail it to me at jbj@knef.dk so 329I can include it in the distribution. 330 331=head1 WHEN TO NOT USE MAGIC RESCUE 332 333Magic Rescue is not meant to be a universal application for file recovery. It 334will give good results when you are extracting known file types from an 335unusable file system, but for many other cases there are better tools 336available. 337 338=over 339 340=item * 341 342If there are intact partitions present somewhere, use B<gpart> to find them. 343 344=item * 345 346If file system's internal data structures are more or less undamaged, use 347B<The Sleuth Kit>. At the time of writing, it only supports NTFS, FAT, ext[23] 348and FFS, though. 349 350=item * 351 352If Magic Rescue does not have a recipe for the file type you are trying to 353recover, try B<foremost> instead. It recognizes more file types, but in most 354cases it extracts them simply by copying out a fixed number of bytes after it 355has found the start of the file. This makes postprocessing the output files 356more difficult. 357 358=back 359 360In many cases you will want to use Magic Rescue in addition to the tools 361mentioned above. They are not mutually exclusive, e.g. combining 362B<magicrescue> with B<dls> from The Sleuth Kit could give good results. In 363many cases you'll want to use B<magicrescue> to extract its known file types 364and another utility to extract the rest. 365 366When combining the results of more than one tool, B<dupemap>(1) can be used to 367eliminate duplicates. 368 369=head1 SEE ALSO 370 371=over 372 373=item Similar programs 374 375=over 376 377=item B<gpart>(8) 378 379L<http://www.stud.uni-hannover.de/user/76201/gpart/>. Tries to rebuild the 380partition table by scanning the disk for lost partitions. 381 382=item B<foremost>(1) 383 384L<http://foremost.sourceforge.net>. Does the same thing as B<magicrescue>, 385except that its "recipes" are less complex. Finding the end of the file must 386happen by either matching an EOF string or just extracting a fixed number of 387bytes every time. It supports more file types than Magic Rescue, but extracted 388files usually have lots of trailing garbage, so removal of duplicates and 389sorting by size is not possible. 390 391=item B<The Sleuth Kit> 392 393L<http://www.sleuthkit.org/sleuthkit/>. This popular package of utilities is 394extremely useful for undeleting files from a FAT/NTFS/ext2/ext3/FFS file system 395that's not completely corrupted. Most of the utilities are not very useful if 396the file system has been corrupted or overwritten. It is based on 397The Coroner's Toolkit (L<http://www.porcupine.org/forensics/tct.html>). 398 399=item JPEG recovery tools 400 401This seems to be the file type most people are trying to recover. Available 402utilities include L<http://www.cgsecurity.org/?photorec.html>, 403L<http://codesink.org/recover.html>, and 404L<http://www.vanheusden.com/findfile/>. 405 406=back 407 408=item Getting disk images from failed disks 409 410B<dd>(1), B<rescuept>(1), 411L<http://www.garloff.de/kurt/linux/ddrescue/>, 412L<http://www.kalysto.org/utilities/dd_rhelp/>, 413L<http://vanheusden.com/recoverdm/>, 414L<http://myrescue.sourceforge.net> 415 416=item Processing B<magicrescue>'s output 417 418B<dupemap>(1), B<file>(1), B<magicsort>(1), L<http://ccorr.sourceforge.net> 419 420=item Authoring recipes 421 422B<magic>(4), B<hexedit>(1), L<http://wotsit.org> 423 424=item Filesystem-specific undelete utilities 425 426There are too many to count them, especially for ext2 and FAT. Find them on 427Google and Freshmeat. 428 429=back 430 431=head1 AUTHOR 432 433Jonas Jensen <jbj@knef.dk> 434 435=head1 LATEST VERSION 436 437You can find the latest version at L<https://github.com/jbj/magicrescue> 438 439