1=head1 NAME
2
3magicrescue - Scans a block device and extracts known file types by looking at
4magic bytes.
5
6=head1 SYNOPSIS
7
8B<magicrescue> [ I<options> ] I<devices>
9
10=head1 DESCRIPTION
11
12Magic Rescue opens I<devices> for reading, scans them for file types it knows
13how to recover and calls an external program to extract them.  It looks at
14"magic bytes" in file contents, so it can be used both as an undelete utility
15and for recovering a corrupted drive or partition.  It works on any file system,
16but on very fragmented file systems it can only recover the first chunk of
17each file.  These chunks are sometimes as big as 50MB, however.
18
19To invoke B<magicrescue>, you must specify at least one device and the B<-d>
20and B<-r> options.  See the L</USAGE> section in this manual for getting
21started.
22
23=head1 OPTIONS
24
25=over 7
26
27=item B<-b> I<blocksize>
28
29Default: 1.  This will direct B<magicrescue> to only consider files that start
30at a multiple of the I<blocksize> argument.  The option applies only to the
31recipes following it, so by specifying it multiple times it can be used to get
32different behavior for different recipes.
33
34Using this option you can usually get better performance, but fewer files will
35be found.  In particular, files with leading garbage (e.g. many mp3 files) and
36files contained inside other files are likely to be skipped.  Also, some file
37systems don't align small files to block boundaries, so those won't be found
38this way either.
39
40If you don't know your file system's block size, just use the value 512, which
41is almost always the hardware sector size.
42
43=item B<-d> I<directory>
44
45Mandatory.  Output directory for found files.  Make sure you have plenty of free
46space in this directory, especially when extracting very common file types such
47as jpeg or gzip files.  Also make sure the file system is able to handle
48thousands of files in a single directory, i.e. don't use FAT if you are
49extracting many files.
50
51You should not place the output directory on the same block device you are
52trying to rescue files from.  This might add the same file to the block device
53ahead of the current reading position, causing B<magicrescue> to find the same
54file again later.  In the worst theoretical case, this could cause a
55loop where the same file is extracted thousands of times until disk space is
56exhausted.  You are also likely to overwrite the deleted files you were looking
57for in the first place.
58
59=item B<-r> I<recipe>
60
61Mandatory.  Recipe name, file, or directory.  Specify this as either a plain
62name (e.g.  C<jpeg-jfif>) or a path (e.g. F<recipes/jpeg-jfif>).  If it doesn't
63find such a file in the current directory, it will look in F<./recipes> and
64F<PREFIX/share/magicrescue/recipes>, where I<PREFIX> is the path you installed
65to, e.g. F</usr/local>.
66
67If I<recipe> is a directory, all files in that directory will be treated as
68recipes.
69
70Browse the F<PREFIX/share/magicrescue/recipes> directory to see what recipes
71are available.  A recipe is a text file, and you should read the comments
72inside it before using it.  Either use the recipe as it is or copy it somewhere
73and modify it.
74
75For information on creating your own recipes, see the L</RECIPES> section.
76
77=item B<-I> I<file>
78
79Reads input files from I<file> in addition to those listed on the command line.
80If I<file> is C<->, read from standard input.  Each line will be interpreted as
81a file name.
82
83=item B<-M> I<output_mode>
84
85Produce machine-readable output to stdout.  I<output_mode> can be:
86
87=over
88
89=item B<i>
90
91Print each input file name before processing
92
93=item B<o>
94
95Print each output file name after processing
96
97=item B<io>
98
99Print both input and output file names.  Input file names will be prefixed by
100C<i> and a space.  Output file names will be prefixed by C<o> and a space.
101
102=back
103
104Nothing else will be written to standard output in this mode.
105
106=item B<-O> [B<+>|B<->|B<=>][B<0x>]I<offset>
107
108Resume from the specified I<offset> in the first device.  If prefixed with
109B<0x> it will be interpreted as a hex number.
110
111The number may be prefixed with a sign:
112
113=over
114
115=item B<=>
116
117Seek to an absolute position (default)
118
119=item B<+>
120
121Seek to a relative position.  On regular files this does the same as the above.
122
123=item B<->
124
125Seek to EOF, minus the offset.
126
127=back
128
129=back
130
131=head1 USAGE
132
133Say you have destroyed the file system on /dev/hdb1 and you want to extract
134all the jpeg files you lost.  This guide assumes you have installed Magic
135Rescue in F</usr/local>, which is the default.
136
137Make sure DMA and other optimizations are enabled on your disk, or it will take
138hours.  In Linux, use hdparm to set these options:
139
140    $ hdparm -d 1 -c 1 -u 1 /dev/hdb
141
142Choose your output directory, somewhere with lots of disk space.
143
144    $ mkdir ~/output
145
146Look in the F</usr/local/share/magicrescue/recipes> directory for the recipes
147you want.  Magic Rescue comes with recipes for some common file types, and you
148can make your own too (see the next section).  Open the recipes you want to use
149in a text editor and read their comments.  Most recipes require 3rd party
150software to work, and you may want to modify some parameters (such as
151B<min_output_file>) to suit your needs.
152
153Then invoke B<magicrescue>
154
155    $ magicrescue -r jpeg-jfif -r jpeg-exif -d ~/output /dev/hdb1
156
157It will scan through your entire hard disk, so it may take a while.  You can
158stop it and resume later of you want to.  To do so, interrupt it (with CTRL+C)
159and note the progress information saying what address it got to.  Then restart
160it later with the B<-O> option.
161
162When it has finished you will probably find thousands of .jpg files in
163F<~/output>, including things you never knew was in your browser cache.  Sorting
164through all those files can be a huge task, so you may want to use software or
165scripts to do it.
166
167First, try to eliminate duplicates with the B<dupemap>(1) tool included in this
168package.
169
170    $ dupemap delete,report ~/output
171
172If you are performing an undelete operation you will want to get rid
173of all the rescued files that also appear on the live file system.  See the
174B<dupemap>(1) manual for instructions on doing this.
175
176If that's not enough, you can use use B<magicsort>(1) to get a better overview:
177
178    $ magicsort ~/output
179
180=head1 RECIPES
181
182=head2 Creating recipe files
183
184A recipe file is a relatively simple file of 3-5 lines of text.  It describes
185how to recognise the beginning of the file and what to do when a file is
186recognised.  For example, all jfif images start with the bytes C<0xff 0xd8>.
187At the 6th byte will be the string C<JFIF>.  Look at F<recipes/jpeg-jfif> in
188the source distribution to follow this example.
189
190Matching magic data is done with a "match operation" that looks like this:
191
192I<offset> I<operation> I<parameter>
193
194where I<offset> is a decimal integer saying how many bytes from the beginning
195of the file this data is located, I<operation> refers to a built-in match
196operation in B<magicrescue>, and I<parameter> is specific to that operation.
197
198=over
199
200=item *
201
202The B<string> operation matches a string of any length.  In the jfif example
203this is four bytes.  You can use escape characters, like C<\n> or C<\xA7>.
204
205=item *
206
207The B<int32> operation matches 4 bytes ANDed with a bit mask.  To match all
208four bytes, use the bit mask C<FFFFFFFF>.  If you have no idea what a bit mask
209is, just use the B<string> operation instead.  The mask C<FFFF0000> in the jfif
210example matches the first two bytes.
211
212=item *
213
214The B<char> operation is like "string", except it only matches a single
215character.
216
217=back
218
219To learn these patterns for a given file type, look at files of the desired
220type in a hex editor, search through the resource files for the B<file>(1)
221utility (L<http://freshmeat.net/projects/file>) and/or search the Internet for
222a reference on the format.
223
224If all the operations match, we have found the start of the file.  Finding the
225end of the file is a much harder problem, and therefore it is delegated to an
226external shell command, which is named by the B<command> directive.  This
227command receives the block device's file descriptor on stdin and must write to
228the file given to it in the C<$1> variable.  Apart from that, the command can do
229anything it wants to try and extract the file.
230
231For some file types (such as jpeg), a tool already exists that can do this.
232However, many programs misbehave when told to read from the middle of a huge
233block device.  Some seek to byte 0 before reading (can be fixed by prefixing
234cat|, but some refuse to work on a file they can't seek in).  Others try to
235read the whole file into memory before doing anything, which will of course
236fail on a muti-gigabyte block device.  And some fail completely to parse a
237partially corrupted file.
238
239This means that you may have to write your own tool or wrap an existing program
240in some scripts that make it behave better.  For example, this could be to
241extract the first 10MB into a temporary file and let the program work on that.
242Or perhaps you can use F<tools/safecat> if the file may be very large.
243
244=head2 Recipe format reference
245
246Empty lines and lines starting with C<#> will be skipped.  A recipe contains a
247series of match operations to find the content and a series of directives to
248specify what to do with it.
249
250Lines of the format I<offset> I<operation> I<parameter> will add a match
251operation to the list.  Match operations will be tried in the order they appear
252in the recipe, and they must all match for the recipe to succeed.  The
253I<offset> describes what offset this data will be found at, counting from the
254beginning of the file.  I<operation> can have the following values:
255
256=over 7
257
258=item B<string> I<string>
259
260The parameter is a character sequence that may contain escape
261sequences such as \xFF.
262
263=item B<char> I<character>
264
265The parameter is a single character (byte), or an escape sequence.
266
267=item B<int32> I<value> I<bitmask>
268
269Both I<value> and I<bitmask> are expressed as 8-character hex strings.
270I<bitmask> will be ANDed with the data, and the result will be compared
271to I<value>.  The byte order is as you see it in the hex editor,
272i.e. big-endian.
273
274=back
275
276The first match operation in a recipe is special, it will be used to scan
277through the file.  Only the B<char> and B<string> operations can be used there.
278To add more operation types, look at the instructions in F<magicrescue.c>.
279
280A line that doesn't start with an integer is a directive.  This can be:
281
282=over 7
283
284=item B<extension> I<ext>
285
286Mandatory.  I<ext> names the file extension for this type, such as C<jpg>.
287
288=item B<command> I<command>
289
290Mandatory.  When all the match operations succeed, this I<command> will be
291executed to extract the file from the block device.  I<command> is passed to
292the shell with the block device's file descriptor (seeked to the right byte) on
293stdin.  The shell variable C<$1> will contain the file its output should be
294written to, and it must respect this.  Otherwise B<magicrescue> cannot tell
295whether it succeeded.
296
297=item B<rename> I<command>
298
299Optional.  After a successful extraction this command will be run.  Its purpose
300is to gather enough information about the file to rename it to something more
301meaningful.  The script must not perform the rename command itself, but it
302should write to standard output the string C<RENAME>, followed by a space,
303followed by the new file name.  Nothing else must be written to standard
304output.  If the file should not be renamed, nothing should be written to
305standard output.  Standard input and C<$1> will work like with the B<command>
306directive.
307
308=item B<min_output_file> I<size>
309
310Default: 100.  Output files less than this size will be deleted.
311
312=item B<allow_overlap> I<bytes>
313
314By default, recipes will not match on overlapping byte ranges.
315B<allow_overlap> disables this, and it should always be used for recipes where
316the extracted file may be larger than it was on disk.  If I<bytes> is negative,
317overlap checking will be completely disabled.  Otherwise, overlap checking will
318be in effect for everything but the last I<bytes> of the output.  For example,
319if the output may be up to 512 bytes bigger than the input, B<allow_overlap>
320should be set to 512.
321
322=back
323
324To test whether your recipe actually works, either just run it on your hard
325disk or use the F<tools/checkrecipe> script to pick out files that should match
326but don't.
327
328If you have created a recipe that works, please mail it to me at jbj@knef.dk so
329I can include it in the distribution.
330
331=head1 WHEN TO NOT USE MAGIC RESCUE
332
333Magic Rescue is not meant to be a universal application for file recovery.  It
334will give good results when you are extracting known file types from an
335unusable file system, but for many other cases there are better tools
336available.
337
338=over
339
340=item *
341
342If there are intact partitions present somewhere, use B<gpart> to find them.
343
344=item *
345
346If file system's internal data structures are more or less undamaged, use
347B<The Sleuth Kit>.  At the time of writing, it only supports NTFS, FAT, ext[23]
348and FFS, though.
349
350=item *
351
352If Magic Rescue does not have a recipe for the file type you are trying to
353recover, try B<foremost> instead.  It recognizes more file types, but in most
354cases it extracts them simply by copying out a fixed number of bytes after it
355has found the start of the file.  This makes postprocessing the output files
356more difficult.
357
358=back
359
360In many cases you will want to use Magic Rescue in addition to the tools
361mentioned above.  They are not mutually exclusive, e.g. combining
362B<magicrescue> with B<dls> from The Sleuth Kit could give good results.  In
363many cases you'll want to use B<magicrescue> to extract its known file types
364and another utility to extract the rest.
365
366When combining the results of more than one tool, B<dupemap>(1) can be used to
367eliminate duplicates.
368
369=head1 SEE ALSO
370
371=over
372
373=item Similar programs
374
375=over
376
377=item B<gpart>(8)
378
379L<http://www.stud.uni-hannover.de/user/76201/gpart/>.  Tries to rebuild the
380partition table by scanning the disk for lost partitions.
381
382=item B<foremost>(1)
383
384L<http://foremost.sourceforge.net>.  Does the same thing as B<magicrescue>,
385except that its "recipes" are less complex.  Finding the end of the file must
386happen by either matching an EOF string or just extracting a fixed number of
387bytes every time.  It supports more file types than Magic Rescue, but extracted
388files usually have lots of trailing garbage, so removal of duplicates and
389sorting by size is not possible.
390
391=item B<The Sleuth Kit>
392
393L<http://www.sleuthkit.org/sleuthkit/>.  This popular package of utilities is
394extremely useful for undeleting files from a FAT/NTFS/ext2/ext3/FFS file system
395that's not completely corrupted.  Most of the utilities are not very useful if
396the file system has been corrupted or overwritten.  It is based on
397The Coroner's Toolkit (L<http://www.porcupine.org/forensics/tct.html>).
398
399=item JPEG recovery tools
400
401This seems to be the file type most people are trying to recover.  Available
402utilities include L<http://www.cgsecurity.org/?photorec.html>,
403L<http://codesink.org/recover.html>, and
404L<http://www.vanheusden.com/findfile/>.
405
406=back
407
408=item Getting disk images from failed disks
409
410B<dd>(1), B<rescuept>(1),
411L<http://www.garloff.de/kurt/linux/ddrescue/>,
412L<http://www.kalysto.org/utilities/dd_rhelp/>,
413L<http://vanheusden.com/recoverdm/>,
414L<http://myrescue.sourceforge.net>
415
416=item Processing B<magicrescue>'s output
417
418B<dupemap>(1), B<file>(1), B<magicsort>(1), L<http://ccorr.sourceforge.net>
419
420=item Authoring recipes
421
422B<magic>(4), B<hexedit>(1), L<http://wotsit.org>
423
424=item Filesystem-specific undelete utilities
425
426There are too many to count them, especially for ext2 and FAT.  Find them on
427Google and Freshmeat.
428
429=back
430
431=head1 AUTHOR
432
433Jonas Jensen <jbj@knef.dk>
434
435=head1 LATEST VERSION
436
437You can find the latest version at L<https://github.com/jbj/magicrescue>
438
439