1This is md5deep, a set of cross-platform tools to computer hashes, or 2message digests, for any number of files while optionally recursively 3digging through the directory structure. It can also take a list of known 4hashes and display the filenames of input files whose hashes either do or 5do not match any of the known hashes. This version supports MD5, SHA-1, 6SHA-256, Tiger, and Whirlpool hashes. 7 8See the file [NEWS](NEWS) for a list of changes between releases. 9 10See the file [COPYING](COPYING) for information about the licensing for this program. 11 12See the file [INSTALL](INSTALL) for (generic) compilation and installation 13instructions. Here's the short version that should just work in many cases: 14 15```shell 16sh bootstrap.sh # runs autoconf, automake 17./configure 18make 19make install 20``` 21 22Note that you must be normally root to install to the default location. 23The sudo command is helpful for doing so. You can specify an alternate 24installation location using the --prefix option to the configure script. 25For example, to install to /home/foo/bin, use: 26 27>$ ./configure --prefix=/home/foo 28 29There is complete documentation on how to use the program on the 30project's homepage, [https://github.com/jessek/hashdeep](https://github.com/jessek/hashdeep) 31 32## md5deep vs. hashdeep 33 34For historical reasons, the program has different options and features 35when run with the names "hashdeep" and "md5deep." 36 37hashdeep has a feature called "audit" which: 38> \* Can also use a list of known hashes to audit a set of FILES. Errors 39> are reported to standard error. If no FILES are specified, reads from 40> standard input. 41> 42> -a Audit mode. Each input file is compared against the set of knowns. An 43> audit is said to pass if each input file is matched against exactly 44> one file in set of knowns. Any collisions, new files, or missing files 45> will make the audit fail. Using this flag alone produces a message, 46> either "Audit passed" or "Audit Failed". 47> 48> -v - prints the number of files in each category 49> -v -v = prints all discrepancies 50> -v -v -v = prints the results for every file examined and every known file. 51> 52> -k <file> - The -k option must be used to load the audit file 53 54To perform an audit: 55> hashdeep -r dir > /tmp/auditfile # Generate the audit file 56> hashdeep -a k /tmp/auditfile -r dir # test the audit 57 58Notice that the audit is performed with a standard hashdeep output 59file. (Internally, the audit is computed as part of the hashing process.) 60 61## Unicode Issues 62POSIX-based modern computer systems consider filenames to be a 63sequence of bytes that are rendered as the application wishes. This 64means that filenames typically contain ASCII but can contain UTF-8, 65UTF-16, latin1, or even invalid Unicode codings. 66 67Windows-based systems have one set of API calls for ASCII-based 68filenames and another set for filenames encoded as UCS-2, which 69"produces a fixed-length format by simply using the code point as the 7016-bit code unit and produces exactly the same result as UTF-16 for 7163,488 code points in the range 0-0xFFFF" according to [wikipedia] 72(http://en.wikipedia.org/wiki/UTF-16/UCS-2). But wikipedia disputes the 73factual accuracy of this statement on the talk page. it's pretty clear 74that nobody is entirely sure that Windows actually does, and Windows 75itself may not be consistent. 76 77Version 3 of this program addressed this issue by using the TCHAR 78variable to hold filenames on Windowa dn by refusing to print them, 79priting a "?" instead. Version 4 of this program translates TCHAR 80strings to std::string strings at the soonest opportunity using the 81[Windows function WideCharToMultiByte] 82(http://msdn.microsoft.com/en-us/library/dd374130%28v=vs.85%29.aspx). Flags 83have been added escape Unicode when it is printed. 84 85There is no way (apparently) on Windows to open a UTF-8 filename; it needs to be 86converted back to a multi-byte filename with MultiByteToWideChar. 87 88Fortunately, we never really need to convert back. 89 90Notice that on Windows the files hashed can have unicode characters 91but the file with the hashes must have an ASCII name. 92 93COMPILING FOR WINDOWS: 94> -D_UNICODE causes TCHAR to be defined as 'wchar_t'. 95 96COMPILING FOR POSIX: 97> -D_UNICODE is not defined, causing TCHAR to be defined as 'char'. 98 99Previously, win32 functions were controlled with #ifdef statements, like this: 100 101```C 102#ifdef _WIN32 103 _wfullpath(d_name,fn,PATH_MAX); 104#else 105 if (NULL == realpath(fn,d_name)) 106 return TRUE; 107#endif 108``` 109 110There was also a file called tchar-local.h which actually changed the semantics 111of functions on different platforms, with things like this: 112 113```C 114 #define _tcsncpy strncpy 115 #define _tstat_t struct stat 116``` 117 118This made the code very difficult to maintain. 119 120With the 4.0 rewrite, we have changed this code with C++ functions that return 121objects were possible and avoid the use of #defines that so that on _WIN32 systems 122the function realpath() gets defined prior to its use, and the mainline code 123lacks the realpath() function. You can see this in cycles.cpp: 124 125```C 126/* Return the canonicalized absolute pathname in UTF-8 on Windows and POSIX systems */ 127std::string get_realpath(const TCHAR *fn) 128{ 129#ifdef _WIN32 130 /* 131 * expand a relative path to the full path. 132 * http://msdn.microsoft.com/en-us/library/506720ff(v=vs.80).aspx 133 */ 134 TCHAR absPath[PATH_MAX]; 135 if(_fullpath(absPath,fn,PAT_HMAX)==0) return ""; 136 return tchar_to_utf8(absPath); 137#else 138 char resolved_name[PATH_MAX]; // 139 if(realpath(fn,resolved_name)==0) return ""; 140 return string(resolved_name); 141#endif 142} 143``` 144 145You can install mingw and then simply configure with something like this: 146>$ export PATH=$PATH:/usr/local/i386-mingw32-4.3.0/bin 147>$ ./configure --host=i386-mingw32 148 149 150## Hash Algorithm References 151 152The MD5 algorithm is defined in RFC 1321: 153http://www.ietf.org/rfc/rfc1321.txt 154 155The SHA1 algorithm is defined in FIPS 180-1: 156http://www.itl.nist.gov/fipspubs/fip180-1.htm 157 158The SHA256 algorithm is defined FIPS 180-2: 159http://csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf 160 161The Tiger algorithm is defined at: 162http://www.cs.technion.ac.il/~biham/Reports/Tiger/ 163 164The Whirlpool algorithm is defined at: 165http://planeta.terra.com.br/informatica/paulobarreto/WhirlpoolPage.html 166 167## Theory of Operation 168 169* main.cpp 170 * sets up the system 171* dig.cpp 172 * iterates through the individual directories 173 * calls hash_file() in hash.cpp for each file to hash 174* hash.cpp 175 * performs the hashing of each file 176* display.cpp 177 * stores/displays the results 178