1Because I kept ignoring this list of TODO items, I have 2moved them all to the SourceForge Feature Request Tracker. 3See https://sourceforge.net/tracker/?group_id=67079&atid=516781 4for all current TODO items. 5 6October 9, 2011 status report: 7 8md5deep/hashdeep: 9* the same code base 10* completely rewritten in C++ 11* multi-threaded (-j4 runs 4 worker threads and one primary thread). 12 - default is 1 worker per CPU. 13 - specify 0 workers to turn off multi-threading 14* each worker opens the files 15* Compiles for windows32 and windows64 using mingw on Fedora Core 15 16* Full regression testing operational. All known features are now testable and tested. 17* -hh now prints more help 18 19Here's future stuff to do: 20* Performance tuning: 21 1 - don't create hash_hex strings for algorithms we don't use. 22 2 - Currently hashlist is a multimap. It might be better to make it 23 a tr1::unordered_set, but that would require having the buckets be 24 vectors, since a single hash can have more than one file attached to it. 25* Style: 26 - Currently hashlist is a subclass of a multimap; it should be an opaque object that 27* Nice graphs showing peformance and speedup. 28* Test with actual hash collisions of files that have same MD5 but different SHA1 29* Add a file pattern option that specifies what files are to be hashed. 30* Improve usage for -hh prints ALL options. 31* Memory mapped files: 32 - Figure out why memory-mapped files are slower than unbuffered io. 33 1- memory-mapped files can generate a SIGSEGV or SIGBUS if the mapped region is not available. 34 http://linux.die.net/man/2/mmap 35 This needs to be explicitly handled with a signal handler. 36 2-Memory mapped files can be handled on WIndows. 37 - Until we get this, memory-mapped files are off by default. 38* Should have a better startegy for "Known file not used" in audit_check, because I don't want to modify the map. 39== 40 41Below is a list of what Simson Garfinkel did to bring hashdeep and md5deep from version 3 to version 4: 421 - Start with a new copy of the source 432 - fixed the autoconf files. 443 - migrated to C++ (hashdeep is now hashdeep.cpp) 454 - add DFXML to the multihash program. 465 - added an output mode to the multihash program that exactly matches 47 the output mode of the single hash md5deep, sha1deep, etc. This was 48 done by using md5deep's output functions directly. 496 - added a mode to the multihash program that exactly matches the 50 command-line options, and have that mode be the default when the 51 command name changes. This was done by using md5deep's command line 52 parser. 537 - changed state to a C++ class. 548 - Remove the current_file stuff from state to create another C++ class. 559 - Modified hash() function take the file being hashed and a place to put it. 5610 - Start replacing char * arrays with stl::string. Consider replacing TCHAR strings with vector<TCHAR>. 5711 - Remove hashtable object for the STL map 5812 - migrate all hash databases to a single class (hashlist.cpp) 5913 - Fixed -k so that it loads into that database 6014 - Fix audit mode so that it reads from that database. 6115 - Loading hashes should return a string with the set of hashes that were added. 6216 - Went through entire program looking for dead code. 6317 - migrated to multi-threaded producer/consumer architecture with 64 the file searching (dig) being the producer and hash() being the 65 consumer. (This code was taken from bulk_extractor) 6618 - Add "-j" option to control how many threads; -j0 turns off threading. 6719 - Made error printing is threadsafe 6820 - Remove the file_name_annotation and instead explicitly remembered 69 the piecewise start and stop. 7021 - Figured out why threading turns on piecewise problem. Turns out 71 that the SHA1 implementation was not threadsafe; replaced it with a 72 different one. 7322 - got hashlist matching working again 7423 - Added support for common crypto functions on mac. This makes SHA1 75 and SHA256 go dramatically faster. 76 77 78 79================ 80Remaining: 81 82* - Change all codes to UTF8 in all modes of operation 83* - Option to include BOM in plain text output mode. 84