1Because I kept ignoring this list of TODO items, I have
2moved them all to the SourceForge Feature Request Tracker.
3See https://sourceforge.net/tracker/?group_id=67079&atid=516781
4for all current TODO items.
5
6October 9, 2011 status report:
7
8md5deep/hashdeep:
9* the same code base
10* completely rewritten in C++
11* multi-threaded (-j4 runs 4 worker threads and one primary thread).
12  - default is 1 worker per CPU.
13  - specify 0 workers to turn off multi-threading
14* each worker opens the files
15* Compiles for windows32 and windows64 using mingw on Fedora Core 15
16* Full regression testing operational. All known features are now testable and tested.
17* -hh now prints more help
18
19Here's future stuff to do:
20* Performance tuning:
21  1 - don't create hash_hex strings for algorithms we don't use.
22  2 - Currently hashlist is a multimap. It might be better to make it
23      a tr1::unordered_set, but that would require having the buckets be
24      vectors, since a single hash can have more than one file attached to it.
25* Style:
26  - Currently hashlist is a subclass of a multimap; it should be an opaque object that
27* Nice graphs showing peformance and speedup.
28* Test with actual hash collisions of files that have same MD5 but different SHA1
29* Add a file pattern option that specifies what files are to be hashed.
30* Improve usage for -hh prints ALL options.
31* Memory mapped files:
32  - Figure out why memory-mapped files are slower than unbuffered io.
33    1- memory-mapped files can generate a SIGSEGV or SIGBUS if the mapped region is not available.
34       http://linux.die.net/man/2/mmap
35       This needs to be explicitly handled with a signal handler.
36    2-Memory mapped files can be handled on WIndows.
37  - Until we get this, memory-mapped files are off by default.
38* Should have a better startegy for "Known file not used" in audit_check, because I don't want to modify the map.
39==
40
41Below is a list of what Simson Garfinkel did to bring hashdeep and md5deep from version 3 to version 4:
421  - Start with a new copy of the source
432  - fixed the autoconf files.
443  - migrated to C++ (hashdeep is now hashdeep.cpp)
454  - add DFXML to the multihash program.
465  - added an output mode to the multihash program that exactly matches
47     the output mode of the single hash md5deep, sha1deep, etc. This was
48     done by using md5deep's output functions directly.
496  - added a mode to the multihash program that exactly matches the
50     command-line options, and have that mode be the default when the
51     command name changes. This was done by using md5deep's command line
52     parser.
537  - changed state to a C++ class.
548  - Remove the current_file stuff from state to create another C++ class.
559  - Modified hash() function take the file being hashed and a place to put it.
5610 - Start replacing char * arrays with stl::string. Consider replacing TCHAR strings with vector<TCHAR>.
5711 - Remove hashtable object for the STL map
5812 - migrate all hash databases to a single class (hashlist.cpp)
5913 - Fixed -k so that it loads into that database
6014 - Fix audit mode so that it reads from that database.
6115 - Loading hashes should return a string with the set of hashes that were added.
6216 - Went through entire program looking for dead code.
6317 - migrated to multi-threaded producer/consumer architecture with
64     the file searching (dig) being the producer and hash() being the
65     consumer. (This code was taken from bulk_extractor)
6618 - Add "-j" option to control how many threads; -j0 turns off threading.
6719 - Made error printing is threadsafe
6820 - Remove the file_name_annotation and instead explicitly remembered
69     the piecewise start and stop.
7021 - Figured out why threading turns on piecewise problem. Turns out
71     that the SHA1 implementation was not threadsafe; replaced it with a
72     different one.
7322 - got hashlist matching working again
7423 - Added support for common crypto functions on mac. This makes SHA1
75     and SHA256 go dramatically faster.
76
77
78
79================
80Remaining:
81
82* - Change all codes to UTF8 in all modes of operation
83* - Option to include BOM in plain text output mode.
84