1David Anderson. 2January 2014. 3 4In 2013 I realized I could simplify some code in libdwarf 5(another project) by removing some complicated and messy 6special memory allocation code. Substituting by use of 7malloc/free and a table of used values. The used-values needed 8to a preserve a special feature of the original allocations 9related to a special handle: any allocations would be freed 10by closing the special handle. 11 12So I decided to implement various searches (mostly tree 13searches) using the tsearch interface definitions. 14I wrote these using a dwarf_ prefix to distinguish 15these from any libc implementation and to make the 16source fit in easily with libdwarf conventions. 17 18I added a tsearch() implementation using hashing too. 19The GNU-only hsearch() function does not let the hash grow 20(the tsearch() hash here grows automatically as needed) 21and has no hdelete() or hwalk() capability, so hsearch() and 22friends did not seem like a good interface set even though 23the GNU-designed hsearch_r() interface set is nicely designed. 24 25The attached C code is a small set of implementations of 26C search code. Based on algorithms in Knuth Volume 3 27(2nd Ed) and Sedgewick 4th Edition. 28 29All the code here is implemented by me. I have never read the 30source to any other implementations of the tsearch algorighms 31or interfaces so this is a clean-room implementation. 32 33The hash version weakens the correspondence to tree based 34tsearch because, well, it's not a tree. So twalk() behaves 35differently than a tree-based version and an initialization. 36 37Non-GNU libc usually has no tdestroy(). The set of functions 38here provides a tdestroy() for all the tree/hash function 39sets here. 40 41 42================== 43WHY TSEARCH? 44 45The interfaces are of a well-known design. While 46the interfaces are based on an obsolete style of writing 47interfaces, they are a known set. So the routines provided 48here are a direct substitution. 49 50The tdestroy() function is available in GNU libc, but is 51not part of the standard tsearch() set, nor is tdestroy() 52defined in the Single Unix Specification. 53 54The hash version of tdelete() cannot always return a non-null 55value even if there are records left. Not being a tree at 56all, it would cost runtime to provide a non-null return in 57some cases even when there are records left from tdelete(). 58The return value from tdelete() is problematic in all versions, 59since it purports to return the parent of the deleted node, 60but what if the deleted node was the root? 61 62================== 63LICENSING: 64 65I wanted this to be usable in any context, hence the use of 66the BSD open-source license. 67 68================== 69INTERFACE ODDITIES: 70 71It's not really easy to understand whether any given 72call is returning 73 success, action succeeded 74 tsearch success - added record 75 tsearch success - record already in tree 76 fail due to internal error or improper argument. 77 requested action impossible 78 (tfind() fails to find a record, or tdelete() fails 79 to find the record to delete) 80 81 82Speaking of C here, so, there is no try/catch available. 83 84It might have been nice if twalk() let the user-provided 85action code indicate to twalk() that the user wanted it to stop 86walking the tree. 87 88I won't be changing the interface though. 89I am staying with the standard interfaces. 90 91================== 92DOCUMENTATION ODDITIES: 93 94The GNU/Linux man pages on tsearch() and friends are nearly 95useless as you won't understand how to use the functions from 96reading that man page. 97 98The prototype for tfind() in the man page is slightly wrong 99while the headers in <search.h> are correct. 100 101================== 102IMPLEMENTATION ODDITIES: 103 104The code uses names like dwarf_tsearch() instead of just 105tsearch() so one can have both this tsearch() and GNU or 106a standard tsearch code available simultaneously with no 107interference at runtime. The code here operates like GNU or 108UNIX tsearch() but is internally incompatible. We'll usually 109refer to tsearch() not dwarf_tsearch() (etc) but we mean either 110and both unless otherwise specified here. 111 112The use of tsearch() and friends is a little tricky. 113That is just the way it is. The best reference is the code 114in tsearch_tester.c. 115 116See the trivial set of #defines in tsearch_tester.c that 117result in a standard-based naming of the functions. 118 119The return value from tdelete() is particularly problematic 120because it purports to return a pointer to another record 121than the one deleted, yet no such record necessarily exists. 122And in the case of 123 124The hash version requires use of the new function 125dwarf_initialize_search_hash() to provide a hashing 126function. And the hash version does not set the root pointer 127to NULL on tdelete() of the last record, since that would 128result in losing the hashing function pointer. Use tdestroy() 129to free up any remaining space. 130 131dwarf_tdump() is an invention and unique to this code. It is 132for debugging. It prints a representation of the data in the 133tree or hash to stdout. 134 135The #include .h file set used here makes it much easier 136for me to move these files to another project as necessary. 137You will probably want to revise the #include set a little. 138 139 140================== 141OTHER DIRECTORIES: 142 143testdata contains a set of sample allocations, where 144lines starting with 'a' mean add and 'd' means delete. 145These are used for testing the library code. 146 147scripts contains some python scripts for converting 148results produced by printf added to libdwarf alloca() 149code. These were used massage the print output 150from running dwarfdump on real object files 151into the testcase/* format. 152It is unlikely you will find them much use 153except as starting points. 154 155 156================== 157REFERENCES: 158 159Donald E Knuth, The Art of Computer Programming Volume 3, Second Edition. 160(Quite difficult to interpret some parts of Knuth's 161algorithms, but then I always found Knuth hard to 162understand. Still, Knuth is the essential source for 163algorithms.) 164 165The Open Group, The Single UNIX Specification, Version 3(2001). 166The Single Unix Specification (or whatever it is called now) 167is a reference source everyone should have. 168 169Robert Sedgewick and Kevin Wayne, Algorithms (4th Ed). 170This is the crucial reference on red-black trees. 171There is a crucial fix in the errata of the 3rd printing. 172In addition, in dwarf_tsearchred.c in two places I had to fix things, 173look for 'Mistake' in the source. 174 175 176http://www.prevanders.net/tsearch.html 177 178 179