1*b8851fccSafresh1 2*b8851fccSafresh1 3*b8851fccSafresh1 4*b8851fccSafresh1 5*b8851fccSafresh1 6*b8851fccSafresh1 7*b8851fccSafresh1 sdbm - Substitute DBM 8*b8851fccSafresh1 or 9*b8851fccSafresh1 Berkeley ndbm for Every UN*X[1] Made Simple 10*b8851fccSafresh1 11*b8851fccSafresh1 Ozan (oz) Yigit 12*b8851fccSafresh1 13*b8851fccSafresh1 The Guild of PD Software Toolmakers 14*b8851fccSafresh1 Toronto - Canada 15*b8851fccSafresh1 16*b8851fccSafresh1 oz@nexus.yorku.ca 17*b8851fccSafresh1 18*b8851fccSafresh1 19*b8851fccSafresh1 20*b8851fccSafresh1Implementation is the sincerest form of flattery. - L. Peter 21*b8851fccSafresh1Deutsch 22*b8851fccSafresh1 23*b8851fccSafresh1A The Clone of the ndbm library 24*b8851fccSafresh1 25*b8851fccSafresh1 The sources accompanying this notice - sdbm - consti- 26*b8851fccSafresh1tute the first public release (Dec. 1990) of a complete 27*b8851fccSafresh1clone of the Berkeley UN*X ndbm library. The sdbm library is 28*b8851fccSafresh1meant to clone the proven functionality of ndbm as closely 29*b8851fccSafresh1as possible, including a few improvements. It is practical, 30*b8851fccSafresh1easy to understand, and compatible. The sdbm library is not 31*b8851fccSafresh1derived from any licensed, proprietary or copyrighted 32*b8851fccSafresh1software. 33*b8851fccSafresh1 34*b8851fccSafresh1 The sdbm implementation is based on a 1978 algorithm 35*b8851fccSafresh1[Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". 36*b8851fccSafresh1In the course of searching for a substitute for ndbm, I pro- 37*b8851fccSafresh1totyped three different external-hashing algorithms [Lar78, 38*b8851fccSafresh1Fag79, Lit80] and ultimately chose Larson's algorithm as a 39*b8851fccSafresh1basis of the sdbm implementation. The Bell Labs dbm (and 40*b8851fccSafresh1therefore ndbm) is based on an algorithm invented by Ken 41*b8851fccSafresh1Thompson, [Tho90, Tor87] and predates Larson's work. 42*b8851fccSafresh1 43*b8851fccSafresh1 The sdbm programming interface is totally compatible 44*b8851fccSafresh1with ndbm and includes a slight improvement in database ini- 45*b8851fccSafresh1tialization. It is also expected to be binary-compatible 46*b8851fccSafresh1under most UN*X versions that support the ndbm library. 47*b8851fccSafresh1 48*b8851fccSafresh1 The sdbm implementation shares the shortcomings of the 49*b8851fccSafresh1ndbm library, as a side effect of various simplifications to 50*b8851fccSafresh1the original Larson algorithm. It does produce holes in the 51*b8851fccSafresh1page file as it writes pages past the end of file. (Larson's 52*b8851fccSafresh1paper include a clever solution to this problem that is a 53*b8851fccSafresh1result of using the hash value directly as a block address.) 54*b8851fccSafresh1On the other hand, extensive tests seem to indicate that 55*b8851fccSafresh1sdbm creates fewer holes in general, and the resulting page- 56*b8851fccSafresh1files are smaller. The sdbm implementation is also faster 57*b8851fccSafresh1than ndbm in database creation. Unlike the ndbm, the sdbm 58*b8851fccSafresh1_________________________ 59*b8851fccSafresh1 60*b8851fccSafresh1 [1] UN*X is not a trademark of any (dis)organization. 61*b8851fccSafresh1 62*b8851fccSafresh1 63*b8851fccSafresh1 64*b8851fccSafresh1 65*b8851fccSafresh1 66*b8851fccSafresh1 67*b8851fccSafresh1 68*b8851fccSafresh1 69*b8851fccSafresh1 70*b8851fccSafresh1 - 2 - 71*b8851fccSafresh1 72*b8851fccSafresh1 73*b8851fccSafresh1store operation will not "wander away" trying to split its 74*b8851fccSafresh1data pages to insert a datum that cannot (due to elaborate 75*b8851fccSafresh1worst-case situations) be inserted. (It will fail after a 76*b8851fccSafresh1pre-defined number of attempts.) 77*b8851fccSafresh1 78*b8851fccSafresh1Important Compatibility Warning 79*b8851fccSafresh1 80*b8851fccSafresh1 The sdbm and ndbm libraries cannot share databases: one 81*b8851fccSafresh1cannot read the (dir/pag) database created by the other. 82*b8851fccSafresh1This is due to the differences between the ndbm and sdbm 83*b8851fccSafresh1algorithms[2], and the hash functions used. It is easy to 84*b8851fccSafresh1convert between the dbm/ndbm databases and sdbm by ignoring 85*b8851fccSafresh1the index completely: see dbd, dbu etc. 86*b8851fccSafresh1 87*b8851fccSafresh1 88*b8851fccSafresh1Notice of Intellectual Property 89*b8851fccSafresh1 90*b8851fccSafresh1The entire sdbm library package, as authored by me, Ozan S. 91*b8851fccSafresh1Yigit, is hereby placed in the public domain. As such, the 92*b8851fccSafresh1author is not responsible for the consequences of use of 93*b8851fccSafresh1this software, no matter how awful, even if they arise from 94*b8851fccSafresh1defects in it. There is no expressed or implied warranty for 95*b8851fccSafresh1the sdbm library. 96*b8851fccSafresh1 97*b8851fccSafresh1 Since the sdbm library package is in the public domain, 98*b8851fccSafresh1this original release or any additional public-domain 99*b8851fccSafresh1releases of the modified original cannot possibly (by defin- 100*b8851fccSafresh1ition) be withheld from you. Also by definition, You (singu- 101*b8851fccSafresh1lar) have all the rights to this code (including the right 102*b8851fccSafresh1to sell without permission, the right to hoard[3] and the 103*b8851fccSafresh1right to do other icky things as you see fit) but those 104*b8851fccSafresh1rights are also granted to everyone else. 105*b8851fccSafresh1 106*b8851fccSafresh1 Please note that all previous distributions of this 107*b8851fccSafresh1software contained a copyright (which is now dropped) to 108*b8851fccSafresh1protect its origins and its current public domain status 109*b8851fccSafresh1against any possible claims and/or challenges. 110*b8851fccSafresh1 111*b8851fccSafresh1Acknowledgments 112*b8851fccSafresh1 113*b8851fccSafresh1 Many people have been very helpful and supportive. A 114*b8851fccSafresh1partial list would necessarily include Rayan Zacherissen 115*b8851fccSafresh1(who contributed the man page, and also hacked a MMAP 116*b8851fccSafresh1_________________________ 117*b8851fccSafresh1 118*b8851fccSafresh1 [2] Torek's discussion [Tor87] indicates that 119*b8851fccSafresh1dbm/ndbm implementations use the hash value to traverse 120*b8851fccSafresh1the radix trie differently than sdbm and as a result, 121*b8851fccSafresh1the page indexes are generated in different order. For 122*b8851fccSafresh1more information, send e-mail to the author. 123*b8851fccSafresh1 [3] You cannot really hoard something that is avail- 124*b8851fccSafresh1able to the public at large, but try if it makes you 125*b8851fccSafresh1feel any better. 126*b8851fccSafresh1 127*b8851fccSafresh1 128*b8851fccSafresh1 129*b8851fccSafresh1 130*b8851fccSafresh1 131*b8851fccSafresh1 132*b8851fccSafresh1 133*b8851fccSafresh1 134*b8851fccSafresh1 135*b8851fccSafresh1 136*b8851fccSafresh1 - 3 - 137*b8851fccSafresh1 138*b8851fccSafresh1 139*b8851fccSafresh1version of sdbm), Arnold Robbins, Chris Lewis, Bill David- 140*b8851fccSafresh1sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me 141*b8851fccSafresh1started in the first place), Johannes Ruschein (who did the 142*b8851fccSafresh1minix port) and David Tilbrook. I thank you all. 143*b8851fccSafresh1 144*b8851fccSafresh1Distribution Manifest and Notes 145*b8851fccSafresh1 146*b8851fccSafresh1This distribution of sdbm includes (at least) the following: 147*b8851fccSafresh1 148*b8851fccSafresh1 CHANGES change log 149*b8851fccSafresh1 README this file. 150*b8851fccSafresh1 biblio a small bibliography on external hashing 151*b8851fccSafresh1 dba.c a crude (n/s)dbm page file analyzer 152*b8851fccSafresh1 dbd.c a crude (n/s)dbm page file dumper (for conversion) 153*b8851fccSafresh1 dbe.1 man page for dbe.c 154*b8851fccSafresh1 dbe.c Janick's database editor 155*b8851fccSafresh1 dbm.c a dbm library emulation wrapper for ndbm/sdbm 156*b8851fccSafresh1 dbm.h header file for the above 157*b8851fccSafresh1 dbu.c a crude db management utility 158*b8851fccSafresh1 hash.c hashing function 159*b8851fccSafresh1 makefile guess. 160*b8851fccSafresh1 pair.c page-level routines (posted earlier) 161*b8851fccSafresh1 pair.h header file for the above 162*b8851fccSafresh1 readme.ms troff source for the README file 163*b8851fccSafresh1 sdbm.3 man page 164*b8851fccSafresh1 sdbm.c the real thing 165*b8851fccSafresh1 sdbm.h header file for the above 166*b8851fccSafresh1 tune.h place for tuning & portability thingies 167*b8851fccSafresh1 util.c miscellaneous 168*b8851fccSafresh1 169*b8851fccSafresh1 dbu is a simple database manipulation program[4] that 170*b8851fccSafresh1tries to look like Bell Labs' cbt utility. It is currently 171*b8851fccSafresh1incomplete in functionality. I use dbu to test out the rou- 172*b8851fccSafresh1tines: it takes (from stdin) tab separated key/value pairs 173*b8851fccSafresh1for commands like build or insert or takes keys for commands 174*b8851fccSafresh1like delete or look. 175*b8851fccSafresh1 176*b8851fccSafresh1 dbu <build|creat|look|insert|cat|delete> dbmfile 177*b8851fccSafresh1 178*b8851fccSafresh1 dba is a crude analyzer of dbm/sdbm/ndbm page files. It 179*b8851fccSafresh1scans the entire page file, reporting page level statistics, 180*b8851fccSafresh1and totals at the end. 181*b8851fccSafresh1 182*b8851fccSafresh1 dbd is a crude dump program for dbm/ndbm/sdbm data- 183*b8851fccSafresh1bases. It ignores the bitmap, and dumps the data pages in 184*b8851fccSafresh1sequence. It can be used to create input for the dbu util- 185*b8851fccSafresh1ity. Note that dbd will skip any NULLs in the key and data 186*b8851fccSafresh1fields, thus is unsuitable to convert some peculiar 187*b8851fccSafresh1_________________________ 188*b8851fccSafresh1 189*b8851fccSafresh1 [4] The dbd, dba, dbu utilities are quick hacks and 190*b8851fccSafresh1are not fit for production use. They were developed 191*b8851fccSafresh1late one night, just to test out sdbm, and convert some 192*b8851fccSafresh1databases. 193*b8851fccSafresh1 194*b8851fccSafresh1 195*b8851fccSafresh1 196*b8851fccSafresh1 197*b8851fccSafresh1 198*b8851fccSafresh1 199*b8851fccSafresh1 200*b8851fccSafresh1 201*b8851fccSafresh1 202*b8851fccSafresh1 - 4 - 203*b8851fccSafresh1 204*b8851fccSafresh1 205*b8851fccSafresh1databases that insist in including the terminating null. 206*b8851fccSafresh1 207*b8851fccSafresh1 I have also included a copy of the dbe (ndbm DataBase 208*b8851fccSafresh1Editor) by Janick Bergeron [janick@bnr.ca] for your pleas- 209*b8851fccSafresh1ure. You may find it more useful than the little dbu util- 210*b8851fccSafresh1ity. 211*b8851fccSafresh1 212*b8851fccSafresh1 dbm.[ch] is a dbm library emulation on top of ndbm (and 213*b8851fccSafresh1hence suitable for sdbm). Written by Robert Elz. 214*b8851fccSafresh1 215*b8851fccSafresh1 The sdbm library has been around in beta test for quite 216*b8851fccSafresh1a long time, and from whatever little feedback I received 217*b8851fccSafresh1(maybe no news is good news), I believe it has been func- 218*b8851fccSafresh1tioning without any significant problems. I would, of 219*b8851fccSafresh1course, appreciate all fixes and/or improvements. Portabil- 220*b8851fccSafresh1ity enhancements would especially be useful. 221*b8851fccSafresh1 222*b8851fccSafresh1Implementation Issues 223*b8851fccSafresh1 224*b8851fccSafresh1 Hash functions: The algorithm behind sdbm implementa- 225*b8851fccSafresh1tion needs a good bit-scrambling hash function to be effec- 226*b8851fccSafresh1tive. I ran into a set of constants for a simple hash func- 227*b8851fccSafresh1tion that seem to help sdbm perform better than ndbm for 228*b8851fccSafresh1various inputs: 229*b8851fccSafresh1 230*b8851fccSafresh1 /* 231*b8851fccSafresh1 * polynomial conversion ignoring overflows 232*b8851fccSafresh1 * 65599 nice. 65587 even better. 233*b8851fccSafresh1 */ 234*b8851fccSafresh1 long 235*b8851fccSafresh1 dbm_hash(char *str, int len) { 236*b8851fccSafresh1 unsigned long n = 0; 237*b8851fccSafresh1 238*b8851fccSafresh1 while (len--) 239*b8851fccSafresh1 n = n * 65599 + *str++; 240*b8851fccSafresh1 return n; 241*b8851fccSafresh1 } 242*b8851fccSafresh1 243*b8851fccSafresh1 There may be better hash functions for the purposes of 244*b8851fccSafresh1dynamic hashing. Try your favorite, and check the pagefile. 245*b8851fccSafresh1If it contains too many pages with too many holes, (in rela- 246*b8851fccSafresh1tion to this one for example) or if sdbm simply stops work- 247*b8851fccSafresh1ing (fails after SPLTMAX attempts to split) when you feed 248*b8851fccSafresh1your NEWS history file to it, you probably do not have a 249*b8851fccSafresh1good hashing function. If you do better (for different 250*b8851fccSafresh1types of input), I would like to know about the function you 251*b8851fccSafresh1use. 252*b8851fccSafresh1 253*b8851fccSafresh1 Block sizes: It seems (from various tests on a few 254*b8851fccSafresh1machines) that a page file block size PBLKSIZ of 1024 is by 255*b8851fccSafresh1far the best for performance, but this also happens to limit 256*b8851fccSafresh1the size of a key/value pair. Depending on your needs, you 257*b8851fccSafresh1may wish to increase the page size, and also adjust PAIRMAX 258*b8851fccSafresh1(the maximum size of a key/value pair allowed: should always 259*b8851fccSafresh1 260*b8851fccSafresh1 261*b8851fccSafresh1 262*b8851fccSafresh1 263*b8851fccSafresh1 264*b8851fccSafresh1 265*b8851fccSafresh1 266*b8851fccSafresh1 267*b8851fccSafresh1 268*b8851fccSafresh1 - 5 - 269*b8851fccSafresh1 270*b8851fccSafresh1 271*b8851fccSafresh1be at least three words smaller than PBLKSIZ.) accordingly. 272*b8851fccSafresh1The system-wide version of the library should probably be 273*b8851fccSafresh1configured with 1024 (distribution default), as this appears 274*b8851fccSafresh1to be sufficient for most common uses of sdbm. 275*b8851fccSafresh1 276*b8851fccSafresh1Portability 277*b8851fccSafresh1 278*b8851fccSafresh1 This package has been tested in many different UN*Xes 279*b8851fccSafresh1even including minix, and appears to be reasonably portable. 280*b8851fccSafresh1This does not mean it will port easily to non-UN*X systems. 281*b8851fccSafresh1 282*b8851fccSafresh1Notes and Miscellaneous 283*b8851fccSafresh1 284*b8851fccSafresh1 The sdbm is not a very complicated package, at least 285*b8851fccSafresh1not after you familiarize yourself with the literature on 286*b8851fccSafresh1external hashing. There are other interesting algorithms in 287*b8851fccSafresh1existence that ensure (approximately) single-read access to 288*b8851fccSafresh1a data value associated with any key. These are directory- 289*b8851fccSafresh1less schemes such as linear hashing [Lit80] (+ Larson varia- 290*b8851fccSafresh1tions), spiral storage [Mar79] or directory schemes such as 291*b8851fccSafresh1extensible hashing [Fag79] by Fagin et al. I do hope these 292*b8851fccSafresh1sources provide a reasonable playground for experimentation 293*b8851fccSafresh1with other algorithms. See the June 1988 issue of ACM Com- 294*b8851fccSafresh1puting Surveys [Enb88] for an excellent overview of the 295*b8851fccSafresh1field. 296*b8851fccSafresh1 297*b8851fccSafresh1References 298*b8851fccSafresh1 299*b8851fccSafresh1 300*b8851fccSafresh1[Lar78] 301*b8851fccSafresh1 P.-A. Larson, "Dynamic Hashing", BIT, vol. 18, pp. 302*b8851fccSafresh1 184-201, 1978. 303*b8851fccSafresh1 304*b8851fccSafresh1[Tho90] 305*b8851fccSafresh1 Ken Thompson, private communication, Nov. 1990 306*b8851fccSafresh1 307*b8851fccSafresh1[Lit80] 308*b8851fccSafresh1 W. Litwin, "Linear Hashing: A new tool for file and 309*b8851fccSafresh1 table addressing", Proceedings of the 6th Conference on 310*b8851fccSafresh1 Very Large Dabatases (Montreal), pp. 212-223, Very 311*b8851fccSafresh1 Large Database Foundation, Saratoga, Calif., 1980. 312*b8851fccSafresh1 313*b8851fccSafresh1[Fag79] 314*b8851fccSafresh1 R. Fagin, J. Nievergelt, N. Pippinger, and H. R. 315*b8851fccSafresh1 Strong, "Extendible Hashing - A Fast Access Method for 316*b8851fccSafresh1 Dynamic Files", ACM Trans. Database Syst., vol. 4, 317*b8851fccSafresh1 no.3, pp. 315-344, Sept. 1979. 318*b8851fccSafresh1 319*b8851fccSafresh1[Wal84] 320*b8851fccSafresh1 Rich Wales, "Discussion of 'dbm' data base system", 321*b8851fccSafresh1 USENET newsgroup unix.wizards, Jan. 1984. 322*b8851fccSafresh1 323*b8851fccSafresh1[Tor87] 324*b8851fccSafresh1 Chris Torek, "Re: dbm.a and ndbm.a archives", 325*b8851fccSafresh1 326*b8851fccSafresh1 327*b8851fccSafresh1 328*b8851fccSafresh1 329*b8851fccSafresh1 330*b8851fccSafresh1 331*b8851fccSafresh1 332*b8851fccSafresh1 333*b8851fccSafresh1 334*b8851fccSafresh1 - 6 - 335*b8851fccSafresh1 336*b8851fccSafresh1 337*b8851fccSafresh1 USENET newsgroup comp.unix, 1987. 338*b8851fccSafresh1 339*b8851fccSafresh1[Mar79] 340*b8851fccSafresh1 G. N. Martin, "Spiral Storage: Incrementally Augment- 341*b8851fccSafresh1 able Hash Addressed Storage", Technical Report #27, 342*b8851fccSafresh1 University of Varwick, Coventry, U.K., 1979. 343*b8851fccSafresh1 344*b8851fccSafresh1[Enb88] 345*b8851fccSafresh1 R. J. Enbody and H. C. Du, "Dynamic Hashing 346*b8851fccSafresh1 Schemes",ACM Computing Surveys, vol. 20, no. 2, pp. 347*b8851fccSafresh1 85-113, June 1988. 348*b8851fccSafresh1 349*b8851fccSafresh1 350*b8851fccSafresh1 351*b8851fccSafresh1 352*b8851fccSafresh1 353*b8851fccSafresh1 354*b8851fccSafresh1 355*b8851fccSafresh1 356*b8851fccSafresh1 357*b8851fccSafresh1 358*b8851fccSafresh1 359*b8851fccSafresh1 360*b8851fccSafresh1 361*b8851fccSafresh1 362*b8851fccSafresh1 363*b8851fccSafresh1 364*b8851fccSafresh1 365*b8851fccSafresh1 366*b8851fccSafresh1 367*b8851fccSafresh1 368*b8851fccSafresh1 369*b8851fccSafresh1 370*b8851fccSafresh1 371*b8851fccSafresh1 372*b8851fccSafresh1 373*b8851fccSafresh1 374*b8851fccSafresh1 375*b8851fccSafresh1 376*b8851fccSafresh1 377*b8851fccSafresh1 378*b8851fccSafresh1 379*b8851fccSafresh1 380*b8851fccSafresh1 381*b8851fccSafresh1 382*b8851fccSafresh1 383*b8851fccSafresh1 384*b8851fccSafresh1 385*b8851fccSafresh1 386*b8851fccSafresh1 387*b8851fccSafresh1 388*b8851fccSafresh1 389*b8851fccSafresh1 390*b8851fccSafresh1 391*b8851fccSafresh1 392*b8851fccSafresh1 393*b8851fccSafresh1 394*b8851fccSafresh1 395*b8851fccSafresh1 396*b8851fccSafresh1 397