Name | Date | Size | #Lines | LOC | ||
---|---|---|---|---|---|---|
.. | 11-Oct-2021 | - | ||||
t/ | H | 11-Oct-2021 | - | 211 | 175 | |
.gitignore | H A D | 11-Oct-2021 | 31 | 4 | 3 | |
CHANGES | H A D | 11-Oct-2021 | 900 | 19 | 15 | |
Makefile.PL | H A D | 11-Oct-2021 | 548 | 18 | 14 | |
README | H A D | 11-Oct-2021 | 11.2 KiB | 397 | 232 | |
README.too | H A D | 11-Oct-2021 | 436 | 15 | 8 | |
SDBM_File.pm | H A D | 11-Oct-2021 | 3.7 KiB | 140 | 12 | |
SDBM_File.xs | H A D | 11-Oct-2021 | 2.8 KiB | 143 | 122 | |
biblio | H A D | 11-Oct-2021 | 1,013 | 65 | 52 | |
dba.c | H A D | 11-Oct-2021 | 1.8 KiB | 88 | 70 | |
dbd.c | H A D | 11-Oct-2021 | 2.5 KiB | 114 | 93 | |
dbe.1 | H A D | 11-Oct-2021 | 1.4 KiB | 47 | 45 | |
dbe.c | H A D | 11-Oct-2021 | 15 KiB | 432 | 353 | |
dbu.c | H A D | 11-Oct-2021 | 6.5 KiB | 249 | 215 | |
grind | H A D | 11-Oct-2021 | 201 | 10 | 7 | |
hash.c | H A D | 11-Oct-2021 | 1.2 KiB | 48 | 26 | |
pair.c | H A D | 11-Oct-2021 | 7.5 KiB | 314 | 191 | |
pair.h | H A D | 11-Oct-2021 | 747 | 28 | 23 | |
readme.ms | H A D | 11-Oct-2021 | 11.4 KiB | 354 | 338 | |
sdbm.3 | H A D | 11-Oct-2021 | 9 KiB | 296 | 294 | |
sdbm.c | H A D | 11-Oct-2021 | 14.9 KiB | 566 | 364 | |
sdbm.h | H A D | 11-Oct-2021 | 5 KiB | 206 | 135 | |
tune.h | H A D | 11-Oct-2021 | 469 | 24 | 8 | |
typemap | H A D | 11-Oct-2021 | 1,016 | 47 | 44 | |
util.c | H A D | 11-Oct-2021 | 965 | 48 | 40 |
README
1 2 3 4 5 6 7 sdbm - Substitute DBM 8 or 9 Berkeley ndbm for Every UN*X[1] Made Simple 10 11 Ozan (oz) Yigit 12 13 The Guild of PD Software Toolmakers 14 Toronto - Canada 15 16 oz@nexus.yorku.ca 17 18 19 20Implementation is the sincerest form of flattery. - L. Peter 21Deutsch 22 23A The Clone of the ndbm library 24 25 The sources accompanying this notice - sdbm - consti- 26tute the first public release (Dec. 1990) of a complete 27clone of the Berkeley UN*X ndbm library. The sdbm library is 28meant to clone the proven functionality of ndbm as closely 29as possible, including a few improvements. It is practical, 30easy to understand, and compatible. The sdbm library is not 31derived from any licensed, proprietary or copyrighted 32software. 33 34 The sdbm implementation is based on a 1978 algorithm 35[Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". 36In the course of searching for a substitute for ndbm, I pro- 37totyped three different external-hashing algorithms [Lar78, 38Fag79, Lit80] and ultimately chose Larson's algorithm as a 39basis of the sdbm implementation. The Bell Labs dbm (and 40therefore ndbm) is based on an algorithm invented by Ken 41Thompson, [Tho90, Tor87] and predates Larson's work. 42 43 The sdbm programming interface is totally compatible 44with ndbm and includes a slight improvement in database ini- 45tialization. It is also expected to be binary-compatible 46under most UN*X versions that support the ndbm library. 47 48 The sdbm implementation shares the shortcomings of the 49ndbm library, as a side effect of various simplifications to 50the original Larson algorithm. It does produce holes in the 51page file as it writes pages past the end of file. (Larson's 52paper include a clever solution to this problem that is a 53result of using the hash value directly as a block address.) 54On the other hand, extensive tests seem to indicate that 55sdbm creates fewer holes in general, and the resulting page- 56files are smaller. The sdbm implementation is also faster 57than ndbm in database creation. Unlike the ndbm, the sdbm 58_________________________ 59 60 [1] UN*X is not a trademark of any (dis)organization. 61 62 63 64 65 66 67 68 69 70 - 2 - 71 72 73store operation will not "wander away" trying to split its 74data pages to insert a datum that cannot (due to elaborate 75worst-case situations) be inserted. (It will fail after a 76pre-defined number of attempts.) 77 78Important Compatibility Warning 79 80 The sdbm and ndbm libraries cannot share databases: one 81cannot read the (dir/pag) database created by the other. 82This is due to the differences between the ndbm and sdbm 83algorithms[2], and the hash functions used. It is easy to 84convert between the dbm/ndbm databases and sdbm by ignoring 85the index completely: see dbd, dbu etc. 86 87 88Notice of Intellectual Property 89 90The entire sdbm library package, as authored by me, Ozan S. 91Yigit, is hereby placed in the public domain. As such, the 92author is not responsible for the consequences of use of 93this software, no matter how awful, even if they arise from 94defects in it. There is no expressed or implied warranty for 95the sdbm library. 96 97 Since the sdbm library package is in the public domain, 98this original release or any additional public-domain 99releases of the modified original cannot possibly (by defin- 100ition) be withheld from you. Also by definition, You (singu- 101lar) have all the rights to this code (including the right 102to sell without permission, the right to hoard[3] and the 103right to do other icky things as you see fit) but those 104rights are also granted to everyone else. 105 106 Please note that all previous distributions of this 107software contained a copyright (which is now dropped) to 108protect its origins and its current public domain status 109against any possible claims and/or challenges. 110 111Acknowledgments 112 113 Many people have been very helpful and supportive. A 114partial list would necessarily include Rayan Zacherissen 115(who contributed the man page, and also hacked a MMAP 116_________________________ 117 118 [2] Torek's discussion [Tor87] indicates that 119dbm/ndbm implementations use the hash value to traverse 120the radix trie differently than sdbm and as a result, 121the page indexes are generated in different order. For 122more information, send e-mail to the author. 123 [3] You cannot really hoard something that is avail- 124able to the public at large, but try if it makes you 125feel any better. 126 127 128 129 130 131 132 133 134 135 136 - 3 - 137 138 139version of sdbm), Arnold Robbins, Chris Lewis, Bill David- 140sen, Henry Spencer, Geoff Collyer, Rich Salz (who got me 141started in the first place), Johannes Ruschein (who did the 142minix port) and David Tilbrook. I thank you all. 143 144Distribution Manifest and Notes 145 146This distribution of sdbm includes (at least) the following: 147 148 CHANGES change log 149 README this file. 150 biblio a small bibliography on external hashing 151 dba.c a crude (n/s)dbm page file analyzer 152 dbd.c a crude (n/s)dbm page file dumper (for conversion) 153 dbe.1 man page for dbe.c 154 dbe.c Janick's database editor 155 dbm.c a dbm library emulation wrapper for ndbm/sdbm 156 dbm.h header file for the above 157 dbu.c a crude db management utility 158 hash.c hashing function 159 makefile guess. 160 pair.c page-level routines (posted earlier) 161 pair.h header file for the above 162 readme.ms troff source for the README file 163 sdbm.3 man page 164 sdbm.c the real thing 165 sdbm.h header file for the above 166 tune.h place for tuning & portability thingies 167 util.c miscellaneous 168 169 dbu is a simple database manipulation program[4] that 170tries to look like Bell Labs' cbt utility. It is currently 171incomplete in functionality. I use dbu to test out the rou- 172tines: it takes (from stdin) tab separated key/value pairs 173for commands like build or insert or takes keys for commands 174like delete or look. 175 176 dbu <build|creat|look|insert|cat|delete> dbmfile 177 178 dba is a crude analyzer of dbm/sdbm/ndbm page files. It 179scans the entire page file, reporting page level statistics, 180and totals at the end. 181 182 dbd is a crude dump program for dbm/ndbm/sdbm data- 183bases. It ignores the bitmap, and dumps the data pages in 184sequence. It can be used to create input for the dbu util- 185ity. Note that dbd will skip any NULLs in the key and data 186fields, thus is unsuitable to convert some peculiar 187_________________________ 188 189 [4] The dbd, dba, dbu utilities are quick hacks and 190are not fit for production use. They were developed 191late one night, just to test out sdbm, and convert some 192databases. 193 194 195 196 197 198 199 200 201 202 - 4 - 203 204 205databases that insist in including the terminating null. 206 207 I have also included a copy of the dbe (ndbm DataBase 208Editor) by Janick Bergeron [janick@bnr.ca] for your pleas- 209ure. You may find it more useful than the little dbu util- 210ity. 211 212 dbm.[ch] is a dbm library emulation on top of ndbm (and 213hence suitable for sdbm). Written by Robert Elz. 214 215 The sdbm library has been around in beta test for quite 216a long time, and from whatever little feedback I received 217(maybe no news is good news), I believe it has been func- 218tioning without any significant problems. I would, of 219course, appreciate all fixes and/or improvements. Portabil- 220ity enhancements would especially be useful. 221 222Implementation Issues 223 224 Hash functions: The algorithm behind sdbm implementa- 225tion needs a good bit-scrambling hash function to be effec- 226tive. I ran into a set of constants for a simple hash func- 227tion that seem to help sdbm perform better than ndbm for 228various inputs: 229 230 /* 231 * polynomial conversion ignoring overflows 232 * 65599 nice. 65587 even better. 233 */ 234 long 235 dbm_hash(char *str, int len) { 236 unsigned long n = 0; 237 238 while (len--) 239 n = n * 65599 + *str++; 240 return n; 241 } 242 243 There may be better hash functions for the purposes of 244dynamic hashing. Try your favorite, and check the pagefile. 245If it contains too many pages with too many holes, (in rela- 246tion to this one for example) or if sdbm simply stops work- 247ing (fails after SPLTMAX attempts to split) when you feed 248your NEWS history file to it, you probably do not have a 249good hashing function. If you do better (for different 250types of input), I would like to know about the function you 251use. 252 253 Block sizes: It seems (from various tests on a few 254machines) that a page file block size PBLKSIZ of 1024 is by 255far the best for performance, but this also happens to limit 256the size of a key/value pair. Depending on your needs, you 257may wish to increase the page size, and also adjust PAIRMAX 258(the maximum size of a key/value pair allowed: should always 259 260 261 262 263 264 265 266 267 268 - 5 - 269 270 271be at least three words smaller than PBLKSIZ.) accordingly. 272The system-wide version of the library should probably be 273configured with 1024 (distribution default), as this appears 274to be sufficient for most common uses of sdbm. 275 276Portability 277 278 This package has been tested in many different UN*Xes 279even including minix, and appears to be reasonably portable. 280This does not mean it will port easily to non-UN*X systems. 281 282Notes and Miscellaneous 283 284 The sdbm is not a very complicated package, at least 285not after you familiarize yourself with the literature on 286external hashing. There are other interesting algorithms in 287existence that ensure (approximately) single-read access to 288a data value associated with any key. These are directory- 289less schemes such as linear hashing [Lit80] (+ Larson varia- 290tions), spiral storage [Mar79] or directory schemes such as 291extensible hashing [Fag79] by Fagin et al. I do hope these 292sources provide a reasonable playground for experimentation 293with other algorithms. See the June 1988 issue of ACM Com- 294puting Surveys [Enb88] for an excellent overview of the 295field. 296 297References 298 299 300[Lar78] 301 P.-A. Larson, "Dynamic Hashing", BIT, vol. 18, pp. 302 184-201, 1978. 303 304[Tho90] 305 Ken Thompson, private communication, Nov. 1990 306 307[Lit80] 308 W. Litwin, "Linear Hashing: A new tool for file and 309 table addressing", Proceedings of the 6th Conference on 310 Very Large Dabatases (Montreal), pp. 212-223, Very 311 Large Database Foundation, Saratoga, Calif., 1980. 312 313[Fag79] 314 R. Fagin, J. Nievergelt, N. Pippinger, and H. R. 315 Strong, "Extendible Hashing - A Fast Access Method for 316 Dynamic Files", ACM Trans. Database Syst., vol. 4, 317 no.3, pp. 315-344, Sept. 1979. 318 319[Wal84] 320 Rich Wales, "Discussion of 'dbm' data base system", 321 USENET newsgroup unix.wizards, Jan. 1984. 322 323[Tor87] 324 Chris Torek, "Re: dbm.a and ndbm.a archives", 325 326 327 328 329 330 331 332 333 334 - 6 - 335 336 337 USENET newsgroup comp.unix, 1987. 338 339[Mar79] 340 G. N. Martin, "Spiral Storage: Incrementally Augment- 341 able Hash Addressed Storage", Technical Report #27, 342 University of Varwick, Coventry, U.K., 1979. 343 344[Enb88] 345 R. J. Enbody and H. C. Du, "Dynamic Hashing 346 Schemes",ACM Computing Surveys, vol. 20, no. 2, pp. 347 85-113, June 1988. 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397
README.too
1This version of sdbm merely has all the dbm_* names translated to sdbm_* 2so that we can link ndbm and sdbm into the same executable. (It also has 3the bad() macro redefined to allow a zero-length key.) 4 5 6Fri Apr 15 10:15:30 EDT 1994. 7 8Additional portability/configuration changes for libsdbm by Andy Dougherty 9doughera@lafayette.edu. 10 11 12Mon Mar 22 03:24:47 PST 1999. 13 14sdbm_exists added to the library by Russ Allbery <rra@stanford.edu>. 15
readme.ms
note the "C" (courier) and "CB" fonts: you will probably have to
change these.
$Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $
.nr dT 4
.nr t \\n(dT*\\w'x'u .... CW uses the typewriter/courier font.
\\$1\\\$2 .. Footnote numbering [by Henry Spencer]
<text>\*f for a footnote number..
.FS
\*F <footnote text>
.FE
.nr f 0 1 .nr F 0 1 .ND
sdbm \(em Substitute DBM
or
Berkeley ndbm for Every UN*X\** Made Simple .AU Ozan (oz) Yigit .AI The Guild of PD Software Toolmakers Toronto - Canada oz@nexus.yorku.ca
.FS UN*X is not a trademark of any (dis)organization. .FE Implementation is the sincerest form of flattery. \(em L. Peter Deutsch
A The Clone of the ndbm libraryThe sources accompanying this notice \(em sdbm \(em constitute the first public release (Dec. 1990) of a complete clone of the Berkeley UN*X ndbm library. The sdbm library is meant to clone the proven functionality of ndbm as closely as possible, including a few improvements. It is practical, easy to understand, and compatible. The sdbm library is not derived from any licensed, proprietary or copyrighted software.
The sdbm implementation is based on a 1978 algorithm [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". In the course of searching for a substitute for ndbm, I prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] and ultimately chose Larson's algorithm as a basis of the sdbm implementation. The Bell Labs dbm (and therefore ndbm) is based on an algorithm invented by Ken Thompson, [Tho90, Tor87] and predates Larson's work.
The sdbm programming interface is totally compatible with ndbm and includes a slight improvement in database initialization. It is also expected to be binary-compatible under most UN*X versions that support the ndbm library.
The sdbm implementation shares the shortcomings of the ndbm library, as a side effect of various simplifications to the original Larson algorithm. It does produce holes in the page file as it writes pages past the end of file. (Larson's paper include a clever solution to this problem that is a result of using the hash value directly as a block address.) On the other hand, extensive tests seem to indicate that sdbm creates fewer holes in general, and the resulting pagefiles are smaller. The sdbm implementation is also faster than ndbm in database creation. Unlike the ndbm, the sdbm store operation will not "wander away" trying to split its data pages to insert a datum that cannot (due to elaborate worst-case situations) be inserted. (It will fail after a pre-defined number of attempts.)
Important Compatibility WarningThe sdbm and ndbm libraries cannot share databases: one cannot read the (dir/pag) database created by the other. This is due to the differences between the ndbm and sdbm algorithms\**, .FS Torek's discussion [Tor87] indicates that dbm/ndbm implementations use the hash value to traverse the radix trie differently than sdbm and as a result, the page indexes are generated in different order. For more information, send e-mail to the author. .FE and the hash functions used. It is easy to convert between the dbm/ndbm databases and sdbm by ignoring the index completely: see dbd , dbu etc. .R
Notice of Intellectual Property
The entire sdbm library package, as authored by me, Ozan S. Yigit, is hereby placed in the public domain. As such, the author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from defects in it. There is no expressed or implied warranty for the sdbm library.
Since the sdbm library package is in the public domain, this original release or any additional public-domain releases of the modified original cannot possibly (by definition) be withheld from you. Also by definition, You (singular) have all the rights to this code (including the right to sell without permission, the right to hoard\** .FS You cannot really hoard something that is available to the public at large, but try if it makes you feel any better. .FE and the right to do other icky things as you see fit) but those rights are also granted to everyone else.
Please note that all previous distributions of this software contained a copyright (which is now dropped) to protect its origins and its current public domain status against any possible claims and/or challenges.
AcknowledgmentsMany people have been very helpful and supportive. A partial list would necessarily include Rayan Zacherissen (who contributed the man page, and also hacked a MMAP version of sdbm), Arnold Robbins, Chris Lewis, Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started in the first place), Johannes Ruschein (who did the minix port) and David Tilbrook. I thank you all.
Distribution Manifest and NotesThis distribution of sdbm includes (at least) the following:
1 CHANGES change log README this file. biblio a small bibliography on external hashing dba.c a crude (n/s)dbm page file analyzer dbd.c a crude (n/s)dbm page file dumper (for conversion) dbe.1 man page for dbe.c dbe.c Janick's database editor dbm.c a dbm library emulation wrapper for ndbm/sdbm dbm.h header file for the above dbu.c a crude db management utility hash.c hashing function makefile guess. pair.c page-level routines (posted earlier) pair.h header file for the above readme.ms troff source for the README file sdbm.3 man page sdbm.c the real thing sdbm.h header file for the above tune.h place for tuning & portability thingies util.c miscellaneous
2
dbu is a simple database manipulation program\** that tries to look .FS The dbd , dba , dbu utilities are quick hacks and are not fit for production use. They were developed late one night, just to test out sdbm, and convert some databases. .FE like Bell Labs' cbt utility. It is currently incomplete in functionality. I use dbu to test out the routines: it takes (from stdin) tab separated key/value pairs for commands like build or insert or takes keys for commands like delete or look .
1 dbu <build|creat|look|insert|cat|delete> dbmfile
2
dba is a crude analyzer of dbm/sdbm/ndbm page files. It scans the entire page file, reporting page level statistics, and totals at the end.
dbd is a crude dump program for dbm/ndbm/sdbm databases. It ignores the bitmap, and dumps the data pages in sequence. It can be used to create input for the dbu utility. Note that dbd will skip any NULLs in the key and data fields, thus is unsuitable to convert some peculiar databases that insist in including the terminating null.
I have also included a copy of the dbe (ndbm DataBase Editor) by Janick Bergeron [janick@bnr.ca] for your pleasure. You may find it more useful than the little dbu utility.
dbm.[ch] is a dbm library emulation on top of ndbm (and hence suitable for sdbm). Written by Robert Elz.
The sdbm library has been around in beta test for quite a long time, and from whatever little feedback I received (maybe no news is good news), I believe it has been functioning without any significant problems. I would, of course, appreciate all fixes and/or improvements. Portability enhancements would especially be useful.
Implementation IssuesHash functions: The algorithm behind sdbm implementation needs a good bit-scrambling hash function to be effective. I ran into a set of constants for a simple hash function that seem to help sdbm perform better than ndbm for various inputs:
1 /* * polynomial conversion ignoring overflows * 65599 nice. 65587 even better. */ long dbm_hash(char *str, int len) { unsigned long n = 0; while (len--) n = n * 65599 + *str++; return n; }
2
There may be better hash functions for the purposes of dynamic hashing. Try your favorite, and check the pagefile. If it contains too many pages with too many holes, (in relation to this one for example) or if sdbm simply stops working (fails after SPLTMAX attempts to split) when you feed your NEWS history file to it, you probably do not have a good hashing function. If you do better (for different types of input), I would like to know about the function you use.
Block sizes: It seems (from various tests on a few machines) that a page file block size PBLKSIZ of 1024 is by far the best for performance, but this also happens to limit the size of a key/value pair. Depending on your needs, you may wish to increase the page size, and also adjust PAIRMAX (the maximum size of a key/value pair allowed: should always be at least three words smaller than PBLKSIZ .) accordingly. The system-wide version of the library should probably be configured with 1024 (distribution default), as this appears to be sufficient for most common uses of sdbm.
PortabilityThis package has been tested in many different UN*Xes even including minix, and appears to be reasonably portable. This does not mean it will port easily to non-UN*X systems.
Notes and MiscellaneousThe sdbm is not a very complicated package, at least not after you familiarize yourself with the literature on external hashing. There are other interesting algorithms in existence that ensure (approximately) single-read access to a data value associated with any key. These are directory-less schemes such as linear hashing [Lit80] (+ Larson variations), spiral storage [Mar79] or directory schemes such as extensible hashing [Fag79] by Fagin et al. I do hope these sources provide a reasonable playground for experimentation with other algorithms. See the June 1988 issue of ACM Computing Surveys [Enb88] for an excellent overview of the field.
G
References