• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..11-Oct-2021-

t/H11-Oct-2021-211175

.gitignoreH A D11-Oct-202131 43

CHANGESH A D11-Oct-2021900 1915

Makefile.PLH A D11-Oct-2021548 1814

READMEH A D11-Oct-202111.2 KiB397232

README.tooH A D11-Oct-2021436 158

SDBM_File.pmH A D11-Oct-20213.7 KiB14012

SDBM_File.xsH A D11-Oct-20212.8 KiB143122

biblioH A D11-Oct-20211,013 6552

dba.cH A D11-Oct-20211.8 KiB8870

dbd.cH A D11-Oct-20212.5 KiB11493

dbe.1H A D11-Oct-20211.4 KiB4745

dbe.cH A D11-Oct-202115 KiB432353

dbu.cH A D11-Oct-20216.5 KiB249215

grindH A D11-Oct-2021201 107

hash.cH A D11-Oct-20211.2 KiB4826

pair.cH A D11-Oct-20217.5 KiB314191

pair.hH A D11-Oct-2021747 2823

readme.msH A D11-Oct-202111.4 KiB354338

sdbm.3H A D11-Oct-20219 KiB296294

sdbm.cH A D11-Oct-202114.9 KiB566364

sdbm.hH A D11-Oct-20215 KiB206135

tune.hH A D11-Oct-2021469 248

typemapH A D11-Oct-20211,016 4744

util.cH A D11-Oct-2021965 4840

README

1
2
3
4
5
6
7                   sdbm - Substitute DBM
8                             or
9        Berkeley ndbm for Every UN*X[1] Made Simple
10
11                      Ozan (oz) Yigit
12
13            The Guild of PD Software Toolmakers
14                      Toronto - Canada
15
16                     oz@nexus.yorku.ca
17
18
19
20Implementation is the sincerest form of flattery. - L. Peter
21Deutsch
22
23A The Clone of the ndbm library
24
25     The sources accompanying this notice - sdbm  -  consti-
26tute  the  first  public  release  (Dec. 1990) of a complete
27clone of the Berkeley UN*X ndbm library. The sdbm library is
28meant  to  clone the proven functionality of ndbm as closely
29as possible, including a few improvements. It is  practical,
30easy to understand, and compatible.  The sdbm library is not
31derived  from  any  licensed,  proprietary  or   copyrighted
32software.
33
34     The sdbm implementation is based on  a  1978  algorithm
35[Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing".
36In the course of searching for a substitute for ndbm, I pro-
37totyped  three different external-hashing algorithms [Lar78,
38Fag79, Lit80] and ultimately chose Larson's algorithm  as  a
39basis  of  the  sdbm  implementation. The Bell Labs dbm (and
40therefore ndbm) is based on an  algorithm  invented  by  Ken
41Thompson, [Tho90, Tor87] and predates Larson's work.
42
43     The sdbm programming interface  is  totally  compatible
44with ndbm and includes a slight improvement in database ini-
45tialization.  It is also expected  to  be  binary-compatible
46under most UN*X versions that support the ndbm library.
47
48     The sdbm implementation shares the shortcomings of  the
49ndbm library, as a side effect of various simplifications to
50the original Larson algorithm. It does produce holes in  the
51page file as it writes pages past the end of file. (Larson's
52paper include a clever solution to this problem  that  is  a
53result of using the hash value directly as a block address.)
54On the other hand, extensive tests  seem  to  indicate  that
55sdbm creates fewer holes in general, and the resulting page-
56files are smaller. The sdbm implementation  is  also  faster
57than  ndbm  in database creation.  Unlike the ndbm, the sdbm
58_________________________
59
60  [1] UN*X is not a trademark of any (dis)organization.
61
62
63
64
65
66
67
68
69
70                           - 2 -
71
72
73store operation will not "wander away" trying to split its
74data  pages  to insert a datum that cannot (due to elaborate
75worst-case situations) be inserted. (It will  fail  after  a
76pre-defined number of attempts.)
77
78Important Compatibility Warning
79
80     The sdbm and ndbm libraries cannot share databases: one
81cannot  read  the  (dir/pag)  database created by the other.
82This is due to the differences between  the  ndbm  and  sdbm
83algorithms[2], and the hash functions used.  It is  easy  to
84convert  between the dbm/ndbm databases and sdbm by ignoring
85the index completely: see dbd, dbu etc.
86
87
88Notice of Intellectual Property
89
90The entire sdbm  library package, as authored by me, Ozan S.
91Yigit,  is  hereby placed in the public domain. As such, the
92author is not responsible for the  consequences  of  use  of
93this  software, no matter how awful, even if they arise from
94defects in it. There is no expressed or implied warranty for
95the sdbm library.
96
97     Since the sdbm library package is in the public domain,
98this   original  release  or  any  additional  public-domain
99releases of the modified original cannot possibly (by defin-
100ition) be withheld from you. Also by definition, You (singu-
101lar) have all the rights to this code (including  the  right
102to sell without permission, the right to  hoard[3]  and  the
103right  to  do  other  icky  things as you see fit) but those
104rights are also granted to everyone else.
105
106     Please note that all  previous  distributions  of  this
107software  contained  a  copyright  (which is now dropped) to
108protect its origins and its  current  public  domain  status
109against any possible claims and/or challenges.
110
111Acknowledgments
112
113     Many people have been very helpful and  supportive.   A
114partial  list  would  necessarily  include Rayan Zacherissen
115(who contributed the  man  page,  and  also  hacked  a  MMAP
116_________________________
117
118  [2] Torek's   discussion   [Tor87]   indicates   that
119dbm/ndbm implementations use the hash value to traverse
120the radix trie differently than sdbm and as  a  result,
121the page indexes are generated in different order.  For
122more information, send e-mail to the author.
123  [3] You  cannot really hoard something that is avail-
124able to the public at large, but try if  it  makes  you
125feel any better.
126
127
128
129
130
131
132
133
134
135
136                           - 3 -
137
138
139version of sdbm), Arnold Robbins, Chris Lewis,  Bill  David-
140sen,  Henry  Spencer,  Geoff  Collyer, Rich Salz (who got me
141started in the first place), Johannes Ruschein (who did  the
142minix port) and David Tilbrook. I thank you all.
143
144Distribution Manifest and Notes
145
146This distribution of sdbm includes (at least) the following:
147
148    CHANGES     change log
149    README      this file.
150    biblio      a small bibliography on external hashing
151    dba.c       a crude (n/s)dbm page file analyzer
152    dbd.c       a crude (n/s)dbm page file dumper (for conversion)
153    dbe.1       man page for dbe.c
154    dbe.c       Janick's database editor
155    dbm.c       a dbm library emulation wrapper for ndbm/sdbm
156    dbm.h       header file for the above
157    dbu.c       a crude db management utility
158    hash.c      hashing function
159    makefile    guess.
160    pair.c      page-level routines (posted earlier)
161    pair.h      header file for the above
162    readme.ms   troff source for the README file
163    sdbm.3      man page
164    sdbm.c      the real thing
165    sdbm.h      header file for the above
166    tune.h      place for tuning & portability thingies
167    util.c      miscellaneous
168
169     dbu is a simple database manipulation  program[4]  that
170tries to look like Bell Labs' cbt utility. It  is  currently
171incomplete in functionality.  I use dbu to test out the rou-
172tines: it takes (from stdin) tab separated  key/value  pairs
173for commands like build or insert or takes keys for commands
174like delete or look.
175
176    dbu <build|creat|look|insert|cat|delete> dbmfile
177
178     dba is a crude analyzer of dbm/sdbm/ndbm page files. It
179scans the entire page file, reporting page level statistics,
180and totals at the end.
181
182     dbd is a crude dump  program  for  dbm/ndbm/sdbm  data-
183bases.  It  ignores  the bitmap, and dumps the data pages in
184sequence. It can be used to create input for the  dbu  util-
185ity.   Note that dbd will skip any NULLs in the key and data
186fields,  thus  is  unsuitable  to  convert   some   peculiar
187_________________________
188
189  [4] The dbd, dba, dbu utilities are quick  hacks  and
190are  not  fit  for  production use. They were developed
191late one night, just to test out sdbm, and convert some
192databases.
193
194
195
196
197
198
199
200
201
202                           - 4 -
203
204
205databases that insist in including the terminating null.
206
207     I have also included a copy of the dbe  (ndbm  DataBase
208Editor)  by  Janick Bergeron [janick@bnr.ca] for your pleas-
209ure. You may find it more useful than the little  dbu  util-
210ity.
211
212     dbm.[ch] is a dbm library emulation on top of ndbm (and
213hence suitable for sdbm). Written by Robert Elz.
214
215     The sdbm library has been around in beta test for quite
216a  long  time,  and from whatever little feedback I received
217(maybe no news is good news), I believe it  has  been  func-
218tioning  without  any  significant  problems.  I  would,  of
219course, appreciate all fixes and/or improvements.  Portabil-
220ity enhancements would especially be useful.
221
222Implementation Issues
223
224     Hash functions: The algorithm behind  sdbm  implementa-
225tion  needs a good bit-scrambling hash function to be effec-
226tive. I ran into a set of constants for a simple hash  func-
227tion  that  seem  to  help sdbm perform better than ndbm for
228various inputs:
229
230    /*
231     * polynomial conversion ignoring overflows
232     * 65599 nice. 65587 even better.
233     */
234    long
235    dbm_hash(char *str, int len) {
236        unsigned long n = 0;
237
238        while (len--)
239            n = n * 65599 + *str++;
240        return n;
241    }
242
243     There may be better hash functions for the purposes  of
244dynamic hashing.  Try your favorite, and check the pagefile.
245If it contains too many pages with too many holes, (in rela-
246tion  to this one for example) or if sdbm simply stops work-
247ing (fails after SPLTMAX attempts to split)  when  you  feed
248your  NEWS  history  file  to it, you probably do not have a
249good hashing function.  If  you  do  better  (for  different
250types of input), I would like to know about the function you
251use.
252
253     Block sizes: It seems (from  various  tests  on  a  few
254machines)  that a page file block size PBLKSIZ of 1024 is by
255far the best for performance, but this also happens to limit
256the  size  of a key/value pair. Depending on your needs, you
257may wish to increase the page size, and also adjust  PAIRMAX
258(the maximum size of a key/value pair allowed: should always
259
260
261
262
263
264
265
266
267
268                           - 5 -
269
270
271be at least three words smaller than PBLKSIZ.)  accordingly.
272The  system-wide  version  of the library should probably be
273configured with 1024 (distribution default), as this appears
274to be sufficient for most common uses of sdbm.
275
276Portability
277
278     This package has been tested in many  different  UN*Xes
279even including minix, and appears to be reasonably portable.
280This does not mean it will port easily to non-UN*X systems.
281
282Notes and Miscellaneous
283
284     The sdbm is not a very complicated  package,  at  least
285not  after  you  familiarize yourself with the literature on
286external hashing. There are other interesting algorithms  in
287existence  that ensure (approximately) single-read access to
288a data value associated with any key. These  are  directory-
289less schemes such as linear hashing [Lit80] (+ Larson varia-
290tions), spiral storage [Mar79] or directory schemes such  as
291extensible  hashing  [Fag79] by Fagin et al. I do hope these
292sources provide a reasonable playground for  experimentation
293with  other algorithms.  See the June 1988 issue of ACM Com-
294puting Surveys [Enb88] for  an  excellent  overview  of  the
295field.
296
297References
298
299
300[Lar78]
301    P.-A. Larson, "Dynamic Hashing", BIT, vol.   18,   pp.
302    184-201, 1978.
303
304[Tho90]
305    Ken Thompson, private communication, Nov. 1990
306
307[Lit80]
308    W. Litwin, "Linear Hashing: A new tool  for  file  and
309    table addressing", Proceedings of the 6th Conference on
310    Very Large  Dabatases  (Montreal), pp.   212-223,   Very
311    Large Database Foundation, Saratoga, Calif., 1980.
312
313[Fag79]
314    R. Fagin, J.  Nievergelt,  N.  Pippinger,  and   H.   R.
315    Strong,  "Extendible Hashing - A Fast Access Method for
316    Dynamic Files", ACM  Trans.  Database  Syst.,  vol.  4,
317    no.3, pp. 315-344, Sept. 1979.
318
319[Wal84]
320    Rich Wales, "Discussion of 'dbm'  data  base  system",
321    USENET newsgroup unix.wizards, Jan. 1984.
322
323[Tor87]
324    Chris Torek,  "Re:   dbm.a   and   ndbm.a   archives",
325
326
327
328
329
330
331
332
333
334                           - 6 -
335
336
337    USENET newsgroup comp.unix, 1987.
338
339[Mar79]
340    G. N. Martin, "Spiral Storage: Incrementally   Augment-
341    able   Hash  Addressed  Storage", Technical Report #27,
342    University of Varwick, Coventry, U.K., 1979.
343
344[Enb88]
345    R.  J.  Enbody  and  H.   C.   Du,   "Dynamic   Hashing
346    Schemes",ACM  Computing  Surveys,  vol.  20, no. 2, pp.
347    85-113, June 1988.
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397

README.too

1This version of sdbm merely has all the dbm_* names translated to sdbm_*
2so that we can link ndbm and sdbm into the same executable.  (It also has
3the bad() macro redefined to allow a zero-length key.)
4
5
6Fri Apr 15 10:15:30 EDT 1994.
7
8Additional portability/configuration changes for libsdbm by Andy Dougherty
9doughera@lafayette.edu.
10
11
12Mon Mar 22 03:24:47 PST 1999.
13
14sdbm_exists added to the library by Russ Allbery <rra@stanford.edu>.
15

readme.ms

tbl | readme.ms | [tn]roff -ms | ...
note the "C" (courier) and "CB" fonts: you will probably have to
change these.
$Id: readme.ms,v 1.1 90/12/13 13:09:15 oz Exp Locker: oz $

.nr dT 4

.nr t \\n(dT*\\w'x'u
..

.. CW uses the typewriter/courier font.
\\$1\\\$2 .. Footnote numbering [by Henry Spencer]
<text>\*f for a footnote number..
.FS
\*F <footnote text>
.FE

.nr f 0 1 .nr F 0 1 .ND

sdbm \(em Substitute DBM

or

Berkeley ndbm for Every UN*X\** Made Simple .AU Ozan (oz) Yigit .AI The Guild of PD Software Toolmakers Toronto - Canada oz@nexus.yorku.ca

.FS UN*X is not a trademark of any (dis)organization. .FE Implementation is the sincerest form of flattery. \(em L. Peter Deutsch

A The Clone of the ndbm library

The sources accompanying this notice \(em sdbm \(em constitute the first public release (Dec. 1990) of a complete clone of the Berkeley UN*X ndbm library. The sdbm library is meant to clone the proven functionality of ndbm as closely as possible, including a few improvements. It is practical, easy to understand, and compatible. The sdbm library is not derived from any licensed, proprietary or copyrighted software.

The sdbm implementation is based on a 1978 algorithm [Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing". In the course of searching for a substitute for ndbm, I prototyped three different external-hashing algorithms [Lar78, Fag79, Lit80] and ultimately chose Larson's algorithm as a basis of the sdbm implementation. The Bell Labs dbm (and therefore ndbm) is based on an algorithm invented by Ken Thompson, [Tho90, Tor87] and predates Larson's work.

The sdbm programming interface is totally compatible with ndbm and includes a slight improvement in database initialization. It is also expected to be binary-compatible under most UN*X versions that support the ndbm library.

The sdbm implementation shares the shortcomings of the ndbm library, as a side effect of various simplifications to the original Larson algorithm. It does produce holes in the page file as it writes pages past the end of file. (Larson's paper include a clever solution to this problem that is a result of using the hash value directly as a block address.) On the other hand, extensive tests seem to indicate that sdbm creates fewer holes in general, and the resulting pagefiles are smaller. The sdbm implementation is also faster than ndbm in database creation. Unlike the ndbm, the sdbm store operation will not "wander away" trying to split its data pages to insert a datum that cannot (due to elaborate worst-case situations) be inserted. (It will fail after a pre-defined number of attempts.)

Important Compatibility Warning

The sdbm and ndbm libraries cannot share databases: one cannot read the (dir/pag) database created by the other. This is due to the differences between the ndbm and sdbm algorithms\**, .FS Torek's discussion [Tor87] indicates that dbm/ndbm implementations use the hash value to traverse the radix trie differently than sdbm and as a result, the page indexes are generated in different order. For more information, send e-mail to the author. .FE and the hash functions used. It is easy to convert between the dbm/ndbm databases and sdbm by ignoring the index completely: see dbd , dbu etc. .R

Notice of Intellectual Property

The entire sdbm library package, as authored by me, Ozan S. Yigit, is hereby placed in the public domain. As such, the author is not responsible for the consequences of use of this software, no matter how awful, even if they arise from defects in it. There is no expressed or implied warranty for the sdbm library.

Since the sdbm library package is in the public domain, this original release or any additional public-domain releases of the modified original cannot possibly (by definition) be withheld from you. Also by definition, You (singular) have all the rights to this code (including the right to sell without permission, the right to hoard\** .FS You cannot really hoard something that is available to the public at large, but try if it makes you feel any better. .FE and the right to do other icky things as you see fit) but those rights are also granted to everyone else.

Please note that all previous distributions of this software contained a copyright (which is now dropped) to protect its origins and its current public domain status against any possible claims and/or challenges.

Acknowledgments

Many people have been very helpful and supportive. A partial list would necessarily include Rayan Zacherissen (who contributed the man page, and also hacked a MMAP version of sdbm), Arnold Robbins, Chris Lewis, Bill Davidsen, Henry Spencer, Geoff Collyer, Rich Salz (who got me started in the first place), Johannes Ruschein (who did the minix port) and David Tilbrook. I thank you all.

Distribution Manifest and Notes

This distribution of sdbm includes (at least) the following:

1 CHANGES change log README this file. biblio a small bibliography on external hashing dba.c a crude (n/s)dbm page file analyzer dbd.c a crude (n/s)dbm page file dumper (for conversion) dbe.1 man page for dbe.c dbe.c Janick's database editor dbm.c a dbm library emulation wrapper for ndbm/sdbm dbm.h header file for the above dbu.c a crude db management utility hash.c hashing function makefile guess. pair.c page-level routines (posted earlier) pair.h header file for the above readme.ms troff source for the README file sdbm.3 man page sdbm.c the real thing sdbm.h header file for the above tune.h place for tuning & portability thingies util.c miscellaneous

2

dbu is a simple database manipulation program\** that tries to look .FS The dbd , dba , dbu utilities are quick hacks and are not fit for production use. They were developed late one night, just to test out sdbm, and convert some databases. .FE like Bell Labs' cbt utility. It is currently incomplete in functionality. I use dbu to test out the routines: it takes (from stdin) tab separated key/value pairs for commands like build or insert or takes keys for commands like delete or look .

1 dbu <build|creat|look|insert|cat|delete> dbmfile

2

dba is a crude analyzer of dbm/sdbm/ndbm page files. It scans the entire page file, reporting page level statistics, and totals at the end.

dbd is a crude dump program for dbm/ndbm/sdbm databases. It ignores the bitmap, and dumps the data pages in sequence. It can be used to create input for the dbu utility. Note that dbd will skip any NULLs in the key and data fields, thus is unsuitable to convert some peculiar databases that insist in including the terminating null.

I have also included a copy of the dbe (ndbm DataBase Editor) by Janick Bergeron [janick@bnr.ca] for your pleasure. You may find it more useful than the little dbu utility.

dbm.[ch] is a dbm library emulation on top of ndbm (and hence suitable for sdbm). Written by Robert Elz.

The sdbm library has been around in beta test for quite a long time, and from whatever little feedback I received (maybe no news is good news), I believe it has been functioning without any significant problems. I would, of course, appreciate all fixes and/or improvements. Portability enhancements would especially be useful.

Implementation Issues

Hash functions: The algorithm behind sdbm implementation needs a good bit-scrambling hash function to be effective. I ran into a set of constants for a simple hash function that seem to help sdbm perform better than ndbm for various inputs:

1 /* * polynomial conversion ignoring overflows * 65599 nice. 65587 even better. */ long dbm_hash(char *str, int len) { unsigned long n = 0; while (len--) n = n * 65599 + *str++; return n; }

2

There may be better hash functions for the purposes of dynamic hashing. Try your favorite, and check the pagefile. If it contains too many pages with too many holes, (in relation to this one for example) or if sdbm simply stops working (fails after SPLTMAX attempts to split) when you feed your NEWS history file to it, you probably do not have a good hashing function. If you do better (for different types of input), I would like to know about the function you use.

Block sizes: It seems (from various tests on a few machines) that a page file block size PBLKSIZ of 1024 is by far the best for performance, but this also happens to limit the size of a key/value pair. Depending on your needs, you may wish to increase the page size, and also adjust PAIRMAX (the maximum size of a key/value pair allowed: should always be at least three words smaller than PBLKSIZ .) accordingly. The system-wide version of the library should probably be configured with 1024 (distribution default), as this appears to be sufficient for most common uses of sdbm.

Portability

This package has been tested in many different UN*Xes even including minix, and appears to be reasonably portable. This does not mean it will port easily to non-UN*X systems.

Notes and Miscellaneous

The sdbm is not a very complicated package, at least not after you familiarize yourself with the literature on external hashing. There are other interesting algorithms in existence that ensure (approximately) single-read access to a data value associated with any key. These are directory-less schemes such as linear hashing [Lit80] (+ Larson variations), spiral storage [Mar79] or directory schemes such as extensible hashing [Fag79] by Fagin et al. I do hope these sources provide a reasonable playground for experimentation with other algorithms. See the June 1988 issue of ACM Computing Surveys [Enb88] for an excellent overview of the field.

G

References

[Lar78] 4m
P.-A. Larson, "Dynamic Hashing", BIT, vol. 18, pp. 184-201, 1978.
[Tho90] 4m
Ken Thompson, private communication, Nov. 1990
[Lit80] 4m
W. Litwin, "Linear Hashing: A new tool for file and table addressing", Proceedings of the 6th Conference on Very Large Dabatases (Montreal), pp. 212-223, Very Large Database Foundation, Saratoga, Calif., 1980.
[Fag79] 4m
R. Fagin, J. Nievergelt, N. Pippinger, and H. R. Strong, "Extendible Hashing - A Fast Access Method for Dynamic Files", ACM Trans. Database Syst., vol. 4, no.3, pp. 315-344, Sept. 1979.
[Wal84] 4m
Rich Wales, "Discussion of 'dbm' data base system", USENET newsgroup unix.wizards, Jan. 1984.
[Tor87] 4m
Chris Torek, "Re: dbm.a and ndbm.a archives", USENET newsgroup comp.unix, 1987.
[Mar79] 4m
G. N. Martin, "Spiral Storage: Incrementally Augmentable Hash Addressed Storage", Technical Report #27, University of Varwick, Coventry, U.K., 1979.
[Enb88] 4m
R. J. Enbody and H. C. Du, "Dynamic Hashing Schemes",ACM Computing Surveys, vol. 20, no. 2, pp. 85-113, June 1988.