xref: /openbsd/gnu/usr.bin/perl/ext/SDBM_File/README (revision b8851fcc)
1*b8851fccSafresh1
2*b8851fccSafresh1
3*b8851fccSafresh1
4*b8851fccSafresh1
5*b8851fccSafresh1
6*b8851fccSafresh1
7*b8851fccSafresh1                   sdbm - Substitute DBM
8*b8851fccSafresh1                             or
9*b8851fccSafresh1        Berkeley ndbm for Every UN*X[1] Made Simple
10*b8851fccSafresh1
11*b8851fccSafresh1                      Ozan (oz) Yigit
12*b8851fccSafresh1
13*b8851fccSafresh1            The Guild of PD Software Toolmakers
14*b8851fccSafresh1                      Toronto - Canada
15*b8851fccSafresh1
16*b8851fccSafresh1                     oz@nexus.yorku.ca
17*b8851fccSafresh1
18*b8851fccSafresh1
19*b8851fccSafresh1
20*b8851fccSafresh1Implementation is the sincerest form of flattery. - L. Peter
21*b8851fccSafresh1Deutsch
22*b8851fccSafresh1
23*b8851fccSafresh1A The Clone of the ndbm library
24*b8851fccSafresh1
25*b8851fccSafresh1     The sources accompanying this notice - sdbm  -  consti-
26*b8851fccSafresh1tute  the  first  public  release  (Dec. 1990) of a complete
27*b8851fccSafresh1clone of the Berkeley UN*X ndbm library. The sdbm library is
28*b8851fccSafresh1meant  to  clone the proven functionality of ndbm as closely
29*b8851fccSafresh1as possible, including a few improvements. It is  practical,
30*b8851fccSafresh1easy to understand, and compatible.  The sdbm library is not
31*b8851fccSafresh1derived  from  any  licensed,  proprietary  or   copyrighted
32*b8851fccSafresh1software.
33*b8851fccSafresh1
34*b8851fccSafresh1     The sdbm implementation is based on  a  1978  algorithm
35*b8851fccSafresh1[Lar78] by P.-A. (Paul) Larson known as "Dynamic Hashing".
36*b8851fccSafresh1In the course of searching for a substitute for ndbm, I pro-
37*b8851fccSafresh1totyped  three different external-hashing algorithms [Lar78,
38*b8851fccSafresh1Fag79, Lit80] and ultimately chose Larson's algorithm  as  a
39*b8851fccSafresh1basis  of  the  sdbm  implementation. The Bell Labs dbm (and
40*b8851fccSafresh1therefore ndbm) is based on an  algorithm  invented  by  Ken
41*b8851fccSafresh1Thompson, [Tho90, Tor87] and predates Larson's work.
42*b8851fccSafresh1
43*b8851fccSafresh1     The sdbm programming interface  is  totally  compatible
44*b8851fccSafresh1with ndbm and includes a slight improvement in database ini-
45*b8851fccSafresh1tialization.  It is also expected  to  be  binary-compatible
46*b8851fccSafresh1under most UN*X versions that support the ndbm library.
47*b8851fccSafresh1
48*b8851fccSafresh1     The sdbm implementation shares the shortcomings of  the
49*b8851fccSafresh1ndbm library, as a side effect of various simplifications to
50*b8851fccSafresh1the original Larson algorithm. It does produce holes in  the
51*b8851fccSafresh1page file as it writes pages past the end of file. (Larson's
52*b8851fccSafresh1paper include a clever solution to this problem  that  is  a
53*b8851fccSafresh1result of using the hash value directly as a block address.)
54*b8851fccSafresh1On the other hand, extensive tests  seem  to  indicate  that
55*b8851fccSafresh1sdbm creates fewer holes in general, and the resulting page-
56*b8851fccSafresh1files are smaller. The sdbm implementation  is  also  faster
57*b8851fccSafresh1than  ndbm  in database creation.  Unlike the ndbm, the sdbm
58*b8851fccSafresh1_________________________
59*b8851fccSafresh1
60*b8851fccSafresh1  [1] UN*X is not a trademark of any (dis)organization.
61*b8851fccSafresh1
62*b8851fccSafresh1
63*b8851fccSafresh1
64*b8851fccSafresh1
65*b8851fccSafresh1
66*b8851fccSafresh1
67*b8851fccSafresh1
68*b8851fccSafresh1
69*b8851fccSafresh1
70*b8851fccSafresh1                           - 2 -
71*b8851fccSafresh1
72*b8851fccSafresh1
73*b8851fccSafresh1store operation will not "wander away" trying to split its
74*b8851fccSafresh1data  pages  to insert a datum that cannot (due to elaborate
75*b8851fccSafresh1worst-case situations) be inserted. (It will  fail  after  a
76*b8851fccSafresh1pre-defined number of attempts.)
77*b8851fccSafresh1
78*b8851fccSafresh1Important Compatibility Warning
79*b8851fccSafresh1
80*b8851fccSafresh1     The sdbm and ndbm libraries cannot share databases: one
81*b8851fccSafresh1cannot  read  the  (dir/pag)  database created by the other.
82*b8851fccSafresh1This is due to the differences between  the  ndbm  and  sdbm
83*b8851fccSafresh1algorithms[2], and the hash functions used.  It is  easy  to
84*b8851fccSafresh1convert  between the dbm/ndbm databases and sdbm by ignoring
85*b8851fccSafresh1the index completely: see dbd, dbu etc.
86*b8851fccSafresh1
87*b8851fccSafresh1
88*b8851fccSafresh1Notice of Intellectual Property
89*b8851fccSafresh1
90*b8851fccSafresh1The entire sdbm  library package, as authored by me, Ozan S.
91*b8851fccSafresh1Yigit,  is  hereby placed in the public domain. As such, the
92*b8851fccSafresh1author is not responsible for the  consequences  of  use  of
93*b8851fccSafresh1this  software, no matter how awful, even if they arise from
94*b8851fccSafresh1defects in it. There is no expressed or implied warranty for
95*b8851fccSafresh1the sdbm library.
96*b8851fccSafresh1
97*b8851fccSafresh1     Since the sdbm library package is in the public domain,
98*b8851fccSafresh1this   original  release  or  any  additional  public-domain
99*b8851fccSafresh1releases of the modified original cannot possibly (by defin-
100*b8851fccSafresh1ition) be withheld from you. Also by definition, You (singu-
101*b8851fccSafresh1lar) have all the rights to this code (including  the  right
102*b8851fccSafresh1to sell without permission, the right to  hoard[3]  and  the
103*b8851fccSafresh1right  to  do  other  icky  things as you see fit) but those
104*b8851fccSafresh1rights are also granted to everyone else.
105*b8851fccSafresh1
106*b8851fccSafresh1     Please note that all  previous  distributions  of  this
107*b8851fccSafresh1software  contained  a  copyright  (which is now dropped) to
108*b8851fccSafresh1protect its origins and its  current  public  domain  status
109*b8851fccSafresh1against any possible claims and/or challenges.
110*b8851fccSafresh1
111*b8851fccSafresh1Acknowledgments
112*b8851fccSafresh1
113*b8851fccSafresh1     Many people have been very helpful and  supportive.   A
114*b8851fccSafresh1partial  list  would  necessarily  include Rayan Zacherissen
115*b8851fccSafresh1(who contributed the  man  page,  and  also  hacked  a  MMAP
116*b8851fccSafresh1_________________________
117*b8851fccSafresh1
118*b8851fccSafresh1  [2] Torek's   discussion   [Tor87]   indicates   that
119*b8851fccSafresh1dbm/ndbm implementations use the hash value to traverse
120*b8851fccSafresh1the radix trie differently than sdbm and as  a  result,
121*b8851fccSafresh1the page indexes are generated in different order.  For
122*b8851fccSafresh1more information, send e-mail to the author.
123*b8851fccSafresh1  [3] You  cannot really hoard something that is avail-
124*b8851fccSafresh1able to the public at large, but try if  it  makes  you
125*b8851fccSafresh1feel any better.
126*b8851fccSafresh1
127*b8851fccSafresh1
128*b8851fccSafresh1
129*b8851fccSafresh1
130*b8851fccSafresh1
131*b8851fccSafresh1
132*b8851fccSafresh1
133*b8851fccSafresh1
134*b8851fccSafresh1
135*b8851fccSafresh1
136*b8851fccSafresh1                           - 3 -
137*b8851fccSafresh1
138*b8851fccSafresh1
139*b8851fccSafresh1version of sdbm), Arnold Robbins, Chris Lewis,  Bill  David-
140*b8851fccSafresh1sen,  Henry  Spencer,  Geoff  Collyer, Rich Salz (who got me
141*b8851fccSafresh1started in the first place), Johannes Ruschein (who did  the
142*b8851fccSafresh1minix port) and David Tilbrook. I thank you all.
143*b8851fccSafresh1
144*b8851fccSafresh1Distribution Manifest and Notes
145*b8851fccSafresh1
146*b8851fccSafresh1This distribution of sdbm includes (at least) the following:
147*b8851fccSafresh1
148*b8851fccSafresh1    CHANGES     change log
149*b8851fccSafresh1    README      this file.
150*b8851fccSafresh1    biblio      a small bibliography on external hashing
151*b8851fccSafresh1    dba.c       a crude (n/s)dbm page file analyzer
152*b8851fccSafresh1    dbd.c       a crude (n/s)dbm page file dumper (for conversion)
153*b8851fccSafresh1    dbe.1       man page for dbe.c
154*b8851fccSafresh1    dbe.c       Janick's database editor
155*b8851fccSafresh1    dbm.c       a dbm library emulation wrapper for ndbm/sdbm
156*b8851fccSafresh1    dbm.h       header file for the above
157*b8851fccSafresh1    dbu.c       a crude db management utility
158*b8851fccSafresh1    hash.c      hashing function
159*b8851fccSafresh1    makefile    guess.
160*b8851fccSafresh1    pair.c      page-level routines (posted earlier)
161*b8851fccSafresh1    pair.h      header file for the above
162*b8851fccSafresh1    readme.ms   troff source for the README file
163*b8851fccSafresh1    sdbm.3      man page
164*b8851fccSafresh1    sdbm.c      the real thing
165*b8851fccSafresh1    sdbm.h      header file for the above
166*b8851fccSafresh1    tune.h      place for tuning & portability thingies
167*b8851fccSafresh1    util.c      miscellaneous
168*b8851fccSafresh1
169*b8851fccSafresh1     dbu is a simple database manipulation  program[4]  that
170*b8851fccSafresh1tries to look like Bell Labs' cbt utility. It  is  currently
171*b8851fccSafresh1incomplete in functionality.  I use dbu to test out the rou-
172*b8851fccSafresh1tines: it takes (from stdin) tab separated  key/value  pairs
173*b8851fccSafresh1for commands like build or insert or takes keys for commands
174*b8851fccSafresh1like delete or look.
175*b8851fccSafresh1
176*b8851fccSafresh1    dbu <build|creat|look|insert|cat|delete> dbmfile
177*b8851fccSafresh1
178*b8851fccSafresh1     dba is a crude analyzer of dbm/sdbm/ndbm page files. It
179*b8851fccSafresh1scans the entire page file, reporting page level statistics,
180*b8851fccSafresh1and totals at the end.
181*b8851fccSafresh1
182*b8851fccSafresh1     dbd is a crude dump  program  for  dbm/ndbm/sdbm  data-
183*b8851fccSafresh1bases.  It  ignores  the bitmap, and dumps the data pages in
184*b8851fccSafresh1sequence. It can be used to create input for the  dbu  util-
185*b8851fccSafresh1ity.   Note that dbd will skip any NULLs in the key and data
186*b8851fccSafresh1fields,  thus  is  unsuitable  to  convert   some   peculiar
187*b8851fccSafresh1_________________________
188*b8851fccSafresh1
189*b8851fccSafresh1  [4] The dbd, dba, dbu utilities are quick  hacks  and
190*b8851fccSafresh1are  not  fit  for  production use. They were developed
191*b8851fccSafresh1late one night, just to test out sdbm, and convert some
192*b8851fccSafresh1databases.
193*b8851fccSafresh1
194*b8851fccSafresh1
195*b8851fccSafresh1
196*b8851fccSafresh1
197*b8851fccSafresh1
198*b8851fccSafresh1
199*b8851fccSafresh1
200*b8851fccSafresh1
201*b8851fccSafresh1
202*b8851fccSafresh1                           - 4 -
203*b8851fccSafresh1
204*b8851fccSafresh1
205*b8851fccSafresh1databases that insist in including the terminating null.
206*b8851fccSafresh1
207*b8851fccSafresh1     I have also included a copy of the dbe  (ndbm  DataBase
208*b8851fccSafresh1Editor)  by  Janick Bergeron [janick@bnr.ca] for your pleas-
209*b8851fccSafresh1ure. You may find it more useful than the little  dbu  util-
210*b8851fccSafresh1ity.
211*b8851fccSafresh1
212*b8851fccSafresh1     dbm.[ch] is a dbm library emulation on top of ndbm (and
213*b8851fccSafresh1hence suitable for sdbm). Written by Robert Elz.
214*b8851fccSafresh1
215*b8851fccSafresh1     The sdbm library has been around in beta test for quite
216*b8851fccSafresh1a  long  time,  and from whatever little feedback I received
217*b8851fccSafresh1(maybe no news is good news), I believe it  has  been  func-
218*b8851fccSafresh1tioning  without  any  significant  problems.  I  would,  of
219*b8851fccSafresh1course, appreciate all fixes and/or improvements.  Portabil-
220*b8851fccSafresh1ity enhancements would especially be useful.
221*b8851fccSafresh1
222*b8851fccSafresh1Implementation Issues
223*b8851fccSafresh1
224*b8851fccSafresh1     Hash functions: The algorithm behind  sdbm  implementa-
225*b8851fccSafresh1tion  needs a good bit-scrambling hash function to be effec-
226*b8851fccSafresh1tive. I ran into a set of constants for a simple hash  func-
227*b8851fccSafresh1tion  that  seem  to  help sdbm perform better than ndbm for
228*b8851fccSafresh1various inputs:
229*b8851fccSafresh1
230*b8851fccSafresh1    /*
231*b8851fccSafresh1     * polynomial conversion ignoring overflows
232*b8851fccSafresh1     * 65599 nice. 65587 even better.
233*b8851fccSafresh1     */
234*b8851fccSafresh1    long
235*b8851fccSafresh1    dbm_hash(char *str, int len) {
236*b8851fccSafresh1        unsigned long n = 0;
237*b8851fccSafresh1
238*b8851fccSafresh1        while (len--)
239*b8851fccSafresh1            n = n * 65599 + *str++;
240*b8851fccSafresh1        return n;
241*b8851fccSafresh1    }
242*b8851fccSafresh1
243*b8851fccSafresh1     There may be better hash functions for the purposes  of
244*b8851fccSafresh1dynamic hashing.  Try your favorite, and check the pagefile.
245*b8851fccSafresh1If it contains too many pages with too many holes, (in rela-
246*b8851fccSafresh1tion  to this one for example) or if sdbm simply stops work-
247*b8851fccSafresh1ing (fails after SPLTMAX attempts to split)  when  you  feed
248*b8851fccSafresh1your  NEWS  history  file  to it, you probably do not have a
249*b8851fccSafresh1good hashing function.  If  you  do  better  (for  different
250*b8851fccSafresh1types of input), I would like to know about the function you
251*b8851fccSafresh1use.
252*b8851fccSafresh1
253*b8851fccSafresh1     Block sizes: It seems (from  various  tests  on  a  few
254*b8851fccSafresh1machines)  that a page file block size PBLKSIZ of 1024 is by
255*b8851fccSafresh1far the best for performance, but this also happens to limit
256*b8851fccSafresh1the  size  of a key/value pair. Depending on your needs, you
257*b8851fccSafresh1may wish to increase the page size, and also adjust  PAIRMAX
258*b8851fccSafresh1(the maximum size of a key/value pair allowed: should always
259*b8851fccSafresh1
260*b8851fccSafresh1
261*b8851fccSafresh1
262*b8851fccSafresh1
263*b8851fccSafresh1
264*b8851fccSafresh1
265*b8851fccSafresh1
266*b8851fccSafresh1
267*b8851fccSafresh1
268*b8851fccSafresh1                           - 5 -
269*b8851fccSafresh1
270*b8851fccSafresh1
271*b8851fccSafresh1be at least three words smaller than PBLKSIZ.)  accordingly.
272*b8851fccSafresh1The  system-wide  version  of the library should probably be
273*b8851fccSafresh1configured with 1024 (distribution default), as this appears
274*b8851fccSafresh1to be sufficient for most common uses of sdbm.
275*b8851fccSafresh1
276*b8851fccSafresh1Portability
277*b8851fccSafresh1
278*b8851fccSafresh1     This package has been tested in many  different  UN*Xes
279*b8851fccSafresh1even including minix, and appears to be reasonably portable.
280*b8851fccSafresh1This does not mean it will port easily to non-UN*X systems.
281*b8851fccSafresh1
282*b8851fccSafresh1Notes and Miscellaneous
283*b8851fccSafresh1
284*b8851fccSafresh1     The sdbm is not a very complicated  package,  at  least
285*b8851fccSafresh1not  after  you  familiarize yourself with the literature on
286*b8851fccSafresh1external hashing. There are other interesting algorithms  in
287*b8851fccSafresh1existence  that ensure (approximately) single-read access to
288*b8851fccSafresh1a data value associated with any key. These  are  directory-
289*b8851fccSafresh1less schemes such as linear hashing [Lit80] (+ Larson varia-
290*b8851fccSafresh1tions), spiral storage [Mar79] or directory schemes such  as
291*b8851fccSafresh1extensible  hashing  [Fag79] by Fagin et al. I do hope these
292*b8851fccSafresh1sources provide a reasonable playground for  experimentation
293*b8851fccSafresh1with  other algorithms.  See the June 1988 issue of ACM Com-
294*b8851fccSafresh1puting Surveys [Enb88] for  an  excellent  overview  of  the
295*b8851fccSafresh1field.
296*b8851fccSafresh1
297*b8851fccSafresh1References
298*b8851fccSafresh1
299*b8851fccSafresh1
300*b8851fccSafresh1[Lar78]
301*b8851fccSafresh1    P.-A. Larson, "Dynamic Hashing", BIT, vol.   18,   pp.
302*b8851fccSafresh1    184-201, 1978.
303*b8851fccSafresh1
304*b8851fccSafresh1[Tho90]
305*b8851fccSafresh1    Ken Thompson, private communication, Nov. 1990
306*b8851fccSafresh1
307*b8851fccSafresh1[Lit80]
308*b8851fccSafresh1    W. Litwin, "Linear Hashing: A new tool  for  file  and
309*b8851fccSafresh1    table addressing", Proceedings of the 6th Conference on
310*b8851fccSafresh1    Very Large  Dabatases  (Montreal), pp.   212-223,   Very
311*b8851fccSafresh1    Large Database Foundation, Saratoga, Calif., 1980.
312*b8851fccSafresh1
313*b8851fccSafresh1[Fag79]
314*b8851fccSafresh1    R. Fagin, J.  Nievergelt,  N.  Pippinger,  and   H.   R.
315*b8851fccSafresh1    Strong,  "Extendible Hashing - A Fast Access Method for
316*b8851fccSafresh1    Dynamic Files", ACM  Trans.  Database  Syst.,  vol.  4,
317*b8851fccSafresh1    no.3, pp. 315-344, Sept. 1979.
318*b8851fccSafresh1
319*b8851fccSafresh1[Wal84]
320*b8851fccSafresh1    Rich Wales, "Discussion of 'dbm'  data  base  system",
321*b8851fccSafresh1    USENET newsgroup unix.wizards, Jan. 1984.
322*b8851fccSafresh1
323*b8851fccSafresh1[Tor87]
324*b8851fccSafresh1    Chris Torek,  "Re:   dbm.a   and   ndbm.a   archives",
325*b8851fccSafresh1
326*b8851fccSafresh1
327*b8851fccSafresh1
328*b8851fccSafresh1
329*b8851fccSafresh1
330*b8851fccSafresh1
331*b8851fccSafresh1
332*b8851fccSafresh1
333*b8851fccSafresh1
334*b8851fccSafresh1                           - 6 -
335*b8851fccSafresh1
336*b8851fccSafresh1
337*b8851fccSafresh1    USENET newsgroup comp.unix, 1987.
338*b8851fccSafresh1
339*b8851fccSafresh1[Mar79]
340*b8851fccSafresh1    G. N. Martin, "Spiral Storage: Incrementally   Augment-
341*b8851fccSafresh1    able   Hash  Addressed  Storage", Technical Report #27,
342*b8851fccSafresh1    University of Varwick, Coventry, U.K., 1979.
343*b8851fccSafresh1
344*b8851fccSafresh1[Enb88]
345*b8851fccSafresh1    R.  J.  Enbody  and  H.   C.   Du,   "Dynamic   Hashing
346*b8851fccSafresh1    Schemes",ACM  Computing  Surveys,  vol.  20, no. 2, pp.
347*b8851fccSafresh1    85-113, June 1988.
348*b8851fccSafresh1
349*b8851fccSafresh1
350*b8851fccSafresh1
351*b8851fccSafresh1
352*b8851fccSafresh1
353*b8851fccSafresh1
354*b8851fccSafresh1
355*b8851fccSafresh1
356*b8851fccSafresh1
357*b8851fccSafresh1
358*b8851fccSafresh1
359*b8851fccSafresh1
360*b8851fccSafresh1
361*b8851fccSafresh1
362*b8851fccSafresh1
363*b8851fccSafresh1
364*b8851fccSafresh1
365*b8851fccSafresh1
366*b8851fccSafresh1
367*b8851fccSafresh1
368*b8851fccSafresh1
369*b8851fccSafresh1
370*b8851fccSafresh1
371*b8851fccSafresh1
372*b8851fccSafresh1
373*b8851fccSafresh1
374*b8851fccSafresh1
375*b8851fccSafresh1
376*b8851fccSafresh1
377*b8851fccSafresh1
378*b8851fccSafresh1
379*b8851fccSafresh1
380*b8851fccSafresh1
381*b8851fccSafresh1
382*b8851fccSafresh1
383*b8851fccSafresh1
384*b8851fccSafresh1
385*b8851fccSafresh1
386*b8851fccSafresh1
387*b8851fccSafresh1
388*b8851fccSafresh1
389*b8851fccSafresh1
390*b8851fccSafresh1
391*b8851fccSafresh1
392*b8851fccSafresh1
393*b8851fccSafresh1
394*b8851fccSafresh1
395*b8851fccSafresh1
396*b8851fccSafresh1
397