• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

CONF/H03-May-2022-306257

DATA/H03-May-2022-398380

TEST/H03-May-2022-4,8984,503

COPYINGH A D22-Nov-199517.6 KiB340281

DbGetopt.pmH A D23-May-20034.2 KiB20589

DbTDistr.pmH A D25-Aug-20068.1 KiB195113

Makefile.inH A D03-May-20223.2 KiB167107

READMEH A D25-Aug-200629.5 KiB877670

configureH A D25-Oct-200294.9 KiB3,2322,613

configure.inH A D25-Oct-20023.1 KiB127106

crl_to_dbH A D22-Jan-20005.8 KiB15276

db2dcliffH A D03-May-20226 KiB241169

db_test_suiteH A D04-Feb-20043.4 KiB167112

db_to_html_tableH A D03-May-20221.7 KiB9666

dbcolH A D03-May-20223.7 KiB152115

dbcolcreateH A D03-May-20221.8 KiB10870

dbcoldefineH A D03-May-20221.4 KiB7755

dbcoldiffH A D03-May-20227.7 KiB242187

dbcolhistoH A D03-May-20226.5 KiB251202

dbcolizeH A D03-May-20223.2 KiB137107

dbcolmergeH A D03-May-20222.4 KiB10479

dbcolmovingstatsH A D03-May-20224.5 KiB190126

dbcolmultiscaleH A D03-May-20227.3 KiB277200

dbcolneatenH A D03-May-20224.7 KiB201137

dbcolpercentileH A D03-May-20223.8 KiB184130

dbcolrenameH A D03-May-20221.8 KiB8159

dbcolscorrelateH A D03-May-20223.9 KiB184135

dbcolsplittocolsH A D03-May-20222.7 KiB11784

dbcolsplittorowsH A D03-May-20222.3 KiB10780

dbcolstatsH A D03-May-20223.6 KiB166115

dbcoltightenH A D03-May-20221.6 KiB8056

dbfilesplitH A D03-May-20222.3 KiB12991

dbformmailH A D03-May-20224.2 KiB176129

dbjoinH A D03-May-202210.2 KiB415325

dblib.plH A D03-May-202212.3 KiB469367

dblistizeH A D03-May-20222.9 KiB12598

dbmultistatsH A D03-May-20224.2 KiB167125

dbrecolizeH A D03-May-20222.7 KiB142105

dbrowH A D03-May-20222.3 KiB10174

dbrowaccumulateH A D03-May-20222.4 KiB13696

dbrowdiffH A D03-May-20222.9 KiB12187

dbrowenumerateH A D03-May-20221.6 KiB8452

dbrowevalH A D03-May-20223.9 KiB149110

dbrowsplituniqH A D03-May-20223.7 KiB181141

dbrowuniqH A D03-May-20223 KiB155121

dbsortH A D03-May-20229 KiB324196

dbstatsH A D03-May-20228 KiB290215

dbstripcommentsH A D03-May-20221.9 KiB10372

dbstripextraheadersH A D03-May-20222.4 KiB8159

dbstripleadingspaceH A D03-May-20221.3 KiB7550

dmalloc_to_dbH A D06-Apr-1998130 74

install-shH A D06-Apr-19984.7 KiB239152

ipchain_logs_to_dbH A D03-May-20221.6 KiB7247

kitrace_to_dbH A D03-May-20223.1 KiB10378

ns_to_dbH A D06-Apr-1998349 152

releaseH A D04-Feb-20045 21

t_distributionH A D25-Aug-20061.4 KiB7143

tabdelim_to_dbH A D05-Jan-2000817 4929

tcpdump_to_dbH A D03-May-20224.1 KiB138105

README

1
2JDB:  Database Functions for Shell Scripting
3--------------------------------------------
4by John Heidemann <johnh@isi.edu>
5
6
7WHAT'S NEW?
8-----------
91.14,  24-Aug-06
10
11- ENHANCEMENT: README cleanup
12- INCOMPATIBLE CHANGE: dbcolsplit renamed dbcolsplittocols
13- NEW: dbcolsplittorows  split one column into multiple rows
14- NEW: dbcolsregression compute linear regression and correlation for two columns
15- ENHANCEMENT: cvs_to_db: better error handling, normalize field names, skip blank lines
16- ENHANCEMENT: dbjoin now detects (and fails) if non-joined files have duplicate names
17- BUG FIX: minor bug fixed in calculation of Student t-distributions
18	(doesn't change any test output, but may have caused small errors)
19
20
21EXECUTIVE SUMMARY
22-----------------
23
24JDB is package of commands for manipulating flat-ASCII databases from
25shell scripts.  JDB is useful to process medium amounts of data (with
26very little data you'd do it by hand, with megabytes you might want a
27real database).  JDB is very good at doing things like:
28
29	- extracting measurements from experimental output
30	- re-examining data to address different hypotheses
31        - joining data from different experiments
32	- eliminating/detecting outliers
33	- computing statistics on data (mean, confidence intervals,
34		correlations, histograms)
35	- reformatting data for graphing programs
36
37Rather than hand-code scripts to do each special case, JDB provides
38higher-level functions.  Although it's often easy throw together a
39custom script to do any single task, I believe that there are several
40advantages to using this library:
41
42	- these programs provide a higher level interface than plain Perl
43		=> dbrow '_size == 1024' | dbstats bw
44		   rather than:
45			while (<>) { split; $sum+=$F[2]; $ss+=$F[2]^2; $n++; }
46			$mean = $sum / $n; $std_dev = ...
47			etc.
48		   in dozens of places
49
50	- the library uses names for columns
51		=> no more $F[2], use _bw
52		=> new or different order columns?  no changes to your scripts!
53
54	- the library is self-documenting (each program records what it did)
55		=> no more wondering what hacks were used to compute the
56			final data, just look at the comments at the end
57			of the output
58
59	- unusual cases, error checking, and large datasets are already handled
60		=> custom scripts often skimp on error checking
61			and assume everything fits in memory
62
63(The disadvantage is that you need to learn what functions JDB provides.)
64
65JDB is built on flat-ASCII databases.  By storing data in simple text
66files and processing it with pipelines it is easy to experiment (in
67the shell) and look at the output.  The original implementation of
68this idea was /rdb, a commercial product described in the book ``UNIX
69relational database management: application development in the UNIX
70environment'' by Rod Manis, Evan Schaffer, and Robert Jorgensen (and
71also at the web page <http://www.rdb.com/>).  JDB is an incompatible
72re-implementation of their idea without any accelerated indexing or
73forms support.  (But it's free!).
74
75Installation instructions follow at the end of this document.  JDB
76requires Perl 5.003 to run.  There are no man pages currently, but
77each command has a complete description in its usage string.  All
78commands are backed by an automated test suite.
79
80The most recent version of JDB is available on the web at
81<http://www.isi.edu/~johnh/SOFTWARE/JDB/index.html>.
82
83
84README CONTENTS
85---------------
86- what's new
87- executive summary
88- README CONTENTS
89- installation
90- basic data format
91- basic data manipulation
92- list of commands
93- another example
94- a gradebook example
95- a password example
96- history
97- related work
98- release notes
99- copyright
100- comments
101
102
103
104INSTALLATION
105------------
106
107The quick answer to installation is to type:
108	./configure
109	make install
110
111JDB uses autoconf.  You can set where the programs are installed
112with --prefix=/where/you/want/them/without/bin/at/the/end.
113Do ./configure --help for details.
114
115JDB requires perl 5.003 or later.  Some of the commands work on 5.000,
116but several of the test scripts fail, so buyer beware.
117
118A test-suite is available, run it with ./db_test_suite
119or "make test".
120
121In the past there have been some test suite problems due to different
122printf implementations.  I've tried to code around this problem;
123please let me know if you encounter it again.
124
125A FreeBSD port to JDB is available, see
126<http://www.freshports.org/databases/jdb/>.
127
128
129COMMON INSTALLATION PROBLEMS (FAQ)
130----------------------------------
131
132Q: After installing jdb, I get this error when I run it:
133	Can't locate ~/lib/dblib.pl in @INC (@INC
134	contains: /usr/libdata/perl/5.00503/mach /usr/libdata/perl/5.00503
135	/usr/local/lib/perl5/site_perl/5.005/i386-freebsd
136	/usr/local/lib/perl5/site_perl/5.005 . ~/lib) at
137	/home/netlab1/alefiyah/bin/dbrow line 48.
138(or something like that).  What should I do?
139
140A: You're probably not running the installed version, you're running
141the unpacked version.  Part of the installation process is changing
142the scripts so they know where their libraries are.  After you
143configure and install jdb, run the programs from where they are
144installed.
145
146
147
148
149BASIC DATA FORMAT
150-----------------
151
152These programs are based on the idea storing data in simple ASCII
153files.  A database is a file with one header line and then data or
154comment lines.  For example:
155
156	#h account passwd uid gid fullname homedir shell
157	johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
158	greg * 2275 134 Greg_Johnson /home/greg /bin/bash
159	root * 0 0 Root /root /bin/bash
160	# this is a simple database
161
162The header line must be first and begins with "#h".
163There are rows (records) and columns (fields),
164just like in a normal database.
165Comment lines begin with "#".
166
167By default, columns are delimited by whitespace.  By default it is
168therefore not possible to have fields which contain whitespace.
169(But see below for alternatives.)
170
171The big advantage of this approach is that it's easy to massage data
172into this format, and it's reasonably easy to take data out of this
173format into other (text-based) programs, like gnuplot, jgraph, and
174LaTeX.  Think Unix.  Think pipes.
175
176Since no-whitespace in columns was a problem for some applications,
177there's an option which relaxes this rule.  You can specify the field
178separator in the table header with -Fx where x is the new field
179separator.  The special value -FS sets a separator of two spaces, thus
180allowing (single) spaces in fields.  An example:
181
182	#h -FS account passwd uid gid fullname homedir shell
183	johnh  *  2274  134  John Heidemann  /home/johnh  /bin/bash
184	greg  *  2275  134  Greg Johnson  /home/greg  /bin/bash
185	root  *  0  0  Root  /root  /bin/bash
186	# this is a simple database
187
188See dbrecolize for more details.  Regardless of what the column
189separator is for the body of the data, it's always whitespace in the
190header.
191
192There's also a third format: a "list".  Because it's often hard to see
193what's columns past the first two, in list format each "column" is on
194a separate line.  The programs dblistize and dbcolize convert to and
195from this format.  Currently other programs work only on column-format
196data, so list data is only for viewing.  Here's a sample of
197"dblistize  < DATA/passwd.jdb":
198
199	#L account passwd uid gid fullname homedir shell
200	account:  johnh
201	passwd:   *
202	uid:      2274
203	gid:      134
204	fullname: John_Heidemann
205	homedir:  /home/johnh
206	shell:    /bin/bash
207
208	account:  greg
209	passwd:   *
210	uid:      2275
211	gid:      134
212	fullname: Greg_Johnson
213	homedir:  /home/greg
214	shell:    /bin/bash
215
216	account:  root
217	passwd:   *
218	uid:      0
219	gid:      0
220	fullname: Root
221	homedir:  /root
222	shell:    /bin/bash
223
224	# this is a simple database
225	#  | dblistize
226
227See dbcolize -? and dblistize -? for more details.
228
229
230BASIC DATA MANIPULATION
231-----------------------
232
233A number of programs exist to manipulate databases.
234Complex functions can be made by stringing together commands
235with shell pipelines.  For example, to print the home
236directories of everyone with ``john'' in their names,
237you would do:
238
239	cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
240
241The output:
242	dash> cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
243	#h      homedir
244	/home/johnh
245	/home/greg
246	# this is a simple database
247	#  | dbrow _fullname =~ /John/
248	#  | dbcol homedir
249
250(Notice that comments are appended to the output listing each command,
251providing an automatic audit log.)
252
253In addition to typical database functions (select, join, etc.) there
254are also a number of statistical functions.
255
256
257TALKING ABOUT COLUMNS
258---------------------
259
260An advantage of JDB is that you can talk about columns by name
261(symbolically) rather than simply by their positions.  So in the above
262example, "dbcol homedir" pulled out the home directory column, and
263"dbrow '_fullname =~ /John/'" matched against column fullname.
264
265In general, you can use the name of the column listed on the #h line
266to identify it in most programs, and _name to identify it in code.
267
268Some alternatives for flexibility:
269
270- numeric values identify columns positionally, so 0 or _0 is the
271first column, 1 is the second, etc.
272
273- in code, _last_columnname gets the value from columname's last row
274
275See dbroweval -? for more details about writing code.
276
277
278
279LIST OF COMMANDS
280----------------
281
282Enough said.  I'll summarize the commands, and then you can
283experiment.  For a detailed description of each command, see its usage
284line by running it with the argument ``-?''.  In some shells (csh)
285you'll need to quote this (run ``dbcol -\?'' rather than ``dbcol -?'').
286
287TABLE CREATION
288--------------
289dbcolcreate	add columns to a database
290dbcoldefine	set the column headings for a non-JDB file
291
292TABLE MANIPULATION
293------------------
294dbcol		select columns from a table
295dbrow		select rows from a table
296dbsort		sort rows based on a set of columns
297dbjoin		compute the natural join of two tables
298dbcolrename	rename a column
299dbcolmerge	merge two columns into one
300dbcolsplittocols  split one column into two or more columns
301dbcolsplittorows  split one column into multiple rows
302dbrowsplituniq  split the file into multiple files per unique fields
303dbfilevalidate  check that db file doesn't have some common errors
304dbfilesplit     split a single input file containing multiple tables into
305			several files
306
307COMPUTATION AND STATISTICS
308--------------------------
309dbstats		compute statistics over a column (mean,etc.,optionally median)
310dbmultistats	compute a series of stats (mean, etc.) over a table
311dbcoldiff	compare two samples distributions (mean/conf interval/T-test)
312dbcolmovingstats  computing moving statistics over a column of data
313dbcolmultiscale compute simple stats (sums and rates) over mutliple timescales
314dbcolstats	compute Z-scores and T-scores over one column of data
315dbcolpercentile	compute the rank or percentile of a column
316dbcolhisto	compute histograms over a column of data
317dbcolscorrelate compute the coefficient of correlation over several columns
318dbcolsregression compute linear regression and correlation for two columns
319dbrowaccumulate compute a running sum over a column of data
320dbrowdiff	compute differences between each row of a table
321dbrowenumerate	number each row
322dbroweval	run arbitrary Perl code on each row
323dbrowuniq	count/eliminate identical rows (like Unix uniq(1))
324db2dcliff	find ``cliffs'' in two-dimensional data
325
326OUTPUT CONTROL
327--------------
328dbcolneaten	pretty-print columns
329dbcoltighten	un-pretty-print columns
330dblistize	convert columnar format into a ``list'' format
331dbcolize	undo dblistize
332dbrecolize	change the field separator for a table
333dbstripcomments remove comments from a table
334dbstripextraheaders remove extra headers that occur from table concatenation
335dbstripleadingspace remove leading spaces from (potentially non-JDB) data
336dbformmail	generate a script that sends form mail based on each row
337
338CONVERSIONS
339-----------
340(These programs convert data into jdb.  See their web pages for details.)
341cgi_to_db	http://stein.cshl.org/WWW/software/CGI/
342crl_to_db	http://moat.nlanr.net/Traces/
343dmalloc_to_db	http://www.letters.com/dmalloc/
344kitrace_to_db	http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html
345ns_to_db	http://mash-www.cs.berkeley.edu/ns/
346tabdelim_to_db	spreadsheet tab-delimited files to db
347tcpdump_to_db   (see man tcpdump(8) on any reasonable system)
348
349(And out of jdb:)
350db_to_html_table   simple conversion of JDB to html tables
351
352
353Standard options:
354
355-?	usage
356-c	confidence interval (dbmultistats)
357-C	column separator (dbcolsplit, dbcolmerge)
358-d	debug mode
359-a	stats over all data (treating non-numerics as zeros)
360	(by default, non-numerics are ignored for stats purposes)
361-S      assume the data is pre-sorted
362
363When giving Perl code (in dbrow and dbroweval)
364column names can be embedded if preceded by underscores.
365(Try dbrow -? and dbroweval -? for examples.)
366
367Most programs run in constant memory and use temporary files if necessary.
368Exceptions are dbcolneaten, dbcolpercentile, dbmultistats, dbrowsplituniq.
369
370
371ANOTHER EXAMPLE
372---------------
373
374Take the raw data in DATA/http_bandwidth,
375put a header on it (dbcoldefine size bw),
376took statistics of each category (dbmultistats size bw),
377pick out the relevant fields (dbcol size mean stddev pct_rsd), and you get:
378	#h      size    mean    stddev  pct_rsd
379	1024    1.4962e+06      2.8497e+05      19.047
380	10240   5.0286e+06      6.0103e+05      11.952
381	102400  4.9216e+06      3.0939e+05      6.2863
382	#  | dbcoldefine size bw
383	#  | /home/johnh/BIN/DB/dbmultistats size bw
384	#  | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
385(The whole command was:
386	cat DATA/http_bandwidth | dbcoldefine size bw |
387		dbmultistats size bw | dbcol size mean stddev pct_rsd
388all on one line.)
389
390Then post-process them to get rid of the exponential notation
391(dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);')
392	#h      size    mean    stddev  pct_rsd
393	1024     1496200          284970        19.047
394	10240    5028600          601030        11.952
395	102400   4921600          309390        6.2863
396	#  | dbcoldefine size bw
397	#  | /home/johnh/BIN/DB/dbmultistats size bw
398	#  | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
399	#  | /home/johnh/BIN/DB/dbroweval   { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }
400(The whole command is as before, with the dbroweval tacked on the end.)
401
402In a few lines, raw data is transformed to processed output.
403
404
405Suppose you expect there is an odd distribution of results of one
406datapoint.  JDB can easily produce a CDF (cumulative distribution
407function) of the data, suitable for graphing:
408
409cat DB/DATA/http_bandwidth | dbcoldefine size bw | \
410	dbrow '_size == 102400' | \
411	dbcol bw | dbsort -n bw | \
412	dbrowenumerate | dbcolpercentile count | \
413	dbcol bw percentile | xgraph
414
415The steps, per line:
416	1. get the raw input data and turn it into jdb format
417	2. pick out just the relevant column (for efficiency) and sort it
418	3. for each data point, assign a CDF percentage to it
419	4. pick out the two columns to graph and show them
420
421
422A GRADEBOOK EXAMPLE
423-------------------
424
425The first commercial program I wrote was a gradebook,
426so here's how to do it with JDB.
427
428Format your data like DATA/grades.
429	#h name email id test1
430	a a@ucla.edu 1 80
431	b b@usc.edu 2 70
432	c c@isi.edu 3 65
433	d d@lmu.edu 4 90
434	e e@caltech.edu 5 70
435	f f@oxy.edu 6 90
436
437Or if your students have spaces in their names, use -FS and two spaces
438to separate each column:
439
440	#h -FS name email id test1
441	a x  a@ucla.edu  1  80
442	b x  b@usc.edu  2  70
443	c x  c@isi.edu  3  65
444	d x  d@lmu.edu  4  90
445	e x  e@caltech.edu  5  70
446	f x  f@oxy.edu  6  90
447
448To compute statistics on an exam, do
449	cat DATA/grades | dbstats test1 |dblistize
450
451	#L  ...
452	mean:        77.5
453	stddev:      10.84
454	pct_rsd:     13.987
455	conf_range:  11.377
456	conf_low:    66.123
457	conf_high:   88.877
458	conf_pct:    0.95
459	sum:         465
460	sum_squared: 36625
461	min:         65
462	max:         90
463	n:           6
464	...
465
466To do a histogram:
467	cat DATA/grades | dbcolhisto -n 5 -g test1
468
469	#h low histogram
470	65      *
471	70      **
472	75
473	80      *
474	85
475	90      **
476	#  | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1
477
478Now you want to send out grades to the students by e-mail.
479Create a form-letter (in the file test1.txt):
480	To: _email (_name)
481	From: J. Random Professor <jrp@usc.edu>
482	Subject: test1 scores
483
484	_name, your score on test1 was _test1.
485	86+   A
486	75-85 B
487	70-74 C
488	0-69  F
489
490Generate the shell script that will send the mail out:
491	cat DATA/grades | dbformmail test1.txt > test1.sh
492And run it:
493	sh <test1.sh
494
495The last two steps can be combined:
496	cat DATA/grades | dbformmail test1.txt | sh
497(but I like to keep a copy of exactly what I send).
498
499
500At the end of the semester you'll want to compute grade totals and
501assign letter grades.  Both fall out of dbroweval.
502For example, to compute weighted total grades with a 40% midterm/60%
503final where the midterm is 84 possible points and the final 100:
504
505	dbcol -rv total |
506	dbcolcreate total - |
507	dbroweval '
508		_total = .40 * _midterm/84.0 + .60 * _final/100.0;
509		_total = sprintf("%4.2f", _total);
510		if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
511	dbcolneaten
512
513
514If you got the data originally from a spreadsheet, save it in
515"tab-delimited" format and convert it with tabdelim_to_db
516(run tabdelim_to_db -? for examples).
517
518
519A PASSWORD EXAMPLE
520------------------
521
522To convert the Unix password file to db:
523
524	cat /etc/passwd | sed 's/:/  /g'| \
525		dbcoldefine -F S login password uid gid gecos home shell \
526		>passwd.jdb
527
528To convert the group file
529
530	cat /etc/group | sed 's/:/  /g' | \
531		dbcoldefine -F S group password gid members \
532		>group.jdb
533
534To show the names of the groups that div7-members are in
535(assuming DIV7 is in the gecos field):
536
537	cat passwd.jdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
538		dbjoin - group.jdb gid | dbcol login group
539
540
541SHORT EXAMPLES
542--------------
543
544Which db programs are the most complicated (based on number of test cases)?
545
546        ls TEST/*.cmd | \
547                dbcoldefine test | \
548                dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
549                dbrowuniq -c | \
550                dbsort -nr count | \
551                dbcolneaten
552
553(Answer: dbstats, then dbjoin.)
554
555
556Stats on an exam (in FILE, with COLUMN==the name of the exam)?
557	cat $FILE | dbstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments
558	cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments
559
560
561Merging a the hw1 column from file hw1.jdb into grades.jdb assuing
562there's a common student id in column "id":
563	dbcol id hw1 <hw1.jdb >t.jdb
564	dbjoin -i -e - grades.jdb t.jdb id |dbsort name|dbcolneaten >new_grades.jdb
565
566
567Merging two jdb files with the same rows:
568	cat file1.jdb file2.jdb >output.jdb
569
570or if you want to clean things up a bit
571	cat file1.jdb file2.jdb | dbstripextraheaders >output.jdb
572
573or if you want to know where the data came from
574	for i in 1 2
575	do
576		dbcolcreate source $i < file$i.jdb
577	done | dbstripextraheaders >output.jdb
578
579(assumes you're using a Bourne-shell compatible shell, not csh).
580
581
582
583
584HISTORY
585-------
586
587There have been two versions of JDB;
588the current is a complete re-write of the first.
589
590JDB (in its various forms) has been used extensively by its author
591since 1991.  Since 1995 it's been used by two other researchers at
592UCLA and several at ISI.  In February 1998 it was announced to the
593Internet.
594
595JDB includes code ported from Geoff Kuenning (DbTDistr.pm).
596
597JDB contributors:  Ashvin Goel <goel@cse.oge.edu>, Geoff Kuenning
598<geoff@fmg.cs.ucla.edu>, Vikram Visweswariah <visweswa@isi.edu>,
599Kannan Varadahan <kannan@isi.edu>, Lars Eggert <larse@isi.edu>, Arkadi
600Gelfond <arkadig@dyna.com>, Haobo Yu <haoboy@packetdesign.com>, Pavlin
601Radoslavov <pavlin@catarina.usc.edu>, Fabio Silva <fabio@isi.edu>,
602Jerry Zhao <zhaoy@isi.edu>, Ning Xu <nxu@aludra.usc.edu>.
603
604
605
606RELATED WORK
607------------
608
609As stated in the introduction, JDB is an incompatible reimplementation
610of the ideas found in /rdb.  By storing data in simple text files and
611processing it with pipelines it is easy to experiment (in the shell)
612and look at the output.  The original implementation of this idea was
613/rdb, a commercial product described in the book ``UNIX relational
614database management: application development in the UNIX environment''
615by Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at the web
616page <http://www.rdb.com/>).
617
618In August, 2002 I found out Carlo Strozzi extended RDB with his
619package NoSQL <http://www.linux.it/~carlos/nosql/>.  According to
620Mr. Strozzi, he implemented NoSQL in awk to avoid the Perl start-up of
621RDB.  Although I haven't found Perl startup overhead to be a big
622problem on my platforms (from old Sparcstation IPCs to 2GHz
623Pentium-4s), you may want to evaluate his system.  (At some point I'll
624try to do a comparison of JDB and NoSQL.)
625
626
627RELEASE NOTES
628-------------
629
630Versions prior to 1.0 were released informally on my web page
631but were not announced.
632
6331.0, 22-Jul-97:  adds autoconf support and a test script.
634
6351.1, 20-Jan-98:  support for double space field separators, better tests
636
6371.2, 11-Feb-98: minor changes and release on comp.lang.perl.announce
638
6391.3, 17-Mar-98
640	- adds median and quartile options to dbstats
641	- adds dmalloc_to_db converter
642	- fixes some warnings
643	- dbjoin now can run on unsorted input
644	- fixes a dbjoin bug
645	- some more tests in the test suite
646
6471.4, 27-Mar-98
648	- improves error messages
649		(all should now report the program that makes the error)
650	- fixed a bug in dbstats output when the mean is zero
651
6521.5, 25-Jun-98
653	- BUG FIX: dbcolhisto, dbcolpercentile now handles non-numeric
654		values like dbstats
655	- NEW: dbcolstats computes zscores and tscores over a column
656	- NEW: dbcolscorrelate computes correlation coefficients
657		between two columns
658	- INTERNAL: ficus_getopt.pl has been replaced by DbGetopt.pm
659	- BUG FIX: all tests are now ``portable'' (previously some tests
660		ran only on my system)
661	- BUG FIX: you no longer need to have the db programs in your path
662		(fix arose from a discussion with Arkadi Gelfond)
663	- BUG FIX: installation no longer uses cp -f (to work on SunOS 4)
664
6651.6, 24-May-99
666
667	- NEW: dbsort, dbstats, dbmultistats now run in constant memory
668		(using tmp files if necessary)
669	- NEW: dbcolmovingstats does moving means over a series of data
670	- NEW: dbcol has a -v option to get all columns except those listed
671	- NEW: dbmultistats does quartitles and medians
672	- NEW: dbstripextraheaders now also cleans up bogus comments
673		before the fist header
674	- BUG FIX: dbcolneaten works better with double-space-separated data
675
6761.7,  5-Jan-00
677
678- NEW: dbcolize now detects and rejects lines that contain embedded
679	copies of the field separator
680
681- NEW: configure tries harder to prevent people from improperly
682	configuring/installing jdb
683
684- NEW: tcpdump_to_db converter (incomplete)
685
686- NEW: tabdelim_to_db converter:  from spreadsheet tab-delimited files to db
687
688- NEW: mailing lists for jdb are
689	jdb-announce@heidemann.la.ca.us and
690	jdb-talk@heidemann.la.ca.us
691     To subscribe to either, send mail to
692	jdb-announce-request@heidemann.la.ca.us
693	or jdb-talk-request@heidemann.la.ca.us.
694     with "subscribe" in the BODY of the message.
695
696- BUG FIX:  dbjoin used to produce incorrect output if there
697	were extra, unmatched values in the 2nd table.
698	Thanks to Graham Phillips for providing a test case.
699
700- BUG FIX:  the sample commands in the usage strings
701	now all should explicitly include the source of data
702	(typically from "cat foo.jdb |").  Thanks to Ya Xu
703	for pointing out this documentation deficiency.
704
705- BUG FIX (DOCUMENTATION): dbcolmovingstats had incorrect sample output.
706
7071.8, 28-Jun-00
708
709- BUG FIX:  header options are now preserved when writing with dblistize
710
711- NEW:  dbrowuniq now optionally checks for uniqueness only on certain fields
712
713- NEW: dbrowsplituniq makes one pass through a file and splits it into
714	separate files based on the given fields
715
716- NEW:  converter for "crl" format network traces
717
718- NEW:  anywhere you use arbitrary code (like dbroweval),
719	_last_foo now maps to the last row's value for field _foo.
720
721- OPTIMIZATION: comment processing slightly changed so that
722	dbmultistats now is much faster on files with lots of comments
723	(for example, ~100k lines of comments and 700 lines of data!)
724	(Thanks to Graham Phillips for pointing out this performance
725	problem.)
726
727- BUG FIX: dbstats with median/quartiles now correctly handles singleton
728	data points
729
7301.9,  6-Nov-00
731
732- NEW: dbfilesplit, split a single input file into multiple output files
733	(based on code contributed by Pavlin Radoslavov).
734
735- BUG FIX: dbsort now works with perl-5.6
736
7371.10, 10-Apr-01
738
739- BUG FIX: dbstats now handles the case where there are more n-tiles
740	than data
741- NEW: dbstats now includes a -S option to optimize work on
742	pre-sorted data (inspired by code contributed by Haobo Yu)
743- BUG FIX: dbsort now has a better estimate of memory usage when
744	run on data with very short records (problem detected by Haobo Yu)
745- BUG FIX: cleanup of temporary files is slightly better
746
7471.11,  2-Nov-01
748
749- BUG FIX: dbcolneaten now runs in constant memory
750- NEW: dbcolneaten now supports "field specifiers" that
751	allow some control over how wide columns should be
752- OPTIMIZATION: dbsort now tries hard to be filesystem cache-friendly
753	(inspired by "Information and Control in Gray-box Systems" by
754	the Arpaci-Dusseau's at SOSP 2001)
755- INTERNAL: t_distr now ported to perl5 module DbTDistr
756
7571.12,  30-Oct-02
758
759- BUG FIX: dbmultistats documentation typo fixed
760- NEW: dbcolmultiscale
761- NEW: dbcol has -r option for "relaxed error checking"
762- NEW: dbcolneaten has new -e option to strip end-of-line spaces
763- NEW: dbrow finally has a -v option to negate the test
764- BUG FIX: math bug in dbcoldiff fixed by Ashvin Goel
765	*** need to check Scheaffer test cases
766- BUG FIX: some patches to run with Perl 5.8
767	Note: some programs (dbcolmultiscale, dbmultistats, dbrowsplituniq)
768	generate warnings like:
769		Use of uninitialized value in concatenation (.)
770		or string at /usr/lib/perl5/5.8.0/FileCache.pm line 98,
771		<STDIN> line 2.
772	Please ignore this until I figure out how to suppress it.
773	(Thanks to Jerry Zhao for noticing perl-5.8 problems.)
774- BUG FIX: fixed an autoconf problem where configure would fail
775	to find a reasonable prefix (thanks to Fabio Silva
776	for reporting the problem)
777- NEW: db_to_html_table: simple conversion to html tables
778	(NO fancy stuff)
779- NEW: dblib now has a function dblib_text2html() that will
780	do simple conversion of iso-8859-1 to HTML
781
782
7831.13,  4-Feb-04
784
785- NEW: jdb added to the freebsd ports tree
786	<http://www.freshports.org/databases/jdb/>
787	maintainer: larse@isi.edu
788- BUG FIX:  properly handle trailing spaces when data must be numeric
789	(ex. dbstats with -FS, see test dbstats_trailing_spaces)
790	Fix from Ning Xu <nxu@aludra.usc.edu>.
791- NEW: dbcolize error message improved (bug report from Terrence
792	Brannon), and list format documented in the README.
793- NEW: cgi_to_db converts CGI.pm-format storage to jdb list format
794- BUG FIX: handle numeric synonyms for column names in dbcol properly
795- ENHANCEMENT: "talking about columns" section added to README.
796	Lack of documentation pointed out by Lars Eggert.
797- CHANGE: dbformmail now defaults to using Mail ("Berkeley Mail")
798	to send mail, rather than sendmail (sendmail is still an option,
799	but mail doesn't require running as root)
800- NEW: on platforms that support it (i.e., with perl 5.8), jdb works
801	fine with unicode
802- NEW: dbfilevalidate: check a db file for some common errors
803
804
805
806MISSING FEATURES
807----------------
808
809Some features that have been requested but not yet provided:
810
811- handling null values  From mike_schulz@csgsystems.com, 29-Mar-01.
812
813
814ISPELL WORDS
815------------
816
817 LocalWords:  Exp rdb Manis Evan Schaffer passwd uid gid fullname homedir greg
818 LocalWords:  gnuplot jgraph dbrow dbcol dbcolcreate dbcoldefine JDB README un
819 LocalWords:  dbcolrename dbcolmerge dbcolsplit dbjoin dbsort dbcoldiff Perl bw
820 LocalWords:  dbmultistats dbrowdiff dbrowenumerate dbroweval dbstats dblistize
821 LocalWords:  dbcolneaten dbcoltighten dbstripcomments dbstripextraheaders pct
822 LocalWords:  dbstripleadingspace stddev rsd dbsetheader sprintf LIBDIR BINDIR
823 LocalWords:  LocalWords isi URL com dbpercentile dbhistogram GRADEBOOK min ss
824 LocalWords:  gradebook conf std dev dbrowaccumulate dbcolpercentile db dcliff
825 LocalWords:  dbuniq uniq dbcolize distr pl Apr autoconf Jul html printf Fx jdb
826 LocalWords:  printfs dbrowuniq dbrecolize dbformmail kitrace geoff ns berkeley
827 LocalWords:  comp lang perl Haobo Yu outliers Jorgensen csh dbrowsplituniq crl
828
829
830COPYRIGHT
831---------
832
833JDB is Copyright (C) 1991-2002 by John Heidemann <johnh@isi.edu>.
834
835This program is free software; you can redistribute it and/or modify
836it under the terms of version 2 of the GNU General Public License as
837published by the Free Software Foundation.
838
839This program is distributed in the hope that it will be useful, but
840WITHOUT ANY WARRANTY; without even the implied warranty of
841MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
842General Public License for more details.
843
844You should have received a copy of the GNU General Public License
845along with this program; if not, write to the Free Software
846Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
847
848A copy of the GNU General Public License can be found in the file
849``COPYING''.
850
851
852COMMENTS
853--------
854
855Any comments about these programs should be sent to John Heidemann
856<johnh@isi.edu>.
857
858At ISI, these programs can be run directly out of /home/johnh/BIN/DB.
859
860   -John Heidemann
861
862
863ISPELL WORDS
864------------
865
866 LocalWords:  dbcolmovingstats dbcolstats zscores tscores dbcolhisto columnar
867 LocalWords:  dmalloc tabdelim stats numerics datapoint CDF xgraph max txt sed
868 LocalWords:  login gecos div cmd nr hw hw assuing Kuenning Vikram Visweswariah
869 LocalWords:  Kannan Varadahan Arkadi Gelfond Pavlin Radoslavov quartile getopt
870 LocalWords:  dbcolscorrelate DbGetopt cp tmp quartitles nd Ya Xu dbfilesplit
871 LocalWords:  MERCHANTABILITY tba dbcolsplittocols dbcolsplittorows cvs johnh
872 LocalWords:  dbcolsregression datasets whitespace LaTeX FS columnname cgi pre
873 LocalWords:  columname's dbfilevalidate  tcpdump http rv eq Bourne DbTDistr LocalWords:  Ashvin
874 LocalWords:  Goel Eggert Ning Strozzi NoSQL awk startup Sparcstation IPCs GHz
875 LocalWords:  SunOS Arpaci Dusseau's SOSP Scheaffer STDIN dblib iso freebsd
876 LocalWords:  sendmail unicode
877