1
2JDB: Database Functions for Shell Scripting
3--------------------------------------------
4by John Heidemann <johnh@isi.edu>
5
6
7WHAT'S NEW?
8-----------
91.14, 24-Aug-06
10
11- ENHANCEMENT: README cleanup
12- INCOMPATIBLE CHANGE: dbcolsplit renamed dbcolsplittocols
13- NEW: dbcolsplittorows split one column into multiple rows
14- NEW: dbcolsregression compute linear regression and correlation for two columns
15- ENHANCEMENT: cvs_to_db: better error handling, normalize field names, skip blank lines
16- ENHANCEMENT: dbjoin now detects (and fails) if non-joined files have duplicate names
17- BUG FIX: minor bug fixed in calculation of Student t-distributions
18 (doesn't change any test output, but may have caused small errors)
19
20
21EXECUTIVE SUMMARY
22-----------------
23
24JDB is package of commands for manipulating flat-ASCII databases from
25shell scripts. JDB is useful to process medium amounts of data (with
26very little data you'd do it by hand, with megabytes you might want a
27real database). JDB is very good at doing things like:
28
29 - extracting measurements from experimental output
30 - re-examining data to address different hypotheses
31 - joining data from different experiments
32 - eliminating/detecting outliers
33 - computing statistics on data (mean, confidence intervals,
34 correlations, histograms)
35 - reformatting data for graphing programs
36
37Rather than hand-code scripts to do each special case, JDB provides
38higher-level functions. Although it's often easy throw together a
39custom script to do any single task, I believe that there are several
40advantages to using this library:
41
42 - these programs provide a higher level interface than plain Perl
43 => dbrow '_size == 1024' | dbstats bw
44 rather than:
45 while (<>) { split; $sum+=$F[2]; $ss+=$F[2]^2; $n++; }
46 $mean = $sum / $n; $std_dev = ...
47 etc.
48 in dozens of places
49
50 - the library uses names for columns
51 => no more $F[2], use _bw
52 => new or different order columns? no changes to your scripts!
53
54 - the library is self-documenting (each program records what it did)
55 => no more wondering what hacks were used to compute the
56 final data, just look at the comments at the end
57 of the output
58
59 - unusual cases, error checking, and large datasets are already handled
60 => custom scripts often skimp on error checking
61 and assume everything fits in memory
62
63(The disadvantage is that you need to learn what functions JDB provides.)
64
65JDB is built on flat-ASCII databases. By storing data in simple text
66files and processing it with pipelines it is easy to experiment (in
67the shell) and look at the output. The original implementation of
68this idea was /rdb, a commercial product described in the book ``UNIX
69relational database management: application development in the UNIX
70environment'' by Rod Manis, Evan Schaffer, and Robert Jorgensen (and
71also at the web page <http://www.rdb.com/>). JDB is an incompatible
72re-implementation of their idea without any accelerated indexing or
73forms support. (But it's free!).
74
75Installation instructions follow at the end of this document. JDB
76requires Perl 5.003 to run. There are no man pages currently, but
77each command has a complete description in its usage string. All
78commands are backed by an automated test suite.
79
80The most recent version of JDB is available on the web at
81<http://www.isi.edu/~johnh/SOFTWARE/JDB/index.html>.
82
83
84README CONTENTS
85---------------
86- what's new
87- executive summary
88- README CONTENTS
89- installation
90- basic data format
91- basic data manipulation
92- list of commands
93- another example
94- a gradebook example
95- a password example
96- history
97- related work
98- release notes
99- copyright
100- comments
101
102
103
104INSTALLATION
105------------
106
107The quick answer to installation is to type:
108 ./configure
109 make install
110
111JDB uses autoconf. You can set where the programs are installed
112with --prefix=/where/you/want/them/without/bin/at/the/end.
113Do ./configure --help for details.
114
115JDB requires perl 5.003 or later. Some of the commands work on 5.000,
116but several of the test scripts fail, so buyer beware.
117
118A test-suite is available, run it with ./db_test_suite
119or "make test".
120
121In the past there have been some test suite problems due to different
122printf implementations. I've tried to code around this problem;
123please let me know if you encounter it again.
124
125A FreeBSD port to JDB is available, see
126<http://www.freshports.org/databases/jdb/>.
127
128
129COMMON INSTALLATION PROBLEMS (FAQ)
130----------------------------------
131
132Q: After installing jdb, I get this error when I run it:
133 Can't locate ~/lib/dblib.pl in @INC (@INC
134 contains: /usr/libdata/perl/5.00503/mach /usr/libdata/perl/5.00503
135 /usr/local/lib/perl5/site_perl/5.005/i386-freebsd
136 /usr/local/lib/perl5/site_perl/5.005 . ~/lib) at
137 /home/netlab1/alefiyah/bin/dbrow line 48.
138(or something like that). What should I do?
139
140A: You're probably not running the installed version, you're running
141the unpacked version. Part of the installation process is changing
142the scripts so they know where their libraries are. After you
143configure and install jdb, run the programs from where they are
144installed.
145
146
147
148
149BASIC DATA FORMAT
150-----------------
151
152These programs are based on the idea storing data in simple ASCII
153files. A database is a file with one header line and then data or
154comment lines. For example:
155
156 #h account passwd uid gid fullname homedir shell
157 johnh * 2274 134 John_Heidemann /home/johnh /bin/bash
158 greg * 2275 134 Greg_Johnson /home/greg /bin/bash
159 root * 0 0 Root /root /bin/bash
160 # this is a simple database
161
162The header line must be first and begins with "#h".
163There are rows (records) and columns (fields),
164just like in a normal database.
165Comment lines begin with "#".
166
167By default, columns are delimited by whitespace. By default it is
168therefore not possible to have fields which contain whitespace.
169(But see below for alternatives.)
170
171The big advantage of this approach is that it's easy to massage data
172into this format, and it's reasonably easy to take data out of this
173format into other (text-based) programs, like gnuplot, jgraph, and
174LaTeX. Think Unix. Think pipes.
175
176Since no-whitespace in columns was a problem for some applications,
177there's an option which relaxes this rule. You can specify the field
178separator in the table header with -Fx where x is the new field
179separator. The special value -FS sets a separator of two spaces, thus
180allowing (single) spaces in fields. An example:
181
182 #h -FS account passwd uid gid fullname homedir shell
183 johnh * 2274 134 John Heidemann /home/johnh /bin/bash
184 greg * 2275 134 Greg Johnson /home/greg /bin/bash
185 root * 0 0 Root /root /bin/bash
186 # this is a simple database
187
188See dbrecolize for more details. Regardless of what the column
189separator is for the body of the data, it's always whitespace in the
190header.
191
192There's also a third format: a "list". Because it's often hard to see
193what's columns past the first two, in list format each "column" is on
194a separate line. The programs dblistize and dbcolize convert to and
195from this format. Currently other programs work only on column-format
196data, so list data is only for viewing. Here's a sample of
197"dblistize < DATA/passwd.jdb":
198
199 #L account passwd uid gid fullname homedir shell
200 account: johnh
201 passwd: *
202 uid: 2274
203 gid: 134
204 fullname: John_Heidemann
205 homedir: /home/johnh
206 shell: /bin/bash
207
208 account: greg
209 passwd: *
210 uid: 2275
211 gid: 134
212 fullname: Greg_Johnson
213 homedir: /home/greg
214 shell: /bin/bash
215
216 account: root
217 passwd: *
218 uid: 0
219 gid: 0
220 fullname: Root
221 homedir: /root
222 shell: /bin/bash
223
224 # this is a simple database
225 # | dblistize
226
227See dbcolize -? and dblistize -? for more details.
228
229
230BASIC DATA MANIPULATION
231-----------------------
232
233A number of programs exist to manipulate databases.
234Complex functions can be made by stringing together commands
235with shell pipelines. For example, to print the home
236directories of everyone with ``john'' in their names,
237you would do:
238
239 cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
240
241The output:
242 dash> cat DATA/passwd | dbrow '_fullname =~ /John/' | dbcol homedir
243 #h homedir
244 /home/johnh
245 /home/greg
246 # this is a simple database
247 # | dbrow _fullname =~ /John/
248 # | dbcol homedir
249
250(Notice that comments are appended to the output listing each command,
251providing an automatic audit log.)
252
253In addition to typical database functions (select, join, etc.) there
254are also a number of statistical functions.
255
256
257TALKING ABOUT COLUMNS
258---------------------
259
260An advantage of JDB is that you can talk about columns by name
261(symbolically) rather than simply by their positions. So in the above
262example, "dbcol homedir" pulled out the home directory column, and
263"dbrow '_fullname =~ /John/'" matched against column fullname.
264
265In general, you can use the name of the column listed on the #h line
266to identify it in most programs, and _name to identify it in code.
267
268Some alternatives for flexibility:
269
270- numeric values identify columns positionally, so 0 or _0 is the
271first column, 1 is the second, etc.
272
273- in code, _last_columnname gets the value from columname's last row
274
275See dbroweval -? for more details about writing code.
276
277
278
279LIST OF COMMANDS
280----------------
281
282Enough said. I'll summarize the commands, and then you can
283experiment. For a detailed description of each command, see its usage
284line by running it with the argument ``-?''. In some shells (csh)
285you'll need to quote this (run ``dbcol -\?'' rather than ``dbcol -?'').
286
287TABLE CREATION
288--------------
289dbcolcreate add columns to a database
290dbcoldefine set the column headings for a non-JDB file
291
292TABLE MANIPULATION
293------------------
294dbcol select columns from a table
295dbrow select rows from a table
296dbsort sort rows based on a set of columns
297dbjoin compute the natural join of two tables
298dbcolrename rename a column
299dbcolmerge merge two columns into one
300dbcolsplittocols split one column into two or more columns
301dbcolsplittorows split one column into multiple rows
302dbrowsplituniq split the file into multiple files per unique fields
303dbfilevalidate check that db file doesn't have some common errors
304dbfilesplit split a single input file containing multiple tables into
305 several files
306
307COMPUTATION AND STATISTICS
308--------------------------
309dbstats compute statistics over a column (mean,etc.,optionally median)
310dbmultistats compute a series of stats (mean, etc.) over a table
311dbcoldiff compare two samples distributions (mean/conf interval/T-test)
312dbcolmovingstats computing moving statistics over a column of data
313dbcolmultiscale compute simple stats (sums and rates) over mutliple timescales
314dbcolstats compute Z-scores and T-scores over one column of data
315dbcolpercentile compute the rank or percentile of a column
316dbcolhisto compute histograms over a column of data
317dbcolscorrelate compute the coefficient of correlation over several columns
318dbcolsregression compute linear regression and correlation for two columns
319dbrowaccumulate compute a running sum over a column of data
320dbrowdiff compute differences between each row of a table
321dbrowenumerate number each row
322dbroweval run arbitrary Perl code on each row
323dbrowuniq count/eliminate identical rows (like Unix uniq(1))
324db2dcliff find ``cliffs'' in two-dimensional data
325
326OUTPUT CONTROL
327--------------
328dbcolneaten pretty-print columns
329dbcoltighten un-pretty-print columns
330dblistize convert columnar format into a ``list'' format
331dbcolize undo dblistize
332dbrecolize change the field separator for a table
333dbstripcomments remove comments from a table
334dbstripextraheaders remove extra headers that occur from table concatenation
335dbstripleadingspace remove leading spaces from (potentially non-JDB) data
336dbformmail generate a script that sends form mail based on each row
337
338CONVERSIONS
339-----------
340(These programs convert data into jdb. See their web pages for details.)
341cgi_to_db http://stein.cshl.org/WWW/software/CGI/
342crl_to_db http://moat.nlanr.net/Traces/
343dmalloc_to_db http://www.letters.com/dmalloc/
344kitrace_to_db http://ficus-www.cs.ucla.edu/ficus-members/geoff/kitrace.html
345ns_to_db http://mash-www.cs.berkeley.edu/ns/
346tabdelim_to_db spreadsheet tab-delimited files to db
347tcpdump_to_db (see man tcpdump(8) on any reasonable system)
348
349(And out of jdb:)
350db_to_html_table simple conversion of JDB to html tables
351
352
353Standard options:
354
355-? usage
356-c confidence interval (dbmultistats)
357-C column separator (dbcolsplit, dbcolmerge)
358-d debug mode
359-a stats over all data (treating non-numerics as zeros)
360 (by default, non-numerics are ignored for stats purposes)
361-S assume the data is pre-sorted
362
363When giving Perl code (in dbrow and dbroweval)
364column names can be embedded if preceded by underscores.
365(Try dbrow -? and dbroweval -? for examples.)
366
367Most programs run in constant memory and use temporary files if necessary.
368Exceptions are dbcolneaten, dbcolpercentile, dbmultistats, dbrowsplituniq.
369
370
371ANOTHER EXAMPLE
372---------------
373
374Take the raw data in DATA/http_bandwidth,
375put a header on it (dbcoldefine size bw),
376took statistics of each category (dbmultistats size bw),
377pick out the relevant fields (dbcol size mean stddev pct_rsd), and you get:
378 #h size mean stddev pct_rsd
379 1024 1.4962e+06 2.8497e+05 19.047
380 10240 5.0286e+06 6.0103e+05 11.952
381 102400 4.9216e+06 3.0939e+05 6.2863
382 # | dbcoldefine size bw
383 # | /home/johnh/BIN/DB/dbmultistats size bw
384 # | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
385(The whole command was:
386 cat DATA/http_bandwidth | dbcoldefine size bw |
387 dbmultistats size bw | dbcol size mean stddev pct_rsd
388all on one line.)
389
390Then post-process them to get rid of the exponential notation
391(dbroweval '_mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev);')
392 #h size mean stddev pct_rsd
393 1024 1496200 284970 19.047
394 10240 5028600 601030 11.952
395 102400 4921600 309390 6.2863
396 # | dbcoldefine size bw
397 # | /home/johnh/BIN/DB/dbmultistats size bw
398 # | /home/johnh/BIN/DB/dbcol size mean stddev pct_rsd
399 # | /home/johnh/BIN/DB/dbroweval { _mean = sprintf("%8.0f", _mean); _stddev = sprintf("%8.0f", _stddev); }
400(The whole command is as before, with the dbroweval tacked on the end.)
401
402In a few lines, raw data is transformed to processed output.
403
404
405Suppose you expect there is an odd distribution of results of one
406datapoint. JDB can easily produce a CDF (cumulative distribution
407function) of the data, suitable for graphing:
408
409cat DB/DATA/http_bandwidth | dbcoldefine size bw | \
410 dbrow '_size == 102400' | \
411 dbcol bw | dbsort -n bw | \
412 dbrowenumerate | dbcolpercentile count | \
413 dbcol bw percentile | xgraph
414
415The steps, per line:
416 1. get the raw input data and turn it into jdb format
417 2. pick out just the relevant column (for efficiency) and sort it
418 3. for each data point, assign a CDF percentage to it
419 4. pick out the two columns to graph and show them
420
421
422A GRADEBOOK EXAMPLE
423-------------------
424
425The first commercial program I wrote was a gradebook,
426so here's how to do it with JDB.
427
428Format your data like DATA/grades.
429 #h name email id test1
430 a a@ucla.edu 1 80
431 b b@usc.edu 2 70
432 c c@isi.edu 3 65
433 d d@lmu.edu 4 90
434 e e@caltech.edu 5 70
435 f f@oxy.edu 6 90
436
437Or if your students have spaces in their names, use -FS and two spaces
438to separate each column:
439
440 #h -FS name email id test1
441 a x a@ucla.edu 1 80
442 b x b@usc.edu 2 70
443 c x c@isi.edu 3 65
444 d x d@lmu.edu 4 90
445 e x e@caltech.edu 5 70
446 f x f@oxy.edu 6 90
447
448To compute statistics on an exam, do
449 cat DATA/grades | dbstats test1 |dblistize
450
451 #L ...
452 mean: 77.5
453 stddev: 10.84
454 pct_rsd: 13.987
455 conf_range: 11.377
456 conf_low: 66.123
457 conf_high: 88.877
458 conf_pct: 0.95
459 sum: 465
460 sum_squared: 36625
461 min: 65
462 max: 90
463 n: 6
464 ...
465
466To do a histogram:
467 cat DATA/grades | dbcolhisto -n 5 -g test1
468
469 #h low histogram
470 65 *
471 70 **
472 75
473 80 *
474 85
475 90 **
476 # | /home/johnh/BIN/DB/dbhistogram -n 5 -g test1
477
478Now you want to send out grades to the students by e-mail.
479Create a form-letter (in the file test1.txt):
480 To: _email (_name)
481 From: J. Random Professor <jrp@usc.edu>
482 Subject: test1 scores
483
484 _name, your score on test1 was _test1.
485 86+ A
486 75-85 B
487 70-74 C
488 0-69 F
489
490Generate the shell script that will send the mail out:
491 cat DATA/grades | dbformmail test1.txt > test1.sh
492And run it:
493 sh <test1.sh
494
495The last two steps can be combined:
496 cat DATA/grades | dbformmail test1.txt | sh
497(but I like to keep a copy of exactly what I send).
498
499
500At the end of the semester you'll want to compute grade totals and
501assign letter grades. Both fall out of dbroweval.
502For example, to compute weighted total grades with a 40% midterm/60%
503final where the midterm is 84 possible points and the final 100:
504
505 dbcol -rv total |
506 dbcolcreate total - |
507 dbroweval '
508 _total = .40 * _midterm/84.0 + .60 * _final/100.0;
509 _total = sprintf("%4.2f", _total);
510 if (_final eq "-" || ( _name =~ /^_/)) { _total = "-"; };' |
511 dbcolneaten
512
513
514If you got the data originally from a spreadsheet, save it in
515"tab-delimited" format and convert it with tabdelim_to_db
516(run tabdelim_to_db -? for examples).
517
518
519A PASSWORD EXAMPLE
520------------------
521
522To convert the Unix password file to db:
523
524 cat /etc/passwd | sed 's/:/ /g'| \
525 dbcoldefine -F S login password uid gid gecos home shell \
526 >passwd.jdb
527
528To convert the group file
529
530 cat /etc/group | sed 's/:/ /g' | \
531 dbcoldefine -F S group password gid members \
532 >group.jdb
533
534To show the names of the groups that div7-members are in
535(assuming DIV7 is in the gecos field):
536
537 cat passwd.jdb | dbrow '_gecos =~ /DIV7/' | dbcol login gid | \
538 dbjoin - group.jdb gid | dbcol login group
539
540
541SHORT EXAMPLES
542--------------
543
544Which db programs are the most complicated (based on number of test cases)?
545
546 ls TEST/*.cmd | \
547 dbcoldefine test | \
548 dbroweval '_test =~ s@^TEST/([^_]+).*$@$1@' | \
549 dbrowuniq -c | \
550 dbsort -nr count | \
551 dbcolneaten
552
553(Answer: dbstats, then dbjoin.)
554
555
556Stats on an exam (in FILE, with COLUMN==the name of the exam)?
557 cat $FILE | dbstats -q 4 $COLUMN <$FILE | dblistize | dbstripcomments
558 cat $FILE | dbcolhisto -g -n 20 $COLUMN | dbcolneaten | dbstripcomments
559
560
561Merging a the hw1 column from file hw1.jdb into grades.jdb assuing
562there's a common student id in column "id":
563 dbcol id hw1 <hw1.jdb >t.jdb
564 dbjoin -i -e - grades.jdb t.jdb id |dbsort name|dbcolneaten >new_grades.jdb
565
566
567Merging two jdb files with the same rows:
568 cat file1.jdb file2.jdb >output.jdb
569
570or if you want to clean things up a bit
571 cat file1.jdb file2.jdb | dbstripextraheaders >output.jdb
572
573or if you want to know where the data came from
574 for i in 1 2
575 do
576 dbcolcreate source $i < file$i.jdb
577 done | dbstripextraheaders >output.jdb
578
579(assumes you're using a Bourne-shell compatible shell, not csh).
580
581
582
583
584HISTORY
585-------
586
587There have been two versions of JDB;
588the current is a complete re-write of the first.
589
590JDB (in its various forms) has been used extensively by its author
591since 1991. Since 1995 it's been used by two other researchers at
592UCLA and several at ISI. In February 1998 it was announced to the
593Internet.
594
595JDB includes code ported from Geoff Kuenning (DbTDistr.pm).
596
597JDB contributors: Ashvin Goel <goel@cse.oge.edu>, Geoff Kuenning
598<geoff@fmg.cs.ucla.edu>, Vikram Visweswariah <visweswa@isi.edu>,
599Kannan Varadahan <kannan@isi.edu>, Lars Eggert <larse@isi.edu>, Arkadi
600Gelfond <arkadig@dyna.com>, Haobo Yu <haoboy@packetdesign.com>, Pavlin
601Radoslavov <pavlin@catarina.usc.edu>, Fabio Silva <fabio@isi.edu>,
602Jerry Zhao <zhaoy@isi.edu>, Ning Xu <nxu@aludra.usc.edu>.
603
604
605
606RELATED WORK
607------------
608
609As stated in the introduction, JDB is an incompatible reimplementation
610of the ideas found in /rdb. By storing data in simple text files and
611processing it with pipelines it is easy to experiment (in the shell)
612and look at the output. The original implementation of this idea was
613/rdb, a commercial product described in the book ``UNIX relational
614database management: application development in the UNIX environment''
615by Rod Manis, Evan Schaffer, and Robert Jorgensen (and also at the web
616page <http://www.rdb.com/>).
617
618In August, 2002 I found out Carlo Strozzi extended RDB with his
619package NoSQL <http://www.linux.it/~carlos/nosql/>. According to
620Mr. Strozzi, he implemented NoSQL in awk to avoid the Perl start-up of
621RDB. Although I haven't found Perl startup overhead to be a big
622problem on my platforms (from old Sparcstation IPCs to 2GHz
623Pentium-4s), you may want to evaluate his system. (At some point I'll
624try to do a comparison of JDB and NoSQL.)
625
626
627RELEASE NOTES
628-------------
629
630Versions prior to 1.0 were released informally on my web page
631but were not announced.
632
6331.0, 22-Jul-97: adds autoconf support and a test script.
634
6351.1, 20-Jan-98: support for double space field separators, better tests
636
6371.2, 11-Feb-98: minor changes and release on comp.lang.perl.announce
638
6391.3, 17-Mar-98
640 - adds median and quartile options to dbstats
641 - adds dmalloc_to_db converter
642 - fixes some warnings
643 - dbjoin now can run on unsorted input
644 - fixes a dbjoin bug
645 - some more tests in the test suite
646
6471.4, 27-Mar-98
648 - improves error messages
649 (all should now report the program that makes the error)
650 - fixed a bug in dbstats output when the mean is zero
651
6521.5, 25-Jun-98
653 - BUG FIX: dbcolhisto, dbcolpercentile now handles non-numeric
654 values like dbstats
655 - NEW: dbcolstats computes zscores and tscores over a column
656 - NEW: dbcolscorrelate computes correlation coefficients
657 between two columns
658 - INTERNAL: ficus_getopt.pl has been replaced by DbGetopt.pm
659 - BUG FIX: all tests are now ``portable'' (previously some tests
660 ran only on my system)
661 - BUG FIX: you no longer need to have the db programs in your path
662 (fix arose from a discussion with Arkadi Gelfond)
663 - BUG FIX: installation no longer uses cp -f (to work on SunOS 4)
664
6651.6, 24-May-99
666
667 - NEW: dbsort, dbstats, dbmultistats now run in constant memory
668 (using tmp files if necessary)
669 - NEW: dbcolmovingstats does moving means over a series of data
670 - NEW: dbcol has a -v option to get all columns except those listed
671 - NEW: dbmultistats does quartitles and medians
672 - NEW: dbstripextraheaders now also cleans up bogus comments
673 before the fist header
674 - BUG FIX: dbcolneaten works better with double-space-separated data
675
6761.7, 5-Jan-00
677
678- NEW: dbcolize now detects and rejects lines that contain embedded
679 copies of the field separator
680
681- NEW: configure tries harder to prevent people from improperly
682 configuring/installing jdb
683
684- NEW: tcpdump_to_db converter (incomplete)
685
686- NEW: tabdelim_to_db converter: from spreadsheet tab-delimited files to db
687
688- NEW: mailing lists for jdb are
689 jdb-announce@heidemann.la.ca.us and
690 jdb-talk@heidemann.la.ca.us
691 To subscribe to either, send mail to
692 jdb-announce-request@heidemann.la.ca.us
693 or jdb-talk-request@heidemann.la.ca.us.
694 with "subscribe" in the BODY of the message.
695
696- BUG FIX: dbjoin used to produce incorrect output if there
697 were extra, unmatched values in the 2nd table.
698 Thanks to Graham Phillips for providing a test case.
699
700- BUG FIX: the sample commands in the usage strings
701 now all should explicitly include the source of data
702 (typically from "cat foo.jdb |"). Thanks to Ya Xu
703 for pointing out this documentation deficiency.
704
705- BUG FIX (DOCUMENTATION): dbcolmovingstats had incorrect sample output.
706
7071.8, 28-Jun-00
708
709- BUG FIX: header options are now preserved when writing with dblistize
710
711- NEW: dbrowuniq now optionally checks for uniqueness only on certain fields
712
713- NEW: dbrowsplituniq makes one pass through a file and splits it into
714 separate files based on the given fields
715
716- NEW: converter for "crl" format network traces
717
718- NEW: anywhere you use arbitrary code (like dbroweval),
719 _last_foo now maps to the last row's value for field _foo.
720
721- OPTIMIZATION: comment processing slightly changed so that
722 dbmultistats now is much faster on files with lots of comments
723 (for example, ~100k lines of comments and 700 lines of data!)
724 (Thanks to Graham Phillips for pointing out this performance
725 problem.)
726
727- BUG FIX: dbstats with median/quartiles now correctly handles singleton
728 data points
729
7301.9, 6-Nov-00
731
732- NEW: dbfilesplit, split a single input file into multiple output files
733 (based on code contributed by Pavlin Radoslavov).
734
735- BUG FIX: dbsort now works with perl-5.6
736
7371.10, 10-Apr-01
738
739- BUG FIX: dbstats now handles the case where there are more n-tiles
740 than data
741- NEW: dbstats now includes a -S option to optimize work on
742 pre-sorted data (inspired by code contributed by Haobo Yu)
743- BUG FIX: dbsort now has a better estimate of memory usage when
744 run on data with very short records (problem detected by Haobo Yu)
745- BUG FIX: cleanup of temporary files is slightly better
746
7471.11, 2-Nov-01
748
749- BUG FIX: dbcolneaten now runs in constant memory
750- NEW: dbcolneaten now supports "field specifiers" that
751 allow some control over how wide columns should be
752- OPTIMIZATION: dbsort now tries hard to be filesystem cache-friendly
753 (inspired by "Information and Control in Gray-box Systems" by
754 the Arpaci-Dusseau's at SOSP 2001)
755- INTERNAL: t_distr now ported to perl5 module DbTDistr
756
7571.12, 30-Oct-02
758
759- BUG FIX: dbmultistats documentation typo fixed
760- NEW: dbcolmultiscale
761- NEW: dbcol has -r option for "relaxed error checking"
762- NEW: dbcolneaten has new -e option to strip end-of-line spaces
763- NEW: dbrow finally has a -v option to negate the test
764- BUG FIX: math bug in dbcoldiff fixed by Ashvin Goel
765 *** need to check Scheaffer test cases
766- BUG FIX: some patches to run with Perl 5.8
767 Note: some programs (dbcolmultiscale, dbmultistats, dbrowsplituniq)
768 generate warnings like:
769 Use of uninitialized value in concatenation (.)
770 or string at /usr/lib/perl5/5.8.0/FileCache.pm line 98,
771 <STDIN> line 2.
772 Please ignore this until I figure out how to suppress it.
773 (Thanks to Jerry Zhao for noticing perl-5.8 problems.)
774- BUG FIX: fixed an autoconf problem where configure would fail
775 to find a reasonable prefix (thanks to Fabio Silva
776 for reporting the problem)
777- NEW: db_to_html_table: simple conversion to html tables
778 (NO fancy stuff)
779- NEW: dblib now has a function dblib_text2html() that will
780 do simple conversion of iso-8859-1 to HTML
781
782
7831.13, 4-Feb-04
784
785- NEW: jdb added to the freebsd ports tree
786 <http://www.freshports.org/databases/jdb/>
787 maintainer: larse@isi.edu
788- BUG FIX: properly handle trailing spaces when data must be numeric
789 (ex. dbstats with -FS, see test dbstats_trailing_spaces)
790 Fix from Ning Xu <nxu@aludra.usc.edu>.
791- NEW: dbcolize error message improved (bug report from Terrence
792 Brannon), and list format documented in the README.
793- NEW: cgi_to_db converts CGI.pm-format storage to jdb list format
794- BUG FIX: handle numeric synonyms for column names in dbcol properly
795- ENHANCEMENT: "talking about columns" section added to README.
796 Lack of documentation pointed out by Lars Eggert.
797- CHANGE: dbformmail now defaults to using Mail ("Berkeley Mail")
798 to send mail, rather than sendmail (sendmail is still an option,
799 but mail doesn't require running as root)
800- NEW: on platforms that support it (i.e., with perl 5.8), jdb works
801 fine with unicode
802- NEW: dbfilevalidate: check a db file for some common errors
803
804
805
806MISSING FEATURES
807----------------
808
809Some features that have been requested but not yet provided:
810
811- handling null values From mike_schulz@csgsystems.com, 29-Mar-01.
812
813
814ISPELL WORDS
815------------
816
817 LocalWords: Exp rdb Manis Evan Schaffer passwd uid gid fullname homedir greg
818 LocalWords: gnuplot jgraph dbrow dbcol dbcolcreate dbcoldefine JDB README un
819 LocalWords: dbcolrename dbcolmerge dbcolsplit dbjoin dbsort dbcoldiff Perl bw
820 LocalWords: dbmultistats dbrowdiff dbrowenumerate dbroweval dbstats dblistize
821 LocalWords: dbcolneaten dbcoltighten dbstripcomments dbstripextraheaders pct
822 LocalWords: dbstripleadingspace stddev rsd dbsetheader sprintf LIBDIR BINDIR
823 LocalWords: LocalWords isi URL com dbpercentile dbhistogram GRADEBOOK min ss
824 LocalWords: gradebook conf std dev dbrowaccumulate dbcolpercentile db dcliff
825 LocalWords: dbuniq uniq dbcolize distr pl Apr autoconf Jul html printf Fx jdb
826 LocalWords: printfs dbrowuniq dbrecolize dbformmail kitrace geoff ns berkeley
827 LocalWords: comp lang perl Haobo Yu outliers Jorgensen csh dbrowsplituniq crl
828
829
830COPYRIGHT
831---------
832
833JDB is Copyright (C) 1991-2002 by John Heidemann <johnh@isi.edu>.
834
835This program is free software; you can redistribute it and/or modify
836it under the terms of version 2 of the GNU General Public License as
837published by the Free Software Foundation.
838
839This program is distributed in the hope that it will be useful, but
840WITHOUT ANY WARRANTY; without even the implied warranty of
841MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
842General Public License for more details.
843
844You should have received a copy of the GNU General Public License
845along with this program; if not, write to the Free Software
846Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
847
848A copy of the GNU General Public License can be found in the file
849``COPYING''.
850
851
852COMMENTS
853--------
854
855Any comments about these programs should be sent to John Heidemann
856<johnh@isi.edu>.
857
858At ISI, these programs can be run directly out of /home/johnh/BIN/DB.
859
860 -John Heidemann
861
862
863ISPELL WORDS
864------------
865
866 LocalWords: dbcolmovingstats dbcolstats zscores tscores dbcolhisto columnar
867 LocalWords: dmalloc tabdelim stats numerics datapoint CDF xgraph max txt sed
868 LocalWords: login gecos div cmd nr hw hw assuing Kuenning Vikram Visweswariah
869 LocalWords: Kannan Varadahan Arkadi Gelfond Pavlin Radoslavov quartile getopt
870 LocalWords: dbcolscorrelate DbGetopt cp tmp quartitles nd Ya Xu dbfilesplit
871 LocalWords: MERCHANTABILITY tba dbcolsplittocols dbcolsplittorows cvs johnh
872 LocalWords: dbcolsregression datasets whitespace LaTeX FS columnname cgi pre
873 LocalWords: columname's dbfilevalidate tcpdump http rv eq Bourne DbTDistr LocalWords: Ashvin
874 LocalWords: Goel Eggert Ning Strozzi NoSQL awk startup Sparcstation IPCs GHz
875 LocalWords: SunOS Arpaci Dusseau's SOSP Scheaffer STDIN dblib iso freebsd
876 LocalWords: sendmail unicode
877