README
1This is version 2.04 of agrep - a new tool for fast
2text searching allowing errors.
3agrep is similar to egrep (or grep or fgrep), but it is much more general
4(and usually faster).
5The main changes from version 1.1 are 1) incorporating Boyer-Moore
6type filtering to speed up search considerably, 2) allowing multi patterns
7via the -f option; this is similar to fgrep, but from our experience
8agrep is much faster, 3) searching for "best match" without having to
9specify the number of errors allowed, and 4) ascii is no longer required.
10Several more options were added.
11
12To compile, simply run make in the agrep directory after untar'ing
13the tar file (tar -xf agrep-2.04.tar will do it).
14
15The three most significant features of agrep that are not supported by
16the grep family are
171) the ability to search for approximate patterns;
18 for example, "agrep -2 homogenos foo" will find homogeneous as well
19 as any other word that can be obtained from homogenos with at most
20 2 substitutions, insertions, or deletions.
21 "agrep -B homogenos foo" will generate a message of the form
22 best match has 2 errors, there are 5 matches, output them? (y/n)
232) agrep is record oriented rather than just line oriented; a record
24 is by default a line, but it can be user defined;
25 for example, "agrep -d '^From ' 'pizza' mbox"
26 outputs all mail messages that contain the keyword "pizza".
27 Another example: "agrep -d '$$' pattern foo" will output all
28 paragraphs (separated by an empty line) that contain pattern.
293) multiple patterns with AND (or OR) logic queries.
30 For example, "agrep -d '^From ' 'burger,pizza' mbox"
31 outputs all mail messages containing at least one of the
32 two keywords (, stands for OR).
33 "agrep -d '^From ' 'good;pizza' mbox" outputs all mail messages
34 containing both keywords.
35
36Putting these options together one can ask queries like
37
38agrep -d '$$' -2 '<CACM>;TheAuthor;Curriculum;<198[5-9]>' bib
39
40which outputs all paragraphs referencing articles in CACM between
411985 and 1989 by TheAuthor dealing with curriculum.
42Two errors are allowed, but they cannot be in either CACM or the year
43(the <> brackets forbid errors in the pattern between them).
44
45Other features include searching for regular expressions (with or
46without errors), unlimited wild cards, limiting the errors to only
47insertions or only substitutions or any combination,
48allowing each deletion, for example, to be counted as, say,
492 substitutions or 3 insertions, restricting parts of the query
50to be exact and parts to be approximate, and many more.
51
52agrep is available by anonymous ftp from cs.arizona.edu (IP 192.12.69.5)
53as agrep/agrep-2.04.tar.Z (or in uncompressed form as agrep/agrep-2.04.tar).
54The tar file contains the source code (in C), man pages (agrep.1),
55and two additional files, agrep.algorithms and agrep.chronicle,
56giving more information.
57The agrep directory also includes two postscript files:
58agrep.ps.1 is a technical report from June 1991
59describing the design and implementation of agrep;
60agrep.ps.2 is a copy of the paper as appeared in the 1992
61Winter USENIX conference.
62
63Please mail bug reports (or any other comments)
64to sw@cs.arizona.edu or to udi@cs.arizona.edu.
65
66We would appreciate if users notify us (at the address above)
67of any extensions, improvements, or interesting uses of this software.
68
69January 17, 1992
70
71
72BUGS_fixed/option_update
73
741. remove multiple definitions of some global variables.
752. fix a bug in -G option.
763. fix a bug in -w option.
77January 23, 1992
78
794. fix a bug in pipeline input.
805. make the definition of word-delimiter consistant.
81March 16, 1992
82
836. add option '-y' which, if specified with -B option, will always
84output the best-matches without a prompt.
85April 10, 1992
86
877. fix a bug regarding exit status.
88April 15, 1992
89