librep/librep_0.92.7/src

This is a version of Henry Spencer's famous regexp implementation. I've
modified it to meet my needs, this is what I've done:

    2) added a new function regsublen(), this performs a dry run of the
       regsub() function returning the length of the string needed to hold
       the output from regsub().
    3) changed regexec(prog,str) to regexec2(prog,str,eflags) with macro for
       regexec(). This is so I can have the flag REG_NOTBOL which signifies
       that the string passed to regexec[2]() is not actually the start of a
       line.
    4) support for case-insignificant matching (with the flag REG_NOCASE)
    5) split the definition of a compiled regexp from regexp.c into
       a new file regprog.h
    6) created a new file regjade.c which uses the regexec() structure to
       match regexp against editor buffers in place.
    7) Altered the regexp structure to allow storing of subexpressions as
       positions in a Jade buffer. Also altered calling conventions of
       regsub() and regsublen() to support this.
    8) support \w, \W, \s, \S, \d, \D, \b, \B, *?, +?, ?? syntax (as in Perl)

And probably some other things as well. Obviously all errors are my
responsibility. The original README follows,

John

--

This is a nearly-public-domain reimplementation of the V8 regexp(3) package.
It gives C programs the ability to use egrep-style regular expressions, and
does it in a much cleaner fashion than the analogous routines in SysV.

	Copyright (c) 1986 by University of Toronto.
	Written by Henry Spencer.  Not derived from licensed software.

	Permission is granted to anyone to use this software for any
	purpose on any computer system, and to redistribute it freely,
	subject to the following restrictions:

	1. The author is not responsible for the consequences of use of
		this software, no matter how awful, even if they arise
		from defects in it.

	2. The origin of this software must not be misrepresented, either
		by explicit claim or by omission.

	3. Altered versions must be plainly marked as such, and must not
		be misrepresented as being the original software.

Barring a couple of small items in the BUGS list, this implementation is
believed 100% compatible with V8.  It should even be binary-compatible,
sort of, since the only fields in a "struct regexp" that other people have
any business touching are declared in exactly the same way at the same
location in the struct (the beginning).

This implementation is *NOT* AT&T/Bell code, and is not derived from licensed
software.  Even though U of T is a V8 licensee.  This software is based on
a V8 manual page sent to me by Dennis Ritchie (the manual page enclosed
here is a complete rewrite and hence is not covered by AT&T copyright).
The software was nearly complete at the time of arrival of our V8 tape.
I haven't even looked at V8 yet, although a friend elsewhere at U of T has
been kind enough to run a few test programs using the V8 regexp(3) to resolve
a few fine points.  I admit to some familiarity with regular-expression
implementations of the past, but the only one that this code traces any
ancestry to is the one published in Kernighan & Plauger (from which this
one draws ideas but not code).

Simplistically:  put this stuff into a source directory, copy regexp.h into
/usr/include, inspect Makefile for compilation options that need changing
to suit your local environment, and then do "make r".  This compiles the
regexp(3) functions, compiles a test program, and runs a large set of
regression tests.  If there are no complaints, then put regexp.o, regsub.o,
and regerror.o into your C library, and regexp.3 into your manual-pages
directory.

Note that if you don't put regexp.h into /usr/include *before* compiling,
you'll have to add "-I." to CFLAGS before compiling.

The files are:

Makefile	instructions to make everything
regexp.3	manual page
regexp.h	header file, for /usr/include
regexp.c	source for regcomp() and regexec()
regsub.c	source for regsub()
regerror.c	source for default regerror()
regmagic.h	internal header file
try.c		source for test program
timer.c		source for timing program
tests		test list for try and timer

This implementation uses nondeterministic automata rather than the
deterministic ones found in some other implementations, which makes it
simpler, smaller, and faster at compiling regular expressions, but slower
at executing them.  In theory, anyway.  This implementation does employ
some special-case optimizations to make the simpler cases (which do make
up the bulk of regular expressions actually used) run quickly.  In general,
if you want blazing speed you're in the wrong place.  Replacing the insides
of egrep with this stuff is probably a mistake; if you want your own egrep
you're going to have to do a lot more work.  But if you want to use regular
expressions a little bit in something else, you're in luck.  Note that many
existing text editors use nondeterministic regular-expression implementations,
so you're in good company.

This stuff should be pretty portable, given appropriate option settings.
If your chars have less than 8 bits, you're going to have to change the
internal representation of the automaton, although knowledge of the details
of this is fairly localized.  There are no "reserved" char values except for
NUL, and no special significance is attached to the top bit of chars.
The string(3) functions are used a fair bit, on the grounds that they are
probably faster than coding the operations in line.  Some attempts at code
tuning have been made, but this is invariably a bit machine-specific.
Name		Date	Size	#Lines	LOC
..		03-May-2022	-
Makefile.in	H A D	03-May-2022	5.7 KiB	169	115
README.regexp	H A D	25-Aug-2017	5.5 KiB	113	93
README.sdbm	H A D	25-Aug-2017	11.2 KiB	397	232
bytecodes.h	H A D	25-Aug-2017	9.5 KiB	290	196
continuations.c	H A D	25-Aug-2017	47.7 KiB	1,911	1,368
datums.c	H A D	25-Aug-2017	5 KiB	187	97
debug-buffer.c	H A D	25-Aug-2017	4.8 KiB	228	180
fake-libexec	H A D	25-Aug-2017	601	25	17
ffi.c	H A D	25-Aug-2017	25.2 KiB	1,041	759
files.c	H A D	25-Aug-2017	49.4 KiB	1,820	1,246
find.c	H A D	25-Aug-2017	14.1 KiB	562	410
fluids.c	H A D	25-Aug-2017	4.2 KiB	167	87
getpagesize.h	H A D	25-Aug-2017	1,006	42	35
gettext.c	H A D	25-Aug-2017	3.5 KiB	131	83
gh.c	H A D	25-Aug-2017	13.8 KiB	821	612
librep.sym	H A D	25-Aug-2017	11 KiB	759	758
lisp.c	H A D	25-Aug-2017	66.5 KiB	2,903	2,350
lispcmds.c	H A D	25-Aug-2017	51.7 KiB	2,180	1,455
lispmach.c	H A D	25-Aug-2017	5.6 KiB	228	151
lispmach.h	H A D	25-Aug-2017	50.9 KiB	2,296	1,842
macros.c	H A D	25-Aug-2017	5.5 KiB	214	131
main.c	H A D	25-Aug-2017	14.1 KiB	570	416
md5.c	H A D	25-Aug-2017	12.5 KiB	420	263
md5.h	H A D	25-Aug-2017	4.9 KiB	147	58
message.c	H A D	25-Aug-2017	2 KiB	82	51
misc.c	H A D	25-Aug-2017	17.9 KiB	676	391
numbers.c	H A D	25-Aug-2017	65.8 KiB	3,125	2,467
origin.c	H A D	25-Aug-2017	4.3 KiB	194	137
readline.c	H A D	25-Aug-2017	6 KiB	260	201
realpath.c	H A D	25-Aug-2017	5.3 KiB	219	150
record-profile.c	H A D	25-Aug-2017	4.7 KiB	215	146
regexp.3	H A D	25-Aug-2017	6.5 KiB	180	175
regexp.c	H A D	25-Aug-2017	31.4 KiB	1,421	1,085
regsub.c	H A D	25-Aug-2017	3.9 KiB	165	115
rep-md5.c	H A D	25-Aug-2017	2.7 KiB	103	45
rep-remote.c	H A D	25-Aug-2017	10.8 KiB	579	480
rep-xgettext.jl	H A D	03-May-2022	1.8 KiB	64	45
rep.c	H A D	25-Aug-2017	640	39	27
rep.h	H A D	25-Aug-2017	1.2 KiB	49	20
rep_config.h.in	H A D	25-Aug-2017	1.1 KiB	39	27
rep_gh.h	H A D	25-Aug-2017	9 KiB	264	157
rep_lisp.h	H A D	25-Aug-2017	29.5 KiB	936	480
rep_regexp.h	H A D	25-Aug-2017	5.8 KiB	165	72
rep_subrs.h	H A D	25-Aug-2017	25.2 KiB	673	596
repdoc.c	H A D	25-Aug-2017	2.3 KiB	112	83
repgdbm.c	H A D	25-Aug-2017	6.2 KiB	275	218
repint.h	H A D	25-Aug-2017	6.7 KiB	267	173
repint_subrs.h	H A D	25-Aug-2017	8.4 KiB	248	173
repsdbm.c	H A D	25-Aug-2017	6.5 KiB	285	219
safemach.c	H A D	25-Aug-2017	3.2 KiB	123	75
sdbm.3	H A D	25-Aug-2017	8.7 KiB	291	289
sdbm.c	H A D	25-Aug-2017	10.6 KiB	512	329
sdbm.h	H A D	25-Aug-2017	2 KiB	74	44
sdbm_hash.c	H A D	25-Aug-2017	923	48	26
sdbm_pair.c	H A D	25-Aug-2017	5.5 KiB	305	200
sdbm_pair.h	H A D	25-Aug-2017	362	11	10
sdbm_tune.h	H A D	25-Aug-2017	665	35	17
sockets.c	H A D	25-Aug-2017	18.6 KiB	792	544
streams.c	H A D	25-Aug-2017	30.1 KiB	1,311	988
structures.c	H A D	25-Aug-2017	42.3 KiB	1,743	1,246
symbols.c	H A D	25-Aug-2017	37.7 KiB	1,535	1,073
tables.c	H A D	25-Aug-2017	13.8 KiB	569	402
timers.c	H A D	25-Aug-2017	9.1 KiB	393	301
tuples.c	H A D	25-Aug-2017	2.9 KiB	123	91
unix_defs.h	H A D	25-Aug-2017	968	29	5
unix_dl.c	H A D	25-Aug-2017	11.2 KiB	551	432
unix_files.c	H A D	25-Aug-2017	11.9 KiB	636	527
unix_main.c	H A D	25-Aug-2017	20.8 KiB	967	757
unix_processes.c	H A D	25-Aug-2017	51.4 KiB	2,155	1,653
utf8.c	H A D	25-Aug-2017	6.3 KiB	240	131
values.c	H A D	25-Aug-2017	27.4 KiB	1,187	945
weak-refs.c	H A D	25-Aug-2017	2.6 KiB	116	75