• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

regexp/H12-Dec-2000-2,4251,815

save/H12-Dec-2000-3,5312,835

test/H07-May-2022-10,90710,737

COPYINGH A D16-Sep-199917.6 KiB341281

ChangeLogH A D12-Dec-20003.8 KiB12184

INSTALLH A D12-Dec-200013.7 KiB411328

Makefile.inH A D03-May-202221.1 KiB721457

READMEH A D10-Dec-20001,010 2923

README.AWKH A D11-Nov-19968.9 KiB242201

README.QUADH A D31-Mar-20006.2 KiB172140

acconfig.hH A D25-Feb-2000483 172

awklib.cH A D10-Dec-200028.9 KiB1,074583

awklib.hH A D10-Dec-20002.8 KiB9674

config.hinH A D25-Feb-2000766 3120

configureH A D12-Dec-200075.2 KiB2,7752,365

configure.inH A D12-Dec-20005.2 KiB189164

confix.hH A D21-Feb-2000418 1812

mkdir.shH A D03-Jan-2000825 4730

ndiff.awkH A D29-Mar-200013.7 KiB477254

ndiff.cH A D12-Dec-200051.8 KiB1,7161,100

ndiff.hH A D07-Mar-2000676 4231

ndiff.manH A D10-Dec-200025.2 KiB931854

ndiff.sinH A D22-Feb-200010.7 KiB323206

ndiff.sokH A D10-Dec-2000587 105104

store.cH A D10-Dec-200045 85

README

1%% /u/sy/beebe/src/ndiff/ndiff-2.00/README, Sun Dec 10 08:47:09 2000
2%% Edit by Nelson H. F. Beebe <beebe@math.utah.edu>
3
4%% Author:
5%% 	Nelson H. F. Beebe
6%% 	Center for Scientific Computing
7%% 	University of Utah
8%% 	Department of Mathematics, 322 INSCC
9%% 	155 S 1400 E RM 233
10%% 	Salt Lake City, UT 84112-0090
11%% 	USA
12%% 	Email: beebe@math.utah.edu, beebe@acm.org,
13%%	       beebe@computer.org, beebe@ieee.org (Internet)
14%% 	WWW URL: http://www.math.utah.edu/~beebe
15%% 	Telephone: +1 801 581 5254
16%% 	FAX: +1 801 585 1640, +1 801 581 4148
17
18This directory contains a utility, ndiff, for comparing putatively
19similar files, ignoring small numeric differences.
20
21See "man ndiff" for program documentation.
22
23That documentation is also provided with the distribution in HTML,
24Adobe Acrobat Portable Document Format (PDF), PostScript, and ASCII
25text formats, all automatically derived from the original source file
26in nroff/troff markup, ndiff.man,
27
28See the companion INSTALL file for installation instructions.
29

README.AWK

1%% /u/sy/beebe/tex/bibsort/README.AWK, Sat Nov  9 15:35:00 1996
2%% Edit by Nelson H. F. Beebe <beebe@plot79.math.utah.edu>
3
4These notes provide some information about awk, in case you are
5unfamiliar with it, or want to learn more about it.
6
7I use the awk programming language for implementing many of my
8software tools (I have written more than 114,000 lines of awk code as
9of [09-Nov-1996]), and I use it in teaching as an example of a little
10language that every computer user who does text processing can benefit
11from learning.
12
13While awk is an interpreted language which suffers a runtime
14performance penalty compared to natively compiled languages such as
15Ada, C, C++, Fortran, and Pascal, for many text processing problems it
16is almost perfect.  A C implementation of my bibcheck utility ran 3.5
17times faster than the awk version, but took 22.4 times as many lines
18of code!  And, of course, the awk version was much easier to write,
19and required very little debugging.
20
21awk is a POSIX standard, though I don't yet have on hand the POSIX awk
22language description.  This means that you can expect your computer
23vendor to provide it, and that it should be widely available for a
24long time.
25
26awk is a clean simple language, with few blemishes.  This is in stark
27contrast to perl, which I find so ugly that I refuse to learn it, even
28though I deeply appreciate what it is trying to do.
29
30The official description of awk is found in the book
31
32@String{pub-AW                  = "Ad{\-d}i{\-s}on-Wes{\-l}ey"}
33@String{pub-AW:adr              = "Reading, MA, USA"}
34
35@Book{Aho:1987:APL,
36  author =       "Alfred V. Aho and Brian W. Kernighan and Peter J.
37                 Weinberger",
38  title =        "The {AWK} Programming Language",
39  publisher =    pub-AW,
40  address =      pub-AW:adr,
41  pages =        "x + 210",
42  year =         "1988",
43  ISBN =         "0-201-07981-X",
44  LCCN =         "QA76.73.A95 A35 1988",
45  bibdate =      "Tue Dec 14 22:33:46 1993",
46}
47
48Another book which you may find useful (though I much prefer the above
49one) is
50
51@String{pub-ORA                 = "O'Reilly \& {Associates, Inc.}"}
52@String{pub-ORA:adr             = "981 Chestnut Street, Newton, MA 02164, USA"}
53
54@Book{Dougherty:SA91,
55  author =       "Dale Dougherty",
56  title =        "sed {\&} awk",
57  publisher =    pub-ORA,
58  address =      pub-ORA:adr,
59  pages =        "xxii + 394",
60  year =         "1991",
61  ISBN =         "0-937175-59-5",
62  LCCN =         "QA76.76.U84 D69 1991",
63}
64
65There is also a recent one (based on the GNU awk implementation) that
66I have not yet seen:
67
68@String{pub-SSC                 = "Specialized Systems Consultants"}
69@String{pub-SSC:adr             = "P.O. Box 55549, Seattle, WA 98155"}
70
71@Book{Robbins:1996:EAP,
72  author =       "Arnold Robbins",
73  title =        "Effective {AWK} Programming",
74  publisher =    pub-SSC,
75  address =      pub-SSC:adr,
76  year =         "1996",
77  URL =          "http://www.ssc.com/ssc/eap/",
78  ISBN =         "0-916151-88-3",
79  LCCN =         "",
80  acknowledgement = ack-nhfb,
81  pages =        "321",
82  price =        "US\$27.00",
83  bibdate =      "Fri Jun 14 17:24:04 1996",
84  libnote =      "Not yet in my library.",
85}
86
87Some other publications on, and suppliers of, awk are:
88
89@String{j-SUNEXPERT             = "SunExpert"}
90@Article{Collinson:awk,
91  author =       "Peter Collinson",
92  title =        "Awk",
93  journal =      j-SUNEXPERT,
94  volume =       "2",
95  number =       "1",
96  pages =        "33--36",
97  month =        jan,
98  year =         "1991",
99}
100
101@String{pub-FSF                 = "{Free Software Foundation}"}
102@String{pub-FSF:adr             = "675 Mass Ave, Cambridge, MA 02139,
103                                  USA, Tel: (617) 876-3296"}
104
105@Misc{FSF:gawk,
106  key =          "GAWK",
107  title =        "The {GAWK} Manual",
108  howpublished = pub-FSF # " " # pub-FSF:adr,
109  year =         "1987",
110  note =         "Also available via ANONYMOUS FTP to
111                 \path|prep.ai.mit.edu|. See also \cite{Aho:APL87}.",
112}
113
114@Misc{MKS:awk,
115  author =       "{Mortice Kern Systems, Inc.}",
116  title =        "{MKSAWK}",
117  year =         "1987",
118  note =         "35 King Street North, Waterloo, Ontario, Canada, Tel:
119                 (519) 884-2251. See also \cite{Aho:APL87}.",
120}
121
122@Misc{ONW:awk,
123  author =       "{OpenNetwork}",
124  title =        "{The Berkeley Utilities}",
125  year =         "1991",
126  note =         "215 Berkeley Place, Brooklyn, NY 11217, USA, Tel:
127                 (718) 398-3838.",
128  altnote =      "See ad on p. 108 of April 1991 UNIX Review.",
129}
130
131@Misc{Polytron:polyawk,
132  author =       "Polytron Corporation",
133  title =        "{Poly{\-}AWK}",
134  year =         "1987",
135  note =         "170 NW 167th Place, Beaverton, OR 97006. See also
136                 \cite{Aho:APL87}.",
137}
138
139@String{j-SPE                   = "Soft{\-}ware\emdash Prac{\-}tice
140                                  and Experience"}
141
142@Article{VanWyk:awk,
143  author =       "Christopher J. Van Wyk",
144  title =        "{AWK} as Glue for Programs",
145  journal =      j-SPE,
146  volume =       "16",
147  number =       "4",
148  pages =        "369--388",
149  month =        apr,
150  year =         "1986",
151}
152
153These entries are all taken from
154
155	ftp://ftp.math.utah.edu/pub/tex/bib/index.html#master
156
157which records books in my library, and other selected references; by
158the time you read this, there may be more awk-related entries in that
159bibliography.
160
161At the time that the Aho, Kernighan, and Weinberger book appeared, awk
162was only available in UNIX systems, and it took a few years for UNIX
163vendors to incorporate the new, and much enhanced, version of the
164language described in the book.  Most UNIX vendors retain the name
165`awk' for the old original function-less language from 1978, and call
166the 1987 one `nawk' (for `new awk').  An important exception is IBM,
167which supplies the new implementation on RS/6000 AIX systems, but
168calls it just awk.
169
170Unfortunately, several vendors have not kept up with Brian Kernighan's
171further development of awk, with the result that some nawk
172implementations lack features that were added after the 1987 book was
173published, notably the ENVIRON[] array for access to environment
174variables.  Also, some of the vendor implementations have not
175incorporated bug fixes which Kernighan introduced.
176
177Fortunately, this situation has improved through three important
178developments:
179
180	(1) Arnold Robbin's gawk, the GNU Project implementation of
181	awk, available at
182		ftp://prep.ai.mit.edu/pub/gnu/gawk-x.yy.tar.gz
183	The gawk distribution includes ports for the Amiga, the IBM
184	PC, the Atari, and for DEC OpenVMS.
185
186	(2) Brian Kernighan's awk has been released by AT&T Bell Labs,
187	and is available at
188		http://cm.bell-labs.com/who/bwk/awk.sh
189
190	(3) Mike Brennan's mawk, available at
191		ftp://ftp.whidbey.net/pub/brennan/mawkx.y.z.tar.gz
192
193Besides these freely-distributable (for non-commercial purposes, as
194detailed in licenses included with their distributions), there are
195commercially-supported versions of awk for the IBM PC world and other
196machines, recorded in the BibTeX entries above.
197
198Robbins, Kernighan, and Brennan are in contact with one another, so
199their implementations support the same features, although gawk has
200added a number of (well-documented) extensions that the others have
201not yet incorporated.  With a few exceptions, I've tried hard in my
202awk programs to stick to the standard language as documented in the
2031987 book.
204
205	(1) gawk and recent AT&T awk have the IGNORECASE extension,
206	which I only rarely use.  That feature is difficult to
207	simulate in a portable awk program.
208
209	(2) gawk and mawk have toupper() and tolower() for efficient
210	lettercase conversion; it is possible to implement these in
211	awk itself, but only very inefficiently
212
213	(3) gawk, mawk, and recent AT&T awk support the ENVIRON[]
214	array for efficient access to environment variables.
215
216	(4) gawk, mawk, and recent AT&T awk support the names
217	/dev/stderr and /dev/stdout for the standard UNIX devices
218	(which sadly, UNIX never got around to giving names), and gawk
219	and AT&T awk support /dev/stdin.  The alternative to
220	/dev/stderr is /dev/tty (except it fails if the process is
221	running without a controlling terminal, which happens for
222	batch jobs, and for background processes), or a horrid
223	contortion to invoke the shell and cat.  Since any realistic
224	program will require the ability to write error messages, use
225	of /dev/stderr is the one feature that is likely to cause
226	portability problems.
227
228	(5) only gawk is 8-bit clean, and capable of processing all
229	256 8-bit byte values, including NUL, and accepting 8-bit
230	characters in regexp patterns.  Recent AT&T awk loses NUL
231	during processing (because it uses C-style strings internally,
232	which reserve NUL for a string terminator), and it rejects
233	characters 128..255 in regexp patterns.  mawk gets thoroughly
234	confused by NUL in its input stream, and terminates; it
235	handles the other 255 byte values (1..255) correctly, at least
236	for I/O.
237
238For further discussion of awk implementation differences and language
239evolution, see the
240	(gawk.info)Language History
241node in the GNU Emacs info system.
242

README.QUAD

1     Comments on quadruple-precision machine epsilon computation
2
3				  by
4
5			  Nelson H. F. Beebe
6		   Center for Scientific Computing
7			  University of Utah
8		 Department of Mathematics, 322 INSCC
9			 155 S 1400 E RM 233
10		    Salt Lake City, UT 84112-0090
11				 USA
12 Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet)
13	       WWW URL: http://www.math.utah.edu/~beebe
14		      Telephone: +1 801 581 5254
15		FAX: +1 801 585 1640, +1 801 581 4148
16
17		Last revised: Mon Feb 21 16:31:12 2000
18
19
20Several vendors have implemented a 128-bit quadruple-precision
21floating-point format, among them, Compaq/DEC, HP, SGI, and Sun, all
22implemented as a 1-bit sign, 15-bit exponent, and 113-bit fraction
23(including one hidden bit).  This makes the machine epsilon 2^{-112} =
241.93e-34.  [If the significand is 1.fffff...fff, with p-1 bits, f,
25after the binary point, then the machine epsilon is 0.000...001, which
26is 2^(p-1).]
27
28Intel and Motorola offer IEEE 754 80-bit temporary real (1-bit sign,
2915-bit exponent, 64-bit significand with no hidden bit), which C
30compilers offer as long double, padding to 96 bits (12 bytes) or 128
31bits (16 bytes) for storage access efficiency.  The gcc compilers in
32GNU/Linux on Intel x86 use 12 bytes for long double, and the machine
33epsilon is 2^{-63} = 1.08e-19.
34
35IBM RS/6000 systems also have a 128-bit quadruple-precision format, but
36unlike the others, this is a pair of normal IEEE 754 64-bit values.
37The IBM compilers require an extra option, -qlongdouble, to enable
38code generation for it; without that option, long double is treated
39like double.
40
41This `double-double' representation, while convenient for the
42implementation of quadruple-precision operations using only the
43double-precision hardware instructions, and particularly, the fused
44multiply-add instruction, has a serious side effect: given
45double-precision values a and b, the quadruple-precision value a + b
46represents a significand of the form
47
48	pppppp...ppp?????....?????qqqqq....qqqq
49
50in which there can be unknown bits in the middle.  For example, one
51can have a = 1 and b = 1e-323: this does not imply that their sum has
52323 accurate decimal digits, even though 1 + 1.e-323 differs from 1.
53
54On this system, code of the form
55
56	do
57	{
58		tolerance = ...
59	}
60	until ((x + tolerance) != x)
61
62which works perfectly well in IEEE 754 single, double, and
63temporary-real formats, as well as Sun-style quadruple-precision
64arithmetic, and also Cray, IBM, and Compaq/DEC VAX mainframe formats,
65will not work as expected, because tolerance must underflow to zero
66before the sum x + tolerance finally differs from x, when x == 1.
67
68This affects the computation of the machine epsilon, which normally
69takes the form
70
71fp_t machine_epsilon(void)
72{
73    fp_t x;
74
75    while ((1 + x/2) != 1)
76	x /= 2;
77
78    return (x);
79}
80
81(writing it in a form that doesn't require length modifiers on the
82constants, allowing typedefs of fp_t to float, double, and long double
83without changing the code, and assuming that longer registers are not
84involved in the computation in while condition).
85
86On the IBM RS/6000, this code produces the incorrect answer
874.9406564584124654e-324 (== 2^(-1074)).
88
89Since there are only 53 bits in each of the two significands, the
90expected precision is only (2*53 =) 106 bits, corresponding to just
91under 31 decimal digits.  The machine epsilon should therefore be
922^{-105} = 2.47e-32.
93
94Despite several experiments, and considerable thought over the course
95of a day, I haven't yet been able to program a MACHINE-INDEPENDENT way
96to write the machine_epsilon() function for the quadruple-precision
97precision case.  I've therefore temporarily taken the easy way out,
98and put in a machine-dependent conditional that checks for the IBM
99RS/6000, and long double revert to double.  During my Web searching, I
100came across a reference to an Apple floating-point implementation that
101uses a similar double + double representation; thus, this problem may
102be endemic on all systems based on the Power and PowerPC chips.
103
104On Apple Rhapsody (also known as MacOS 10), the C compiler (gcc
1052.7.2.1) treats long double like double, so this problem isn't
106exhibited.
107
108On GNU/Linux on Apple Macintosh, the C and C++ compilers also
109treat long double like double; they define the symbols
110	__powerpc__
111	__PPC
112	__PPC__
113	powerpc
114	PPC
115
116The IBM RS/6000 compilers on AIX 4.x define the symbols
117	_IBMR2
118	_POWER
119independent of which architecture subset (com, pwr, pwr2,
120pwrx, ppc, ppcgr) is selected by a -qarch=xxx flag.
121
122The Apple Rhapsody compiler defines
123
124    __ARCHITECTURE__="ppc"
125    __ppc
126    __ppc__
127    ppc
128
129I've therefore modified confix.h to test for some of these symbols,
130and if any are found, to disable long double arithmetic.  There was
131already a similar disabling there for the NeXT, where the compiler
132handles long double, but the run-time library doesn't.
133
134Fri Mar 31 15:24:16 2000
135I tracked down an IBM Web page that discusses implications of their
136peculiar 128-bit floating-point format:
137
138http://www.rs6000.ibm.com/doc_link/en_US/a_doc_lib/aixprggd/genprogc/128bit_long_double_floating-point_datatype.htm#CE7AE41923raga
139
140Just in case that page disappears, or moves, here is what IBM has to
141say about the machine epsilon:
142
143>> ...
144>> Epsilon
145>>
146>> The ANSI C standard defines the value of epsilon as the difference
147>> between 1.0 and the least representable value greater than 1.0, that
148>> is, b**(1-p), where b is the radix (2) and p is the number of base b
149>> digits in the number. This definition requires that the number of base
150>> b digits is fixed, which is not true for 128-bit long double numbers.
151>>
152>> The smallest representable value greater than 1.0 is this number:
153>>
154>> 0x3FF0000000000000, 0x0000000000000001
155>>
156>> The difference between this value and 1.0 is this number:
157>>
158>> 0x0000000000000001, 0x0000000000000000
159>> 0.4940656458412465441765687928682213E-323
160>>
161>> Because 128-bit numbers usually provide at least 106 bits of
162>> precision, an appropriate minimum value for p is 106. Thus, b**(1-p)
163>> and 2**(-105) yield this value:
164>>
165>> 0x3960000000000000, 0x0000000000000000
166>> 0.24651903288156618919116517665087070E-31
167>>
168>> Both values satisfy the definition of epsilon according to standard
169>> C. The long double subroutines use the second value because it better
170>> characterizes the accuracy provided by the 128-bit implementation.
171>> ...
172