xref: /dragonfly/contrib/awk/FIXES (revision bb54c3a2)
1/****************************************************************
2Copyright (C) Lucent Technologies 1997
3All Rights Reserved
4
5Permission to use, copy, modify, and distribute this software and
6its documentation for any purpose and without fee is hereby
7granted, provided that the above copyright notice appear in all
8copies and that both that the copyright notice and this
9permission notice and warranty disclaimer appear in supporting
10documentation, and that the name Lucent Technologies or any of
11its entities not be used in advertising or publicity pertaining
12to distribution of the software without specific, written prior
13permission.
14
15LUCENT DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE,
16INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
17IN NO EVENT SHALL LUCENT OR ANY OF ITS ENTITIES BE LIABLE FOR ANY
18SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
19WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
20IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
21ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF
22THIS SOFTWARE.
23****************************************************************/
24
25This file lists all bug fixes, changes, etc., made since the AWK book
26was sent to the printers in August, 1987.
27
28June 12, 2020:
29	Clear errno before calling errcheck to avoid any spurious errors
30	left over from previous calls that may have set it. Thanks to
31	Todd Miller for the fix, from PR #80.
32
33	Fix Issue #78 by allowing \r to follow floating point numbers in
34	lib.c:is_number. Thanks to GitHub user ajcarr for the report
35	and to Arnold Robbins for the fix.
36
37June 5, 2020:
38	In fldbld(), make sure that inputFS is set before trying to
39	use it. Thanks to  Steffen Nurpmeso <steffen@sdaoden.eu>
40	for the report.
41
42May 5, 2020:
43	Fix checks for compilers that can handle noreturn. Thanks to
44	GitHub user enh-google for pointing it out. Closes Issue #79.
45
46April 16, 2020:
47	Handle old compilers that don't support C11 (for noreturn).
48	Thanks to Arnold Robbins.
49
50April 5, 2020:
51	Use <stdnoreturn.h> and noreturn instead of GCC attributes.
52	Thanks to GitHub user awkfan77. Closes PR #77.
53
54February 28, 2020:
55	More cleanups from Christos Zoulas: notably backslash continuation
56	inside strings removes the newline and a fix for RS = "^a".
57	Fix for address sanitizer-found problem. Thanks to GitHub user
58	enh-google.
59
60February 19, 2020:
61	More small cleanups from Christos Zoulas.
62
63February 18, 2020:
64	Additional cleanups from Christos Zoulas. It's no longer necessary
65	to use the -y flag to bison.
66
67February 6, 2020:
68	Additional small cleanups from Christos Zoulas. awk is now
69	a little more robust about reporting I/O errors upon exit.
70
71January 31, 2020:
72	Merge PR #70, which avoids use of variable length arrays. Thanks
73	to GitHub user michaelforney.  Fix issue #60 ({0} in interval
74	expressions doesn't work).  Also get all tests working again.
75	Thanks to Arnold Robbins.
76
77January 24, 2020:
78	A number of small cleanups from Christos Zoulas.  Add the close
79	on exec flag to files/pipes opened for redirection; courtesy of
80	Arnold Robbins.
81
82January 19, 2020:
83	If POSIXLY_CORRECT is set in the environment, then sub and gsub
84	use POSIX rules for multiple backslashes.  This fixes Issue #66,
85	while maintaining backwards compatibility.
86
87January 9, 2020:
88	Input/output errors on closing files are now fatal instead of
89	mere warnings. Thanks to Martijn Dekker <martijn@inlv.org>.
90
91January 5, 2020:
92	Fix a bug in the concatentation of two string constants into
93	one done in the grammar.  Fixes GitHub issue #61.  Thanks
94	to GitHub user awkfan77 for pointing out the direction for
95	the fix.  New test T.concat added to the test suite.
96	Fix a few memory leaks reported by valgrind, as well.
97
98December 27, 2019:
99	Fix a bug whereby a{0,3} could match four a's.  Thanks to
100	"Anonymous AWK fan" for the report.
101
102December 11, 2019:
103	Further printf-related fixes for 32 bit systems.
104	Thanks again to Christos Zoulas.
105
106December 8, 2019:
107	Fix the return value of sprintf("%d") on 32 bit systems.
108	Thanks to Jim Lowe for the report and to Christos Zoulas
109	for the fix.
110
111November 10, 2019:
112	Convert a number of Boolean integer variables into
113	actual bools. Convert compile_time variable into an
114	enum and simplify some of the related code.  Thanks
115	to Arnold Robbins.
116
117November 8, 2019:
118	Fix from Ori Bernstein to get UTF-8 characters instead of
119	bytes when FS = "".  This is currently the only bit of
120	the One True Awk that understands multibyte characters.
121	From Arnold Robbins, apply some cleanups in the test suite.
122
123October 25, 2019:
124	More fixes and cleanups from NetBSD, courtesy of Christos
125	Zoulas. Merges PRs 54 and 55.
126
127October 24, 2019:
128	Import second round of code cleanups from NetBSD. Much thanks
129	to Christos Zoulas (GitHub user zoulasc). Merges PR 53.
130	Add an optimization for string concatenation, also from
131	Christos.
132
133October 17, 2019:
134	Import code cleanups from NetBSD. Much thanks to Christos
135	Zoulas (GitHub user zoulasc). Merges PR 51.
136
137October 6, 2019:
138	Import code from NetBSD awk that implements RS as a regular
139	expression.
140
141September 10, 2019:
142	Fixes for various array / memory overruns found via gcc's
143	-fsanitize=unknown. Thanks to Alexander Richardson (GitHub
144	user arichardson). Merges PRs 47 and 48.
145
146July 28, 2019:
147	Import grammar optimization from NetBSD: Two string constants
148	concatenated together get turned into a single string.
149
150July 26, 2019:
151	Support POSIX-specified C-style escape sequences "\a" (alarm)
152	and "\v" (vertical tab) in command line arguments and regular
153	expressions, further to the support for them in strings added on
154	Apr 9, 1989. These now no longer match as literal "a" and "v"
155	characters (as they don't on other awk implementations).
156	Thanks to Martijn Dekker.
157
158July 17, 2019:
159	Pull in a number of code cleanups and minor fixes from
160	Warner Losh's bsd-ota branch.  The only user visible change
161	is the use of random(3) as the random number generator.
162	Thanks to Warner Losh for collecting all these fixes in
163	one easy place to get them from.
164
165July 16, 2019:
166	Fix field splitting to use FS value as of the time a record
167	was read or assigned to.  Thanks to GitHub user Cody Mello (melloc)
168	for the fix. (Merged from his branch, via PR #42.) Updated
169	testdir/T.split per said PR as well.
170
171June 24, 2019:
172	Extract awktest.tar into testdir directory. Add some very
173	simple mechanics to the makefile for running the tests and
174	for cleaning up. No changes to awk itself.
175
176June 17, 2019:
177	Disallow deleting SYMTAB and its elements, which creates
178	use-after-free bugs. Thanks to GitHub user Cody Mello (melloc)
179	for the fix. (Merged from PR #43.)
180
181June 5, 2019:
182	Allow unmatched right parenthesis in a regular expression to
183	be treated literally. Fixes Issue #40. Thanks to GitHub user
184	Warner Losh (bsdimp) for the report. Thanks to Arnold Robbins
185	for the fix.
186
187May 29,2019:
188	Fix check for command line arguments to no longer require that
189	first character after '=' not be another '='. Reverts change of
190	August 11, 1989. Thanks to GitHub user Jamie Landeg Jones for
191	pointing out the issue; from Issue #38.
192
193Apr 7, 2019:
194	Update awktest.tar(p.50) to use modern options to sort. Needed
195	for Android development. Thanks to GitHub user mohd-akram (Mohamed
196	Akram).  From Issue #33.
197
198Mar 12, 2019:
199	Added very simplistic support for cross-compiling in the
200	makefile.  We are NOT going to go in the direction of the
201	autotools, though.  Thanks to GitHub user nee-san for
202	the basic change. (Merged from PR #34.)
203
204Mar 5, 2019:
205	Added support for POSIX-standard interval expressions (a.k.a.
206	bounds, a.k.a. repetition expressions) in regular expressions,
207	backported (via NetBSD) from Apple awk-24 (20070501).
208	Thanks to Martijn Dekker <martijn@inlv.org> for the port.
209	(Merged from PR #30.)
210
211Mar 3, 2019:
212	Merge PRs as follows:
213	#12: Avoid undefined behaviour when using ctype(3) functions in
214	     relex(). Thanks to GitHub user iamleot.
215	#31: Make getline handle numeric strings, and update FIXES. Thanks
216	     to GitHub user arnoldrobbins.
217	#32: maketab: support build systems with read-only source. Thanks
218	     to GitHub user enh.
219
220Jan 25, 2019:
221	Make getline handle numeric strings properly in all cases.
222	(Thanks, Arnold.)
223
224Jan 21, 2019:
225	Merged a number of small fixes from GitHub pull requests.
226	Thanks to GitHub users Arnold Robbins (arnoldrobbins),
227	Cody Mello (melloc) and Christoph Junghans (junghans).
228	PR numbers: 13-21, 23, 24, 27.
229
230Oct 25, 2018:
231	Added test in maketab.c to prevent generating a proctab entry
232	for YYSTYPE_IS_DEFINED.  It was harmless but some gcc settings
233	generated a warning message.  Thanks to Nan Xiao for report.
234
235Aug 27, 2018:
236	Disallow '$' in printf formats; arguments evaluated in order
237	and printed in order.
238
239	Added some casts to silence warnings on debugging printfs.
240	(Thanks, Arnold.)
241
242Aug 23, 2018:
243        A long list of fixes courtesy of Arnold Robbins,
244        to whom profound thanks.
245
246        1. ofs-rebuild: OFS value used to rebuild the record was incorrect.
247        Fixed August 19, 2014. Revised fix August 2018.
248
249        2. system-status: Instead of a floating-point division by 256, use
250        the wait(2) macros to create a reasonable exit status.
251        Fixed March 12, 2016.
252
253        3. space: Use provided xisblank() function instead of ispace() for
254        matching [[:blank:]].
255
256        4. a-format: Add POSIX standard %a and %A to supported formats. Check
257        at runtime that this format is available.
258
259        5. decr-NF: Decrementing NF did not change $0. This is a decades-old
260        bug. There are interactions with the old and new value of OFS as well.
261        Most of the fix came from the NetBSD awk.
262
263        6. string-conv: String conversions of scalars were sticky.  Once a
264        conversion to string happened, even with OFMT, that value was used until
265        a new numeric value was assigned, even if OFMT differed from CONVFMT,
266        and also if CONVFMT changed.
267
268        7. unary-plus: Unary plus on a string constant returned the string.
269        Instead, it should convert the value to numeric and give that value.
270
271	Also added Arnold's tests for these to awktest.tar as T.arnold.
272
273Aug 15, 2018:
274	fixed mangled awktest.tar (thanks, Arnold), posted all
275	current (very minor) fixes to github / onetrueawk
276
277Jun 7, 2018:
278	(yes, a long layoff)
279	Updated some broken tests (beebe.tar, T.lilly)
280	[thanks to Arnold Robbins]
281
282Mar 26, 2015:
283	buffer overflow in error reporting; thanks to tobias ulmer
284	and john-mark gurney for spotting it and the fix.
285
286Feb 4, 2013:
287	cleaned up a handful of tests that didn't seem to actually
288	test for correct behavior: T.latin1, T.gawk.
289
290Jan 5, 2013:
291	added ,NULL initializer to static Cells in run.c; not really
292	needed but cleaner.  Thanks to Michael Bombardieri.
293
294Dec 20, 2012:
295	fiddled makefile to get correct yacc and bison flags.  pick yacc
296	(linux) or bison (mac) as necessary.
297
298	added  __attribute__((__noreturn__)) to a couple of lines in
299	proto.h, to silence someone's enthusiastic checker.
300
301	fixed obscure call by value bug in split(a[1],a) reported on
302	9fans.  the management of temporary values is just a mess; i
303	took a shortcut by making an extra string copy.  thanks
304	to paul patience and arnold robbins for passing it on and for
305	proposed patches.
306
307	tiny fiddle in setfval to eliminate -0 results in T.expr, which
308	has irritated me for 20+ years.
309
310Aug 10, 2011:
311	another fix to avoid core dump with delete(ARGV); again, many thanks
312	to ruslan ermilov.
313
314Aug 7, 2011:
315	split(s, a, //) now behaves the same as split(s, a, "")
316
317Jun 12, 2011:
318	/pat/, \n /pat/ {...} is now legal, though bad style to use.
319
320	added checks to new -v code that permits -vnospace; thanks to
321	ruslan ermilov for spotting this and providing the patch.
322
323	removed fixed limit on number of open files; thanks to aleksey
324	cheusov and christos zoulos.
325
326	fixed day 1 bug that resurrected deleted elements of ARGV when
327	used as filenames (in lib.c).
328
329	minor type fiddles to make gcc -Wall -pedantic happier (but not
330	totally so); turned on -fno-strict-aliasing in makefile.
331
332May 6, 2011:
333	added #ifdef for isblank.
334	now allows -ffoo as well as -f foo arguments.
335	(thanks, ruslan)
336
337May 1, 2011:
338	after advice from todd miller, kevin lo, ruslan ermilov,
339	and arnold robbins, changed srand() to return the previous
340	seed (which is 1 on the first call of srand).  the seed is
341	an Awkfloat internally though converted to unsigned int to
342	pass to the library srand().  thanks, everyone.
343
344	fixed a subtle (and i hope low-probability) overflow error
345	in fldbld, by adding space for one extra \0.  thanks to
346	robert bassett for spotting this one and providing a fix.
347
348	removed the files related to compilation on windows.  i no
349	longer have anything like a current windows environment, so
350	i can't test any of it.
351
352May 23, 2010:
353	fixed long-standing overflow bug in run.c; many thanks to
354	nelson beebe for spotting it and providing the fix.
355
356	fixed bug that didn't parse -vd=1 properly; thanks to santiago
357	vila for spotting it.
358
359Feb 8, 2010:
360	i give up.  replaced isblank with isspace in b.c; there are
361	no consistent header files.
362
363Nov 26, 2009:
364	fixed a long-standing issue with when FS takes effect.  a
365	change to FS is now noticed immediately for subsequent splits.
366
367	changed the name getline() to awkgetline() to avoid yet another
368	name conflict somewhere.
369
370Feb 11, 2009:
371	temporarily for now defined HAS_ISBLANK, since that seems to
372	be the best way through the thicket.  isblank arrived in C99,
373	but seems to be arriving at different systems at different
374	times.
375
376Oct 8, 2008:
377	fixed typo in b.c that set tmpvec wrongly.  no one had ever
378	run into the problem, apparently.  thanks to alistair crooks.
379
380Oct 23, 2007:
381	minor fix in lib.c: increase inputFS to 100, change malloc
382	for fields to n+1.
383
384	fixed memory fault caused by out of order test in setsval.
385
386	thanks to david o'brien, freebsd, for both fixes.
387
388May 1, 2007:
389	fiddle in makefile to fix for BSD make; thanks to igor sobrado.
390
391Mar 31, 2007:
392	fixed some null pointer refs calling adjbuf.
393
394Feb 21, 2007:
395	fixed a bug in matching the null RE in sub and gsub.  thanks to al aho
396	who actually did the fix (in b.c), and to wolfgang seeberg for finding
397	it and providing a very compact test case.
398
399	fixed quotation in b.c; thanks to Hal Pratt and the Princeton Dante
400	Project.
401
402	removed some no-effect asserts in run.c.
403
404	fiddled maketab.c to not complain about bison-generated values.
405
406	removed the obsolete -V argument; fixed --version to print the
407	version and exit.
408
409	fixed wording and an outright error in the usage message; thanks to igor
410	sobrado and jason mcintyre.
411
412	fixed a bug in -d that caused core dump if no program followed.
413
414Jan 1, 2007:
415	dropped mac.code from makefile; there are few non-MacOSX
416	mac's these days.
417
418Jan 17, 2006:
419	system() not flagged as unsafe in the unadvertised -safe option.
420	found it while enhancing tests before shipping the ;login: article.
421	practice what you preach.
422
423	removed the 9-years-obsolete -mr and -mf flags.
424
425	added -version and --version options.
426
427	core dump on linux with BEGIN {nextfile}, now fixed.
428
429	removed some #ifdef's in run.c and lex.c that appear to no
430	longer be necessary.
431
432Apr 24, 2005:
433	modified lib.c so that values of $0 et al are preserved in the END
434	block, apparently as required by posix.  thanks to havard eidnes
435	for the report and code.
436
437Jan 14, 2005:
438	fixed infinite loop in parsing, originally found by brian tsang.
439	thanks to arnold robbins for a suggestion that started me
440	rethinking it.
441
442Dec 31, 2004:
443	prevent overflow of -f array in main, head off potential error in
444	call of SYNTAX(), test malloc return in lib.c, all with thanks to
445	todd miller.
446
447Dec 22, 2004:
448	cranked up size of NCHARS; coverity thinks it can be overrun with
449	smaller size, and i think that's right.  added some assertions to b.c
450	to catch places where it might overrun.  the RE code is still fragile.
451
452Dec 5, 2004:
453	fixed a couple of overflow problems with ridiculous field numbers:
454	e.g., print $(2^32-1).  thanks to ruslan ermilov, giorgos keramidas
455	and david o'brien at freebsd.org for patches.  this really should
456	be re-done from scratch.
457
458Nov 21, 2004:
459	fixed another 25-year-old RE bug, in split.  it's another failure
460	to (re-)initialize.  thanks to steve fisher for spotting this and
461	providing a good test case.
462
463Nov 22, 2003:
464	fixed a bug in regular expressions that dates (so help me) from 1977;
465	it's been there from the beginning.  an anchored longest match that
466	was longer than the number of states triggered a failure to initialize
467	the machine properly.  many thanks to moinak ghosh for not only finding
468	this one but for providing a fix, in some of the most mysterious
469	code known to man.
470
471	fixed a storage leak in call() that appears to have been there since
472	1983 or so -- a function without an explicit return that assigns a
473	string to a parameter leaked a Cell.  thanks to moinak ghosh for
474	spotting this very subtle one.
475
476Jul 31, 2003:
477	fixed, thanks to andrey chernov and ruslan ermilov, a bug in lex.c
478	that mis-handled the character 255 in input.  (it was being compared
479	to EOF with a signed comparison.)
480
481Jul 29, 2003:
482	fixed (i think) the long-standing botch that included the beginning of
483	line state ^ for RE's in the set of valid characters; this led to a
484	variety of odd problems, including failure to properly match certain
485	regular expressions in non-US locales.  thanks to ruslan for keeping
486	at this one.
487
488Jul 28, 2003:
489	n-th try at getting internationalization right, with thanks to volker
490	kiefel, arnold robbins and ruslan ermilov for advice, though they
491	should not be blamed for the outcome.  according to posix, "."  is the
492	radix character in programs and command line arguments regardless of
493	the locale; otherwise, the locale should prevail for input and output
494	of numbers.  so it's intended to work that way.
495
496	i have rescinded the attempt to use strcoll in expanding shorthands in
497	regular expressions (cclenter).  its properties are much too
498	surprising; for example [a-c] matches aAbBc in locale en_US but abBcC
499	in locale fr_CA.  i can see how this might arise by implementation
500	but i cannot explain it to a human user.  (this behavior can be seen
501	in gawk as well; we're leaning on the same library.)
502
503	the issue appears to be that strcoll is meant for sorting, where
504	merging upper and lower case may make sense (though note that unix
505	sort does not do this by default either).  it is not appropriate
506	for regular expressions, where the goal is to match specific
507	patterns of characters.  in any case, the notations [:lower:], etc.,
508	are available in awk, and they are more likely to work correctly in
509	most locales.
510
511	a moratorium is hereby declared on internationalization changes.
512	i apologize to friends and colleagues in other parts of the world.
513	i would truly like to get this "right", but i don't know what
514	that is, and i do not want to keep making changes until it's clear.
515
516Jul 4, 2003:
517	fixed bug that permitted non-terminated RE, as in "awk /x".
518
519Jun 1, 2003:
520	subtle change to split: if source is empty, number of elems
521	is always 0 and the array is not set.
522
523Mar 21, 2003:
524	added some parens to isblank, in another attempt to make things
525	internationally portable.
526
527Mar 14, 2003:
528	the internationalization changes, somewhat modified, are now
529	reinstated.  in theory awk will now do character comparisons
530	and case conversions in national language, but "." will always
531	be the decimal point separator on input and output regardless
532	of national language.  isblank(){} has an #ifndef.
533
534	this no longer compiles on windows: LC_MESSAGES isn't defined
535	in vc6++.
536
537	fixed subtle behavior in field and record splitting: if FS is
538	a single character and RS is not empty, \n is NOT a separator.
539	this tortuous reading is found in the awk book; behavior now
540	matches gawk and mawk.
541
542Dec 13, 2002:
543	for the moment, the internationalization changes of nov 29 are
544	rolled back -- programs like x = 1.2 don't work in some locales,
545	because the parser is expecting x = 1,2.  until i understand this
546	better, this will have to wait.
547
548Nov 29, 2002:
549	modified b.c (with tiny changes in main and run) to support
550	locales, using strcoll and iswhatever tests for posix character
551	classes.  thanks to ruslan ermilov (ru@freebsd.org) for code.
552	the function isblank doesn't seem to have propagated to any
553	header file near me, so it's there explicitly.  not properly
554	tested on non-ascii character sets by me.
555
556Jun 28, 2002:
557	modified run/format() and tran/getsval() to do a slightly better
558	job on using OFMT for output from print and CONVFMT for other
559	number->string conversions, as promised by posix and done by
560	gawk and mawk.  there are still places where it doesn't work
561	right if CONVFMT is changed; by then the STR attribute of the
562	variable has been irrevocably set.  thanks to arnold robbins for
563	code and examples.
564
565	fixed subtle bug in format that could get core dump.  thanks to
566	Jaromir Dolecek <jdolecek@NetBSD.org> for finding and fixing.
567	minor cleanup in run.c / format() at the same time.
568
569	added some tests for null pointers to debugging printf's, which
570	were never intended for external consumption.  thanks to dave
571	kerns (dkerns@lucent.com) for pointing this out.
572
573	GNU compatibility: an empty regexp matches anything (thanks to
574	dag-erling smorgrav, des@ofug.org).  subject to reversion if
575	this does more harm than good.
576
577	pervasive small changes to make things more const-correct, as
578	reported by gcc's -Wwrite-strings.  as it says in the gcc manual,
579	this may be more nuisance than useful.  provoked by a suggestion
580	and code from arnaud desitter, arnaud@nimbus.geog.ox.ac.uk
581
582	minor documentation changes to note that this now compiles out
583	of the box on Mac OS X.
584
585Feb 10, 2002:
586	changed types in posix chars structure to quiet solaris cc.
587
588Jan 1, 2002:
589	fflush() or fflush("") flushes all files and pipes.
590
591	length(arrayname) returns number of elements; thanks to
592	arnold robbins for suggestion.
593
594	added a makefile.win to make it easier to build on windows.
595	based on dan allen's buildwin.bat.
596
597Nov 16, 2001:
598	added support for posix character class names like [:digit:],
599	which are not exactly shorter than [0-9] and perhaps no more
600	portable.  thanks to dag-erling smorgrav for code.
601
602Feb 16, 2001:
603	removed -m option; no longer needed, and it was actually
604	broken (noted thanks to volker kiefel).
605
606Feb 10, 2001:
607	fixed an appalling bug in gettok: any sequence of digits, +,-, E, e,
608	and period was accepted as a valid number if it started with a period.
609	this would never have happened with the lex version.
610
611	other 1-character botches, now fixed, include a bare $ and a
612	bare " at the end of the input.
613
614Feb 7, 2001:
615	more (const char *) casts in b.c and tran.c to silence warnings.
616
617Nov 15, 2000:
618	fixed a bug introduced in august 1997 that caused expressions
619	like $f[1] to be syntax errors.  thanks to arnold robbins for
620	noticing this and providing a fix.
621
622Oct 30, 2000:
623	fixed some nextfile bugs: not handling all cases.  thanks to
624	arnold robbins for pointing this out.  new regressions added.
625
626	close() is now a function.  it returns whatever the library
627	fclose returns, and -1 for closing a file or pipe that wasn't
628	opened.
629
630Sep 24, 2000:
631	permit \n explicitly in character classes; won't work right
632	if comes in as "[\n]" but ok as /[\n]/, because of multiple
633	processing of \'s.  thanks to arnold robbins.
634
635July 5, 2000:
636	minor fiddles in tran.c to keep compilers happy about uschar.
637	thanks to norman wilson.
638
639May 25, 2000:
640	yet another attempt at making 8-bit input work, with another
641	band-aid in b.c (member()), and some (uschar) casts to head
642	off potential errors in subscripts (like isdigit).  also
643	changed HAT to NCHARS-2.  thanks again to santiago vila.
644
645	changed maketab.c to ignore apparently out of range definitions
646	instead of halting; new freeBSD generates one.  thanks to
647	jon snader <jsnader@ix.netcom.com> for pointing out the problem.
648
649May 2, 2000:
650	fixed an 8-bit problem in b.c by making several char*'s into
651	unsigned char*'s.  not clear i have them all yet.  thanks to
652	Santiago Vila <sanvila@unex.es> for the bug report.
653
654Apr 21, 2000:
655	finally found and fixed a memory leak in function call; it's
656	been there since functions were added ~1983.  thanks to
657	jon bentley for the test case that found it.
658
659	added test in envinit to catch environment "variables" with
660	names beginning with '='; thanks to Berend Hasselman.
661
662Jul 28, 1999:
663	added test in defn() to catch function foo(foo), which
664	otherwise recurses until core dump.  thanks to arnold
665	robbins for noticing this.
666
667Jun 20, 1999:
668	added *bp in gettok in lex.c; appears possible to exit function
669	without terminating the string.  thanks to russ cox.
670
671Jun 2, 1999:
672	added function stdinit() to run to initialize files[] array,
673	in case stdin, etc., are not constants; some compilers care.
674
675May 10, 1999:
676	replaced the ERROR ... FATAL, etc., macros with functions
677	based on vprintf, to avoid problems caused by overrunning
678	fixed-size errbuf array.  thanks to ralph corderoy for the
679	impetus, and for pointing out a string termination bug in
680	qstring as well.
681
682Apr 21, 1999:
683	fixed bug that caused occasional core dumps with commandline
684	variable with value ending in \.  (thanks to nelson beebe for
685	the test case.)
686
687Apr 16, 1999:
688	with code kindly provided by Bruce Lilly, awk now parses
689	/=/ and similar constructs more sensibly in more places.
690	Bruce also provided some helpful test cases.
691
692Apr 5, 1999:
693	changed true/false to True/False in run.c to make it
694	easier to compile with C++.  Added some casts on malloc
695	and realloc to be honest about casts; ditto.  changed
696	ltype int to long in struct rrow to reduce some 64-bit
697	complaints; other changes scattered throughout for the
698	same purpose.  thanks to Nelson Beebe for these portability
699	improvements.
700
701	removed some horrible pointer-int casting in b.c and elsewhere
702	by adding ptoi and itonp to localize the casts, which are
703	all benign.  fixed one incipient bug that showed up on sgi
704	in 64-bit mode.
705
706	reset lineno for new source file; include filename in error
707	message.  also fixed line number error in continuation lines.
708	(thanks to Nelson Beebe for both of these.)
709
710Mar 24, 1999:
711	Nelson Beebe notes that irix 5.3 yacc dies with a bogus
712	error; use a newer version or switch to bison, since sgi
713	is unlikely to fix it.
714
715Mar 5, 1999:
716	changed isnumber to is_number to avoid the problem caused by
717	versions of ctype.h that include the name isnumber.
718
719	distribution now includes a script for building on a Mac,
720	thanks to Dan Allen.
721
722Feb 20, 1999:
723	fixed memory leaks in run.c (call) and tran.c (setfval).
724	thanks to Stephen Nutt for finding these and providing the fixes.
725
726Jan 13, 1999:
727	replaced srand argument by (unsigned int) in run.c;
728	avoids problem on Mac and potentially on Unix & Windows.
729	thanks to Dan Allen.
730
731	added a few (int) casts to silence useless compiler warnings.
732	e.g., errorflag= in run.c jump().
733
734	added proctab.c to the bundle outout; one less thing
735	to have to compile out of the box.
736
737	added calls to _popen and _pclose to the win95 stub for
738	pipes (thanks to Steve Adams for this helpful suggestion).
739	seems to work, though properties are not well understood
740	by me, and it appears that under some circumstances the
741	pipe output is truncated.  Be careful.
742
743Oct 19, 1998:
744	fixed a couple of bugs in getrec: could fail to update $0
745	after a getline var; because inputFS wasn't initialized,
746	could split $0 on every character, a misleading diversion.
747
748	fixed caching bug in makedfa: LRU was actually removing
749	least often used.
750
751	thanks to ross ridge for finding these, and for providing
752	great bug reports.
753
754May 12, 1998:
755	fixed potential bug in readrec: might fail to update record
756	pointer after growing.  thanks to dan levy for spotting this
757	and suggesting the fix.
758
759Mar 12, 1998:
760	added -V to print version number and die.
761
762[notify dave kerns, dkerns@dacsoup.ih.lucent.com]
763
764Feb 11, 1998:
765	subtle silent bug in lex.c: if the program ended with a number
766	longer than 1 digit, part of the input would be pushed back and
767	parsed again because token buffer wasn't terminated right.
768	example:  awk 'length($0) > 10'.  blush.  at least i found it
769	myself.
770
771Aug 31, 1997:
772	s/adelete/awkdelete/: SGI uses this in malloc.h.
773	thanks to nelson beebe for pointing this one out.
774
775Aug 21, 1997:
776	fixed some bugs in sub and gsub when replacement includes \\.
777	this is a dark, horrible corner, but at least now i believe that
778	the behavior is the same as gawk and the intended posix standard.
779	thanks to arnold robbins for advice here.
780
781Aug 9, 1997:
782	somewhat regretfully, replaced the ancient lex-based lexical
783	analyzer with one written in C.  it's longer, generates less code,
784	and more portable; the old one depended too much on mysterious
785	properties of lex that were not preserved in other environments.
786	in theory these recognize the same language.
787
788	now using strtod to test whether a string is a number, instead of
789	the convoluted original function.  should be more portable and
790	reliable if strtod is implemented right.
791
792	removed now-pointless optimization in makefile that tries to avoid
793	recompilation when awkgram.y is changed but symbols are not.
794
795	removed most fixed-size arrays, though a handful remain, some
796	of which are unchecked.  you have been warned.
797
798Aug 4, 1997:
799	with some trepidation, replaced the ancient code that managed
800	fields and $0 in fixed-size arrays with arrays that grow on
801	demand.  there is still some tension between trying to make this
802	run fast and making it clean; not sure it's right yet.
803
804	the ill-conceived -mr and -mf arguments are now useful only
805	for debugging.  previous dynamic string code removed.
806
807	numerous other minor cleanups along the way.
808
809Jul 30, 1997:
810	using code provided by dan levy (to whom profuse thanks), replaced
811	fixed-size arrays and awkward kludges by a fairly uniform mechanism
812	to grow arrays as needed for printf, sub, gsub, etc.
813
814Jul 23, 1997:
815	falling off the end of a function returns "" and 0, not 0.
816	thanks to arnold robbins.
817
818Jun 17, 1997:
819	replaced several fixed-size arrays by dynamically-created ones
820	in run.c; added overflow tests to some previously unchecked cases.
821	getline, toupper, tolower.
822
823	getline code is still broken in that recursive calls may wind
824	up using the same space.  [fixed later]
825
826	increased RECSIZE to 8192 to push problems further over the horizon.
827
828	added \r to \n as input line separator for programs, not data.
829	damn CRLFs.
830
831	modified format() to permit explicit printf("%c", 0) to include
832	a null byte in output.  thanks to ken stailey for the fix.
833
834	added a "-safe" argument that disables file output (print >,
835	print >>), process creation (cmd|getline, print |, system), and
836	access to the environment (ENVIRON).  this is a first approximation
837	to a "safe" version of awk, but don't rely on it too much.  thanks
838	to joan feigenbaum and matt blaze for the inspiration long ago.
839
840Jul 8, 1996:
841	fixed long-standing bug in sub, gsub(/a/, "\\\\&"); thanks to
842	ralph corderoy.
843
844Jun 29, 1996:
845	fixed awful bug in new field splitting; didn't get all the places
846	where input was done.
847
848Jun 28, 1996:
849	changed field-splitting to conform to posix definition: fields are
850	split using the value of FS at the time of input; it used to be
851	the value when the field or NF was first referred to, a much less
852	predictable definition.  thanks to arnold robbins for encouragement
853	to do the right thing.
854
855May 28, 1996:
856	fixed appalling but apparently unimportant bug in parsing octal
857	numbers in reg exprs.
858
859	explicit hex in reg exprs now limited to 2 chars: \xa, \xaa.
860
861May 27, 1996:
862	cleaned up some declarations so gcc -Wall is now almost silent.
863
864	makefile now includes backup copies of ytab.c and lexyy.c in case
865	one makes before looking; it also avoids recreating lexyy.c unless
866	really needed.
867
868	s/aprintf/awkprint, s/asprintf/awksprintf/ to avoid some name clashes
869	with unwisely-written header files.
870
871	thanks to jeffrey friedl for several of these.
872
873May 26, 1996:
874	an attempt to rationalize the (unsigned) char issue.  almost all
875	instances of unsigned char have been removed; the handful of places
876	in b.c where chars are used as table indices have been hand-crafted.
877	added some latin-1 tests to the regression, but i'm not confident;
878	none of my compilers seem to care much.  thanks to nelson beebe for
879	pointing out some others that do care.
880
881May 2, 1996:
882	removed all register declarations.
883
884	enhanced split(), as in gawk, etc:  split(s, a, "") splits s into
885	a[1]...a[length(s)] with each character a single element.
886
887	made the same changes for field-splitting if FS is "".
888
889	added nextfile, as in gawk: causes immediate advance to next
890	input file. (thanks to arnold robbins for inspiration and code).
891
892	small fixes to regexpr code:  can now handle []], [[], and
893	variants;  [] is now a syntax error, rather than matching
894	everything;  [z-a] is now empty, not z.  far from complete
895	or correct, however.  (thanks to jeffrey friedl for pointing out
896	some awful behaviors.)
897
898Apr 29, 1996:
899	replaced uchar by uschar everywhere; apparently some compilers
900	usurp this name and this causes conflicts.
901
902	fixed call to time in run.c (bltin); arg is time_t *.
903
904	replaced horrible pointer/long punning in b.c by a legitimate
905	union.  should be safer on 64-bit machines and cleaner everywhere.
906	(thanks to nelson beebe for pointing out some of these problems.)
907
908	replaced nested comments by #if 0...#endif in run.c, lib.c.
909
910	removed getsval, setsval, execute macros from run.c and lib.c.
911	machines are 100x faster than they were when these macros were
912	first used.
913
914	revised filenames: awk.g.y => awkgram.y, awk.lx.l => awklex.l,
915	y.tab.[ch] => ytab.[ch], lex.yy.c => lexyy.c, all in the aid of
916	portability to nameless systems.
917
918	"make bundle" now includes yacc and lex output files for recipients
919	who don't have yacc or lex.
920
921Aug 15, 1995:
922	initialized Cells in setsymtab more carefully; some fields
923	were not set.  (thanks to purify, all of whose complaints i
924	think i now understand.)
925
926	fixed at least one error in gsub that looked at -1-th element
927	of an array when substituting for a null match (e.g., $).
928
929	delete arrayname is now legal; it clears the elements but leaves
930	the array, which may not be the right behavior.
931
932	modified makefile: my current make can't cope with the test used
933	to avoid unnecessary yacc invocations.
934
935Jul 17, 1995:
936	added dynamically growing strings to awk.lx.l and b.c
937	to permit regular expressions to be much bigger.
938	the state arrays can still overflow.
939
940Aug 24, 1994:
941	detect duplicate arguments in function definitions (mdm).
942
943May 11, 1994:
944	trivial fix to printf to limit string size in sub().
945
946Apr 22, 1994:
947	fixed yet another subtle self-assignment problem:
948	$1 = $2; $1 = $1 clobbered $1.
949
950	Regression tests now use private echo, to avoid quoting problems.
951
952Feb 2, 1994:
953	changed error() to print line number as %d, not %g.
954
955Jul 23, 1993:
956	cosmetic changes: increased sizes of some arrays,
957	reworded some error messages.
958
959	added CONVFMT as in posix (just replaced OFMT in getsval)
960
961	FILENAME is now "" until the first thing that causes a file
962	to be opened.
963
964Nov 28, 1992:
965	deleted yyunput and yyoutput from proto.h;
966	different versions of lex give these different declarations.
967
968May 31, 1992:
969	added -mr N and -mf N options: more record and fields.
970	these really ought to adjust automatically.
971
972	cleaned up some error messages; "out of space" now means
973	malloc returned NULL in all cases.
974
975	changed rehash so that if it runs out, it just returns;
976	things will continue to run slow, but maybe a bit longer.
977
978Apr 24, 1992:
979	remove redundant close of stdin when using -f -.
980
981	got rid of core dump with -d; awk -d just prints date.
982
983Apr 12, 1992:
984	added explicit check for /dev/std(in,out,err) in redirection.
985	unlike gawk, no /dev/fd/n yet.
986
987	added (file/pipe) builtin.  hard to test satisfactorily.
988	not posix.
989
990Feb 20, 1992:
991	recompile after abortive changes;  should be unchanged.
992
993Dec 2, 1991:
994	die-casting time:  converted to ansi C, installed that.
995
996Nov 30, 1991:
997	fixed storage leak in freefa, failing to recover [N]CCL.
998	thanks to Bill Jones (jones@cs.usask.ca)
999
1000Nov 19, 1991:
1001	use RAND_MAX instead of literal in builtin().
1002
1003Nov 12, 1991:
1004	cranked up some fixed-size arrays in b.c, and added a test for
1005	overflow in penter.  thanks to mark larsen.
1006
1007Sep 24, 1991:
1008	increased buffer in gsub.  a very crude fix to a general problem.
1009	and again on Sep 26.
1010
1011Aug 18, 1991:
1012	enforce variable name syntax for commandline variables: has to
1013	start with letter or _.
1014
1015Jul 27, 1991:
1016	allow newline after ; in for statements.
1017
1018Jul 21, 1991:
1019	fixed so that in self-assignment like $1=$1, side effects
1020	like recomputing $0 take place.  (this is getting subtle.)
1021
1022Jun 30, 1991:
1023	better test for detecting too-long output record.
1024
1025Jun 2, 1991:
1026	better defense against very long printf strings.
1027	made break and continue illegal outside of loops.
1028
1029May 13, 1991:
1030	removed extra arg on gettemp, tempfree.  minor error message rewording.
1031
1032May 6, 1991:
1033	fixed silly bug in hex parsing in hexstr().
1034	removed an apparently unnecessary test in isnumber().
1035	warn about weird printf conversions.
1036	fixed unchecked array overwrite in relex().
1037
1038	changed for (i in array) to access elements in sorted order.
1039	then unchanged it -- it really does run slower in too many cases.
1040	left the code in place, commented out.
1041
1042Feb 10, 1991:
1043	check error status on all writes, to avoid banging on full disks.
1044
1045Jan 28, 1991:
1046	awk -f - reads the program from stdin.
1047
1048Jan 11, 1991:
1049	failed to set numeric state on $0 in cmd|getline context in run.c.
1050
1051Nov 2, 1990:
1052	fixed sleazy test for integrality in getsval;  use modf.
1053
1054Oct 29, 1990:
1055	fixed sleazy buggy code in lib.c that looked (incorrectly) for
1056	too long input lines.
1057
1058Oct 14, 1990:
1059	fixed the bug on p. 198 in which it couldn't deduce that an
1060	argument was an array in some contexts.  replaced the error
1061	message in intest() by code that damn well makes it an array.
1062
1063Oct 8, 1990:
1064	fixed horrible bug:  types and values were not preserved in
1065	some kinds of self-assignment. (in assign().)
1066
1067Aug 24, 1990:
1068	changed NCHARS to 256 to handle 8-bit characters in strings
1069	presented to match(), etc.
1070
1071Jun 26, 1990:
1072	changed struct rrow (awk.h) to use long instead of int for lval,
1073	since cfoll() stores a pointer in it.  now works better when int's
1074	are smaller than pointers!
1075
1076May 6, 1990:
1077	AVA fixed the grammar so that ! is uniformly of the same precedence as
1078	unary + and -.  This renders illegal some constructs like !x=y, which
1079	now has to be parenthesized as !(x=y), and makes others work properly:
1080	!x+y is (!x)+y, and x!y is x !y, not two pattern-action statements.
1081	(These problems were pointed out by Bob Lenk of Posix.)
1082
1083	Added \x to regular expressions (already in strings).
1084	Limited octal to octal digits; \8 and \9 are not octal.
1085	Centralized the code for parsing escapes in regular expressions.
1086	Added a bunch of tests to T.re and T.sub to verify some of this.
1087
1088Feb 9, 1990:
1089	fixed null pointer dereference bug in main.c:  -F[nothing].  sigh.
1090
1091	restored srand behavior:  it returns the current seed.
1092
1093Jan 18, 1990:
1094	srand now returns previous seed value (0 to start).
1095
1096Jan 5, 1990:
1097	fix potential problem in tran.c -- something was freed,
1098	then used in freesymtab.
1099
1100Oct 18, 1989:
1101	another try to get the max number of open files set with
1102	relatively machine-independent code.
1103
1104	small fix to input() in case of multiple reads after EOF.
1105
1106Oct 11, 1989:
1107	FILENAME is now defined in the BEGIN block -- too many old
1108	programs broke.
1109
1110	"-" means stdin in getline as well as on the commandline.
1111
1112	added a bunch of casts to the code to tell the truth about
1113	char * vs. unsigned char *, a right royal pain.  added a
1114	setlocale call to the front of main, though probably no one
1115	has it usefully implemented yet.
1116
1117Aug 24, 1989:
1118	removed redundant relational tests against nullnode if parse
1119	tree already had a relational at that point.
1120
1121Aug 11, 1989:
1122	fixed bug:  commandline variable assignment has to look like
1123	var=something.  (consider the man page for =, in file =.1)
1124
1125	changed number of arguments to functions to static arrays
1126	to avoid repeated malloc calls.
1127
1128Aug 2, 1989:
1129	restored -F (space) separator
1130
1131Jul 30, 1989:
1132	added -v x=1 y=2 ... for immediate commandline variable assignment;
1133	done before the BEGIN block for sure.  they have to precede the
1134	program if the program is on the commandline.
1135	Modified Aug 2 to require a separate -v for each assignment.
1136
1137Jul 10, 1989:
1138	fixed ref-thru-zero bug in environment code in tran.c
1139
1140Jun 23, 1989:
1141	add newline to usage message.
1142
1143Jun 14, 1989:
1144	added some missing ansi printf conversion letters: %i %X %E %G.
1145	no sensible meaning for h or L, so they may not do what one expects.
1146
1147	made %* conversions work.
1148
1149	changed x^y so that if n is a positive integer, it's done
1150	by explicit multiplication, thus achieving maximum accuracy.
1151	(this should be done by pow() but it seems not to be locally.)
1152	done to x ^= y as well.
1153
1154Jun 4, 1989:
1155	ENVIRON array contains environment: if shell variable V=thing,
1156		ENVIRON["V"] is "thing"
1157
1158	multiple -f arguments permitted.  error reporting is naive.
1159	(they were permitted before, but only the last was used.)
1160
1161	fixed a really stupid botch in the debugging macro dprintf
1162
1163	fixed order of evaluation of commandline assignments to match
1164	what the book claims:  an argument of the form x=e is evaluated
1165	at the time it would have been opened if it were a filename (p 63).
1166	this invalidates the suggested answer to ex 4-1 (p 195).
1167
1168	removed some code that permitted -F (space) fieldseparator,
1169	since it didn't quite work right anyway.  (restored aug 2)
1170
1171Apr 27, 1989:
1172	Line number now accumulated correctly for comment lines.
1173
1174Apr 26, 1989:
1175	Debugging output now includes a version date,
1176	if one compiles it into the source each time.
1177
1178Apr 9, 1989:
1179	Changed grammar to prohibit constants as 3rd arg of sub and gsub;
1180	prevents class of overwriting-a-constant errors.  (Last one?)
1181	This invalidates the "banana" example on page 43 of the book.
1182
1183	Added \a ("alert"), \v (vertical tab), \xhhh (hexadecimal),
1184	as in ANSI, for strings.  Rescinded the sloppiness that permitted
1185	non-octal digits in \ooo.  Warning:  not all compilers and libraries
1186	will be able to deal with \x correctly.
1187
1188Jan 9, 1989:
1189	Fixed bug that caused tempcell list to contain a duplicate.
1190	The fix is kludgy.
1191
1192Dec 17, 1988:
1193	Catches some more commandline errors in main.
1194	Removed redundant decl of modf in run.c (confuses some compilers).
1195	Warning:  there's no single declaration of malloc, etc., in awk.h
1196	that seems to satisfy all compilers.
1197
1198Dec 7, 1988:
1199	Added a bit of code to error printing to avoid printing nulls.
1200	(Not clear that it actually would.)
1201
1202Nov 27, 1988:
1203	With fear and trembling, modified the grammar to permit
1204	multiple pattern-action statements on one line without
1205	an explicit separator.  By definition, this capitulation
1206	to the ghost of ancient implementations remains undefined
1207	and thus subject to change without notice or apology.
1208	DO NOT COUNT ON IT.
1209
1210Oct 30, 1988:
1211	Fixed bug in call() that failed to recover storage.
1212
1213	A warning is now generated if there are more arguments
1214	in the call than in the definition (in lieu of fixing
1215	another storage leak).
1216
1217Oct 20, 1988:
1218	Fixed %c:  if expr is numeric, use numeric value;
1219	otherwise print 1st char of string value.  still
1220	doesn't work if the value is 0 -- won't print \0.
1221
1222	Added a few more checks for running out of malloc.
1223
1224Oct 12, 1988:
1225	Fixed bug in call() that freed local arrays twice.
1226
1227	Fixed to handle deletion of non-existent array right;
1228	complains about attempt to delete non-array element.
1229
1230Sep 30, 1988:
1231	Now guarantees to evaluate all arguments of built-in
1232	functions, as in C;  the appearance is that arguments
1233	are evaluated before the function is called.  Places
1234	affected are sub (gsub was ok), substr, printf, and
1235	all the built-in arithmetic functions in bltin().
1236	A warning is generated if a bltin() is called with
1237	the wrong number of arguments.
1238
1239	This requires changing makeprof on p167 of the book.
1240
1241Aug 23, 1988:
1242	setting FILENAME in BEGIN caused core dump, apparently
1243	because it was freeing space not allocated by malloc.
1244
1245July 24, 1988:
1246	fixed egregious error in toupper/tolower functions.
1247	still subject to rescinding, however.
1248
1249July 2, 1988:
1250	flush stdout before opening file or pipe
1251
1252July 2, 1988:
1253	performance bug in b.c/cgoto(): not freeing some sets of states.
1254	partial fix only right now, and the number of states increased
1255	to make it less obvious.
1256
1257June 1, 1988:
1258	check error status on close
1259
1260May 28, 1988:
1261	srand returns seed value it's using.
1262	see 1/18/90
1263
1264May 22, 1988:
1265	Removed limit on depth of function calls.
1266
1267May 10, 1988:
1268	Fixed lib.c to permit _ in commandline variable names.
1269
1270Mar 25, 1988:
1271	main.c fixed to recognize -- as terminator of command-
1272	line options.  Illegal options flagged.
1273	Error reporting slightly cleaned up.
1274
1275Dec 2, 1987:
1276	Newer C compilers apply a strict scope rule to extern
1277	declarations within functions.  Two extern declarations in
1278	lib.c and tran.c have been moved to obviate this problem.
1279
1280Oct xx, 1987:
1281	Reluctantly added toupper and tolower functions.
1282	Subject to rescinding without notice.
1283
1284Sep 17, 1987:
1285	Error-message printer had printf(s) instead of
1286	printf("%s",s);  got core dumps when the message
1287	included a %.
1288
1289Sep 12, 1987:
1290	Very long printf strings caused core dump;
1291	fixed aprintf, asprintf, format to catch them.
1292	Can still get a core dump in printf itself.
1293
1294
1295