1Fri Apr 2 17:31:59 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu) 2 3 * Released version 0.12. 4 5 * regex.c (regerror): If errcode is zero, that's not a valid 6 error code, according to POSIX, but return "Success." 7 8 * regex.c (regerror): Remember to actually fetch the message 9 from re_error_msg. 10 11 * regex.c (regex_compile): Don't use the trick for ".*\n" on 12 ".+\n". Since the latter involves laying an extra choice 13 point, the backward jump isn't adjusted properly. 14 15Thu Mar 25 21:35:18 1993 Jim Blandy (jimb@totoro.cs.oberlin.edu) 16 17 * regex.c (regex_compile): In the handle_open and handle_close 18 sections, clear pending_exact to zero. 19 20Tue Mar 9 12:03:07 1993 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) 21 22 * regex.c (re_search_2): In the loop which searches forward 23 using fastmap, don't forget to cast the character from the 24 string to an unsigned before using it as an index into the 25 translate map. 26 27Thu Jan 14 15:41:46 1993 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu) 28 29 * regex.h: Never define const; let the callers do it. 30 configure.in: Don't define USING_AUTOCONF. 31 32Wed Jan 6 20:49:29 1993 Jim Blandy (jimb@geech.gnu.ai.mit.edu) 33 34 * regex.c (regerror): Abort if ERRCODE is out of range. 35 36Sun Dec 20 16:19:10 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) 37 38 * configure.in: Arrange to #define USING_AUTOCONF. 39 * regex.h: If USING_AUTOCONF is #defined, don't mess with 40 `const' at all; autoconf has taken care of it. 41 42Mon Dec 14 21:40:39 1992 David J. MacKenzie (djm@kropotkin.gnu.ai.mit.edu) 43 44 * regex.h (RE_SYNTAX_AWK): Fix typo. From Arnold Robbins. 45 46Sun Dec 13 20:35:39 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) 47 48 * regex.c (compile_range): Fetch the range start and end by 49 casting the pattern pointer to an `unsigned char *' before 50 fetching through it. 51 52Sat Dec 12 09:41:01 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) 53 54 * regex.c: Undo change of 12/7/92; it's better for Emacs to 55 #define HAVE_CONFIG_H. 56 57Fri Dec 11 22:00:34 1992 Jim Meyering (meyering@hal.gnu.ai.mit.edu) 58 59 * regex.c: Define and use isascii-protected ctype.h macros. 60 61Fri Dec 11 05:10:38 1992 Jim Blandy (jimb@totoro.cs.oberlin.edu) 62 63 * regex.c (re_match_2): Undo Karl's November 10th change; it 64 keeps the group in :\(.*\) from matching :/ properly. 65 66Mon Dec 7 19:44:56 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) 67 68 * regex.c: #include config.h if either HAVE_CONFIG_H or emacs 69 is #defined. 70 71Tue Dec 1 13:33:17 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) 72 73 * regex.c [HAVE_CONFIG_H]: Include config.h. 74 75Wed Nov 25 23:46:02 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) 76 77 * regex.c (regcomp): Add parens around bitwise & for clarity. 78 Initialize preg->allocated to prevent segv. 79 80Tue Nov 24 09:22:29 1992 David J. MacKenzie (djm@goldman.gnu.ai.mit.edu) 81 82 * regex.c: Use HAVE_STRING_H, not USG. 83 * configure.in: Check for string.h, not USG. 84 85Fri Nov 20 06:33:24 1992 Karl Berry (karl@cs.umb.edu) 86 87 * regex.c (SIGN_EXTEND_CHAR) [VMS]: Back out of this change, 88 since Roland Roberts now says it was a localism. 89 90Mon Nov 16 07:01:36 1992 Karl Berry (karl@cs.umb.edu) 91 92 * regex.h (const) [!HAVE_CONST]: Test another cpp symbol (from 93 Autoconf) before zapping const. 94 95Sun Nov 15 05:36:42 1992 Jim Blandy (jimb@wookumz.gnu.ai.mit.edu) 96 97 * regex.c, regex.h: Changes for VMS from Roland B Roberts 98 <roberts@nsrl31.nsrl.rochester.edu>. 99 100Thu Nov 12 11:31:15 1992 Karl Berry (karl@cs.umb.edu) 101 102 * Makefile.in (distfiles): Include INSTALL. 103 104Tue Nov 10 09:29:23 1992 Karl Berry (karl@cs.umb.edu) 105 106 * regex.c (re_match_2): At maybe_pop_jump, if at end of string 107 and pattern, just quit the matching loop. 108 109 * regex.c (LETTER_P): Rename to `WORDCHAR_P'. 110 111 * regex.c (AT_STRINGS_{BEG,END}): Take `d' as an arg; change 112 callers. 113 114 * regex.c (re_match_2) [!emacs]: In wordchar and notwordchar 115 cases, advance d. 116 117Wed Nov 4 15:43:58 1992 Karl Berry (karl@hal.gnu.ai.mit.edu) 118 119 * regex.h (const) [!__STDC__]: Don't define if it's already defined. 120 121Sat Oct 17 19:28:19 1992 Karl Berry (karl@cs.umb.edu) 122 123 * regex.c (bcmp, bcopy, bzero): Only #define if they are not 124 already #defined. 125 126 * configure.in: Use AC_CONST. 127 128Thu Oct 15 08:39:06 1992 Karl Berry (karl@cs.umb.edu) 129 130 * regex.h (const) [!const]: Conditionalize. 131 132Fri Oct 2 13:31:42 1992 Karl Berry (karl@cs.umb.edu) 133 134 * regex.h (RE_SYNTAX_ED): New definition. 135 136Sun Sep 20 12:53:39 1992 Karl Berry (karl@cs.umb.edu) 137 138 * regex.[ch]: remove traces of `longest_p' -- dumb idea to put 139 this into the pattern buffer, as it means parallelism loses. 140 141 * Makefile.in (config.status): use sh to run configure --no-create. 142 143 * Makefile.in (realclean): OK, don't remove configure. 144 145Sat Sep 19 09:05:08 1992 Karl Berry (karl@hayley) 146 147 * regex.c (PUSH_FAILURE_POINT, POP_FAILURE_POINT) [DEBUG]: keep 148 track of how many failure points we push and pop. 149 (re_match_2) [DEBUG]: declare variables for that, and print results. 150 (DEBUG_PRINT4): new macro. 151 152 * regex.h (re_pattern_buffer): new field `longest_p' (to 153 eliminate backtracking if the user doesn't need it). 154 * regex.c (re_compile_pattern): initialize it (to 1). 155 (re_search_2): set it to zero if register information is not needed. 156 (re_match_2): if it's set, don't backtrack. 157 158 * regex.c (re_search_2): update fastmap only after checking that 159 the pattern is anchored. 160 161 * regex.c (re_match_2): do more debugging at maybe_pop_jump. 162 163 * regex.c (re_search_2): cast result of TRANSLATE for use in 164 array subscript. 165 166Thu Sep 17 19:47:16 1992 Karl Berry (karl@geech.gnu.ai.mit.edu) 167 168 * Version 0.11. 169 170Wed Sep 16 08:17:10 1992 Karl Berry (karl@hayley) 171 172 * regex.c (INIT_FAIL_STACK): rewrite as statements instead of a 173 complicated comma expr, to avoid compiler warnings (and also 174 simplify). 175 (re_compile_fastmap, re_match_2): change callers. 176 177 * regex.c (POP_FAILURE_POINT): cast pop of regstart and regend 178 to avoid compiler warnings. 179 180 * regex.h (RE_NEWLINE_ORDINARY): remove this syntax bit, and 181 remove uses. 182 * regex.c (at_{beg,end}line_loc_p): go the last mile: remove 183 the RE_NEWLINE_ORDINARY case which made the ^ in \n^ be an anchor. 184 185Tue Sep 15 09:55:29 1992 Karl Berry (karl@hayley) 186 187 * regex.c (at_begline_loc_p): new fn. 188 (at_endline_loc_p): simplify at_endline_op_p. 189 (regex_compile): in ^/$ cases, call the above. 190 191 * regex.c (POP_FAILURE_POINT): rewrite the fn as a macro again, 192 as lord's profiling indicates the function is 20% of the time. 193 (re_match_2): callers changed. 194 195 * configure.in (AC_MEMORY_H): remove, since we never use memcpy et al. 196 197Mon Sep 14 17:49:27 1992 Karl Berry (karl@hayley) 198 199 * Makefile.in (makeargs): include MFLAGS. 200 201Sun Sep 13 07:41:45 1992 Karl Berry (karl@hayley) 202 203 * regex.c (regex_compile): in \1..\9 case, make it always 204 invalid to use \<digit> if there is no preceding <digit>th subexpr. 205 * regex.h (RE_NO_MISSING_BK_REF): remove this syntax bit. 206 207 * regex.c (regex_compile): remove support for invalid empty groups. 208 * regex.h (RE_NO_EMPTY_GROUPS): remove this syntax bit. 209 210 * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: define as alloca (0), 211 to reclaim memory. 212 213 * regex.h (RE_SYNTAX_POSIX_SED): don't bother with this. 214 215Sat Sep 12 13:37:21 1992 Karl Berry (karl@hayley) 216 217 * README: incorporate emacs.diff. 218 219 * regex.h (_RE_ARGS) [!__STDC__]: define as empty parens. 220 221 * configure.in: add AC_ALLOCA. 222 223 * Put test files in subdir test, documentation in subdir doc. 224 Adjust Makefile.in and configure.in accordingly. 225 226Thu Sep 10 10:29:11 1992 Karl Berry (karl@hayley) 227 228 * regex.h (RE_SYNTAX_{POSIX_,}SED): new definitions. 229 230Wed Sep 9 06:27:09 1992 Karl Berry (karl@hayley) 231 232 * Version 0.10. 233 234Tue Sep 8 07:32:30 1992 Karl Berry (karl@hayley) 235 236 * xregex.texinfo: put the day of month into the date. 237 238 * Makefile.in (realclean): remove Texinfo-generated files. 239 (distclean): remove empty sorted index files. 240 (clean): remove dvi files, etc. 241 242 * configure.in: test for more Unix variants. 243 244 * fileregex.c: new file. 245 Makefile.in (fileregex): new target. 246 247 * iregex.c (main): move variable decls to smallest scope. 248 249 * regex.c (FREE_VARIABLES): free reg_{,info_}dummy. 250 (re_match_2): check that the allocation for those two succeeded. 251 252 * regex.c (FREE_VAR): replace FREE_NONNULL with this. 253 (FREE_VARIABLES): call it. 254 (re_match_2) [REGEX_MALLOC]: initialize all our vars to NULL. 255 256 * tregress.c (do_match): generalize simple_match. 257 (SIMPLE_NONMATCH): new macro. 258 (SIMPLE_MATCH): change from routine. 259 260 * Makefile.in (regex.texinfo): make file readonly, so we don't 261 edit it by mistake. 262 263 * many files (re_default_syntax): rename to `re_syntax_options'; 264 call re_set_syntax instead of assigning to the variable where 265 possible. 266 267Mon Sep 7 10:12:16 1992 Karl Berry (karl@hayley) 268 269 * syntax.skel: don't use prototypes. 270 271 * {configure,Makefile}.in: new files. 272 273 * regex.c: include <string.h> `#if USG || STDC_HEADERS'; remove 274 obsolete test for `POSIX', and test for BSRTING. 275 Include <strings.h> if we are not USG or STDC_HEADERS. 276 Do not include <unistd.h>. What did we ever need that for? 277 278 * regex.h (RE_NO_EMPTY_ALTS): remove this. 279 (RE_SYNTAX_AWK): remove from here, too. 280 * regex.c (regex_compile): remove the check. 281 * xregex.texinfo (Alternation Operator): update. 282 * other.c (test_others): remove tests for this. 283 284 * regex.h (RE_DUP_MAX): undefine if already defined. 285 286 * regex.h: (RE_SYNTAX_POSIX*): redo to allow more operators, and 287 define new syntaxes with the minimal set. 288 289 * syntax.skel (main): used sscanf instead of scanf. 290 291 * regex.h (RE_SYNTAX_*GREP): new definitions from mike. 292 293 * regex.c (regex_compile): initialize the upper bound of 294 intervals at the beginning of the interval, not the end. 295 (From pclink@qld.tne.oz.au.) 296 297 * regex.c (handle_bar): rename to `handle_alt', for consistency. 298 299 * regex.c ({store,insert}_{op1,op2}): new routines (except the last). 300 ({STORE,INSERT}_JUMP{,2}): macros to replace the old routines, 301 which took arguments in different orders, and were generally weird. 302 303 * regex.c (PAT_PUSH*): rename to `BUF_PUSH*' -- we're not 304 appending info to the pattern! 305 306Sun Sep 6 11:26:49 1992 Karl Berry (karl@hayley) 307 308 * regex.c (regex_compile): delete the variable 309 `following_left_brace', since we never use it. 310 311 * regex.c (print_compiled_pattern): don't print the fastmap if 312 it's null. 313 314 * regex.c (re_compile_fastmap): handle 315 `on_failure_keep_string_jump' like `on_failure_jump'. 316 317 * regex.c (re_match_2): in `charset{,_not' case, cast the bit 318 count to unsigned, not unsigned char, in case we have a full 319 32-byte bit list. 320 321 * tregress.c (simple_match): remove. 322 (simple_test): rename as `simple_match'. 323 (simple_compile): print the error string if the compile failed. 324 325 * regex.c (DO_RANGE): rewrite as a function, `compile_range', so 326 we can debug it. Change pattern characters to unsigned char 327 *'s, and change the range variable to an unsigned. 328 (regex_compile): change calls. 329 330Sat Sep 5 17:40:49 1992 Karl Berry (karl@hayley) 331 332 * regex.h (_RE_ARGS): new macro to put in argument lists (if 333 ANSI) or omit them (if K&R); don't declare routines twice. 334 335 * many files (obscure_syntax): rename to `re_default_syntax'. 336 337Fri Sep 4 09:06:53 1992 Karl Berry (karl@hayley) 338 339 * GNUmakefile (extraclean): new target. 340 (realclean): delete the info files. 341 342Wed Sep 2 08:14:42 1992 Karl Berry (karl@hayley) 343 344 * regex.h: doc fix. 345 346Sun Aug 23 06:53:15 1992 Karl Berry (karl@hayley) 347 348 * regex.[ch] (re_comp): no const in the return type (from djm). 349 350Fri Aug 14 07:25:46 1992 Karl Berry (karl@hayley) 351 352 * regex.c (DO_RANGE): declare variables as unsigned chars, not 353 signed chars (from jimb). 354 355Wed Jul 29 18:33:53 1992 Karl Berry (karl@claude.cs.umb.edu) 356 357 * Version 0.9. 358 359 * GNUmakefile (distclean): do not remove regex.texinfo. 360 (realclean): remove it here. 361 362 * tregress.c (simple_test): initialize buf.buffer. 363 364Sun Jul 26 08:59:38 1992 Karl Berry (karl@hayley) 365 366 * regex.c (push_dummy_failure): new opcode and corresponding 367 case in the various routines. Pushed at the end of 368 alternatives. 369 370 * regex.c (jump_past_next_alt): rename to `jump_past_alt', for 371 brevity. 372 (no_pop_jump): rename to `jump'. 373 374 * regex.c (regex_compile) [DEBUG]: terminate printing of pattern 375 with a newline. 376 377 * NEWS: new file. 378 379 * tregress.c (simple_{compile,match,test}): routines to simplify all 380 these little tests. 381 382 * tregress.c: test for matching as much as possible. 383 384Fri Jul 10 06:53:32 1992 Karl Berry (karl@hayley) 385 386 * Version 0.8. 387 388Wed Jul 8 06:39:31 1992 Karl Berry (karl@hayley) 389 390 * regex.c (SIGN_EXTEND_CHAR): #undef any previous definition, as 391 ours should always work properly. 392 393Mon Jul 6 07:10:50 1992 Karl Berry (karl@hayley) 394 395 * iregex.c (main) [DEBUG]: conditionalize the call to 396 print_compiled_pattern. 397 398 * iregex.c (main): initialize buf.buffer to NULL. 399 * tregress (test_regress): likewise. 400 401 * regex.c (alloca) [sparc]: #if on HAVE_ALLOCA_H instead. 402 403 * tregress.c (test_regress): didn't have jla's test quite right. 404 405Sat Jul 4 09:02:12 1992 Karl Berry (karl@hayley) 406 407 * regex.c (re_match_2): only REGEX_ALLOCATE all the register 408 vectors if the pattern actually has registers. 409 (match_end): new variable to avoid having to use best_regend[0]. 410 411 * regex.c (IS_IN_FIRST_STRING): rename to FIRST_STRING_P. 412 413 * regex.c: doc fixes. 414 415 * tregess.c (test_regress): new fastmap test forwarded by rms. 416 417 * tregress.c (test_regress): initialize the fastmap field. 418 419 * tregress.c (test_regress): new test from jla that aborted 420 in re_search_2. 421 422Fri Jul 3 09:10:05 1992 Karl Berry (karl@hayley) 423 424 * tregress.c (test_regress): add tests for translating charsets, 425 from kaoru. 426 427 * GNUmakefile (common): add alloca.o. 428 * alloca.c: new file, copied from bison. 429 430 * other.c (test_others): remove var `buf', since it's no longer used. 431 432 * Below changes from ro@TechFak.Uni-Bielefeld.DE. 433 434 * tregress.c (test_regress): initialize buf.allocated. 435 436 * regex.c (re_compile_fastmap): initialize `succeed_n_p'. 437 438 * GNUmakefile (regex): depend on $(common). 439 440Wed Jul 1 07:12:46 1992 Karl Berry (karl@hayley) 441 442 * Version 0.7. 443 444 * regex.c: doc fixes. 445 446Mon Jun 29 08:09:47 1992 Karl Berry (karl@fosse) 447 448 * regex.c (pop_failure_point): change string vars to 449 `const char *' from `unsigned char *'. 450 451 * regex.c: consolidate debugging stuff. 452 (print_partial_compiled_pattern): avoid enum clash. 453 454Mon Jun 29 07:50:27 1992 Karl Berry (karl@hayley) 455 456 * xmalloc.c: new file. 457 * GNUmakefile (common): add it. 458 459 * iregex.c (print_regs): new routine (from jimb). 460 (main): call it. 461 462Sat Jun 27 10:50:59 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu) 463 464 * xregex.c (re_match_2): When we have accepted a match and 465 restored d from best_regend[0], we need to set dend 466 appropriately as well. 467 468Sun Jun 28 08:48:41 1992 Karl Berry (karl@hayley) 469 470 * tregress.c: rename from regress.c. 471 472 * regex.c (print_compiled_pattern): improve charset case to ease 473 byte-counting. 474 Also, don't distinguish between Emacs and non-Emacs 475 {not,}wordchar opcodes. 476 477 * regex.c (print_fastmap): move here. 478 * test.c: from here. 479 * regex.c (print_{{partial,}compiled_pattern,double_string}): 480 rename from ..._printer. Change calls here and in test.c. 481 482 * regex.c: create from xregex.c and regexinc.c for once and for 483 all, and change the debug fns to be extern, instead of static. 484 * GNUmakefile: remove traces of xregex.c. 485 * test.c: put in externs, instead of including regexinc.c. 486 487 * xregex.c: move interactive main program and scanstring to iregex.c. 488 * iregex.c: new file. 489 * upcase.c, printchar.c: new files. 490 491 * various doc fixes and other cosmetic changes throughout. 492 493 * regexinc.c (compiled_pattern_printer): change variable name, 494 for consistency. 495 (partial_compiled_pattern_printer): print other info about the 496 compiled pattern, besides just the opcodes. 497 * xregex.c (regex_compile) [DEBUG]: print the compiled pattern 498 when we're done. 499 500 * xregex.c (re_compile_fastmap): in the duplicate case, set 501 `can_be_null' and return. 502 Also, set `bufp->can_be_null' according to a new variable, 503 `path_can_be_null'. 504 Also, rewrite main while loop to not test `p != NULL', since 505 we never set it that way. 506 Also, eliminate special `can_be_null' value for the endline case. 507 (re_search_2): don't test for the special value. 508 * regex.h (struct re_pattern_buffer): remove the definition. 509 510Sat Jun 27 15:00:40 1992 Karl Berry (karl@hayley) 511 512 * xregex.c (re_compile_fastmap): remove the `RE_' from 513 `REG_RE_MATCH_NULL_AT_END'. 514 Also, assert the fastmap in the pattern buffer is non-null. 515 Also, reset `succeed_n_p' after we've 516 paid attention to it, instead of every time through the loop. 517 Also, in the `anychar' case, only clear fastmap['\n'] if the 518 syntax says to, and don't return prematurely. 519 Also, rearrange cases in some semblance of a rational order. 520 * regex.h (REG_RE_MATCH_NULL_AT_END): remove the `RE_' from the name. 521 522 * other.c: take bug reports from here. 523 * regress.c: new file for them. 524 * GNUmakefile (test): add it. 525 * main.c (main): new possible test. 526 * test.h (test_type): new value in enum. 527 528Thu Jun 25 17:37:43 1992 Karl Berry (karl@hayley) 529 530 * xregex.c (scanstring) [test]: new function from jimb to allow some 531 escapes. 532 (main) [test]: call it (on the string, not the pattern). 533 534 * xregex.c (main): make return type `int'. 535 536Wed Jun 24 10:43:03 1992 Karl Berry (karl@hayley) 537 538 * xregex.c (pattern_offset_t): change to `int', for the benefit 539 of patterns which compile to more than 2^15 bytes. 540 541 * xregex.c (GET_BUFFER_SPACE): remove spurious braces. 542 543 * xregex.texinfo (Using Registers): put in a stub to ``document'' 544 the new function. 545 * regex.h (re_set_registers) [!__STDC__]: declare. 546 * xregex.c (re_set_registers): declare K&R style (also move to a 547 different place in the file). 548 549Mon Jun 8 18:03:28 1992 Jim Blandy (jimb@pogo.cs.oberlin.edu) 550 551 * regex.h (RE_NREGS): Doc fix. 552 553 * xregex.c (re_set_registers): New function. 554 * regex.h (re_set_registers): Declaration for new function. 555 556Fri Jun 5 06:55:18 1992 Karl Berry (karl@hayley) 557 558 * main.c (main): `return 0' instead of `exit (0)'. (From Paul Eggert) 559 560 * regexinc.c (SIGN_EXTEND_CHAR): cast to unsigned char. 561 (extract_number, EXTRACT_NUMBER): don't bother to cast here. 562 563Tue Jun 2 07:37:53 1992 Karl Berry (karl@hayley) 564 565 * Version 0.6. 566 567 * Change copyrights to `1985, 89, ...'. 568 569 * regex.h (REG_RE_MATCH_NULL_AT_END): new macro. 570 * xregex.c (re_compile_fastmap): initialize `can_be_null' to 571 `p==pend', instead of in the test at the top of the loop (as 572 it was, it was always being set). 573 Also, set `can_be_null'=1 if we would jump to the end of the 574 pattern in the `on_failure_jump' cases. 575 (re_search_2): check if `can_be_null' is 1, not nonzero. This 576 was the original test in rms' regex; why did we change this? 577 578 * xregex.c (re_compile_fastmap): rename `is_a_succeed_n' to 579 `succeed_n_p'. 580 581Sat May 30 08:09:08 1992 Karl Berry (karl@hayley) 582 583 * xregex.c (re_compile_pattern): declare `regnum' as `unsigned', 584 not `regnum_t', for the benefit of those patterns with more 585 than 255 groups. 586 587 * xregex.c: rename `failure_stack' to `fail_stack', for brevity; 588 likewise for `match_nothing' to `match_null'. 589 590 * regexinc.c (REGEX_REALLOCATE): take both the new and old 591 sizes, and copy only the old bytes. 592 * xregex.c (DOUBLE_FAILURE_STACK): pass both old and new. 593 * This change from Thorsten Ohl. 594 595Fri May 29 11:45:22 1992 Karl Berry (karl@hayley) 596 597 * regexinc.c (SIGN_EXTEND_CHAR): define as `(signed char) c' 598 instead of relying on __CHAR_UNSIGNED__, to work with 599 compilers other than GCC. From Per Bothner. 600 601 * main.c (main): change return type to `int'. 602 603Mon May 18 06:37:08 1992 Karl Berry (karl@hayley) 604 605 * regex.h (RE_SYNTAX_AWK): typo in RE_RE_UNMATCHED... 606 607Fri May 15 10:44:46 1992 Karl Berry (karl@hayley) 608 609 * Version 0.5. 610 611Sun May 3 13:54:00 1992 Karl Berry (karl@hayley) 612 613 * regex.h (struct re_pattern_buffer): now it's just `regs_allocated'. 614 (REGS_UNALLOCATED, REGS_REALLOCATE, REGS_FIXED): new constants. 615 * xregex.c (regexec, re_compile_pattern): set the field appropriately. 616 (re_match_2): and use it. bufp can't be const any more. 617 618Fri May 1 15:43:09 1992 Karl Berry (karl@hayley) 619 620 * regexinc.c: unconditionally include <sys/types.h>, first. 621 622 * regex.h (struct re_pattern_buffer): rename 623 `caller_allocated_regs' to `regs_allocated_p'. 624 * xregex.c (re_compile_pattern): same change here. 625 (regexec): and here. 626 (re_match_2): reallocate registers if necessary. 627 628Fri Apr 10 07:46:50 1992 Karl Berry (karl@hayley) 629 630 * regex.h (RE_SYNTAX{_POSIX,}_AWK): new definitions from Arnold. 631 632Sun Mar 15 07:34:30 1992 Karl Berry (karl at hayley) 633 634 * GNUmakefile (dist): versionize regex.{c,h,texinfo}. 635 636Tue Mar 10 07:05:38 1992 Karl Berry (karl at hayley) 637 638 * Version 0.4. 639 640 * xregex.c (PUSH_FAILURE_POINT): always increment the failure id. 641 (DEBUG_STATEMENT) [DEBUG]: execute the statement even if `debug'==0. 642 643 * xregex.c (pop_failure_point): if the saved string location is 644 null, keep the current value. 645 (re_match_2): at fail, test for a dummy failure point by 646 checking the restored pattern value, not string value. 647 (re_match_2): new case, `on_failure_keep_string_jump'. 648 (regex_compile): output this opcode in the .*\n case. 649 * regexinc.c (re_opcode_t): define the opcode. 650 (partial_compiled_pattern_pattern): add the new case. 651 652Mon Mar 9 09:09:27 1992 Karl Berry (karl at hayley) 653 654 * xregex.c (regex_compile): optimize .*\n to output an 655 unconditional jump to the ., instead of pushing failure points 656 each time through the loop. 657 658 * xregex.c (DOUBLE_FAILURE_STACK): compute the maximum size 659 ourselves (and correctly); change callers. 660 661Sun Mar 8 17:07:46 1992 Karl Berry (karl at hayley) 662 663 * xregex.c (failure_stack_elt_t): change to `const char *', to 664 avoid warnings. 665 666 * regex.h (re_set_syntax): declare this. 667 668 * xregex.c (pop_failure_point) [DEBUG]: conditionally pass the 669 original strings and sizes; change callers. 670 671Thu Mar 5 16:35:35 1992 Karl Berry (karl at claude.cs.umb.edu) 672 673 * xregex.c (regnum_t): new type for register/group numbers. 674 (compile_stack_elt_t, regex_compile): use it. 675 676 * xregex.c (regexec): declare len as `int' to match re_search. 677 678 * xregex.c (re_match_2): don't declare p1 twice. 679 680 * xregex.c: change `while (1)' to `for (;;)' to avoid silly 681 compiler warnings. 682 683 * regex.h [__STDC__]: use #if, not #ifdef. 684 685 * regexinc.c (REGEX_REALLOCATE): cast the result of alloca to 686 (char *), to avoid warnings. 687 688 * xregex.c (regerror): declare variable as const. 689 690 * xregex.c (re_compile_pattern, re_comp): define as returning a const 691 char *. 692 * regex.h (re_compile_pattern, re_comp): likewise. 693 694Thu Mar 5 15:57:56 1992 Karl Berry (karl@hal) 695 696 * xregex.c (regcomp): declare `syntax' as unsigned. 697 698 * xregex.c (re_match_2): try to avoid compiler warnings about 699 unsigned comparisons. 700 701 * GNUmakefile (test-xlc): new target. 702 703 * regex.h (reg_errcode_t): remove trailing comma from definition. 704 * regexinc.c (re_opcode_t): likewise. 705 706Thu Mar 5 06:56:07 1992 Karl Berry (karl at hayley) 707 708 * GNUmakefile (dist): add version numbers automatically. 709 (versionfiles): new variable. 710 (regex.{c,texinfo}): don't add version numbers here. 711 * regex.h: put in placeholder instead of the version number. 712 713Fri Feb 28 07:11:33 1992 Karl Berry (karl at hayley) 714 715 * xregex.c (re_error_msg): declare const, since it is. 716 717Sun Feb 23 05:41:57 1992 Karl Berry (karl at fosse) 718 719 * xregex.c (PAT_PUSH{,_2,_3}, ...): cast args to avoid warnings. 720 (regex_compile, regexec): return REG_NOERROR, instead 721 of 0, on success. 722 (boolean): define as char, and #define false and true. 723 * regexinc.c (STREQ): cast the result. 724 725Sun Feb 23 07:45:38 1992 Karl Berry (karl at hayley) 726 727 * GNUmakefile (test-cc, test-hc, test-pcc): new targets. 728 729 * regex.inc (extract_number, extract_number_and_incr) [DEBUG]: 730 only define if we are debugging. 731 732 * xregex.c [_AIX]: do #pragma alloca first if necessary. 733 * regexinc.c [_AIX]: remove the #pragma from here. 734 735 * regex.h (reg_syntax_t): declare as unsigned, and redo the enum 736 as #define's again. Some compilers do stupid things with enums. 737 738Thu Feb 20 07:19:47 1992 Karl Berry (karl at hayley) 739 740 * Version 0.3. 741 742 * xregex.c, regex.h (newline_anchor_match_p): rename to 743 `newline_anchor'; dumb idea to change the name. 744 745Tue Feb 18 07:09:02 1992 Karl Berry (karl at hayley) 746 747 * regexinc.c: go back to original, i.e., don't include 748 <string.h> or define strchr. 749 * xregex.c (regexec): don't bother with adding characters after 750 newlines to the fastmap; instead, just don't use a fastmap. 751 * xregex.c (regcomp): set the buffer and fastmap fields to zero. 752 753 * xregex.texinfo (GNU r.e. compiling): have to initialize more 754 than two fields. 755 756 * regex.h (struct re_pattern_buffer): rename `newline_anchor' to 757 `newline_anchor_match_p', as we're back to two cases. 758 * xregex.c (regcomp, re_compile_pattern, re_comp): change 759 accordingly. 760 (re_match_2): at begline and endline, POSIX is not a special 761 case anymore; just check newline_anchor_match_p. 762 763Thu Feb 13 16:29:33 1992 Karl Berry (karl at hayley) 764 765 * xregex.c (*empty_string*): rename to *null_string*, for brevity. 766 767Wed Feb 12 06:36:22 1992 Karl Berry (karl at hayley) 768 769 * xregex.c (re_compile_fastmap): at endline, don't set fastmap['\n']. 770 (re_match_2): rewrite the begline/endline cases to take account 771 of the new field newline_anchor. 772 773Tue Feb 11 14:34:55 1992 Karl Berry (karl at hayley) 774 775 * regexinc.c [!USG etc.]: include <strings.h> and define strchr 776 as index. 777 778 * xregex.c (re_search_2): when searching backwards, declare `c' 779 as a char and use casts when using it as an array subscript. 780 781 * xregex.c (regcomp): if REG_NEWLINE, set 782 RE_HAT_LISTS_NOT_NEWLINE. Set the `newline_anchor' field 783 appropriately. 784 (regex_compile): compile [^...] as matching a \n according to 785 the syntax bit. 786 (regexec): if doing REG_NEWLINE stuff, compile a fastmap and add 787 characters after any \n's to the newline. 788 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): new syntax bit. 789 (struct re_pattern_buffer): rename `posix_newline' to 790 `newline_anchor', define constants for its values. 791 792Mon Feb 10 07:22:50 1992 Karl Berry (karl at hayley) 793 794 * xregex.c (re_compile_fastmap): combine the code at the top and 795 bottom of the loop, as it's essentially identical. 796 797Sun Feb 9 10:02:19 1992 Karl Berry (karl at hayley) 798 799 * xregex.texinfo (POSIX Translate Tables): remove this, as it 800 doesn't match the spec. 801 802 * xregex.c (re_compile_fastmap): if we finish off a path, go 803 back to the top (to set can_be_null) instead of returning 804 immediately. 805 806 * xregex.texinfo: changes from bob. 807 808Sat Feb 1 07:03:25 1992 Karl Berry (karl at hayley) 809 810 * xregex.c (re_search_2): doc fix (from rms). 811 812Fri Jan 31 09:52:04 1992 Karl Berry (karl at hayley) 813 814 * xregex.texinfo (GNU Searching): clarify the range arg. 815 816 * xregex.c (re_match_2, at_endline_op_p): add extra parens to 817 get rid of GCC 2's (silly, IMHO) warning about && within ||. 818 819 * xregex.c (common_op_match_empty_string_p): use 820 MATCH_NOTHING_UNSET_VALUE, not -1. 821 822Thu Jan 16 08:43:02 1992 Karl Berry (karl at hayley) 823 824 * xregex.c (SET_REGS_MATCHED): only set the registers from 825 lowest to highest. 826 827 * regexinc.c (MIN): new macro. 828 * xregex.c (re_match_2): only check min (num_regs, 829 regs->num_regs) when we set the returned regs. 830 831 * xregex.c (re_match_2): set registers after the first 832 num_regs to -1 before we return. 833 834Tue Jan 14 16:01:42 1992 Karl Berry (karl at hayley) 835 836 * xregex.c (re_match_2): initialize max (RE_NREGS, re_nsub + 1) 837 registers (from rms). 838 839 * xregex.c, regex.h: don't abbreviate `19xx' to `xx'. 840 841 * regexinc.c [!emacs]: include <sys/types.h> before <unistd.h>. 842 (from ro@thp.Uni-Koeln.DE). 843 844Thu Jan 9 07:23:00 1992 Karl Berry (karl at hayley) 845 846 * xregex.c (*unmatchable): rename to `match_empty_string_p'. 847 (CAN_MATCH_NOTHING): rename to `REG_MATCH_EMPTY_STRING_P'. 848 849 * regexinc.c (malloc, realloc): remove prototypes, as they can 850 cause clashes (from rms). 851 852Mon Jan 6 12:43:24 1992 Karl Berry (karl at claude.cs.umb.edu) 853 854 * Version 0.2. 855 856Sun Jan 5 10:50:38 1992 Karl Berry (karl at hayley) 857 858 * xregex.texinfo: bring more or less up-to-date. 859 * GNUmakefile (regex.texinfo): generate from regex.h and 860 xregex.texinfo. 861 * include.awk: new file. 862 863 * xregex.c: change all calls to the fn extract_number_and_incr 864 to the macro. 865 866 * xregex.c (re_match_2) [emacs]: in at_dot, use PTR_CHAR_POS + 1, 867 instead of bf_* and sl_*. Cast d to unsigned char *, to match 868 the declaration in Emacs' buffer.h. 869 [emacs19]: in before_dot, at_dot, and after_dot, likewise. 870 871 * regexinc.c: unconditionally include <sys/types.h>. 872 873 * regexinc.c (alloca) [!alloca]: Emacs config files sometimes 874 define this, so don't define it if it's already defined. 875 876Sun Jan 5 06:06:53 1992 Karl Berry (karl at fosse) 877 878 * xregex.c (re_comp): fix type conflicts with regex_compile (we 879 haven't been compiling this). 880 881 * regexinc.c (SIGN_EXTEND_CHAR): use `__CHAR_UNSIGNED__', not 882 `CHAR_UNSIGNED'. 883 884 * regexinc.c (NULL) [!NULL]: define it (as zero). 885 886 * regexinc.c (extract_number): remove the temporaries. 887 888Sun Jan 5 07:50:14 1992 Karl Berry (karl at hayley) 889 890 * regex.h (regerror) [!__STDC__]: return a size_t, not a size_t *. 891 892 * xregex.c (PUSH_FAILURE_POINT, ...): declare `destination' as 893 `char *' instead of `void *', to match alloca declaration. 894 895 * xregex.c (regerror): use `size_t' for the intermediate values 896 as well as the return type. 897 898 * xregex.c (regexec): cast the result of malloc. 899 900 * xregex.c (regexec): don't initialize `private_preg' in the 901 declaration, as old C compilers can't do that. 902 903 * xregex.c (main) [test]: declare printchar void. 904 905 * xregex.c (assert) [!DEBUG]: define this to do nothing, and 906 remove #ifdef DEBUG's from around asserts. 907 908 * xregex.c (re_match_2): remove error message when not debugging. 909 910Sat Jan 4 09:45:29 1992 Karl Berry (karl at hayley) 911 912 * other.c: test the bizarre duplicate case in re_compile_fastmap 913 that I just noticed. 914 915 * test.c (general_test): don't test registers beyond the end of 916 correct_regs, as well as regs. 917 918 * xregex.c (regex_compile): at handle_close, don't assign to 919 *inner_group_loc if we didn't push a start_memory (because the 920 group number was too big). In fact, don't push or pop the 921 inner_group_offset in that case. 922 923 * regex.c: rename to xregex.c, since it's not the whole thing. 924 * regex.texinfo: likewise. 925 * GNUmakefile: change to match. 926 927 * regex.c [DEBUG]: only include <stdio.h> if debugging. 928 929 * regexinc.c (SIGN_EXTEND_CHAR) [CHAR_UNSIGNED]: if it's already 930 defined, don't redefine it. 931 932 * regex.c: define _GNU_SOURCE at the beginning. 933 * regexinc.c (isblank) [!isblank]: define it. 934 (isgraph) [!isgraph]: change conditional to this, and remove the 935 sequent stuff. 936 937 * regex.c (regex_compile): add `blank' character class. 938 939 * regex.c (regex_compile): don't use a uchar variable to loop 940 through all characters. 941 942 * regex.c (regex_compile): at '[', improve logic for checking 943 that we have enough space for the charset. 944 945 * regex.h (struct re_pattern_buffer): declare translate as char 946 * again. We only use it as an array subscript once, I think. 947 948 * regex.c (TRANSLATE): new macro to cast the data character 949 before subscripting. 950 (num_internal_regs): rename to `num_regs'. 951 952Fri Jan 3 07:58:01 1992 Karl Berry (karl at hayley) 953 954 * regex.h (struct re_pattern_buffer): declare `allocated' and 955 `used' as unsigned long, since these are never negative. 956 957 * regex.c (compile_stack_element): rename to compile_stack_elt_t. 958 (failure_stack_element): similarly. 959 960 * regexinc.c (TALLOC, RETALLOC): new macros to simplify 961 allocation of arrays. 962 963 * regex.h (re_*) [__STDC__]: don't declare string args unsigned 964 char *; that makes them incompatible with string constants. 965 (struct re_pattern_buffer): declare the pattern and translate 966 table as unsigned char *. 967 * regex.c (most routines): use unsigned char vs. char consistently. 968 969 * regex.h (re_compile_pattern): do not declare the length arg as 970 const. 971 * regex.c (re_compile_pattern): likewise. 972 973 * regex.c (POINTER_TO_REG): rename to `POINTER_TO_OFFSET'. 974 975 * regex.h (re_registers): declare `start' and `end' as 976 `regoff_t', instead of `int'. 977 978 * regex.c (regexec): if either of the malloc's for the register 979 information fail, return failure. 980 981 * regex.h (RE_NREGS): define this again, as 30 (from jla). 982 (RE_ALLOCATE_REGISTERS): remove this. 983 (RE_SYNTAX_*): remove it from definitions. 984 (re_pattern_buffer): remove `return_default_num_regs', add 985 `caller_allocated_regs'. 986 * regex.c (re_compile_pattern): clear no_sub and 987 caller_allocated_regs in the pattern. 988 (regcomp): set caller_allocated_regs. 989 (re_match_2): do all register allocation at the end of the 990 match; implement new semantics. 991 992 * regex.c (MAX_REGNUM): new macro. 993 (regex_compile): at handle_open and handle_close, if the group 994 number is too large, don't push the start/stop memory. 995 996Thu Jan 2 07:56:10 1992 Karl Berry (karl at hayley) 997 998 * regex.c (re_match_2): if the back reference is to a group that 999 never matched, then goto fail, not really_fail. Also, don't 1000 test if the pattern can match the empty string. Why did we 1001 ever do that? 1002 (really_fail): this label no longer needed. 1003 1004 * regexinc.c [STDC_HEADERS]: use only this to test if we should 1005 include <stdlib.h>. 1006 1007 * regex.c (DO_RANGE, regex_compile): translate in all cases 1008 except the single character after a \. 1009 1010 * regex.h (RE_AWK_CLASS_HACK): rename to 1011 RE_BACKSLASH_ESCAPE_IN_LISTS. 1012 * regex.c (regex_compile): change use. 1013 1014 * regex.c (re_compile_fastmap): do not translate the characters 1015 again; we already translated them at compilation. (From ylo@ngs.fi.) 1016 1017 * regex.c (re_match_2): in case for at_dot, invert sense of 1018 comparison and find the character number properly. (From 1019 worley@compass.com.) 1020 (re_match_2) [emacs]: remove the cases for before_dot and 1021 after_dot, since there's no way to specify them, and the code 1022 is wrong (judging from this change). 1023 1024Wed Jan 1 09:13:38 1992 Karl Berry (karl at hayley) 1025 1026 * psx-{interf,basic,extend}.c, other.c: set `t' as the first 1027 thing, so that if we run them in sucession, general_test's 1028 kludge to see if we're doing POSIX tests works. 1029 1030 * test.h (test_type): add `all_test'. 1031 * main.c: add case for `all_test'. 1032 1033 * regexinc.c (partial_compiled_pattern_printer, 1034 double_string_printer): don't print anything if we're passed null. 1035 1036 * regex.c (PUSH_FAILURE_POINT): do not scan for the highest and 1037 lowest active registers. 1038 (re_match_2): compute lowest/highest active regs at start_memory and 1039 stop_memory. 1040 (NO_{LOW,HIGH}EST_ACTIVE_REG): new sentinel values. 1041 (pop_failure_point): return the lowest/highest active reg values 1042 popped; change calls. 1043 1044 * regex.c [DEBUG]: include <assert.h>. 1045 (various routines) [DEBUG]: change conditionals to assertions. 1046 1047 * regex.c (DEBUG_STATEMENT): new macro. 1048 (PUSH_FAILURE_POINT): use it to increment num_regs_pushed. 1049 (re_match_2) [DEBUG]: only declare num_regs_pushed if DEBUG. 1050 1051 * regex.c (*can_match_nothing): rename to *unmatchable. 1052 1053 * regex.c (re_match_2): at stop_memory, adjust argument reading. 1054 1055 * regex.h (re_pattern_buffer): declare `can_be_null' as a 2-bit 1056 bit field. 1057 1058 * regex.h (re_pattern_buffer): declare `buffer' unsigned char *; 1059 no, dumb idea. The pattern can have signed number. 1060 1061 * regex.c (re_match_2): in maybe_pop_jump case, skip over the 1062 right number of args to the group operators, and don't do 1063 anything with endline if posix_newline is not set. 1064 1065 * regex.c, regexinc.c (all the things we just changed): go back 1066 to putting the inner group count after the start_memory, 1067 because we need it in the on_failure_jump case in re_match_2. 1068 But leave it after the stop_memory also, since we need it 1069 there in re_match_2, and we don't have any way of getting back 1070 to the start_memory. 1071 1072 * regexinc.c (partial_compiled_pattern_printer): adjust argument 1073 reading for start/stop_memory. 1074 * regex.c (re_compile_fastmap, group_can_match_nothing): likewise. 1075 1076Tue Dec 31 10:15:08 1991 Karl Berry (karl at hayley) 1077 1078 * regex.c (bits list routines): remove these. 1079 (re_match_2): get the number of inner groups from the pattern, 1080 instead of keeping track of it at start and stop_memory. 1081 Put the count after the stop_memory, not after the 1082 start_memory. 1083 (compile_stack_element): remove `fixup_inner_group' member, 1084 since we now put it in when we can compute it. 1085 (regex_compile): at handle_open, don't push the inner group 1086 offset, and at handle_close, don't pop it. 1087 1088 * regex.c (level routines): remove these, and their uses in 1089 regex_compile. This was another manifestation of having to find 1090 $'s that were endlines. 1091 1092 * regex.c (regexec): this does searching, not matching (a 1093 well-disguised part of the standard). So rewrite to use 1094 `re_search' instead of `re_match'. 1095 * psx-interf.c (test_regexec): add tests to, uh, match. 1096 1097 * regex.h (RE_TIGHT_ALT): remove this; nobody uses it. 1098 * regex.c: remove the code that was supposed to implement it. 1099 1100 * other.c (test_others): ^ and $ never match newline characters; 1101 RE_CONTEXT_INVALID_OPS doesn't affect anchors. 1102 1103 * psx-interf.c (test_regerror): update for new error messages. 1104 1105 * psx-extend.c: it's now ok to have an alternative be just a $, 1106 so remove all the tests which supposed that was invalid. 1107 1108Wed Dec 25 09:00:05 1991 Karl Berry (karl at hayley) 1109 1110 * regex.c (regex_compile): in handle_open, don't skip over ^ and 1111 $ when checking for an empty group. POSIX has changed the 1112 grammar. 1113 * psx-extend.c (test_posix_extended): thus, move (^$) tests to 1114 valid section. 1115 1116 * regexinc.c (boolean): move from here to test.h and regex.c. 1117 * test files: declare verbose, omit_register_tests, and 1118 test_should_match as boolean. 1119 1120 * psx-interf.c (test_posix_c_interface): remove the `c_'. 1121 * main.c: likewise. 1122 1123 * psx-basic.c (test_posix_basic): ^ ($) is an anchor after 1124 (before) an open (close) group. 1125 1126 * regex.c (re_match_2): in endline, correct precedence of 1127 posix_newline condition. 1128 1129Tue Dec 24 06:45:11 1991 Karl Berry (karl at hayley) 1130 1131 * test.h: incorporate private-tst.h. 1132 * test files: include test.h, not private-tst.h. 1133 1134 * test.c (general_test): set posix_newline to zero if we are 1135 doing POSIX tests (unfortunately, it's difficult to call 1136 regcomp in this case, which is what we should really be doing). 1137 1138 * regex.h (reg_syntax_t): make this an enumeration type which 1139 defines the syntax bits; renames re_syntax_t. 1140 1141 * regex.c (at_endline_op_p): don't preincrement p; then if it's 1142 not an empty string op, we lose. 1143 1144 * regex.h (reg_errcode_t): new enumeration type of the error 1145 codes. 1146 * regex.c (regex_compile): return that type. 1147 1148 * regex.c (regex_compile): in [, initialize 1149 just_had_a_char_class to false; somehow I had changed this to 1150 true. 1151 1152 * regex.h (RE_NO_CONSECUTIVE_REPEATS): remove this, since we 1153 don't use it, and POSIX doesn't require this behavior anymore. 1154 * regex.c (regex_compile): remove it from here. 1155 1156 * regex.c (regex_compile): remove the no_op insertions for 1157 verify_and_adjust_endlines, since that doesn't exist anymore. 1158 1159 * regex.c (regex_compile) [DEBUG]: use printchar to print the 1160 pattern, so unprintable bytes will print properly. 1161 1162 * regex.c: move re_error_msg back. 1163 * test.c (general_test): print the compile error if the pattern 1164 was invalid. 1165 1166Mon Dec 23 08:54:53 1991 Karl Berry (karl at hayley) 1167 1168 * regexinc.c: move re_error_msg here. 1169 1170 * regex.c (re_error_msg): the ``message'' for success must be 1171 NULL, to keep the interface to re_compile_pattern the same. 1172 (regerror): if the msg is null, use "Success". 1173 1174 * rename most test files for consistency. Change Makefile 1175 correspondingly. 1176 1177 * test.c (most routines): add casts to (unsigned char *) when we 1178 call re_{match,search}{,_2}. 1179 1180Sun Dec 22 09:26:06 1991 Karl Berry (karl at hayley) 1181 1182 * regex.c (re_match_2): declare string args as unsigned char * 1183 again; don't declare non-pointer args const; declare the 1184 pattern buffer const. 1185 (re_match): likewise. 1186 (re_search_2, re_search): likewise, except don't declare the 1187 pattern const, since we make a fastmap. 1188 * regex.h [__STDC__]: change prototypes. 1189 1190 * regex.c (regex_compile): return an error code, not a string. 1191 (re_err_list): new table to map from error codes to string. 1192 (re_compile_pattern): return an element of re_err_list. 1193 (regcomp): don't test all the strings. 1194 (regerror): just use the list. 1195 (put_in_buffer): remove this. 1196 1197 * regex.c (equivalent_failure_points): remove this. 1198 1199 * regex.c (re_match_2): don't copy the string arguments into 1200 non-const pointers. We never alter the data. 1201 1202 * regex.c (re_match_2): move assignment to `is_a_jump_n' out of 1203 the main loop. Just initialize it right before we do 1204 something with it. 1205 1206 * regex.[ch] (re_match_2): don't declare the int parameters const. 1207 1208Sat Dec 21 08:52:20 1991 Karl Berry (karl at hayley) 1209 1210 * regex.h (re_syntax_t): new type; declare to be unsigned 1211 (previously we used int, but since we do bit operations on 1212 this, unsigned is better, according to H&S). 1213 (obscure_syntax, re_pattern_buffer): use that type. 1214 * regex.c (re_set_syntax, regex_compile): likewise. 1215 1216 * regex.h (re_pattern_buffer): new field `posix_newline'. 1217 * regex.c (re_comp, re_compile_pattern): set to zero. 1218 (regcomp): set to REG_NEWLINE. 1219 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): remove this (we can just 1220 check `posix_newline' instead.) 1221 1222 * regex.c (op_list_type, op_list, add_op): remove these. 1223 (verify_and_adjust_endlines): remove this. 1224 (pattern_offset_list_type, *pattern_offset* routines): and these. 1225 These things all implemented the nonleading/nontrailing position 1226 code, which was very long, had a few remaining problems, and 1227 is no longer needed. So... 1228 1229 * regexinc.c (STREQ): new macro to abbreviate strcmp(,)==0, for 1230 brevity. Change various places in regex.c to use it. 1231 1232 * regex{,inc}.c (enum regexpcode): change to a typedef 1233 re_opcode_t, for brevity. 1234 1235 * regex.h (re_syntax_table) [SYNTAX_TABLE]: remove this; it 1236 should only be in regex.c, I think, since we don't define it 1237 in this case. Maybe it should be conditional on !SYNTAX_TABLE? 1238 1239 * regexinc.c (partial_compiled_pattern_printer): simplify and 1240 distinguish the emacs/not-emacs (not)wordchar cases. 1241 1242Fri Dec 20 08:11:38 1991 Karl Berry (karl at hayley) 1243 1244 * regexinc.c (regexpcode) [emacs]: only define the Emacs opcodes 1245 if we are ifdef emacs. 1246 1247 * regex.c (BUF_PUSH*): rename to PAT_PUSH*. 1248 1249 * regex.c (regex_compile): in $ case, go back to essentially the 1250 original code for deciding endline op vs. normal char. 1251 (at_endline_op_p): new routine. 1252 * regex.h (RE_ANCHORS_ONLY_AT_ENDS, RE_CONTEXT_INVALID_ANCHORS, 1253 RE_REPEATED_ANCHORS_AWAY, RE_NO_ANCHOR_AT_NEWLINE): remove 1254 these. POSIX has simplified the rules for anchors in draft 1255 11.2. 1256 (RE_NEWLINE_ORDINARY): new syntax bit. 1257 (RE_CONTEXT_INDEP_ANCHORS): change description to be compatible 1258 with POSIX. 1259 * regex.texinfo (Syntax Bits): remove the descriptions. 1260 1261Mon Dec 16 08:12:40 1991 Karl Berry (karl at hayley) 1262 1263 * regex.c (re_match_2): in jump_past_next_alt, unconditionally 1264 goto no_pop. The only register we were finding was one which 1265 enclosed the whole alternative expression, not one around an 1266 individual alternative. So we were never doing what we 1267 thought we were doing, and this way makes (|a) against the 1268 empty string fail. 1269 1270 * regex.c (regex_compile): remove `highest_ever_regnum', and 1271 don't restore regnum from the stack; just put it into a 1272 temporary to put into the stop_memory. Otherwise, groups 1273 aren't numbered consecutively. 1274 1275 * regex.c (is_in_compile_stack): rename to 1276 `group_in_compile_stack'; remove unnecessary test for the 1277 stack being empty. 1278 1279 * regex.c (re_match_2): in on_failure_jump, skip no_op's before 1280 checking for the start_memory, in case we were called from 1281 succeed_n. 1282 1283Sun Dec 15 16:20:48 1991 Karl Berry (karl at hayley) 1284 1285 * regex.c (regex_compile): in duplicate case, use 1286 highest_ever_regnum instead of regnum, since the latter is 1287 reverted at stop_memory. 1288 1289 * regex.c (re_match_2): in on_failure_jump, if the * applied to 1290 a group, save the information for that group and all inner 1291 groups (by making it active), even though we're not inside it 1292 yet. 1293 1294Sat Dec 14 09:50:59 1991 Karl Berry (karl at hayley) 1295 1296 * regex.c (PUSH_FAILURE_ITEM, POP_FAILURE_ITEM): new macros. 1297 Use them instead of copying the stack manipulating a zillion 1298 times. 1299 1300 * regex.c (PUSH_FAILURE_POINT, pop_failure_point) [DEBUG]: save 1301 and restore a unique identification value for each failure point. 1302 1303 * regexinc.c (partial_compiled_pattern_printer): don't print an 1304 extra / after duplicate commands. 1305 1306 * regex.c (regex_compile): in back-reference case, allow a back 1307 reference to register `regnum'. Otherwise, even `\(\)\1' 1308 fails, since regnum is 1 at the back-reference. 1309 1310 * regex.c (re_match_2): in fail, don't examine the pattern if we 1311 restored to pend. 1312 1313 * test_private.h: rename to private_tst.h. Change includes. 1314 1315 * regex.c (extend_bits_list): compute existing size for realloc 1316 in bytes, not blocks. 1317 1318 * regex.c (re_match_2): in jump_past_next_alt, the for loop was 1319 missing its (empty) statement. Even so, some register tests 1320 still fail, although in a different way than in the previous change. 1321 1322Fri Dec 13 15:55:08 1991 Karl Berry (karl at hayley) 1323 1324 * regex.c (re_match_2): in jump_past_next_alt, unconditionally 1325 goto no_pop, since we weren't properly detecting if the 1326 alternative matched something anyway. No, we need to not jump 1327 to keep the register values correct; just change to not look at 1328 register zero and not test RE_NO_EMPTY_ALTS (which is a 1329 compile-time thing). 1330 1331 * regex.c (SET_REGS_MATCHED): start the loop at 1, since we never 1332 care about register zero until the very end. (I think.) 1333 1334 * regex.c (PUSH_FAILURE_POINT, pop_failure_point): go back to 1335 pushing and popping the active registers, instead of only doing 1336 the registers before a group: (fooq|fo|o)*qbar against fooqbar 1337 fails, since we restore back into the middle of group 1, yet it 1338 isn't active, because the previous restore clobbered the active flag. 1339 1340Thu Dec 12 17:25:36 1991 Karl Berry (karl at hayley) 1341 1342 * regex.c (PUSH_FAILURE_POINT): do not call 1343 `equivalent_failure_points' after all; it causes the registers 1344 to be ``wrong'' (according to POSIX), and an infinite loop on 1345 `((a*)*)*' against `ab'. 1346 1347 * regex.c (re_compile_fastmap): don't push `pend' on the failure 1348 stack. 1349 1350Tue Dec 10 10:30:03 1991 Karl Berry (karl at hayley) 1351 1352 * regex.c (PUSH_FAILURE_POINT): if pushing same failure point that 1353 is on the top of the stack, fail. 1354 (equivalent_failure_points): new routine. 1355 1356 * regex.c (re_match_2): add debug statements for every opcode we 1357 execute. 1358 1359 * regex.c (regex_compile/handle_close): restore 1360 `fixup_inner_group_count' and `regnum' from the stack. 1361 1362Mon Dec 9 13:51:15 1991 Karl Berry (karl at hayley) 1363 1364 * regex.c (PUSH_FAILURE_POINT): declare `this_reg' as int, so 1365 unsigned arithmetic doesn't happen when we don't want to save 1366 the registers. 1367 1368Tue Dec 3 08:11:10 1991 Karl Berry (karl at hayley) 1369 1370 * regex.c (extend_bits_list): divide size by bits/block. 1371 1372 * regex.c (init_bits_list): remove redundant assignmen to 1373 `bits_list_ptr'. 1374 1375 * regexinc.c (partial_compiled_pattern_printer): don't do *p++ 1376 twice in the same expr. 1377 1378 * regex.c (re_match_2): at on_failure_jump, use the correct 1379 pattern positions for getting the stuff following the start_memory. 1380 1381 * regex.c (struct register_info): remove the bits_list for the 1382 inner groups; make that a separate variable. 1383 1384Mon Dec 2 10:42:07 1991 Karl Berry (karl at hayley) 1385 1386 * regex.c (PUSH_FAILURE_POINT): don't pass `failure_stack' as an 1387 arg; change callers. 1388 1389 * regex.c (PUSH_FAILURE_POINT): print items in order they are 1390 pushed. 1391 (pop_failure_point): likewise. 1392 1393 * regex.c (main): prompt for the pattern and string. 1394 1395 * regex.c (FREE_VARIABLES) [!REGEX_MALLOC]: declare as nothing; 1396 remove #ifdefs from around calls. 1397 1398 * regex.c (extract_number, extract_number_and_incr): declare static. 1399 1400 * regex.c: remove the canned main program. 1401 * main.c: new file. 1402 * Makefile (COMMON): add main.o. 1403 1404Tue Sep 24 06:26:51 1991 Kathy Hargreaves (kathy at fosse) 1405 1406 * regex.c (re_match_2): Made `pend' and `dend' not register variables. 1407 Only set string2 to string1 if string1 isn't null. 1408 Send address of p, d, regstart, regend, and reg_info to 1409 pop_failure_point. 1410 Put in more debug statements. 1411 1412 * regex.c [debug]: Added global variable. 1413 (DEBUG_*PRINT*): Only print if `debug' is true. 1414 (DEBUG_DOUBLE_STRING_PRINTER): Changed DEBUG_STRING_PRINTER's 1415 name to this. 1416 Changed some comments. 1417 (PUSH_FAILURE_POINT): Moved and added some debugging statements. 1418 Was saving regstart on the stack twice instead of saving both 1419 regstart and regend; remedied this. 1420 [NUM_REGS_ITEMS]: Changed from 3 to 4, as now save lowest and 1421 highest active registers instead of highest used one. 1422 [NUM_NON_REG_ITEMS]: Changed name of NUM_OTHER_ITEMS to this. 1423 (NUM_FAILURE_ITEMS): Use active registers instead of number 0 1424 through highest used one. 1425 (re_match_2): Have pop_failure_point put things in the variables. 1426 (pop_failure_point): Have it do what the fail case in re_match_2 1427 did with the failure stack, instead of throwing away the stuff 1428 popped off. re_match_2 can ignore results when it doesn't 1429 need them. 1430 1431 1432Thu Sep 5 13:23:28 1991 Kathy Hargreaves (kathy at fosse) 1433 1434 * regex.c (banner): Changed copyright years to be separate. 1435 1436 * regex.c [CHAR_UNSIGNED]: Put __ at both ends of this name. 1437 [DEBUG, debug_count, *debug_p, DEBUG_PRINT_1, DEBUG_PRINT_2, 1438 DEBUG_COMPILED_PATTERN_PRINTER ,DEBUG_STRING_PRINTER]: 1439 defined these for debugging. 1440 (extract_number): Added this (debuggable) routine version of 1441 the macro EXTRACT_NUMBER. Ditto for EXTRACT_NUMBER_AND_INCR. 1442 (re_compile_pattern): Set return_default_num_regs if the 1443 syntax bit RE_ALLOCATE_REGISTERS is set. 1444 [REGEX_MALLOC]: Renamed USE_ALLOCA to this. 1445 (BUF_POP): Got rid of this, as don't ever use it. 1446 (regex_compile): Made the type of `pattern' not be register. 1447 If DEBUG, print the pattern to compile. 1448 (re_match_2): If had a `$' in the pattern before a `^' then 1449 don't record the `^' as an anchor. 1450 Put (enum regexpcode) before references to b, as suggested 1451 [RE_NO_BK_BRACES]: Changed RE_NO_BK_CURLY_BRACES to this. 1452 (remove_pattern_offset): Removed this unused routine. 1453 (PUSH_FAILURE_POINT): Changed to only save active registers. 1454 Put in debugging statements. 1455 (re_compile_fastmap): Made `pattern' not a register variable. 1456 Use routine for extracting numbers instead of macro. 1457 (re_match_2): Made `p', `mcnt' and `mcnt2' not register variables. 1458 Added `num_regs_pushed' for debugging. 1459 Only malloc registers if the syntax bit RE_ALLOCATE_REGISTERS is set. 1460 Put in debug statements. 1461 Put the macro NOTE_INNER_GROUP's code inline, as it was the 1462 only called in one place. 1463 For debugging, extract numbers using routines instead of macros. 1464 In case fail: only restore pushed active registers, and added 1465 debugging statements. 1466 (pop_failure_point): Test for underfull stack. 1467 (group_can_match_nothing, common_op_can_match_nothing): For 1468 debugging, extract numbers using routines instead of macros. 1469 (regexec): Changed formal parameters to not be prototypes. 1470 Don't initialize `regs' or `private_preg' in their declarations. 1471 1472Tue Jul 23 18:38:36 1991 Kathy Hargreaves (kathy at hayley) 1473 1474 * regex.h [RE_CONTEX_INDEP_OPS]: Moved the anchor stuff out of 1475 this bit. 1476 [RE_UNMATCHED_RIGHT_PAREN_ORD]: Defined this bit. 1477 [RE_CONTEXT_INVALID_ANCHORS]: Defined this bit. 1478 [RE_CONTEXT_INDEP_ANCHORS]: Defined this bit. 1479 Added RE_CONTEXT_INDEP_ANCHORS to all syntaxes which had 1480 RE_CONTEXT_INDEP_OPS. 1481 Took RE_ANCHORS_ONLY_AT_ENDS out of the POSIX basic syntax. 1482 Added RE_UNMATCHED_RIGHT_PAREN_ORD to the POSIX extended 1483 syntax. 1484 Took RE_REPEATED_ANCHORS_AWAY out of the POSIX extended syntax. 1485 Defined REG_NOERROR (which will probably have to go away again). 1486 Changed the type `off_t' to `regoff_t'. 1487 1488 * regex.c: Changed some commments. 1489 (regex_compile): Added variable `had_an_endline' to keep track 1490 of if hit a `$' since the beginning of the pattern or the last 1491 alternative (if any). 1492 Changed RE_CONTEXT_INVALID_OPS and RE_CONTEXT_INDEP_OPS to 1493 RE_CONTEXT_INVALID_ANCHORS and RE_CONTEXT_INDEP_ANCHORS where 1494 appropriate. 1495 Put a `no_op' in the pattern if a repeat is only zero or one 1496 times; in this case and if it is many times (whereupon a jump 1497 backwards is pushed instead), keep track of the operator for 1498 verify_and_adjust_endlines. 1499 If RE_UNMATCHED_RIGHT_PAREN is set, make an unmatched 1500 close-group operator match `)'. 1501 Changed all error exits to exit (1). 1502 (remove_pattern_offset): Added this routine, but don't use it. 1503 (verify_and_adjust_endlines): At top of routine, if initialize 1504 routines run out of memory, return true after setting 1505 enough_memory false. 1506 At end of endline, et al. case, don't set *p to no_op. 1507 Repetition operators also set the level and active groups' 1508 match statuses, unless RE_REPEATED_ANCHORS_AWAY is set. 1509 (get_group_match_status): Put a return in front of call to get_bit. 1510 (re_compile_fastmap): Changed is_a_succeed_n to a boolean. 1511 If at end of pattern, then if the failure stack isn't empty, 1512 go back to the failure point. 1513 In *jump* case, only pop the stack if what's on top of it is 1514 where we've just jumped to. 1515 (re_search_2): Return -2 instead of val if val is -2. 1516 (group_can_match_nothing, alternative_can_match_nothing, 1517 common_op_can-match_nothing): Now pass in reg_info for the 1518 `duplicate' case. 1519 (re_match_2): Don't skip over the next alternative also if 1520 empty alternatives aren't allowed. 1521 In fail case, if failed to a backwards jump that's part of a 1522 repetition loop, pop the current failure point and use the 1523 next one. 1524 (pop_failure_point): Check that there's as many register items 1525 on the failure stack as the stack says there are. 1526 (common_op_can_match_nothing): Added variables `ret' and 1527 `reg_no' so can set reg_info for the group encountered. 1528 Also break without doing anything if hit a no_op or the other 1529 kinds of `endline's. 1530 If not done already, set reg_info in start_memory case. 1531 Put in no_pop_jump for an optimized succeed_n of zero repetitions. 1532 In succeed_n case, if the number isn't zero, then return false. 1533 Added `duplicate' case. 1534 1535Sat Jul 13 11:27:38 1991 Kathy Hargreaves (kathy at hayley) 1536 1537 * regex.h (REG_NOERROR): Added this error code definition. 1538 1539 * regex.c: Took some redundant parens out of macros. 1540 (enum regexpcode): Added jump_past_next_alt. 1541 Wrapped some macros in `do..while (0)'. 1542 Changed some comments. 1543 (regex_compile): Use `fixup_alt_jump' instead of `fixup_jump'. 1544 Use `maybe_pop_jump' instead of `maybe_pop_failure_jump'. 1545 Use `jump_past_next_alt' instead of `no_pop_jump' when at the 1546 end of an alternative. 1547 (re_match_2): Used REGEX_ALLOCATE for the registers stuff. 1548 In stop_memory case: Add more boolean tests to see if the 1549 group is in a loop. 1550 Added jump_past_next_alt case, which doesn't jump over the 1551 next alternative if the last one didn't match anything. 1552 Unfortunately, to make this work with, e.g., `(a+?*|b)*' 1553 against `bb', I also had to pop the alternative's failure 1554 point, which in turn broke backtracking! 1555 In fail case: Detect a dummy failure point by looking at 1556 failure_stack.avail - 2, not stack[-2]. 1557 (pop_failure_point): Only pop if the stack isn't empty; don't 1558 give an error if it is. (Not sure yet this is correct.) 1559 (group_can_match_nothing): Make it return a boolean instead of int. 1560 Make it take an argument indicating the end of where it should look. 1561 If find a group that can match nothing, set the pointer 1562 argument to past the group in the pattern. 1563 Took out cases which can share with alternative_can_match_nothing 1564 and call common_op_can_match_nothing. 1565 Took ++ out of switch, so could call common_op_can_match_nothing. 1566 Wrote lots more for on_failure_jump case to handle alternatives. 1567 Main loop now doesn't look for matching stop_memory, but 1568 rather the argument END; return true if hit the matching 1569 stop_memory; this way can call itself for inner groups. 1570 (alternative_can_match_nothing): Added for alternatives. 1571 (common_op_can_match_nothing): Added for previous two routines' 1572 common operators. 1573 (regerror): Returns a message saying there's no error if gets 1574 sent REG_NOERROR. 1575 1576Wed Jul 3 10:43:15 1991 Kathy Hargreaves (kathy at hayley) 1577 1578 * regex.c: Removed unnecessary enclosing parens from several macros. 1579 Put `do..while (0)' around a few. 1580 Corrected some comments. 1581 (INIT_FAILURE_STACK_SIZE): Deleted in favor of using 1582 INIT_FAILURE_ALLOC. 1583 (INIT_FAILURE_STACK, DOUBLE_FAILURE_STACK, PUSH_PATTERN_OP, 1584 PUSH_FAILURE_POINT): Made routines of the same name (but with all 1585 lowercase letters) into these macros, so could use `alloca' 1586 when USE_ALLOCA is defined. The reason is stated below for 1587 bits lists. Deleted analogous routines. 1588 (re_compile_fastmap): Added variable void *destination for 1589 PUSH_PATTERN_OP. 1590 (re_match_2): Added variable void *destination for REGEX_REALLOCATE. 1591 Used the failure stack macros in place of the routines. 1592 Detected a dummy failure point by inspecting the failure stack's 1593 (avail - 2)th element, not failure_stack.stack[-2]. This bug 1594 arose when used the failure stack macros instead of the routines. 1595 1596 * regex.c [USE_ALLOCA]: Put this conditional around previous 1597 alloca stuff and defined these to work differently depending 1598 on whether or not USE_ALLOCA is defined: 1599 (REGEX_ALLOCATE): Uses either `alloca' or `malloc'. 1600 (REGEX_REALLOCATE): Uses either `alloca' or `realloc'. 1601 (INIT_BITS_LIST, EXTEND_BITS_LIST, SET_BIT_TO_VALUE): Defined 1602 macro versions of routines with the same name (only with all 1603 lowercase letters) so could use `alloc' in re_match_2. This 1604 is to prevent core leaks when C-g is used in Emacs and to make 1605 things faster and avoid storage fragmentation. These things 1606 have to be macros because the results of `alloca' go away with 1607 the routine by which it's called. 1608 (BITS_BLOCK_SIZE, BITS_BLOCK, BITS_MASK): Moved to above the 1609 above-mentioned macros instead of before the routines defined 1610 below regex_compile. 1611 (set_bit_to_value): Compacted some code. 1612 (reg_info_type): Changed inner_groups field to be bits_list_type 1613 so could be arbitrarily long and thus handle arbitrary nesting. 1614 (NOTE_INNER_GROUP): Put `do...while (0)' around it so could 1615 use as a statement. 1616 Changed code to use bits lists. 1617 Added variable void *destination for REGEX_REALLOCATE (whose call 1618 is several levels in). 1619 Changed variable name of `this_bit' to `this_reg'. 1620 (FREE_VARIABLES): Only define and use if USE_ALLOCA is defined. 1621 (re_match_2): Use REGEX_ALLOCATE instead of malloc. 1622 Instead of setting INNER_GROUPS of reg_info to zero, have to 1623 use INIT_BITS_LIST and return -2 (and free variables if 1624 USE_ALLOCA isn't defined) if it fails. 1625 1626Fri Jun 28 13:45:07 1991 Karl Berry (karl at hayley) 1627 1628 * regex.c (re_match_2): set value of `dend' when we restore `d'. 1629 1630 * regex.c: remove declaration of alloca. 1631 1632 * regex.c (MISSING_ISGRAPH): rename to `ISGRAPH_MISSING'. 1633 1634 * regex.h [_POSIX_SOURCE]: remove these conditionals; always 1635 define POSIX stuff. 1636 * regex.c (_POSIX_SOURCE): change conditionals to use `POSIX' 1637 instead. 1638 1639Sat Jun 1 16:56:50 1991 Kathy Hargreaves (kathy at hayley) 1640 1641 * regex.*: Changed RE_CONTEXTUAL_* to RE_CONTEXT_*, 1642 RE_TIGHT_VBAR to RE_TIGHT_ALT, RE_NEWLINE_OR to 1643 RE_NEWLINE_ALT, and RE_DOT_MATCHES_NEWLINE to RE_DOT_NEWLINE. 1644 1645Wed May 29 09:24:11 1991 Karl Berry (karl at hayley) 1646 1647 * regex.texinfo (POSIX Pattern Buffers): cross-reference the 1648 correct node name (Match-beginning-of-line, not ..._line). 1649 (Syntax Bits): put @code around all syntax bits. 1650 1651Sat May 18 16:29:58 1991 Karl Berry (karl at hayley) 1652 1653 * regex.c (global): add casts to keep broken compilers from 1654 complaining about malloc and realloc calls. 1655 1656 * regex.c (isgraph) [MISSING_ISGRAPH]: change test to this, 1657 instead of `#ifndef isgraph', since broken compilers can't 1658 have both a macro and a symbol by the same name. 1659 1660 * regex.c (re_comp, re_exec) [_POSIX_SOURCE]: do not define. 1661 (regcomp, regfree, regexec, regerror) [_POSIX_SOURCE && !emacs]: 1662 only define in this case. 1663 1664Mon May 6 17:37:04 1991 Kathy Hargreaves (kathy at hayley) 1665 1666 * regex.h (re_search, re_search_2): Changed BUFFER to not be const. 1667 1668 * regex.c (re_compile_pattern): `^' is in a leading position if 1669 it precedes a newline. 1670 (various routines): Added or changed header comments. 1671 (double_pattern_offsets_list): Changed name from 1672 `extend_pattern_offsets_list'. 1673 (adjust_pattern_offsets_list): Changed return value from 1674 unsigned to void. 1675 (verify_and_adjust_endlines): Now returns `true' and `false' 1676 instead of 1 and 0. 1677 `$' is in a leading position if it follows a newline. 1678 (set_bit_to_value, get_bit_value): Exit with error if POSITION < 0 1679 so now calling routines don't have to. 1680 (init_failure_stack, inspect_failure_stack_top, 1681 pop_failure_stack_top, push_pattern_op, double_failure_stack): 1682 Now return value unsigned instead of boolean. 1683 (re_search, re_search_2): Changed BUFP to not be const. 1684 (re_search_2): Added variable const `private_bufp' to send to 1685 re_match_2. 1686 (push_failure_point): Made return value unsigned instead of boolean. 1687 1688Sat May 4 15:32:22 1991 Kathy Hargreaves (kathy at hayley) 1689 1690 * regex.h (re_compile_fastmap): Added extern for this. 1691 Changed some comments. 1692 1693 * regex.c (re_compile_pattern): In case handle_bar: put invalid 1694 pattern test before levels matching stuff. 1695 Changed some commments. 1696 Added optimizing test for detecting an empty alternative that 1697 ends with a trailing '$' at the end of the pattern. 1698 (re_compile_fastmap): Moved failure_stack stuff to before this 1699 so could use it. Made its stack dynamic. 1700 Made it return an int so that it could return -2 if its stack 1701 couldn't be allocated. 1702 Added to header comment (about the return values). 1703 (init_failure_stack): Wrote so both re_match_2 and 1704 re_compile_fastmap could use it similar stacks. 1705 (double_failure_stack): Added for above reasons. 1706 (push_pattern_op): Wrote for re_compile_fastmap. 1707 (re_search_2): Now return -2 if re_compile_fastmap does. 1708 (re_match_2): Made regstart and regend type failure_stack_element*. 1709 (push_failure_point): Made pattern_place and string_place type 1710 failure_stack_element*. 1711 Call double_failure_stack now. 1712 Return true instead of 1. 1713 1714Wed May 1 12:57:21 1991 Kathy Hargreaves (kathy at hayley) 1715 1716 * regex.c (remove_intervening_anchors): Avoid erroneously making 1717 ops into no_op's by making them no_op only when they're beglines. 1718 (verify_and_adjust_endlines): Don't make '$' a normal character 1719 if it's before a newline. 1720 Look for the endline op in *p, not p[1]. 1721 (failure_stack_element): Added this declaration. 1722 (failure_stack_type): Added this declaration. 1723 (INIT_FAILURE_STACK_SIZE, FAILURE_STACK_EMPTY, 1724 FAILURE_STACK_PTR_EMPTY, REMAINING_AVAIL_SLOTS): Added for 1725 failure stack. 1726 (FAILURE_ITEM_SIZE, PUSH_FAILURE_POINT): Deleted. 1727 (FREE_VARIABLES): Now free failure_stack.stack instead of stackb. 1728 (re_match_2): deleted variables `initial_stack', `stackb', 1729 `stackp', and `stacke' and added `failure_stack' to replace them. 1730 Replaced calls to PUSH_FAILURE_POINT with those to 1731 push_failure_point. 1732 (push_failure_point): Added for re_match_2. 1733 (pop_failure_point): Rewrote to use a failure_stack_type of stack. 1734 (can_match_nothing): Moved definition to below re_match_2. 1735 (bcmp_translate): Moved definition to below re_match_2. 1736 1737Mon Apr 29 14:20:54 1991 Kathy Hargreaves (kathy at hayley) 1738 1739 * regex.c (enum regexpcode): Added codes endline_before_newline 1740 and repeated_endline_before_newline so could detect these 1741 types of endlines in the intermediate stages of a compiled 1742 pattern. 1743 (INIT_FAILURE_ALLOC): Renamed NFAILURES to this and set it to 5. 1744 (BUF_PUSH): Put `do {...} while 0' around this. 1745 (BUF_PUSH_2): Defined this to cut down on expansion of EXTEND_BUFFER. 1746 (regex_compile): Changed some comments. 1747 Now push endline_before_newline if find a `$' before a newline 1748 in the pattern. 1749 If a `$' might turn into an ordinary character, set laststart 1750 to point to it. 1751 In '^' case, if syntax bit RE_TIGHT_VBAR is set, then for `^' 1752 to be in a leading position, it must be first in the pattern. 1753 Don't have to check in one of the else clauses that it's not set. 1754 If RE_CONTEXTUAL_INDEP_OPS isn't set but RE_ANCHORS_ONLY_AT_ENDS 1755 is, make '^' a normal character if it isn't first in the pattern. 1756 Can only detect at the end if a '$' after an alternation op is a 1757 trailing one, so can't immediately detect empty alternatives 1758 if a '$' follows a vbar. 1759 Added a picture of the ``success jumps'' in alternatives. 1760 Have to set bufp->used before calling verify_and_adjust_endlines. 1761 Also do it before returning all error strings. 1762 (remove_intervening_anchors): Now replaces the anchor with 1763 repeated_endline_before_newline if it's an endline_before_newline. 1764 (verify_and_adjust_endlines): Deleted SYNTAX parameter (could 1765 use bufp's) and added GROUP_FORWARD_MATCH_STATUS so could 1766 detect back references referring to empty groups. 1767 Added variable `bend' to point past the end of the pattern buffer. 1768 Added variable `previous_p' so wouldn't have to reinspect the 1769 pattern buffer to see what op we just looked at. 1770 Added endline_before_newline and repeated_endline_before_newline 1771 cases. 1772 When checking if in a trailing position, added case where '$' 1773 has to be at the pattern's end if either of the syntax bits 1774 RE_ANCHORS_ONLY_AT_ENDS or RE_TIGHT_VBAR are set. 1775 Since `endline' can have the intermediate form `endline_in_repeat', 1776 have to change it to `endline' if RE_REPEATED_ANCHORS_AWAY 1777 isn't set. 1778 Now disallow empty alternatives with trailing endlines in them 1779 if RE_NO_EMPTY_ALTS is set. 1780 Now don't make '$' an ordinary character if it precedes a newline. 1781 Don't make it an ordinary character if it's before a newline. 1782 Back references now affect the level matching something only if 1783 they refer to nonempty groups. 1784 (can_match_nothing): Now increment p1 in the switch, which 1785 changes many of the cases, but makes the code more like what 1786 it was derived from. 1787 Adjust the return statement to reflect above. 1788 (struct register_info): Made `can_match_nothing' field an int 1789 instead of a bit so could have -1 in it if never set. 1790 (MAX_FAILURE_ITEMS): Changed name from MAX_NUM_FAILURE_ITEMS. 1791 (FAILURE_ITEM_SIZE): Defined how much space a failure items uses. 1792 (PUSH_FAILURE_POINT): Changed variable `last_used_reg's name 1793 to `highest_used_reg'. 1794 Added variable `num_stack_items' and changed `len's name to 1795 `stack_length'. 1796 Test failure stack limit in terms of number of items in it, not 1797 in terms of its length. rms' fix tested length against number 1798 of items, which was a misunderstanding. 1799 Use `realloc' instead of `alloca' to extend the failure stack. 1800 Use shifts instead of multiplying by 2. 1801 (FREE_VARIABLES): Free `stackb' instead of `initial_stack', as 1802 might may have been reallocated. 1803 (re_match_2): When mallocing `initial_stack', now multiply 1804 the number of items wanted (what was there before) by 1805 FAILURE_ITEM_SIZE. 1806 (pop_failure_point): Need this procedure form of the macro of 1807 the same name for debugging, so left it in and deleted the 1808 macro. 1809 (recomp): Don't free the pattern buffer's translate field. 1810 1811Mon Apr 15 09:47:47 1991 Kathy Hargreaves (kathy at hayley) 1812 1813 * regex.h (RE_DUP_MAX): Moved to outside of #ifdef _POSIX_SOURCE. 1814 * regex.c (#include <sys/types.h>): Removed #ifdef _POSIX_SOURCE 1815 condition. 1816 (malloc, realloc): Made return type void* #ifdef __STDC__. 1817 (enum regexpcode): Added endline_in_repeat for the compiler's 1818 use; this never ends up on the final compiled pattern. 1819 (INIT_PATTERN_OFFSETS_LIST_SIZE): Initial size for 1820 pattern_offsets_list_type. 1821 (pattern_offset_type): Type for pattern offsets. 1822 (pattern_offsets_list_type): Type for keeping a list of 1823 pattern offsets. 1824 (anchor_list_type): Changed to above type. 1825 (PATTERN_OFFSETS_LIST_PTR_FULL): Tests if a pattern offsets 1826 list is full. 1827 (ANCHOR_LIST_PTR_FULL): Changed to above. 1828 (BIT_BLOCK_SIZE): Changed to BITS_BLOCK_SIZE and moved to 1829 above bits list routines below regex_compile. 1830 (op_list_type): Defined to be pattern_offsets_list_type. 1831 (compile_stack_type): Changed offsets to be 1832 pattern_offset_type instead of unsigned. 1833 (pointer): Changed the name of all structure fields from this 1834 to `avail'. 1835 (COMPILE_STACK_FULL): Changed so the stack is full if `avail' 1836 is equal to `size' instead of `size' - 1. 1837 (GET_BUFFER_SPACE): Changed `>=' to `>' in the while statement. 1838 (regex_compile): Added variable `enough_memory' so could check 1839 that routine that verifies '$' positions could return an 1840 allocation error. 1841 (group_count): Deleted this variable, as `regnum' already does 1842 this work. 1843 (op_list): Added this variable to keep track of operations 1844 needed for verifying '$' positions. 1845 (anchor_list): Now initialize using routine 1846 `init_pattern_offsets_list'. 1847 Consolidated the three bits_list initializations. 1848 In case '$': Instead of trying to go past constructs which can 1849 follow '$', merely detect the special case where it has to be 1850 at the pattern's end, fix up any fixup jumps if necessary, 1851 record the anchor if necessary and add an `endline' (and 1852 possibly two `no-op's) to the pattern; will call a routine at 1853 the end to verify if it's in a valid position or not. 1854 (init_pattern_offsets_list): Added to initialize pattern 1855 offsets lists. 1856 (extend_anchor_list): Renamed this extend_pattern_offsets_list 1857 and renamed parameters and internal variables appropriately. 1858 (add_pattern_offset): Added this routine which both 1859 record_anchor_position and add_op call. 1860 (adjust_pattern_offsets_list): Add this routine to adjust by 1861 some increment all the pattern offsets a list of such after a 1862 given position. 1863 (record_anchor_position): Now send in offset instead of 1864 calculating it and just call add_pattern_offset. 1865 (adjust_anchor_list): Replaced by above routine. 1866 (remove_intervening_anchors): If the anchor is an `endline' 1867 then replace it with `endline_in_repeat' instead of `no_op'. 1868 (add_op): Added this routine to call in regex_compile 1869 wherever push something relevant to verifying '$' positions. 1870 (verify_and_adjust_endlines): Added routine to (1) verify that 1871 '$'s in a pattern buffer (represented by `endline') were in 1872 valid positions and (2) whether or not they were anchors. 1873 (BITS_BLOCK_SIZE): Renamed BIT_BLOCK_SIZE and moved to right 1874 above bits list routines. 1875 (BITS_BLOCK): Defines which array element of a bits list the 1876 bit corresponding to a given position is in. 1877 (BITS_MASK): Has a 1 where the bit (in a bit list array element) 1878 for a given position is. 1879 1880Mon Apr 1 12:09:06 1991 Kathy Hargreaves (kathy at hayley) 1881 1882 * regex.c (BIT_BLOCK_SIZE): Defined this for using with 1883 bits_list_type, abstracted from level_list_type so could use 1884 for more things than just the level match status. 1885 (regex_compile): Renamed `level_list' variable to 1886 `level_match_status'. 1887 Added variable `group_match_status' of type bits_list_type. 1888 Kept track of whether or not for all groups any of them 1889 matched other than the empty string, so detect if a back 1890 reference in front of a '^' made it nonleading or not. 1891 Do this by setting a match status bit for all active groups 1892 whenever leave a group that matches other than the empty string. 1893 Could detect which groups are active by going through the 1894 stack each time, but or-ing a bits list of active groups with 1895 a bits list of group match status is faster, so make a bits 1896 list of active groups instead. 1897 Have to check that '^' isn't in a leading position before 1898 going to normal_char. 1899 Whenever set level match status of the current level, also set 1900 the match status of all active groups. 1901 Increase the group count and make that group active whenever 1902 open a group. 1903 When close a group, only set the next level down if the 1904 current level matches other than the empty string, and make 1905 the current group inactive. 1906 At a back reference, only set a level's match status if the 1907 group to which the back reference refers matches other than 1908 the empty string. 1909 (init_bits_list): Added to initialize a bits list. 1910 (get_level_value): Deleted this. (Made into 1911 get_level_match_status.) 1912 (extend_bits_list): Added to extend a bits list. (Made this 1913 from deleted routine `extend_level_list'.) 1914 (get_bit): Added to get a bit value from a bits list. (Made 1915 this from deleted routine `get_level_value'.) 1916 (set_bit_to_value): Added to set a bit in a bits list. (Made 1917 this from deleted routine `set_level_value'.) 1918 (get_level_match_status): Added this to get the match status 1919 of a given level. (Made from get_level_value.) 1920 (set_this_level, set_next_lower_level): Made all routines 1921 which set bits extend the bits list if necessary, thus they 1922 now return an unsigned value to indicate whether or not the 1923 reallocation failed. 1924 (increase_level): No longer extends the level list. 1925 (make_group_active): Added to mark as active a given group in 1926 an active groups list. 1927 (make_group_inactive): Added to mark as inactive a given group 1928 in an active groups list. 1929 (set_match_status_of_active_groups): Added to set the match 1930 status of all currently active groups. 1931 (get_group_match_status): Added to get a given group's match status. 1932 (no_levels_match_anything): Removed the paramenter LEVEL. 1933 (PUSH_FAILURE_POINT): Added rms' bug fix and changed RE_NREGS 1934 to num_internal_regs. 1935 1936Sun Mar 31 09:04:30 1991 Kathy Hargreaves (kathy at hayley) 1937 1938 * regex.h (RE_ANCHORS_ONLY_AT_ENDS): Added syntax so could 1939 constrain '^' and '$' to only be anchors if at the beginning 1940 and end of the pattern. 1941 (RE_SYNTAX_POSIX_BASIC): Added the above bit. 1942 1943 * regex.c (enum regexcode): Changed `unused' to `no_op'. 1944 (this_and_lower_levels_match_nothing): Deleted forward reference. 1945 (regex_compile): case '^': if the syntax bit RE_ANCHORS_ONLY_AT_ENDS 1946 is set, then '^' is only an anchor if at the beginning of the 1947 pattern; only record anchor position if the syntax bit 1948 RE_REPEATED_ANCHORS_AWAY is set; the '^' is a normal char if 1949 the syntax bit RE_ANCHORS_ONLY_AT_END is set and we're not at 1950 the beginning of the pattern (and neither RE_CONTEXTUAL_INDEP_OPS 1951 nor RE_CONTEXTUAL_INDEP_OPS syntax bits are set). 1952 Only adjust the anchor list if the syntax bit 1953 RE_REPEATED_ANCHORS_AWAY is set. 1954 1955 * regex.c (level_list_type): Use to detect when '^' is 1956 in a leading position. 1957 (regex_compile): Added level_list_type level_list variable in 1958 which we keep track of whether or not a grouping level (in its 1959 current or most recent incarnation) matches anything besides the 1960 empty string. Set the bit for the i-th level when detect it 1961 should match something other than the empty string and the bit 1962 for the (i-1)-th level when leave the i-th group. Clear all 1963 bits for the i-th and higher levels if none of 0--(i - 1)-th's 1964 bits are set when encounter an alternation operator on that 1965 level. If no levels are set when hit a '^', then it is in a 1966 leading position. We keep track of which level we're at by 1967 increasing a variable current_level whenever we encounter an 1968 open-group operator and decreasing it whenever we encounter a 1969 close-group operator. 1970 Have to adjust the anchor list contents whenever insert 1971 something ahead of them (such as on_failure_jump's) in the 1972 pattern. 1973 (adjust_anchor_list): Adjusts the offsets in an anchor list by 1974 a given increment starting at a given start position. 1975 (get_level_value): Returns the bit setting of a given level. 1976 (set_level_value): Sets the bit of a given level to a given value. 1977 (set_this_level): Sets (to 1) the bit of a given level. 1978 (set_next_lower_level): Sets (to 1) the bit of (LEVEL - 1) for a 1979 given LEVEL. 1980 (clear_this_and_higher_levels): Clears the bits for a given 1981 level and any higher levels. 1982 (extend_level_list): Adds sizeof(unsigned) more bits to a level list. 1983 (increase_level): Increases by 1 the value of a given level variable. 1984 (decrease_level): Decreases by 1 the value of a given level variable. 1985 (lower_levels_match_nothing): Checks if any levels lower than 1986 the given one match anything. 1987 (no_levels_match_anything): Checks if any levels match anything. 1988 (re_match_2): At case wordbeg: before looking at d-1, check that 1989 we're not at the string's beginning. 1990 At case wordend: Added some illuminating parentheses. 1991 1992Mon Mar 25 13:58:51 1991 Kathy Hargreaves (kathy at hayley) 1993 1994 * regex.h (RE_NO_ANCHOR_AT_NEWLINE): Changed syntax bit name 1995 from RE_ANCHOR_NOT_NEWLINE because an anchor never matches the 1996 newline itself, just the empty string either before or after it. 1997 (RE_REPEATED_ANCHORS_AWAY): Added this syntax bit for ignoring 1998 anchors inside groups which are operated on by repetition 1999 operators. 2000 (RE_DOT_MATCHES_NEWLINE): Added this bit so the match-any-character 2001 operator could match a newline when it's set. 2002 (RE_SYNTAX_POSIX_BASIC): Set RE_DOT_MATCHES_NEWLINE in this. 2003 (RE_SYNTAX_POSIX_EXTENDED): Set RE_DOT_MATCHES_NEWLINE and 2004 RE_REPEATED_ANCHORS_AWAY in this. 2005 (regerror): Changed prototypes to new POSIX spec. 2006 2007 * regex.c (anchor_list_type): Added so could null out anchors inside 2008 repeated groups. 2009 (ANCHOR_LIST_PTR_FULL): Added for above type. 2010 (compile_stack_element): Changed name from stack_element. 2011 (compile_stack_type): Changed name from compile_stack. 2012 (INIT_COMPILE_STACK_SIZE): Changed name from INIT_STACK_SIZE. 2013 (COMPILE_STACK_EMPTY): Changed name from STACK_EMPTY. 2014 (COMPILE_STACK_FULL): Changed name from STACK_FULL. 2015 (regex_compile): Changed SYNTAX parameter to non-const. 2016 Changed variable name `stack' to `compile_stack'. 2017 If syntax bit RE_REPEATED_ANCHORS_AWAY is set, then naively put 2018 anchors in a list when encounter them and then set them to 2019 `unused' when detect they are within a group operated on by a 2020 repetition operator. Need something more sophisticated than 2021 this, as they should only get set to `unused' if they are in 2022 positions where they would be anchors. Also need a better way to 2023 detect contextually invalid anchors. 2024 Changed some commments. 2025 (is_in_compile_stack): Changed name from `is_in_stack'. 2026 (extend_anchor_list): Added to do anchor stuff. 2027 (record_anchor_position): Added to do anchor stuff. 2028 (remove_intervening_anchors): Added to do anchor stuff. 2029 (re_match_2): Now match a newline with the match-any-character 2030 operator if RE_DOT_MATCHES_NEWLINE is set. 2031 Compacted some code. 2032 (regcomp): Added new POSIX newline information to the header 2033 commment. 2034 If REG_NEWLINE cflag is set, then now unset RE_DOT_MATCHES_NEWLINE 2035 in syntax. 2036 (put_in_buffer): Added to do new POSIX regerror spec. Called 2037 by regerror. 2038 (regerror): Changed to take a pattern buffer, error buffer and 2039 its size, and return type `size_t', the size of the full error 2040 message, and the first ERRBUF_SIZE - 1 characters of the full 2041 error message in the error buffer. 2042 2043Wed Feb 27 16:38:33 1991 Kathy Hargreaves (kathy at hayley) 2044 2045 * regex.h (#include <sys/types.h>): Removed this as new POSIX 2046 standard has the user include it. 2047 (RE_SYNTAX_POSIX_BASIC and RE_SYNTAX_POSIX_EXTENDED): Removed 2048 RE_HAT_LISTS_NOT_NEWLINE as new POSIX standard has the cflag 2049 REG_NEWLINE now set this. Similarly, added syntax bit 2050 RE_ANCHOR_NOT_NEWLINE as this is now unset by REG_NEWLINE. 2051 (RE_SYNTAX_POSIX_BASIC): Removed syntax bit 2052 RE_NO_CONSECUTIVE_REPEATS as POSIX now allows them. 2053 2054 * regex.c (#include <sys/types.h>): Added this as new POSIX 2055 standard has the user include it instead of us putting it in 2056 regex.h. 2057 (extern char *re_syntax_table): Made into an extern so the 2058 user could allocate it. 2059 (DO_RANGE): If don't find a range end, now goto invalid_range_end 2060 instead of unmatched_left_bracket. 2061 (regex_compile): Made variable SYNTAX non-const.???? 2062 Reformatted some code. 2063 (re_compile_fastmap): Moved is_a_succeed_n's declaration to 2064 inner braces. 2065 Compacted some code. 2066 (SET_NEWLINE_FLAG): Removed and put inline. 2067 (regcomp): Made variable `syntax' non-const so can unset 2068 RE_ANCHOR_NOT_NEWLINE syntax bit if cflag RE_NEWLINE is set. 2069 If cflag RE_NEWLINE is set, set the RE_HAT_LISTS_NOT_NEWLINE 2070 syntax bit and unset RE_ANCHOR_NOT_NEWLINE one of `syntax'. 2071 2072Wed Feb 20 16:33:38 1991 Kathy Hargreaves (kathy at hayley) 2073 2074 * regex.h (RE_NO_CONSECUTIVE_REPEATS): Changed name from 2075 RE_NO_CONSEC_REPEATS. 2076 (REG_ENESTING): Deleted this POSIX return value, as the stack 2077 is now unbounded. 2078 (struct re_pattern_buffer): Changed some comments. 2079 (re_compile_pattern): Changed a comment. 2080 Deleted check on stack upper bound and corresponding error. 2081 Now when there's no interval contents and it's the end of the 2082 pattern, go to unmatched_left_curly_brace instead of end_of_pattern. 2083 Removed nesting_too_deep error, as the stack is now unbounded. 2084 (regcomp): Removed REG_ENESTING case, as the stack is now unbounded. 2085 (regerror): Removed REG_ENESTING case, as the stack is now unbounded. 2086 2087 * regex.c (MAX_STACK_SIZE): Deleted because don't need upper 2088 bound on array indexed with an unsigned number. 2089 2090Sun Feb 17 15:50:24 1991 Kathy Hargreaves (kathy at hayley) 2091 2092 * regex.h: Changed and added some comments. 2093 2094 * regex.c (init_syntax_once): Made `_' a word character. 2095 (re_compile_pattern): Added a comment. 2096 (re_match_2): Redid header comment. 2097 (regexec): With header comment about PMATCH, corrected and 2098 removed details found regex.h, adding a reference. 2099 2100Fri Feb 15 09:21:31 1991 Kathy Hargreaves (kathy at hayley) 2101 2102 * regex.c (DO_RANGE): Removed argument parentheses. 2103 Now get untranslated range start and end characters and set 2104 list bits for the translated (if at all) versions of them and 2105 all characters between them. 2106 (re_match_2): Now use regs->num_regs instead of num_regs_wanted 2107 wherever possible. 2108 (regcomp): Now build case-fold translate table using isupper 2109 and tolower facilities so will work on foreign language characters. 2110 2111Sat Feb 9 16:40:03 1991 Kathy Hargreaves (kathy at hayley) 2112 2113 * regex.h (RE_HAT_LISTS_NOT_NEWLINE): Changed syntax bit name 2114 from RE_LISTS_NOT_NEWLINE as it only affects nonmatching lists. 2115 Changed all references to the match-beginning-of-string 2116 operator to match-beginning-of-line operator, as this is what 2117 it does. 2118 (RE_NO_CONSEC_REPEATS): Added this syntax bit. 2119 (RE_SYNTAX_POSIX_BASIC): Added above bit to this. 2120 (REG_PREMATURE_END): Changed name to REG_EEND. 2121 (REG_EXCESS_NESTING): Changed name to REG_ENESTING. 2122 (REG_TOO_BIG): Changed name to REG_ESIZE. 2123 (REG_INVALID_PREV_RE): Deleted this return POSIX value. 2124 Added and changed some comments. 2125 2126 * regex.c (re_compile_pattern): Now sets the pattern buffer's 2127 `return_default_num_regs' field. 2128 (typedef struct stack_element, stack_type, INIT_STACK_SIZE, 2129 MAX_STACK_SIZE, STACK_EMPTY, STACK_FULL): Added for regex_compile. 2130 (INIT_BUF_SIZE): Changed value from 28 to 32. 2131 (BUF_PUSH): Changed name from BUFPUSH. 2132 (MAX_BUF_SIZE): Added so could use in many places. 2133 (IS_CHAR_CLASS_STRING): Replaced is_char_class with this. 2134 (regex_compile): Added a stack which could grow dynamically 2135 and which has struct elements. 2136 Go back to initializing `zero_times_ok' and `many_time_ok' to 2137 0 and |=ing them inside the loop. 2138 Now disallow consecutive repetition operators if the syntax 2139 bit RE_NO_CONSEC_REPEATS is set. 2140 Now detect trailing backslash when the compiler is expecting a 2141 `?' or a `+'. 2142 Changed calls to GET_BUFFER_SPACE which asked for 6 to ask for 2143 3, as that's all they needed. 2144 Now check for trailing backslash inside lists. 2145 Now disallow an empty alternative right before an end-of-line 2146 operator. 2147 Now get buffer space before leaving space for a fixup jump. 2148 Now check if at pattern end when at open-interval operator. 2149 Added some comments. 2150 Now check if non-interval repetition operators follow an 2151 interval one if the syntax bit RE_NO_CONSEC_REPEATS is set. 2152 Now only check if what precedes an interval repetition 2153 operator isn't a regular expression which matches one 2154 character if the syntax bit RE_NO_CONSEC_REPEATS is set. 2155 Now return "Unmatched [ or [^" instead of "Unmatched [". 2156 (is_in_stack): Added to check if a given register number is in 2157 the stack. 2158 (re_match_2): If initial variable allocations fail, return -2, 2159 instead of -1. 2160 Now set reg's `num_regs' field when allocating regs. 2161 Now before allocating them, free regs->start and end if they 2162 aren't NULL and return -2 if either allocation fails. 2163 Now use regs->num_regs instead of num_regs_wanted to control 2164 regs loops. 2165 Now increment past the newline when matching it with an 2166 end-of-line operator. 2167 (recomp): Added to the header comment. 2168 Now return REG_ESUBREG if regex_compile returns "Unmatched [ 2169 or [^" instead of doing so if it returns "Unmatched [". 2170 Now return REG_BADRPT if in addition to returning "Missing 2171 preceding regular expression", regex_compile returns "Invalid 2172 preceding regular expression". 2173 Now return new return value names (see regex.h changes). 2174 (regexec): Added to header comment. 2175 Initialize regs structure. 2176 Now match whole string. 2177 Now always free regs.start and regs.end instead of just when 2178 the string matched. 2179 (regerror): Now return "Regex error: Unmatched [ or [^.\n" 2180 instead of "Regex error: Unmatched [.\n". 2181 Now return "Regex error: Preceding regular expression either 2182 missing or not simple.\n" instead of "Regex error: Missing 2183 preceding regular expression.\n". 2184 Removed REG_INVALID_PREV_RE case (it got subsumed into the 2185 REG_BADRPT case). 2186 2187Thu Jan 17 09:52:35 1991 Kathy Hargreaves (kathy at hayley) 2188 2189 * regex.h: Changed a comment. 2190 2191 * regex.c: Changed and added large header comments. 2192 (re_compile_pattern): Now if detect that `laststart' for an 2193 interval points to a byte code for a regular expression which 2194 matches more than one character, make it an internal error. 2195 (regerror): Return error message, don't print it. 2196 2197Tue Jan 15 15:32:49 1991 Kathy Hargreaves (kathy at hayley) 2198 2199 * regex.h (regcomp return codes): Added GNU ones. 2200 Updated some comments. 2201 2202 * regex.c (DO_RANGE): Changed `obscure_syntax' to `syntax'. 2203 (regex_compile): Added `following_left_brace' to keep track of 2204 where pseudo interval following a valid interval starts. 2205 Changed some instances that returned "Invalid regular 2206 expression" to instead return error strings coinciding with 2207 POSIX error codes. 2208 Changed some comments. 2209 Now consider only things between `[:' and `:]' to be possible 2210 character class names. 2211 Now a character class expression can't end a pattern; at 2212 least a `]' must close the list. 2213 Now if the syntax bit RE_NO_BK_CURLY_BRACES is set, then a 2214 valid interval must be followed by yet another to get an error 2215 for preceding an interval (in this case, the second one) with 2216 a regular expression that matches more than one character. 2217 Now if what follows a valid interval begins with a open 2218 interval operator but doesn't begin a valid interval, then set 2219 following_left_bracket to it, put it in C and go to 2220 normal_char label. 2221 Added some comments. 2222 Return "Invalid character class name" instead of "Invalid 2223 character class". 2224 (regerror): Return messages for all POSIX error codes except 2225 REG_ECOLLATE and REG_NEWLINE, along with all GNU error codes. 2226 Added `break's after all cases. 2227 (main): Call re_set_syntax instead of setting `obscure_syntax' 2228 directly. 2229 2230Sat Jan 12 13:37:59 1991 Kathy Hargreaves (kathy at hayley) 2231 2232 * regex.h (Copyright): Updated date. 2233 (#include <sys/types.h>): Include unconditionally. 2234 (RE_CANNOT_MATCH_NEWLINE): Deleted this syntax bit. 2235 (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_POSIX_EXTENDED): Removed 2236 setting the RE_ANCHOR_NOT_NEWLINE syntax bit from these. 2237 Changed and added some comments. 2238 (struct re_pattern_buffer): Changed some flags from chars to bits. 2239 Added field `syntax'; holds which syntax pattern was compiled with. 2240 Added bit flag `return_default_num_regs'. 2241 (externs for GNU and Berkeley UNIX routines): Added `const's to 2242 parameter types to be compatible with POSIX. 2243 (#define const): Added to support old C compilers. 2244 2245 * regex.c (Copyright): Updated date. 2246 (enum regexpcode): Deleted `newline'. 2247 (regex_compile): Renamed re_compile_pattern to this, added a 2248 syntax parameter so it can set the pattern buffer's `syntax' 2249 field. 2250 Made `pattern', and `size' `const's so could pass to POSIX 2251 interface routines; also made `const' whatever interval 2252 variables had to be to make this work. 2253 Changed references to `obscure_syntax' to new parameter `syntax'. 2254 Deleted putting `newline' in buffer when see `\n'. 2255 Consider invalid character classes which have nothing wrong 2256 except the character class name; if so, return character-class error. 2257 (is_char_class): Added routine for regex_compile. 2258 (re_compile_pattern): added a new one which calls 2259 regex_compile with `obscure_syntax' as the actual parameter 2260 for the formal `syntax'. 2261 Gave this the old routine's header comments. 2262 Made `pattern', and `size' `const's so could use POSIX interface 2263 routine parameters. 2264 (re_search, re_search_2, re_match, re_match_2): Changed 2265 `pbufp' to `bufp'. 2266 (re_search_2, re_match_2): Changed `mstop' to `stop'. 2267 (re_search, re_search_2): Made all parameters except `regs' 2268 `const's so could use POSIX interface routines parameters. 2269 (re_search_2): Added private copies of `const' parameters so 2270 could change their values. 2271 (re_match_2): Made all parameters except `regs' `const's so 2272 could use POSIX interface routines parameters. 2273 Changed `size1' and `size2' parameters to `size1_arg' and 2274 `size2_arg' and so could change; added local `size1' and 2275 `size2' and set to these. 2276 Added some comments. 2277 Deleted `newline' case. 2278 `begline' can also possibly match if `d' contains a newline; 2279 if it does, we have to increment d to point past the newline. 2280 Replaced references to `obscure_syntax' with `bufp->syntax'. 2281 (re_comp, re_exec): Made parameter `s' a `const' so could use POSIX 2282 interface routines parameters. 2283 Now call regex_compile, passing `obscure_syntax' via the 2284 `syntax' parameter. 2285 (re_exec): Made local `len' a `const' so could pass to re_search. 2286 (regcomp): Added header comment. 2287 Added local `syntax' to set and pass to regex_compile rather 2288 than setting global `obscure_syntax' and passing it. 2289 Call regex_compile with its `syntax' parameter rather than 2290 re_compile_pattern. 2291 Return REG_ECTYPE if character-class error. 2292 (regexec): Don't initialize `regs' to anything. 2293 Made `private_preg' a nonpointer so could set to what the 2294 constant `preg' points. 2295 Initialize `private_preg's `return_default_num_regs' field to 2296 zero because want to return `nmatch' registers, not however 2297 many there are subexpressions in the pattern. 2298 Also test if `nmatch' > 0 to see if should pass re_match `regs'. 2299 2300Tue Jan 8 15:57:17 1991 Kathy Hargreaves (kathy at hayley) 2301 2302 * regex.h (struct re_pattern_buffer): Reworded comment. 2303 2304 * regex.c (EXTEND_BUFFER): Also reset beg_interval. 2305 (re_search_2): Return val if val = -2. 2306 (NUM_REG_ITEMS): Listed items in comment. 2307 (NUM_OTHER_ITEMS): Defined this for using in > 1 definition. 2308 (MAX_NUM_FAILURE_ITEMS): Replaced `+ 2' with NUM_OTHER_ITEMS. 2309 (NUM_FAILURE_ITEMS): As with definition above and added to 2310 comment. 2311 (PUSH_FAILURE_POINT): Replaced `* 2's with `<< 1's. 2312 (re_match_2): Test with equality with 1 to see pbufp->bol and 2313 pbufp->eol are set. 2314 2315Fri Jan 4 15:07:22 1991 Kathy Hargreaves (kathy at hayley) 2316 2317 * regex.h (struct re_pattern_buffer): Reordered some fields. 2318 Updated some comments. 2319 Added not_bol and not_eol fields. 2320 (extern regcomp, regexec, regerror): Added return types. 2321 (extern regfree): Added `extern'. 2322 2323 * regex.c (min): Deleted unused macro. 2324 (re_match_2): Compacted some code. 2325 Removed call to macro `min' from `for' loop. 2326 Fixed so unused registers get filled with -1's. 2327 Fail if the pattern buffer's `not_bol' field is set and 2328 encounter a `begline'. 2329 Fail if the pattern buffer's `not_eol' field is set and 2330 encounter a `endline'. 2331 Deleted redundant check for empty stack in fail case. 2332 Don't free pattern buffer's components in re_comp. 2333 (regexec): Initialize variable regs. 2334 Added `private_preg' pattern buffer so could set `not_bol' and 2335 `not_eol' fields and hand to re_match. 2336 Deleted naive attempt to detect anchors. 2337 Set private pattern buffer's `not_bol' and `not_eol' fields 2338 according to eflags value. 2339 `nmatch' must also be > 0 for us to bother allocating 2340 registers to send to re_match and filling pmatch 2341 with their results after the call to re_match. 2342 Send private pattern buffer instead of argument to re_match. 2343 If use the registers, always free them and then set them to NULL. 2344 (regerror): Added this Posix routine. 2345 (regfree): Added this Posix routine. 2346 2347Tue Jan 1 15:02:45 1991 Kathy Hargreaves (kathy at hayley) 2348 2349 * regex.h (RE_NREGS): Deleted this definition, as now the user 2350 can choose how many registers to have. 2351 (REG_NOTBOL, REG_NOTEOL): Defined these Posix eflag bits. 2352 (REG_NOMATCH, REG_BADPAT, REG_ECOLLATE, REG_ECTYPE, 2353 REG_EESCAPE, REG_ESUBREG, REG_EBRACK, REG_EPAREN, REG_EBRACE, 2354 REG_BADBR, REG_ERANGE, REG_ESPACE, REG_BADRPT, REG_ENEWLINE): 2355 Defined these return values for Posix's regcomp and regexec. 2356 Updated some comments. 2357 (struct re_pattern_buffer): Now typedef this as regex_t 2358 instead of the other way around. 2359 (struct re_registers): Added num_regs field. Made start and 2360 end fields pointers to char instead of fixed size arrays. 2361 (regmatch_t): Added this Posix register type. 2362 (regcomp, regexec, regerror, regfree): Added externs for these 2363 Posix routines. 2364 2365 * regex.c (enum boolean): Typedefed this. 2366 (re_pattern_buffer): Reformatted some comments. 2367 (re_compile_pattern): Updated some comments. 2368 Always push start_memory and its attendant number whenever 2369 encounter a group, not just when its number is less than the 2370 previous maximum number of registers; same for stop_memory. 2371 Get 4 bytes of buffer space instead of 2 when pushing a 2372 set_number_at. 2373 (can_match_nothing): Added this to elaborate on and replace 2374 code in re_match_2. 2375 (reg_info_type): Made can_match_nothing field a bit instead of int. 2376 (MIN): Added for re_match_2. 2377 (re_match_2 macros): Changed all `for' loops which used 2378 RE_NREGS to now use num_internal_regs as upper bounds. 2379 (MAX_NUM_FAILURE_ITEMS): Use num_internal_regs instead of RE_NREGS. 2380 (POP_FAILURE_POINT): Added check for empty stack. 2381 (FREE_VARIABLES): Added this to free (and set to NULL) 2382 variables allocated in re_match_2. 2383 (re_match_2): Rearranged parameters to be in order. 2384 Added variables num_regs_wanted (how many registers the user wants) 2385 and num_internal_regs (how many groups there are). 2386 Allocated initial_stack, regstart, regend, old_regstart, 2387 old_regend, reginfo, best_regstart, and best_regend---all 2388 which used to be fixed size arrays. Free them all and return 2389 -1 if any fail. 2390 Free above variables if starting position pos isn't valid. 2391 Changed all `for' loops which used RE_NREGS to now use 2392 num_internal_regs as upper bounds---except for the loops which 2393 fill regs; then use num_regs_wanted. 2394 Allocate regs if the user has passed it and wants more than 0 2395 registers filled. 2396 Set regs->start[i] and regs->end[i] to -1 if either 2397 regstart[i] or regend[i] equals -1, not just the first. 2398 Free allocated variables before returning. 2399 Updated some comments. 2400 (regcomp): Return REG_ESPACE, REG_BADPAT, REG_EPAREN when 2401 appropriate. 2402 Free translate array. 2403 (regexec): Added this Posix interface routine. 2404 2405Mon Dec 24 14:21:13 1990 Kathy Hargreaves (kathy at hayley) 2406 2407 * regex.h: If _POSIX_SOURCE is defined then #include <sys/types.h>. 2408 Added syntax bit RE_CANNOT_MATCH_NEWLINE. 2409 Defined Posix cflags: REG_EXTENDED, REG_NEWLINE, REG_ICASE, and 2410 REG_NOSUB. 2411 Added fields re_nsub and no_sub to struct re_pattern_buffer. 2412 Typedefed regex_t to be `struct re_pattern_buffer'. 2413 2414 * regex.c (CHAR_SET_SIZE): Defined this to be 256 and replaced 2415 incidences of this value with this constant. 2416 (re_compile_pattern): Added switch case for `\n' and put 2417 `newline' into the pattern buffer when encounter this. 2418 Increment the pattern_buffer's `re_nsub' field whenever open a 2419 group. 2420 (re_match_2): Match a newline with `newline'---provided the 2421 syntax bit RE_CANNOT_MATCH_NEWLINE isn't set. 2422 (regcomp): Added this Posix interface routine. 2423 (enum test_type): Added interface_test tag. 2424 (main): Added Posix interface test. 2425 2426Tue Dec 18 12:58:12 1990 Kathy Hargreaves (kathy at hayley) 2427 2428 * regex.h (struct re_pattern_buffer): reformatted so would fit 2429 in texinfo documentation. 2430 2431Thu Nov 29 15:49:16 1990 Kathy Hargreaves (kathy at hayley) 2432 2433 * regex.h (RE_NO_EMPTY_ALTS): Added this bit. 2434 (RE_SYNTAX_POSIX_EXTENDED): Added above bit. 2435 2436 * regex.c (re_compile_pattern): Disallow empty alternatives only 2437 when RE_NO_EMPTY_ALTS is set, not when RE_CONTEXTUAL_INVALID_OPS is. 2438 Changed RE_NO_BK_CURLY_BRACES to RE_NO_BK_PARENS when testing 2439 for empty groups at label handle_open. 2440 At label handle_bar: disallow empty alternatives if RE_NO_EMPTY_ALTS 2441 is set. 2442 Rewrote some comments. 2443 2444 (re_compile_fastmap): cleaned up code. 2445 2446 (re_search_2): Rewrote comment. 2447 2448 (struct register_info): Added field `inner_groups'; it records 2449 which groups are inside of the current one. 2450 Added field can_match_nothing; it's set if the current group 2451 can match nothing. 2452 Added field ever_match_something; it's set if current group 2453 ever matched something. 2454 2455 (INNER_GROUPS): Added macro to access inner_groups field of 2456 struct register_info. 2457 2458 (CAN_MATCH_NOTHING): Added macro to access can_match_nothing 2459 field of struct register_info. 2460 2461 (EVER_MATCHED_SOMETHING): Added macro to access 2462 ever_matched_something field of struct register_info. 2463 2464 (NOTE_INNER_GROUP): Defined macro to record that a given group 2465 is inside of all currently active groups. 2466 2467 (re_match_2): Added variables *p1 and mcnt2 (multipurpose). 2468 Added old_regstart and old_regend arrays to hold previous 2469 register values if they need be restored. 2470 Initialize added fields and variables. 2471 case start_memory: Find out if the group can match nothing. 2472 Save previous register values in old_restart and old_regend. 2473 Record that current group is inside of all currently active 2474 groups. 2475 If the group is inside a loop and it ever matched anything, 2476 restore its registers to values before the last failed match. 2477 Restore the registers for the inner groups, too. 2478 case duplicate: Can back reference to a group that never 2479 matched if it can match nothing. 2480 2481Thu Nov 29 11:12:54 1990 Karl Berry (karl at hayley) 2482 2483 * regex.c (bcopy, ...): define these if either _POSIX_SOURCE or 2484 STDC_HEADERS is defined; same for including <stdlib.h>. 2485 2486Sat Oct 6 16:04:55 1990 Kathy Hargreaves (kathy at hayley) 2487 2488 * regex.h (struct re_pattern_buffer): Changed field comments. 2489 2490 * regex.c (re_compile_pattern): Allow a `$' to precede an 2491 alternation operator (`|' or `\|'). 2492 Disallow `^' and/or `$' in empty groups if the syntax bit 2493 RE_NO_EMPTY_GROUPS is set. 2494 Wait until have parsed a valid `\{...\}' interval expression 2495 before testing RE_CONTEXTUAL_INVALID_OPS to see if it's 2496 invalidated by that. 2497 Don't use RE_NO_BK_CURLY_BRACES to test whether or not a validly 2498 parsed interval expression is invalid if it has no preceding re; 2499 rather, use RE_CONTEXTUAL_INVALID_OPS. 2500 If an interval parses, but there is no preceding regular 2501 expression, yet the syntax bit RE_CONTEXTUAL_INDEP_OPS is set, 2502 then that interval can match the empty regular expression; if 2503 the bit isn't set, then the characters in the interval 2504 expression are parsed as themselves (sans the backslashes). 2505 In unfetch_interval case: Moved PATFETCH to above the test for 2506 RE_NO_BK_CURLY_BRACES being set, which would force a goto 2507 normal_backslash; the code at both normal_backsl and normal_char 2508 expect a character in `c.' 2509 2510Sun Sep 30 11:13:48 1990 Kathy Hargreaves (kathy at hayley) 2511 2512 * regex.h: Changed some comments to use the terms used in the 2513 documentation. 2514 (RE_CONTEXTUAL_INDEP_OPS): Changed name from `RE_CONTEXT_INDEP_OPS'. 2515 (RE_LISTS_NOT_NEWLINE): Changed name from `RE_HAT_NOT_NEWLINE.' 2516 (RE_ANCHOR_NOT_NEWLINE): Added this syntax bit. 2517 (RE_NO_EMPTY_GROUPS): Added this syntax bit. 2518 (RE_NO_HYPHEN_RANGE_END): Deleted this syntax bit. 2519 (RE_SYNTAX_...): Reformatted. 2520 (RE_SYNTAX_POSIX_BASIC, RE_SYNTAX_EXTENDED): Added syntax bits 2521 RE_ANCHOR_NOT_NEWLINE and RE_NO_EMPTY_GROUPS, and deleted 2522 RE_NO_HYPHEN_RANGE_END. 2523 (RE_SYNTAX_POSIX_EXTENDED): Added syntax bit RE_DOT_NOT_NULL. 2524 2525 * regex.c (bcopy, bcmp, bzero): Define if _POSIX_SOURCE is defined. 2526 (_POSIX_SOURCE): ifdef this, #include <stdlib.h> 2527 (#ifdef emacs): Changed comment of the #endif for the its #else 2528 clause to be `not emacs', not `emacs.' 2529 (no_pop_jump): Changed name from `jump'. 2530 (pop_failure_jump): Changed name from `finalize_jump.' 2531 (maybe_pop_failure_jump): Changed name from `maybe_finalize_jump'. 2532 (no_pop_jump_n): Changed name from `jump_n.' 2533 (EXTEND_BUFFER): Use shift instead of multiplication to double 2534 buf->allocated. 2535 (DO_RANGE, recompile_pattern): Added macro to set the list bits 2536 for a range. 2537 (re_compile_pattern): Fixed grammar problems in some comments. 2538 Checked that RE_NO_BK_VBAR is set to make `$' valid before a `|' 2539 and not set to make it valid before a `\|'. 2540 Checked that RE_NO_BK_PARENS is set to make `$' valid before a ')' 2541 and not set to make it valid before a `\)'. 2542 Disallow ranges starting with `-', unless the range is the 2543 first item in a list, rather than disallowing ranges which end 2544 with `-'. 2545 Disallow empty groups if the syntax bit RE_NO_EMPTY_GROUPS is set. 2546 Disallow nothing preceding `{' and `\{' if they represent the 2547 open-interval operator and RE_CONTEXTUAL_INVALID_OPS is set. 2548 (register_info_type): typedef-ed this using `struct register_info.' 2549 (SET_REGS_MATCHED): Compacted the code. 2550 (re_match_2): Made it fail if back reference a group which we've 2551 never matched. 2552 Made `^' not match a newline if the syntax bit 2553 RE_ANCHOR_NOT_NEWLINE is set. 2554 (really_fail): Added this label so could force a final fail that 2555 would not try to use the failure stack to recover. 2556 2557Sat Aug 25 14:23:01 1990 Kathy Hargreaves (kathy at hayley) 2558 2559 * regex.h (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS. 2560 (global): Rewrote comments and rebroke some syntax #define lines. 2561 2562 * regex.c (isgraph): Added definition for sequents. 2563 (global): Now refer to character set lists as ``lists.'' 2564 Rewrote comments containing ``\('' or ``\)'' to now refer to 2565 ``groups.'' 2566 (RE_CONTEXTUAL_OPS): Changed name from RE_CONTEXT_OPS. 2567 2568 (re_compile_pattern): Expanded header comment. 2569 2570Sun Jul 15 14:50:25 1990 Kathy Hargreaves (kathy at hayley) 2571 2572 * regex.h (RE_CONTEX_INDEP_OPS): the comment's sense got turned 2573 around when we changed how it read; changed it to be correct. 2574 2575Sat Jul 14 16:38:06 1990 Kathy Hargreaves (kathy at hayley) 2576 2577 * regex.h (RE_NO_EMPTY_BK_REF): changed name to 2578 RE_NO_MISSING_BK_REF, as this describes it better. 2579 2580 * regex.c (re_compile_pattern): changed RE_NO_EMPTY_BK_REF 2581 to RE_NO_MISSING_BK_REF, as above. 2582 2583Thu Jul 12 11:45:05 1990 Kathy Hargreaves (kathy at hayley) 2584 2585 * regex.h (RE_NO_EMPTY_BRACKETS): removed this syntax bit, as 2586 bracket expressions should *never* be empty regardless of the 2587 syntax. Removes this bit from RE_SYNTAX_POSIX_BASIC and 2588 RE_SYNTAX_POSIX_EXTENDED. 2589 2590 * regex.c (SET_LIST_BIT): in the comment, now refer to character 2591 sets as (non)matching sets, as bracket expressions can now match 2592 other things in addition to characters. 2593 (re_compile_pattern): refer to groups as such instead of `\(...\)' 2594 or somesuch, because groups can now be enclosed in either plain 2595 parens or backslashed ones, depending on the syntax. 2596 In the '[' case, added a boolean just_had_a_char_class to detect 2597 whether or not a character class begins a range (which is invalid). 2598 Restore way of breaking out of a bracket expression to original way. 2599 Add way to detect a range if the last thing in a bracket 2600 expression was a character class. 2601 Took out check for c != ']' at the end of a character class in 2602 the else clause, as it had already been checked in the if part 2603 that also checked the validity of the string. 2604 Set or clear just_had_a_char_class as appropriate. 2605 Added some comments. Changed references to character sets to 2606 ``(non)matching lists.'' 2607 2608Sun Jul 1 12:11:29 1990 Karl Berry (karl at hayley) 2609 2610 * regex.h (BYTEWIDTH): moved back to regex.c. 2611 2612 * regex.h (re_compile_fastmap): removed declaration; this 2613 shouldn't be advertised. 2614 2615Mon May 28 15:27:53 1990 Kathy Hargreaves (kathy at hayley) 2616 2617 * regex.c (ifndef Sword): Made comments more specific. 2618 (global): include <stdio.h> so can write fatal messages on 2619 standard error. Replaced calls to assert with fprintfs to 2620 stderr and exit (1)'s. 2621 (PREFETCH): Reformatted to make more readable. 2622 (AT_STRINGS_BEG): Defined to test if we're at the beginning of 2623 the virtual concatenation of string1 and string2. 2624 (AT_STRINGS_END): Defined to test if at the end of the virtual 2625 concatenation of string1 and string2. 2626 (AT_WORD_BOUNDARY): Defined to test if are at a word boundary. 2627 (IS_A_LETTER(d)): Defined to test if the contents of the pointer D 2628 is a letter. 2629 (re_match_2): Rewrote the wordbound, notwordbound, wordbeg, wordend, 2630 begbuf, and endbuf cases in terms of the above four new macros. 2631 Called SET_REGS_MATCHED in the matchsyntax, matchnotsyntax, 2632 wordchar, and notwordchar cases. 2633 2634Mon May 14 14:49:13 1990 Kathy Hargreaves (kathy at hayley) 2635 2636 * regex.c (re_search_2): Fixed RANGE to not ever take STARTPOS 2637 outside of virtual concatenation of STRING1 and STRING2. 2638 Updated header comment as to this. 2639 (re_match_2): Clarified comment about MSTOP in header. 2640 2641Sat May 12 15:39:00 1990 Kathy Hargreaves (kathy at hayley) 2642 2643 * regex.c (re_search_2): Checked for out-of-range STARTPOS. 2644 Added comments. 2645 When searching backwards, not only get the character with which 2646 to compare to the fastmap from string2 if the starting position 2647 >= size1, but also if size1 is zero; this is so won't get a 2648 segmentation fault if string1 is null. 2649 Reformatted code at label advance. 2650 2651Thu Apr 12 20:26:21 1990 Kathy Hargreaves (kathy at hayley) 2652 2653 * regex.h: Added #pragma once and #ifdef...endif __REGEXP_LIBRARY. 2654 (RE_EXACTN_VALUE): Added for search.c to use. 2655 Reworded some comments. 2656 2657 regex.c: Punctuated some comments correctly. 2658 (NULL): Removed this. 2659 (RE_EXACTN_VALUE): Added for search.c to use. 2660 (<ctype.h>): Moved this include to top of file. 2661 (<assert.h>): Added this include. 2662 (struct regexpcode): Assigned 0 to unused and 1 to exactn 2663 because of RE_EXACTN_VALUE. 2664 Added comment. 2665 (various macros): Lined up backslashes near end of line. 2666 (insert_jump): Cleaned up the header comment. 2667 (re_search): Corrected the header comment. 2668 (re_search_2): Cleaned up and completed the header comment. 2669 (re_max_failures): Updated comment. 2670 (struct register_info): Constructed as bits so as to save space 2671 on the stack when pushing register information. 2672 (IS_ACTIVE): Macro for struct register_info. 2673 (MATCHED_SOMETHING): Macro for struct register_info. 2674 (NUM_REG_ITEMS): How many register information items for each 2675 register we have to push on the stack at each failure. 2676 (MAX_NUM_FAILURE_ITEMS): If push all the registers on failure, 2677 this is how many items we push on the stack. 2678 (PUSH_FAILURE_POINT): Now pushes whether or not the register is 2679 currently active, and whether or not it matched something. 2680 Checks that there's enough space allocated to accomodate all the 2681 items we currently want to push. (Before, a test for an empty 2682 stack sufficed because we always pushed and popped the same 2683 number of items). 2684 Replaced ``2'' with MAX_NUM_FAILURE_POINTS when ``2'' refers 2685 to how many things get pushed on the stack each time. 2686 When copy the stack into the newly allocated storage, now only copy 2687 the area in use. 2688 Clarified comment. 2689 (POP_FAILURE_POINT): Defined to use in places where put number 2690 of registers on the stack into a variable before using it to 2691 decrement the stack, so as to not confuse the compiler. 2692 (IS_IN_FIRST_STRING): Defined to check if a pointer points into 2693 the first string. 2694 (SET_REGS_MATCHED): Changed to use the struct register_info 2695 bits; also set the matched-something bit to false if the 2696 register isn't currently active. (This is a redundant setting.) 2697 (re_match_2): Cleaned up and completed the header comment. 2698 Updated the failure stack comment. 2699 Replaced the ``2'' with MAX_NUM_FAILURE_ITEMS in the static 2700 allocation of initial_stack, because now more than two (now up 2701 to MAX_FAILURE_ITEMS) items get pushed on the failure stack each 2702 time. 2703 Ditto for stackb. 2704 Trashed restart_seg1, regend_seg1, best_regstart_seg1, and 2705 best_regend_seg1 because they could have erroneous information 2706 in them, such as when matching ``a'' (in string1) and ``ab'' (in 2707 string2) with ``(a)*ab''; before using IS_IN_FIRST_STRING to see 2708 whether or not the register starts or ends in string1, 2709 regstart[1] pointed past the end of string1, yet regstart_seg1 2710 was 0! 2711 Added variable reg_info of type struct register_info to keep 2712 track of currently active registers and whether or not they 2713 currently match anything. 2714 Commented best_regs_set. 2715 Trashed reg_active and reg_matched_something and put the 2716 information they held into reg_info; saves space on the stack. 2717 Replaced NULL with '\000'. 2718 In begline case, compacted the code. 2719 Used assert to exit if had an internal error. 2720 In begbuf case, because now force the string we're working on 2721 into string2 if there aren't two strings, now allow d == string2 2722 if there is no string1 (and the check for that is size1 == 0!); 2723 also now succeeds if there aren't any strings at all. 2724 (main, ifdef canned): Put test type into a variable so could 2725 change it while debugging. 2726 2727Sat Mar 24 12:24:13 1990 Kathy Hargreaves (kathy at hayley) 2728 2729 * regex.c (GET_UNSIGNED_NUMBER): Deleted references to num_fetches. 2730 (re_compile_pattern): Deleted num_fetches because could keep 2731 track of the number of fetches done by saving a pointer into the 2732 pattern. 2733 Added variable beg_interval to be used as a pointer, as above. 2734 Assert that beg_interval points to something when it's used as above. 2735 Initialize succeed_n's to lower_bound because re_compile_fastmap 2736 needs to know it. 2737 (re_compile_fastmap): Deleted unnecessary variable is_a_jump_n. 2738 Added comment. 2739 (re_match_2): Put number of registers on the stack into a 2740 variable before using it to decrement the stack, so as to not 2741 confuse the compiler. 2742 Updated comments. 2743 Used error routine instead of printf and exit. 2744 In exactn case, restored longer code from ``original'' regex.c 2745 which doesn't test translate inside a loop. 2746 2747 * regex.h: Moved #define NULL and the enum regexpcode definition 2748 and to regex.c. Changed some comments. 2749 2750 regex.c (global): Updated comments about compiling and for the 2751 re_compile_pattern jump routines. 2752 Added #define NULL and the enum regexpcode definition (from 2753 regex.h). 2754 (enum regexpcode): Added set_number_at to reset the n's of 2755 succeed_n's and jump_n's. 2756 (re_set_syntax): Updated its comment. 2757 (re_compile_pattern): Moved its heading comment to after its macros. 2758 Moved its include statement to the top of the file. 2759 Commented or added to comments of its macros. 2760 In start_memory case: Push laststart value before adding 2761 start_memory and its register number to the buffer, as they 2762 might not get added. 2763 Added code to put a set_number_at before each succeed_n and one 2764 after each jump_n; rewrote code in what seemed a more 2765 straightforward manner to put all these things in the pattern so 2766 the succeed_n's would correctly jump to the set_number_at's of 2767 the matching jump_n's, and so the jump_n's would correctly jump 2768 to after the set_number_at's of the matching succeed_n's. 2769 Initialize succeed_n n's to -1. 2770 (insert_op_2): Added this to insert an operation followed by 2771 two integers. 2772 (re_compile_fastmap): Added set_number_at case. 2773 (re_match_2): Moved heading comment to after macros. 2774 Added mention of REGS to heading comment. 2775 No longer turn a succeed_n with n = 0 into an on_failure_jump, 2776 because n needs to be reset each time through a loop. 2777 Check to see if a succeed_n's n is set by its set_number_at. 2778 Added set_number_at case. 2779 Updated some comments. 2780 (main): Added another main to run posix tests, which is compiled 2781 ifdef both test and canned. (Old main is still compiled ifdef 2782 test only). 2783 2784Tue Mar 19 09:22:55 1990 Kathy Hargreaves (kathy at hayley) 2785 2786 * regex.[hc]: Change all instances of the word ``legal'' to 2787 ``valid'' and all instances of ``illegal'' to ``invalid.'' 2788 2789Sun Mar 4 12:11:31 1990 Kathy Hargreaves (kathy at hayley) 2790 2791 * regex.h: Added syntax bit RE_NO_EMPTY_RANGES which is set if 2792 an ending range point has to collate higher or equal to the 2793 starting range point. 2794 Added syntax bit RE_NO_HYPHEN_RANGE_END which is set if a hyphen 2795 can't be an ending range point. 2796 Set to two above bits in RE_SYNTAX_POSIX_BASIC and 2797 RE_SYNTAX_POSIX_EXTENDED. 2798 2799 regex.c: (re_compile_pattern): Don't allow empty ranges if the 2800 RE_NO_EMPTY_RANGES syntax bit is set. 2801 Don't let a hyphen be a range end if the RE_NO_HYPHEN_RANGE_END 2802 syntax bit is set. 2803 (ESTACK_PUSH_2): renamed this PUSH_FAILURE_POINT and made it 2804 push all the used registers on the stack, as well as the number 2805 of the highest numbered register used, and (as before) the two 2806 failure points. 2807 (re_match_2): Fixed up comments. 2808 Added arrays best_regstart[], best_regstart_seg1[], best_regend[], 2809 and best_regend_seg1[] to keep track of the best match so far 2810 whenever reach the end of the pattern but not the end of the 2811 string, and there are still failure points on the stack with 2812 which to backtrack; if so, do the saving and force a fail. 2813 If reach the end of the pattern but not the end of the string, 2814 but there are no more failure points to try, restore the best 2815 match so far, set the registers and return. 2816 Compacted some code. 2817 In stop_memory case, if the subexpression we've just left is in 2818 a loop, push onto the stack the loop's on_failure_jump failure 2819 point along with the current pointer into the string (d). 2820 In finalize_jump case, in addition to popping the failure 2821 points, pop the saved registers. 2822 In the fail case, restore the registers, as well as the failure 2823 points. 2824 2825Sun Feb 18 15:08:10 1990 Kathy Hargreaves (kathy at hayley) 2826 2827 * regex.c: (global): Defined a macro GET_BUFFER_SPACE which 2828 makes sure you have a specified number of buffer bytes 2829 allocated. 2830 Redefined the macro BUFPUSH to use this. 2831 Added comments. 2832 2833 (re_compile_pattern): Call GET_BUFFER_SPACE before storing or 2834 inserting any jumps. 2835 2836 (re_match_2): Set d to string1 + pos and dend to end_match_1 2837 only if string1 isn't null. 2838 Force exit from a loop if it's around empty parentheses. 2839 In stop_memory case, if found some jumps, increment p2 before 2840 extracting address to which to jump. Also, don't need to know 2841 how many more times can jump_n. 2842 In begline case, d must equal string1 or string2, in that order, 2843 only if they are not null. 2844 In maybe_finalize_jump case, skip over start_memorys' and 2845 stop_memorys' register numbers, too. 2846 2847Thu Feb 15 15:53:55 1990 Kathy Hargreaves (kathy at hayley) 2848 2849 * regex.c (BUFPUSH): off by one goof in deciding whether to 2850 EXTEND_BUFFER. 2851 2852Wed Jan 24 17:07:46 1990 Kathy Hargreaves (kathy at hayley) 2853 2854 * regex.h: Moved definition of NULL to here. 2855 Got rid of ``In other words...'' comment. 2856 Added to some comments. 2857 2858 regex.c: (re_compile_pattern): Tried to bulletproof some code, 2859 i.e., checked if backward references (e.g., p[-1]) were within 2860 the range of pattern. 2861 2862 (re_compile_fastmap): Fixed a bug in succeed_n part where was 2863 getting the amount to jump instead of how many times to jump. 2864 2865 (re_search_2): Changed the name of the variable ``total'' to 2866 ``total_size.'' 2867 Condensed some code. 2868 2869 (re_match_2): Moved the comment about duplicate from above the 2870 start_memory case to above duplicate case. 2871 2872 (global): Rewrote some comments. 2873 Added commandline arguments to testing. 2874 2875Wed Jan 17 11:47:27 1990 Kathy Hargreaves (kathy at hayley) 2876 2877 * regex.c: (global): Defined a macro STORE_NUMBER which stores a 2878 number into two contiguous bytes. Also defined STORE_NUMBER_AND_INCR 2879 which does the same thing and then increments the pointer to the 2880 storage place to point after the number. 2881 Defined a macro EXTRACT_NUMBER which extracts a number from two 2882 continguous bytes. Also defined EXTRACT_NUMBER_AND_INCR which 2883 does the same thing and then increments the pointer to the 2884 source to point to after where the number was. 2885 2886Tue Jan 16 12:09:19 1990 Kathy Hargreaves (kathy at hayley) 2887 2888 * regex.h: Incorporated rms' changes. 2889 Defined RE_NO_BK_REFS syntax bit which is set when want to 2890 interpret back reference patterns as literals. 2891 Defined RE_NO_EMPTY_BRACKETS syntax bit which is set when want 2892 empty bracket expressions to be illegal. 2893 Defined RE_CONTEXTUAL_ILLEGAL_OPS syntax bit which is set when want 2894 it to be illegal for *, +, ? and { to be first in an re or come 2895 immediately after a | or a (, and for ^ not to appear in a 2896 nonleading position and $ in a nontrailing position (outside of 2897 bracket expressions, that is). 2898 Defined RE_LIMITED_OPS syntax bit which is set when want +, ? 2899 and | to always be literals instead of ops. 2900 Fixed up the Posix syntax. 2901 Changed the syntax bit comments from saying, e.g., ``0 means...'' 2902 to ``If this bit is set, it means...''. 2903 Changed the syntax bit defines to use shifts instead of integers. 2904 2905 * regex.c: (global): Incorporated rms' changes. 2906 2907 (re_compile_pattern): Incorporated rms' changes 2908 Made it illegal for a $ to appear anywhere but inside a bracket 2909 expression or at the end of an re when RE_CONTEXTUAL_ILLEGAL_OPS 2910 is set. Made the same hold for $ except it has to be at the 2911 beginning of an re instead of the end. 2912 Made the re "[]" illegal if RE_NO_EMPTY_BRACKETS is set. 2913 Made it illegal for | to be first or last in an re, or immediately 2914 follow another | or a (. 2915 Added and embellished some comments. 2916 Allowed \{ to be interpreted as a literal if RE_NO_BK_CURLY_BRACES 2917 is set. 2918 Made it illegal for *, +, ?, and { to appear first in an re, or 2919 immediately follow a | or a ( when RE_CONTEXTUAL_ILLEGAL_OPS is set. 2920 Made back references interpreted as literals if RE_NO_BK_REFS is set. 2921 Made recursive intervals either illegal (if RE_NO_BK_CURLY_BRACES 2922 isn't set) or interpreted as literals (if is set), if RE_INTERVALS 2923 is set. 2924 Made it treat +, ? and | as literals if RE_LIMITED_OPS is set. 2925 Cleaned up some code. 2926 2927Thu Dec 21 15:31:32 1989 Kathy Hargreaves (kathy at hayley) 2928 2929 * regex.c: (global): Moved RE_DUP_MAX to regex.h and made it 2930 equal 2^15 - 1 instead of 1000. 2931 Defined NULL to be zero. 2932 Moved the definition of BYTEWIDTH to regex.h. 2933 Made the global variable obscure_syntax nonstatic so the tests in 2934 another file could use it. 2935 2936 (re_compile_pattern): Defined a maximum length (CHAR_CLASS_MAX_LENGTH) 2937 for character class strings (i.e., what's between the [: and the 2938 :]'s). 2939 Defined a macro SET_LIST_BIT(c) which sets the bit for C in a 2940 character set list. 2941 Took out comments that EXTEND_BUFFER clobbers C. 2942 Made the string "^" match itself, if not RE_CONTEXT_IND_OPS. 2943 Added character classes to bracket expressions. 2944 Change the laststart pointer saved with the start of each 2945 subexpression to point to start_memory instead of after the 2946 following register number. This is because the subexpression 2947 might be in a loop. 2948 Added comments and compacted some code. 2949 Made intervals only work if preceded by an re matching a single 2950 character or a subexpression. 2951 Made back references to nonexistent subexpressions illegal if 2952 using POSIX syntax. 2953 Made intervals work on the last preceding character of a 2954 concatenation of characters, e.g., ab{0,} matches abbb, not abab. 2955 Moved macro PREFETCH to outside the routine. 2956 2957 (re_compile_fastmap): Added succeed_n to work analogously to 2958 on_failure_jump if n is zero and jump_n to work analogously to 2959 the other backward jumps. 2960 2961 (re_match_2): Defined macro SET_REGS_MATCHED to set which 2962 current subexpressions had matches within them. 2963 Changed some comments. 2964 Added reg_active and reg_matched_something arrays to keep track 2965 of in which subexpressions currently have matched something. 2966 Defined MATCHING_IN_FIRST_STRING and replaced ``dend == end_match_1'' 2967 with it to make code easier to understand. 2968 Fixed so can apply * and intervals to arbitrarily nested 2969 subexpressions. (Lots of previous bugs here.) 2970 Changed so won't match a newline if syntax bit RE_DOT_NOT_NULL is set. 2971 Made the upcase array nonstatic so the testing file could use it also. 2972 2973 (main.c): Moved the tests out to another file. 2974 2975 (tests.c): Moved all the testing stuff here. 2976 2977Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley) 2978 2979 * regex.c: (re_compile_pattern): Defined RE_DUP_MAX, the maximum 2980 number of times an interval can match a pattern. 2981 Added macro GET_UNSIGNED_NUMBER (used to get below): 2982 Added variables lower_bound and upper_bound for upper and lower 2983 bounds of intervals. 2984 Added variable num_fetches so intervals could do backtracking. 2985 Added code to handle '{' and "\{" and intervals. 2986 Added to comments. 2987 2988 (store_jump_n): (Added) Stores a jump with a number following the 2989 relative address (for intervals). 2990 2991 (insert_jump_n): (Added) Inserts a jump_n. 2992 2993 (re_match_2): Defined a macro ESTACK_PUSH_2 for the error stack; 2994 it checks for overflow and reallocates if necessary. 2995 2996 * regex.h: Added bits (RE_INTERVALS and RE_NO_BK_CURLY_BRACES) 2997 to obscure syntax to indicate whether or not 2998 a syntax handles intervals and recognizes either \{ and 2999 \} or { and } as operators. Also added two syntaxes 3000 RE_SYNTAX_POSIX_BASIC and RE_POSIX_EXTENDED and two command codes 3001 to the enumeration regexpcode; they are succeed_n and jump_n. 3002 3003Sat Nov 18 19:30:30 1989 Kathy Hargreaves (kathy at hayley) 3004 3005 * regex.c: (re_compile_pattern): Defined INIT_BUFF_SIZE to get rid 3006 of repeated constants in code. Tested with value 1. 3007 Renamed PATPUSH as BUFPUSH, since it pushes things onto the 3008 buffer, not the pattern. Also made this macro extend the buffer 3009 if it's full (so could do the following): 3010 Took out code at top of loop that checks to see if buffer is going 3011 to be full after 10 additions (and reallocates if necessary). 3012 3013 (insert_jump): Rearranged declaration lines so comments would read 3014 better. 3015 3016 (re_match_2): Compacted exactn code and added more comments. 3017 3018 (main): Defined macros TEST_MATCH and MATCH_SELF to do 3019 testing; took out loop so could use these instead. 3020 3021Tue Oct 24 20:57:18 1989 Kathy Hargreaves (kathy at hayley) 3022 3023 * regex.c (re_set_syntax): Gave argument `syntax' a type. 3024 (store_jump, insert_jump): made them void functions. 3025 3026Local Variables: 3027mode: indented-text 3028left-margin: 8 3029version-control: never 3030End: 3031