.xx meta.keywords="regex implementation categorization" .MT 4
regex implementation categorization .AF "AT&T Research - Florham Park NJ" .AU "Glenn Fowler <gsf@research.att.com>" .H 1 The regex tests in .xx link=categorize.dat attempt to categorize regex implementations. The tests do not address internationalization. All implementations report the leftmost match; this is omitted from the table. .so re-categorize.tab

The categories are: .VL 6 .LI LABEL The implementation label from .xx link="./ testregex." .LI ASSOC Subpattern (or atom) associativity: either left or right . The subexpression match rule in the rationale requires right for expressions where each concatenated part is a subexpression. There is no definition for subpattern , but it would be inconsistent for any definition to require different associativity than that for subexpressions. Some claim that the BRE and ERE grammars specify left associativity, but this interpretation disregards the subexpression match rule in the rationale. The grammar can also be interpreted to support right associativity, and this interpretation is in accord with the rationale. .LI SUBEXPR Subexpression semantics: precedence if subexpressions can override the default associativity; grouping if subexpressions are for repetition and regmatch_t grouping only. The subexpression match rule in the rationale requires precedence . .LI REP_LONGEST How repeated subexpressions that match more than once are handled: first if the longest possible matches occur first; last if the longest possible matches occur last; unknown otherwise. The subexpression match rule in the rationale requires first . .LI BUGS Miscellaneous bugs (see .xx link=categorize.dat for specific examples): .VL 6 .LI alternation-order A change in the order of subexpression alternation operands, "not involved in a tie" , changes regmatch_t values. Some implementations with this bug can be coaxed into missing the overall longest match. .LI first-match The first of the leftmost matches, instead of the longest of the leftmost matches, is returned. .LI nomatch-match A back-reference to a regmatch_t (-1,-1) value is treated as matching. .LI range-null A range-repeated subexpression that matches null does not report the match at offset (0,0). .LI repeat-artifact A regmatch_t value is reported for a repeated match that is not the last match. .LI repeat-artifact-nomatch To prevent not matching, a regmatch_t value is reported for a repeated match that is not the last match. .LI repeat-null A repeated subexpression matches the null string even though it is not the only match and is not necessary to satisfy the exact or minimum number of occurrences for an interval expression. .LI repeat-short Incorrect regmatch_t values for a repeated subexpression. This may be a variant of repeat-artifact . .LI subexpression-first A subexpression match takes precedence over a subpattern to its left. .LE .LE