1/*man-start********************************************************************* 2 3 4======================================================================== 5APPENDIX 7 - REGULAR EXPRESSIONS IN THE 6======================================================================== 7 8This appendix contains details on regular expression usage in THE. There are 9two places where THE uses regular expressions; in targets in commands like 10<LOCATE> and <ALL>, and in the specification of patterns in THE Language 11Definition files used for syntax highlighting. 12 13THE uses the GNU Regular Expression Library to implement regular expressions. 14This library has several different regular expression syntaxes that can be used 15when specifying targets. 16 17Note that all pattern specifications used for syntax highlighting always uses 18the EMACS regular expression syntax. 19 20The following table lists the features of each of the regular expression 21syntaxes that can be set via the <SET REGEXP> command. Each feature in the 22table is explained later. 23 24This appendix is not intended to explain everything about regular expressions. 25If you want to find out more about GNU Regular Expressions, then view the on-line 26documentation at <http://hessling-editor.sf.net/doc/regex/>. 27 28 +------------------------+----------------------------+ 29 | Syntax | Features | 30 +------------------------+----------------------------+ 31 | EMACS | None set | 32 +------------------------+----------------------------+ 33 | AWK | BACKSLASH_ESCAPE_IN_LISTS | 34 | | DOT_NOT_NULL | 35 | | NO_BACKSLASH_PARENS | 36 | | NO_BACKSLASH_REFS | 37 | | NO_BACKSLASH_VBAR | 38 | | NO_EMPTY_RANGES | 39 | | UNMATCHED_RIGHT_PAREND_ORD | 40 +------------------------+----------------------------+ 41 | POSIX_AWK | CHAR_CLASSES | 42 | | DOT_NEWLINE | 43 | | DOT_NOT_NULL | 44 | | INTERVALS | 45 | | NO_EMPTY_RANGES | 46 | | CONTEXT_INDEP_ANCHORS | 47 | | CONTEXT_INDEP_OPS | 48 | | NO_BACKSLASH_BRACES | 49 | | NO_BACKSLASH_PARENS | 50 | | NO_BACKSLASH_VBAR | 51 | | UNMATCHED_RIGHT_PAREN_ORD | 52 | | BACKSLASH_ESCAPE_IN_LISTS | 53 +------------------------+----------------------------+ 54 | GREP | BACKSLASH_PLUS_QM | 55 | | CHAR_CLASSES | 56 | | HAT_LISTS_NOT_NEWLINE | 57 | | INTERVALS | 58 | | NEWLINE_ALT | 59 +------------------------+----------------------------+ 60 | EGREP | CHAR_CLASSES | 61 | | HAT_LISTS_NOT_NEWLINE | 62 | | NEWLINE_ALT | 63 | | CONTEXT_INDEP_ANCHORS | 64 | | CONTEXT_INDEP_OPS | 65 | | NO_BACKSLASH_PARENS | 66 | | NO_BACKSLASH_VBAR | 67 +------------------------+----------------------------+ 68 | POSIX_EGREP | CHAR_CLASSES | 69 | | HAT_LISTS_NOT_NEWLINE | 70 | | NEWLINE_ALT | 71 | | CONTEXT_INDEP_ANCHORS | 72 | | CONTEXT_INDEP_OPS | 73 | | NO_BACKSLASH_PARENS | 74 | | NO_BACKSLASH_VBAR | 75 | | NO_BACKSLASH_BRACES | 76 | | INTERVALS | 77 +------------------------+----------------------------+ 78 | SED | CHAR_CLASSES | 79 | | DOT_NEWLINE | 80 | | DOT_NOT_NULL | 81 | | INTERVALS | 82 | | NO_EMPTY_RANGES | 83 | | BACKSLASH_PLUS_QM | 84 +------------------------+----------------------------+ 85 | POSIX_BASIC | CHAR_CLASSES | 86 | | DOT_NEWLINE | 87 | | DOT_NOT_NULL | 88 | | INTERVALS | 89 | | NO_EMPTY_RANGES | 90 | | BACKSLASH_PLUS_QM | 91 +------------------------+----------------------------+ 92 | POSIX_MINIMAL_BASIC | CHAR_CLASSES | 93 | | DOT_NEWLINE | 94 | | DOT_NOT_NULL | 95 | | INTERVALS | 96 | | NO_EMPTY_RANGES | 97 | | LIMITED_OPS | 98 +------------------------+----------------------------+ 99 | POSIX_EXTENDED | CHAR_CLASSES | 100 | | DOT_NEWLINE | 101 | | DOT_NOT_NULL | 102 | | INTERVALS | 103 | | NO_EMPTY_RANGES | 104 | | CONTEXT_INDEP_ANCHORS | 105 | | CONTEXT_INDEP_OPS | 106 | | NO_BACKSLASH_BRACES | 107 | | NO_BACKSLASH_PARENS | 108 | | NO_BACKSLASH_VBAR | 109 | | UNMATCHED_RIGHT_PAREN_ORD | 110 +------------------------+----------------------------+ 111 | POSIX_MINIMAL_EXTENDED | CHAR_CLASSES | 112 | | DOT_NEWLINE | 113 | | DOT_NOT_NULL | 114 | | INTERVALS | 115 | | NO_EMPTY_RANGES | 116 | | CONTEXT_INDEP_ANCHORS | 117 | | CONTEXT_INVALID_OPS | 118 | | NO_BACKSLASH_BRACES | 119 | | NO_BACKSLASH_PARENS | 120 | | NO_BACKSLASH_REFS | 121 | | NO_BACKSLASH_VBAR | 122 | | UNMATCHED_RIGHT_PAREN_ORD | 123 +------------------------+----------------------------+ 124 125------------ 126BACKSLASH_ESCAPE_IN_LISTS 127------------ 128If this feature is not set, then \ inside a bracket expression is 129literal. 130If set, then such a \ quotes the following character. 131 132------------ 133BACKSLASH_PLUS_QM 134------------ 135If this feature is not set, then + and ? are operators, and \+ and \? are 136literals. 137If set, then \+ and \? are operators and + and ? are literals. 138 139------------ 140CHAR_CLASSES 141------------ 142If this feature is set, then character classes are supported. 143They are: 144 [:alpha:], [:upper:], [:lower:], [:digit:], [:alnum:], [:xdigit:], 145 [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:]. 146If not set, then character classes are not supported. 147 148------------ 149CONTEXT_INDEP_ANCHORS 150------------ 151If this feature is set, then ^ and $ are always anchors (outside bracket 152expressions, of course). 153If this feature is not set, then it depends: 154 ^ is an anchor if it is at the beginning of a regular 155 expression or after an open-group or an alternation operator; 156 $ is an anchor if it is at the end of a regular expression, or 157 before a close-group or an alternation operator. 158 159This feature could be (re)combined with CONTEXT_INDEP_OPS, because 160POSIX draft 11.2 says that * etc. in leading positions is undefined. 161 162------------ 163CONTEXT_INDEP_OPS 164------------ 165If this feature is set, then special characters are always special regardless 166of where they are in the pattern. 167If this feature is not set, then special characters are special only in some 168contexts; otherwise they are ordinary. Specifically, * + ? and intervals 169are only special when not after the beginning, open-group, or alternation operator. 170 171------------ 172CONTEXT_INVALID_OPS 173------------ 174If this feature is set, then *, +, ?, and { cannot be first in an RE or immediately 175after an alternation or begin-group operator. 176 177------------ 178DOT_NEWLINE 179------------ 180If this feature is set, then . matches newline. If not set, then it does not. 181 182------------ 183DOT_NOT_NULL 184------------ 185If this feature is set, then . does not match NUL. If not set, then it does. 186 187------------ 188HAT_LISTS_NOT_NEWLINE 189------------ 190If this feature is set, nonmatching lists [^...] do not match newline. 191If not set, they do. 192 193------------ 194INTERVALS 195------------ 196If this feature is set, either \{...\} or {...} defines an interval, depending 197on NO_BACKSLASH_BRACES. 198If not set, \{, \}, {, and } are literals. 199 200------------ 201LIMITED_OPS 202------------ 203If this feature is set, +, ? and | are not recognized as operators. 204If not set, they are. 205 206------------ 207NEWLINE_ALT 208------------ 209If this feature is set, newline is an alternation operator. 210If not set, newline is literal. 211 212------------ 213NO_BACKSLASH_BRACES 214------------ 215If this feature is set, then `{...}' defines an interval, and \{ and \} are literals. 216If not set, then `\{...\}' defines an interval. 217 218------------ 219NO_BACKSLASH_PARENS 220------------ 221If this feature is set, (...) defines a group, and \( and \) are literals. 222If not set, \(...\) defines a group, and ( and ) are literals. 223 224------------ 225NO_BACKSLASH_REFS 226------------ 227If this feature is set, then \<digit> matches <digit>. If not set, then \<digit> 228is a back-reference. 229 230------------ 231NO_BACKSLASH_VBAR 232------------ 233If this feature is set, then | is an alternation operator, and \| is literal. 234If not set, then \| is an alternation operator, and | is literal. 235 236------------ 237NO_EMPTY_RANGES 238------------ 239If this feature is set, then an ending range point collating higher than the starting 240range point, as in [z-a], is invalid. 241If not set, then when ending range point collates higher than the starting range 242point, the range is ignored. 243 244------------ 245UNMATCHED_RIGHT_PAREN_ORD 246------------ 247If this feature is set, then an unmatched ) is ordinary. 248If not set, then an unmatched ) is invalid. 249 250**man-end**********************************************************************/ 251