1/*man-start*********************************************************************
2
3
4========================================================================
5APPENDIX 7 - REGULAR EXPRESSIONS IN THE
6========================================================================
7
8This appendix contains details on regular expression usage in THE.  There are
9two places where THE uses regular expressions; in targets in commands like
10<LOCATE> and <ALL>, and in the specification of patterns in THE Language
11Definition files used for syntax highlighting.
12
13THE uses the GNU Regular Expression Library to implement regular expressions.
14This library has several different regular expression syntaxes that can be used
15when specifying targets.
16
17Note that all pattern specifications used for syntax highlighting always uses
18the EMACS regular expression syntax.
19
20The following table lists the features of each of the regular expression
21syntaxes that can be set via the <SET REGEXP> command.  Each feature in the
22table is explained later.
23
24This appendix is not intended to explain everything about regular expressions.
25If you want to find out more about GNU Regular Expressions, then view the on-line
26documentation at <http://hessling-editor.sf.net/doc/regex/>.
27
28 +------------------------+----------------------------+
29 | Syntax                 | Features                   |
30 +------------------------+----------------------------+
31 | EMACS                  | None set                   |
32 +------------------------+----------------------------+
33 | AWK                    | BACKSLASH_ESCAPE_IN_LISTS  |
34 |                        | DOT_NOT_NULL               |
35 |                        | NO_BACKSLASH_PARENS        |
36 |                        | NO_BACKSLASH_REFS          |
37 |                        | NO_BACKSLASH_VBAR          |
38 |                        | NO_EMPTY_RANGES            |
39 |                        | UNMATCHED_RIGHT_PAREND_ORD |
40 +------------------------+----------------------------+
41 | POSIX_AWK              | CHAR_CLASSES               |
42 |                        | DOT_NEWLINE                |
43 |                        | DOT_NOT_NULL               |
44 |                        | INTERVALS                  |
45 |                        | NO_EMPTY_RANGES            |
46 |                        | CONTEXT_INDEP_ANCHORS      |
47 |                        | CONTEXT_INDEP_OPS          |
48 |                        | NO_BACKSLASH_BRACES        |
49 |                        | NO_BACKSLASH_PARENS        |
50 |                        | NO_BACKSLASH_VBAR          |
51 |                        | UNMATCHED_RIGHT_PAREN_ORD  |
52 |                        | BACKSLASH_ESCAPE_IN_LISTS  |
53 +------------------------+----------------------------+
54 | GREP                   | BACKSLASH_PLUS_QM          |
55 |                        | CHAR_CLASSES               |
56 |                        | HAT_LISTS_NOT_NEWLINE      |
57 |                        | INTERVALS                  |
58 |                        | NEWLINE_ALT                |
59 +------------------------+----------------------------+
60 | EGREP                  | CHAR_CLASSES               |
61 |                        | HAT_LISTS_NOT_NEWLINE      |
62 |                        | NEWLINE_ALT                |
63 |                        | CONTEXT_INDEP_ANCHORS      |
64 |                        | CONTEXT_INDEP_OPS          |
65 |                        | NO_BACKSLASH_PARENS        |
66 |                        | NO_BACKSLASH_VBAR          |
67 +------------------------+----------------------------+
68 | POSIX_EGREP            | CHAR_CLASSES               |
69 |                        | HAT_LISTS_NOT_NEWLINE      |
70 |                        | NEWLINE_ALT                |
71 |                        | CONTEXT_INDEP_ANCHORS      |
72 |                        | CONTEXT_INDEP_OPS          |
73 |                        | NO_BACKSLASH_PARENS        |
74 |                        | NO_BACKSLASH_VBAR          |
75 |                        | NO_BACKSLASH_BRACES        |
76 |                        | INTERVALS                  |
77 +------------------------+----------------------------+
78 | SED                    | CHAR_CLASSES               |
79 |                        | DOT_NEWLINE                |
80 |                        | DOT_NOT_NULL               |
81 |                        | INTERVALS                  |
82 |                        | NO_EMPTY_RANGES            |
83 |                        | BACKSLASH_PLUS_QM          |
84 +------------------------+----------------------------+
85 | POSIX_BASIC            | CHAR_CLASSES               |
86 |                        | DOT_NEWLINE                |
87 |                        | DOT_NOT_NULL               |
88 |                        | INTERVALS                  |
89 |                        | NO_EMPTY_RANGES            |
90 |                        | BACKSLASH_PLUS_QM          |
91 +------------------------+----------------------------+
92 | POSIX_MINIMAL_BASIC    | CHAR_CLASSES               |
93 |                        | DOT_NEWLINE                |
94 |                        | DOT_NOT_NULL               |
95 |                        | INTERVALS                  |
96 |                        | NO_EMPTY_RANGES            |
97 |                        | LIMITED_OPS                |
98 +------------------------+----------------------------+
99 | POSIX_EXTENDED         | CHAR_CLASSES               |
100 |                        | DOT_NEWLINE                |
101 |                        | DOT_NOT_NULL               |
102 |                        | INTERVALS                  |
103 |                        | NO_EMPTY_RANGES            |
104 |                        | CONTEXT_INDEP_ANCHORS      |
105 |                        | CONTEXT_INDEP_OPS          |
106 |                        | NO_BACKSLASH_BRACES        |
107 |                        | NO_BACKSLASH_PARENS        |
108 |                        | NO_BACKSLASH_VBAR          |
109 |                        | UNMATCHED_RIGHT_PAREN_ORD  |
110 +------------------------+----------------------------+
111 | POSIX_MINIMAL_EXTENDED | CHAR_CLASSES               |
112 |                        | DOT_NEWLINE                |
113 |                        | DOT_NOT_NULL               |
114 |                        | INTERVALS                  |
115 |                        | NO_EMPTY_RANGES            |
116 |                        | CONTEXT_INDEP_ANCHORS      |
117 |                        | CONTEXT_INVALID_OPS        |
118 |                        | NO_BACKSLASH_BRACES        |
119 |                        | NO_BACKSLASH_PARENS        |
120 |                        | NO_BACKSLASH_REFS          |
121 |                        | NO_BACKSLASH_VBAR          |
122 |                        | UNMATCHED_RIGHT_PAREN_ORD  |
123 +------------------------+----------------------------+
124
125------------
126BACKSLASH_ESCAPE_IN_LISTS
127------------
128If this feature is not set, then \ inside a bracket expression is
129literal.
130If set, then such a \ quotes the following character.
131
132------------
133BACKSLASH_PLUS_QM
134------------
135If this feature is not set, then + and ? are operators, and \+ and \? are
136literals.
137If set, then \+ and \? are operators and + and ? are literals.
138
139------------
140CHAR_CLASSES
141------------
142If this feature is set, then character classes are supported.
143They are:
144  [:alpha:], [:upper:], [:lower:],  [:digit:], [:alnum:], [:xdigit:],
145  [:space:], [:print:], [:punct:], [:graph:], and [:cntrl:].
146If not set, then character classes are not supported.
147
148------------
149CONTEXT_INDEP_ANCHORS
150------------
151If this feature is set, then ^ and $ are always anchors (outside bracket
152expressions, of course).
153If this feature is not set, then it depends:
154     ^  is an anchor if it is at the beginning of a regular
155        expression or after an open-group or an alternation operator;
156     $  is an anchor if it is at the end of a regular expression, or
157        before a close-group or an alternation operator.
158
159This feature could be (re)combined with CONTEXT_INDEP_OPS, because
160POSIX draft 11.2 says that * etc. in leading positions is undefined.
161
162------------
163CONTEXT_INDEP_OPS
164------------
165If this feature is set, then special characters are always special regardless
166of where they are in the pattern.
167If this feature is not set, then special characters are special only in some
168contexts; otherwise they are ordinary.  Specifically, * + ? and intervals
169are only special when not after the beginning, open-group, or alternation operator.
170
171------------
172CONTEXT_INVALID_OPS
173------------
174If this feature is set, then *, +, ?, and { cannot be first in an RE or immediately
175after an alternation or begin-group operator.
176
177------------
178DOT_NEWLINE
179------------
180If this feature is set, then . matches newline. If not set, then it does not.
181
182------------
183DOT_NOT_NULL
184------------
185If this feature is set, then . does not match NUL. If not set, then it does.
186
187------------
188HAT_LISTS_NOT_NEWLINE
189------------
190If this feature is set, nonmatching lists [^...] do not match newline.
191If not set, they do.
192
193------------
194INTERVALS
195------------
196If this feature is set, either \{...\} or {...} defines an interval, depending
197on NO_BACKSLASH_BRACES.
198If not set, \{, \}, {, and } are literals.
199
200------------
201LIMITED_OPS
202------------
203If this feature is set, +, ? and | are not recognized as operators.
204If not set, they are.
205
206------------
207NEWLINE_ALT
208------------
209If this feature is set, newline is an alternation operator.
210If not set, newline is literal.
211
212------------
213NO_BACKSLASH_BRACES
214------------
215If this feature is set, then `{...}' defines an interval, and \{ and \} are literals.
216If not set, then `\{...\}' defines an interval.
217
218------------
219NO_BACKSLASH_PARENS
220------------
221If this feature is set, (...) defines a group, and \( and \) are literals.
222If not set, \(...\) defines a group, and ( and ) are literals.
223
224------------
225NO_BACKSLASH_REFS
226------------
227If this feature is set, then \<digit> matches <digit>. If not set, then \<digit>
228is a back-reference.
229
230------------
231NO_BACKSLASH_VBAR
232------------
233If this feature is set, then | is an alternation operator, and \| is literal.
234If not set, then \| is an alternation operator, and | is literal.
235
236------------
237NO_EMPTY_RANGES
238------------
239If this feature is set, then an ending range point collating higher than the starting
240range point, as in [z-a], is invalid.
241If not set, then when ending range point collates higher than the starting range
242point, the range is ignored.
243
244------------
245UNMATCHED_RIGHT_PAREN_ORD
246------------
247If this feature is set, then an unmatched ) is ordinary.
248If not set, then an unmatched ) is invalid.
249
250**man-end**********************************************************************/
251