1This material is derived from The GNU Emacs Manual, 9th edition.
2Copyright 1985, 1986, 1987, 1993 Free Software Foundation, Inc.
3and from the GNU regex library version 0.11.
4Copyright 1992 Free Software Foundation, Inc.
5
6The Free Software Foundation gives permission for the use of this
7material in e93, with different distribution terms from those stated
8in the Emacs manual, because e93 is free software, and thus worthy
9of cooperation.
10
11
12NOTE: this has been modified to reflect e93's syntax.
131/22/00 TMS (squirest@e93.org)
14
15Regular Expression Syntax
16
17Common Operators
18
19* Match-self Operator::                 Ordinary characters.
20* Match-any-character Operator::        .
21* Concatenation Operator::              Juxtaposition.
22* Repetition Operators::                *  +  ? {}
23* Alternation Operator::                |
24* List Operators::                      [...]  [^...]
25* Grouping Operators::                  (...)
26* Back-reference Operator::             \digit
27* Anchoring Operators::                 ^  $  <  >
28
29Repetition Operators
30
31* Match-zero-or-more Operator::  *
32* Match-one-or-more Operator::   +
33* Match-zero-or-one Operator::   ?
34* Interval Operators::           {}
35
36List Operators (`[' ... `]' and `[^' ... `]')
37
38* Range Operator::          start-end
39
40Anchoring Operators
41
42* Match-beginning-of-word Operator::  <
43* Match-end-of-word Operator::        >
44* Match-beginning-of-line Operator::  ^
45* Match-end-of-line Operator::        $
46
47
48Overview
49********
50
51  A "regular expression" (or "regexp", or "pattern") is a text string
52that describes some (mathematical) set of strings.  A regexp R
53"matches" a string S if S is in the set of strings described by R.
54
55  Some regular expressions match only one string, i.e., the set they
56describe has only one member.  For example, the regular expression
57`foo' matches the string `foo' and no others.  Other regular
58expressions match more than one string, i.e., the set they describe has
59more than one member.  For example, the regular expression `f*' matches
60the set of strings made up of any number (including zero) of `f's.  As
61you can see, some characters in regular expressions match themselves
62(such as `f') and some don't (such as `*'); the ones that don't match
63themselves instead let you specify patterns that describe many
64different strings.
65
66Regular Expression Syntax
67*************************
68
69  "Characters" are things you can type.  "Operators" are things in a
70regular expression that match one or more characters.  You compose
71regular expressions from operators, which in turn you specify using one
72or more characters.
73
74  Most characters represent what we call the match-self operator, i.e.,
75they match themselves; we call these characters "ordinary".  Other
76characters represent either all or parts of fancier operators; e.g.,
77`.' represents what we call the match-any-character operator (which, no
78surprise, matches any character); we call these characters
79"special".
80
81  In the following sections, we describe these things in more detail.
82
83The Backslash Character
84=======================
85
86  The `\' character quotes (makes ordinary, if it's special,
87  or possibly special if it's ordinary) the next character.
88
89	'\' sequences:
90
91	\n			-- stands for new line (0x0A).
92	\b			-- stands for backspace (0x08).
93	\r			-- stands for return (0x0D).
94	\t			-- stands for tab (0x09).
95	\x##		-- allows specification of arbitrary characters in hex
96				   for example \x0A is equivalent to \n.
97	\0 to \9	-- backreference to register (not valid within [])
98
99
100Common Operators
101****************
102
103  You compose regular expressions from operators.  In the following
104sections, we describe the regular expression operators.
105
106
107* Match-self Operator::                 Ordinary characters.
108* Match-any-character Operator::        .
109* Concatenation Operator::              Juxtaposition.
110* Repetition Operators::                *  +  ? {}
111* Alternation Operator::                |
112* List Operators::                      [...]  [^...]
113* Grouping Operators::                  (...)
114* Back-reference Operator::             \digit
115* Anchoring Operators::                 ^  $  <  >
116
117
118The Match-self Operator (ORDINARY CHARACTER)
119============================================
120
121  This operator matches the character itself.  All ordinary characters
122represent this operator.  For example, `f' is always an ordinary character,
123so the regular expression `f' matches only the string `f'.
124In particular, it does *not* match the string `ff'.
125
126
127The Match-any-character Operator (`.')
128======================================
129
130  This operator matches any single printing or nonprinting character except
131  newline (it is equivalent to `[^\n]').
132
133  NOTE: if you wish to match absolutely anything, use `[-]', or `[^]'.
134
135  The `.' (period) character represents this operator.  For example,
136`a.b' matches any three-character string beginning with `a' and ending
137with `b'.
138
139The Concatenation Operator
140==========================
141
142  This operator concatenates two regular expressions A and B. No
143character represents this operator; you simply put B after A.  The
144result is a regular expression that will match a string if A matches
145its first part and B matches the rest.  For example, `xy' (two
146match-self operators) matches `xy'.
147
148Repetition Operators
149====================
150
151  Repetition operators repeat the preceding regular expression a
152specified number of times.
153
154* Match-zero-or-more Operator::  *
155* Match-one-or-more Operator::   +
156* Match-zero-or-one Operator::   ?
157* Interval Operators::           {}
158
159The Match-zero-or-more Operator (`*')
160-------------------------------------
161
162  This operator repeats the smallest possible preceding regular
163expression as many times as necessary (including zero) to match the
164pattern. `*' represents this operator.  For example, `o*' matches any
165string made up of zero or more `o's.  Since this operator operates on
166the smallest preceding regular expression, `fo*' has a repeating `o',
167not a repeating `fo'.  So, `fo*' matches `f', `fo', `foo', and so on.
168
169  Since the match-zero-or-more operator is a suffix operator, it may
170not be applied when no regular expression precedes it.  This is the
171case when it:
172
173   * is first in a regular expression, or
174
175   * follows a match-beginning-of-line, match-end-of-line, open-group,
176     or alternation operator.
177
178  The matcher processes a match-zero-or-more operator by first matching
179as many repetitions of the smallest preceding regular expression as it
180can. Then it continues to match the rest of the pattern.
181
182  If it can't match the rest of the pattern, it backtracks (as many
183times as necessary), each time discarding one of the matches until it
184can either match the entire pattern or be certain that it cannot get a
185match.  For example, when matching `ca*ar' against `caaar', the matcher
186first matches all three `a's of the string with the `a*' of the regular
187expression.  However, it cannot then match the final `ar' of the
188regular expression against the final `r' of the string.  So it
189backtracks, discarding the match of the last `a' in the string.  It can
190then match the remaining `ar'.
191
192The Match-one-or-more Operator (`+')
193------------------------------------
194
195  This operator is similar to the match-zero-or-more operator except
196that it repeats the preceding regular expression at least once; *note
197Match-zero-or-more Operator::., for what it operates on, and how Regex
198backtracks to match it.
199
200  For example, supposing that `+' represents the match-one-or-more
201operator; then `ca+r' matches, e.g., `car' and `caaaar', but not `cr'.
202
203The Match-zero-or-one Operator (`?')
204------------------------------------
205
206  This operator is similar to the match-zero-or-more operator except
207that it repeats the preceding regular expression once or not at all;
208*note Match-zero-or-more Operator::., to see what it operates on, and
209how Regex backtracks to match it.
210
211  For example, supposing that `?' represents the match-zero-or-one
212operator; then `ca?r' matches both `car' and `cr', but nothing else.
213
214Interval Operators (`{' ... `}')
215----------------------------------
216
217Supposing that `{' and `}' represent the open-interval
218and close-interval operators; then:
219
220`{COUNT}'
221     matches exactly COUNT occurrences of the preceding regular
222     expression.
223
224`{MIN,}'
225     matches MIN or more occurrences of the preceding regular
226     expression.
227
228`{MIN, MAX}'
229     matches at least MIN but no more than MAX occurrences of the
230     preceding regular expression.
231
232  The interval expression (but not necessarily the regular expression
233that contains it) is invalid if:
234
235   * MIN is greater than MAX
236
237The Alternation Operator (`|')
238==============================
239
240  Alternatives match one of a choice of regular expressions: if you put
241the character(s) representing the alternation operator between any two
242regular expressions A and B, the result matches the union of the
243strings that A and B match.  For example, supposing that `|' is the
244alternation operator, then `foo|bar|quux' would match any of `foo',
245`bar' or `quux'.
246
247  The alternation operator operates on the *largest* possible
248surrounding regular expressions.  (Put another way, it has the lowest
249precedence of any regular expression operator.) Thus, the only way you
250can delimit its arguments is to use grouping.  For example, if `(' and
251`)' are the open and close-group operators, then `fo(o|b)ar' would
252match either `fooar' or `fobar'.  (`foo|bar' would match `foo' or
253`bar'.)
254
255  The matcher tries each combination of alternatives in order until it
256is able to make a match.
257
258List Operators (`[' ... `]' and `[^' ... `]')
259=============================================
260
261  "Lists", also called "bracket expressions", are a set of zero or more
262items.  An "item" is a character, or a range expression.
263
264  A "matching list" matches a single character represented by one of
265the list items.  You form a matching list by enclosing one or more items
266within an "open-matching-list operator" (represented by `[') and a
267"close-list operator" (represented by `]').
268
269  For example, `[ab]' matches either `a' or `b'. `[ad]*' matches the
270empty string and any string composed of just `a's and `d's in any
271order.  Regex considers invalid a regular expression with a `[' but no
272matching `]'.
273
274  "Nonmatching lists" are similar to matching lists except that they
275match a single character *not* represented by one of the list items.
276You use an "open-nonmatching-list operator" (represented by `[^')
277instead of an open-matching-list operator to start a nonmatching list.
278
279  For example, `[^ab]' matches any character except `a' or `b'.
280
281  Most characters lose any special meaning inside a list.  The special
282characters inside a list follow.
283
284`]'
285     ends the list unless quoted by '\'.
286
287`\'
288     quotes the next character.
289
290`-'
291     represents the range operator unless quoted by '\'.
292
293All other characters are ordinary.  For example, `[.*]' matches `.' and
294`*'.
295
296The Range Operator (`-')
297------------------------
298
299  Regex recognizes "range expressions" inside a list. They represent
300those characters that fall between two elements in the current
301collating sequence.  You form a range expression by putting a "range
302operator" between two characters. `-' represents the range operator.
303 For example, `a-f' within a list represents all the characters from `a'
304through `f' inclusively.
305
306  Since `-' represents the range operator, if you want to make a `-'
307character itself a list item, you must quote it with '\'.
308
309  Ranges do not need a start and end, if the start is omitted
310for example, `[-a]' matches all characters through lowercase 'a';
311if the end is omitted: '[a-]' matches lowercase 'a' through 0xFF;
312[-] matches all characters.
313
314Grouping Operators (`(' ... `)')
315=================================================
316
317  A "group", also known as a "subexpression", consists of an
318"open-group operator", any number of other operators, and a
319"close-group operator".  Regex treats this sequence as a unit, just as
320mathematics and programming languages treat a parenthesized expression
321as a unit. Groups can be empty.
322
323  Therefore, using "groups", you can:
324
325   * delimit the argument(s) to an alternation operator (*note
326     Alternation Operator::.) or a repetition operator (*note
327     Repetition Operators::.).
328
329   * keep track of the indices of the substring that matched a given
330     group. *Note Using Registers::, for a precise explanation. This
331     lets you:
332
333        * use the back-reference operator (*note Back-reference
334          Operator::.).
335
336        * use registers (*note Using Registers::.).
337
338The Back-reference Operator ("\"DIGIT)
339======================================
340
341A back reference matches a specified preceding group.
342The back reference operator is represented by `\DIGIT' anywhere after
343the end of a regular expression's DIGIT-th group (*note Grouping
344Operators::.).
345
346  DIGIT must be between `0' and `9'.  The matcher assigns numbers 0
347through 9 to the first ten groups it encounters.  By using one of `\0'
348through `\9' after the corresponding group's close-group operator, you
349can match a substring identical to the one that the group does.
350
351  Back references match according to the following (in all examples
352below, `(' represents the open-group, `)' the close-group, `{' the
353open-interval and `}' the close-interval operator):
354
355   * If the group matches a substring, the back reference matches an
356     identical substring.  For example, `(a)\0' matches `aa' and
357     `(bana)na\0bo\0' matches `bananabanabobana'.  Likewise, `(.*)\0'
358     matches any string that is composed of two identical halves; the `(.*)'
359     matches the first half and the `\0' matches the second half.
360
361   * If the group matches more than once (as it might if followed by,
362     e.g., a repetition operator), then the back reference matches the
363     substring the group *last* matched.  For example, `((a*)b)*\0\1'
364     matches `aabababa'; first group 0 (the outer one) matches `aab'
365     and group 1 (the inner one) matches `aa'.  Then group 0 matches
366     `ab' and group 1 matches `a'.  So, `\0' matches `ab' and `\1'
367     matches `a'.
368
369   * If the group doesn't participate in a match, i.e., it is part of an
370     alternative not taken or a repetition operator allows zero
371     repetitions of it, then the back reference makes the whole match
372     fail.
373
374  You can use a back reference as an argument to a repetition operator.
375 For example, `(a(b))\1*' matches `a' followed by one or more `b's.
376Similarly, `(a(b))\1{3}' matches `abbbb'.
377
378  If there is no preceding DIGIT-th subexpression, the regular
379expression is invalid.
380
381Anchoring Operators
382===================
383
384  These operators can appear anywhere (except lists) within a pattern
385and force that point in the pattern to match only at the beginning or end of a word or line.
386
387* Match-beginning-of-word Operator::  <
388* Match-end-of-word Operator::        >
389* Match-beginning-of-line Operator::  ^
390* Match-end-of-line Operator::        $
391
392
393The Match-beginning-of-word Operator (`<')
394------------------------------------------
395
396  This operator can match the empty string either at the beginning of
397the text or the beginning of a word. Thus, it is said to "anchor" the
398pattern to the beginning of a word.
399
400The Match-end-of-word Operator (`>')
401------------------------------------
402
403  This operator can match the empty string either at the end of the text
404or the end of a word. Thus, it is said to "anchor" the pattern to the
405end of a word.
406
407The Match-beginning-of-line Operator (`^')
408------------------------------------------
409
410  This operator can match the empty string either at the beginning of
411the text or after a newline character.  Thus, it is said to "anchor"
412the pattern to the beginning of a line.
413
414The Match-end-of-line Operator (`$')
415------------------------------------
416
417  This operator can match the empty string either at the end of the
418text or before a newline character in the text.  Thus, it is said
419to "anchor" the pattern to the end of a line.
420