xref: /openbsd/gnu/usr.bin/perl/pod/perlreref.pod (revision cca36db2)
1=head1 NAME
2
3perlreref - Perl Regular Expressions Reference
4
5=head1 DESCRIPTION
6
7This is a quick reference to Perl's regular expressions.
8For full information see L<perlre> and L<perlop>, as well
9as the L</"SEE ALSO"> section in this document.
10
11=head2 OPERATORS
12
13C<=~> determines to which variable the regex is applied.
14In its absence, $_ is used.
15
16    $var =~ /foo/;
17
18C<!~> determines to which variable the regex is applied,
19and negates the result of the match; it returns
20false if the match succeeds, and true if it fails.
21
22    $var !~ /foo/;
23
24C<m/pattern/msixpogc> searches a string for a pattern match,
25applying the given options.
26
27    m  Multiline mode - ^ and $ match internal lines
28    s  match as a Single line - . matches \n
29    i  case-Insensitive
30    x  eXtended legibility - free whitespace and comments
31    p  Preserve a copy of the matched string -
32       ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined.
33    o  compile pattern Once
34    g  Global - all occurrences
35    c  don't reset pos on failed matches when using /g
36
37If 'pattern' is an empty string, the last I<successfully> matched
38regex is used. Delimiters other than '/' may be used for both this
39operator and the following ones. The leading C<m> can be omitted
40if the delimiter is '/'.
41
42C<qr/pattern/msixpo> lets you store a regex in a variable,
43or pass one around. Modifiers as for C<m//>, and are stored
44within the regex.
45
46C<s/pattern/replacement/msixpogce> substitutes matches of
47'pattern' with 'replacement'. Modifiers as for C<m//>,
48with one addition:
49
50    e  Evaluate 'replacement' as an expression
51
52'e' may be specified multiple times. 'replacement' is interpreted
53as a double quoted string unless a single-quote (C<'>) is the delimiter.
54
55C<?pattern?> is like C<m/pattern/> but matches only once. No alternate
56delimiters can be used.  Must be reset with reset().
57
58=head2 SYNTAX
59
60   \       Escapes the character immediately following it
61   .       Matches any single character except a newline (unless /s is used)
62   ^       Matches at the beginning of the string (or line, if /m is used)
63   $       Matches at the end of the string (or line, if /m is used)
64   *       Matches the preceding element 0 or more times
65   +       Matches the preceding element 1 or more times
66   ?       Matches the preceding element 0 or 1 times
67   {...}   Specifies a range of occurrences for the element preceding it
68   [...]   Matches any one of the characters contained within the brackets
69   (...)   Groups subexpressions for capturing to $1, $2...
70   (?:...) Groups subexpressions without capturing (cluster)
71   |       Matches either the subexpression preceding or following it
72   \1, \2, \3 ...           Matches the text from the Nth group
73   \g1 or \g{1}, \g2 ...    Matches the text from the Nth group
74   \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group
75   \g{name}     Named backreference
76   \k<name>     Named backreference
77   \k'name'     Named backreference
78   (?P=name)    Named backreference (python syntax)
79
80=head2 ESCAPE SEQUENCES
81
82These work as in normal strings.
83
84   \a       Alarm (beep)
85   \e       Escape
86   \f       Formfeed
87   \n       Newline
88   \r       Carriage return
89   \t       Tab
90   \037     Any octal ASCII value
91   \x7f     Any hexadecimal ASCII value
92   \x{263a} A wide hexadecimal value
93   \cx      Control-x
94   \N{name} A named character
95   \N{U+263D} A Unicode character by hex ordinal
96
97   \l  Lowercase next character
98   \u  Titlecase next character
99   \L  Lowercase until \E
100   \U  Uppercase until \E
101   \Q  Disable pattern metacharacters until \E
102   \E  End modification
103
104For Titlecase, see L</Titlecase>.
105
106This one works differently from normal strings:
107
108   \b  An assertion, not backspace, except in a character class
109
110=head2 CHARACTER CLASSES
111
112   [amy]    Match 'a', 'm' or 'y'
113   [f-j]    Dash specifies "range"
114   [f-j-]   Dash escaped or at start or end means 'dash'
115   [^f-j]   Caret indicates "match any character _except_ these"
116
117The following sequences (except C<\N>) work within or without a character class.
118The first six are locale aware, all are Unicode aware. See L<perllocale>
119and L<perlunicode> for details.
120
121   \d      A digit
122   \D      A nondigit
123   \w      A word character
124   \W      A non-word character
125   \s      A whitespace character
126   \S      A non-whitespace character
127   \h      An horizontal whitespace
128   \H      A non horizontal whitespace
129   \N      A non newline (when not followed by '{NAME}'; experimental; not
130	   valid in a character class; equivalent to [^\n]; it's like '.'
131	   without /s modifier)
132   \v      A vertical whitespace
133   \V      A non vertical whitespace
134   \R      A generic newline           (?>\v|\x0D\x0A)
135
136   \C      Match a byte (with Unicode, '.' matches a character)
137   \pP     Match P-named (Unicode) property
138   \p{...} Match Unicode property with name longer than 1 character
139   \PP     Match non-P
140   \P{...} Match lack of Unicode property with name longer than 1 char
141   \X      Match Unicode extended grapheme cluster
142
143POSIX character classes and their Unicode and Perl equivalents:
144
145   alnum   IsAlnum              Alphanumeric
146   alpha   IsAlpha              Alphabetic
147   ascii   IsASCII              Any ASCII char
148   blank   IsSpace  [ \t]       Horizontal whitespace (GNU extension)
149   cntrl   IsCntrl              Control characters
150   digit   IsDigit  \d          Digits
151   graph   IsGraph              Alphanumeric and punctuation
152   lower   IsLower              Lowercase chars (locale and Unicode aware)
153   print   IsPrint              Alphanumeric, punct, and space
154   punct   IsPunct              Punctuation
155   space   IsSpace  [\s\ck]     Whitespace
156           IsSpacePerl   \s     Perl's whitespace definition
157   upper   IsUpper              Uppercase chars (locale and Unicode aware)
158   word    IsWord   \w          Alphanumeric plus _ (Perl extension)
159   xdigit  IsXDigit [0-9A-Fa-f] Hexadecimal digit
160
161Within a character class:
162
163    POSIX       traditional   Unicode
164    [:digit:]       \d        \p{IsDigit}
165    [:^digit:]      \D        \P{IsDigit}
166
167=head2 ANCHORS
168
169All are zero-width assertions.
170
171   ^  Match string start (or line, if /m is used)
172   $  Match string end (or line, if /m is used) or before newline
173   \b Match word boundary (between \w and \W)
174   \B Match except at word boundary (between \w and \w or \W and \W)
175   \A Match string start (regardless of /m)
176   \Z Match string end (before optional newline)
177   \z Match absolute string end
178   \G Match where previous m//g left off
179
180   \K Keep the stuff left of the \K, don't include it in $&
181
182=head2 QUANTIFIERS
183
184Quantifiers are greedy by default and match the B<longest> leftmost.
185
186   Maximal Minimal Possessive Allowed range
187   ------- ------- ---------- -------------
188   {n,m}   {n,m}?  {n,m}+     Must occur at least n times
189                              but no more than m times
190   {n,}    {n,}?   {n,}+      Must occur at least n times
191   {n}     {n}?    {n}+       Must occur exactly n times
192   *       *?      *+         0 or more times (same as {0,})
193   +       +?      ++         1 or more times (same as {1,})
194   ?       ??      ?+         0 or 1 time (same as {0,1})
195
196The possessive forms (new in Perl 5.10) prevent backtracking: what gets
197matched by a pattern with a possessive quantifier will not be backtracked
198into, even if that causes the whole match to fail.
199
200There is no quantifier C<{,n}>. That's interpreted as a literal string.
201
202=head2 EXTENDED CONSTRUCTS
203
204   (?#text)          A comment
205   (?:...)           Groups subexpressions without capturing (cluster)
206   (?pimsx-imsx:...) Enable/disable option (as per m// modifiers)
207   (?=...)           Zero-width positive lookahead assertion
208   (?!...)           Zero-width negative lookahead assertion
209   (?<=...)          Zero-width positive lookbehind assertion
210   (?<!...)          Zero-width negative lookbehind assertion
211   (?>...)           Grab what we can, prohibit backtracking
212   (?|...)           Branch reset
213   (?<name>...)      Named capture
214   (?'name'...)      Named capture
215   (?P<name>...)     Named capture (python syntax)
216   (?{ code })       Embedded code, return value becomes $^R
217   (??{ code })      Dynamic regex, return value used as regex
218   (?N)              Recurse into subpattern number N
219   (?-N), (?+N)      Recurse into Nth previous/next subpattern
220   (?R), (?0)        Recurse at the beginning of the whole pattern
221   (?&name)          Recurse into a named subpattern
222   (?P>name)         Recurse into a named subpattern (python syntax)
223   (?(cond)yes|no)
224   (?(cond)yes)      Conditional expression, where "cond" can be:
225                     (N)       subpattern N has matched something
226                     (<name>)  named subpattern has matched something
227                     ('name')  named subpattern has matched something
228                     (?{code}) code condition
229                     (R)       true if recursing
230                     (RN)      true if recursing into Nth subpattern
231                     (R&name)  true if recursing into named subpattern
232                     (DEFINE)  always false, no no-pattern allowed
233
234=head2 VARIABLES
235
236   $_    Default variable for operators to use
237
238   $`    Everything prior to matched string
239   $&    Entire matched string
240   $'    Everything after to matched string
241
242   ${^PREMATCH}   Everything prior to matched string
243   ${^MATCH}      Entire matched string
244   ${^POSTMATCH}  Everything after to matched string
245
246The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use
247within your program. Consult L<perlvar> for C<@->
248to see equivalent expressions that won't cause slow down.
249See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you
250can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}>
251and C<${^POSTMATCH}>, but for them to be defined, you have to
252specify the C</p> (preserve) modifier on your regular expression.
253
254   $1, $2 ...  hold the Xth captured expr
255   $+    Last parenthesized pattern match
256   $^N   Holds the most recently closed capture
257   $^R   Holds the result of the last (?{...}) expr
258   @-    Offsets of starts of groups. $-[0] holds start of whole match
259   @+    Offsets of ends of groups. $+[0] holds end of whole match
260   %+    Named capture buffers
261   %-    Named capture buffers, as array refs
262
263Captured groups are numbered according to their I<opening> paren.
264
265=head2 FUNCTIONS
266
267   lc          Lowercase a string
268   lcfirst     Lowercase first char of a string
269   uc          Uppercase a string
270   ucfirst     Titlecase first char of a string
271
272   pos         Return or set current match position
273   quotemeta   Quote metacharacters
274   reset       Reset ?pattern? status
275   study       Analyze string for optimizing matching
276
277   split       Use a regex to split a string into parts
278
279The first four of these are like the escape sequences C<\L>, C<\l>,
280C<\U>, and C<\u>.  For Titlecase, see L</Titlecase>.
281
282=head2 TERMINOLOGY
283
284=head3 Titlecase
285
286Unicode concept which most often is equal to uppercase, but for
287certain characters like the German "sharp s" there is a difference.
288
289=head1 AUTHOR
290
291Iain Truskett. Updated by the Perl 5 Porters.
292
293This document may be distributed under the same terms as Perl itself.
294
295=head1 SEE ALSO
296
297=over 4
298
299=item *
300
301L<perlretut> for a tutorial on regular expressions.
302
303=item *
304
305L<perlrequick> for a rapid tutorial.
306
307=item *
308
309L<perlre> for more details.
310
311=item *
312
313L<perlvar> for details on the variables.
314
315=item *
316
317L<perlop> for details on the operators.
318
319=item *
320
321L<perlfunc> for details on the functions.
322
323=item *
324
325L<perlfaq6> for FAQs on regular expressions.
326
327=item *
328
329L<perlrebackslash> for a reference on backslash sequences.
330
331=item *
332
333L<perlrecharclass> for a reference on character classes.
334
335=item *
336
337The L<re> module to alter behaviour and aid
338debugging.
339
340=item *
341
342L<perldebug/"Debugging regular expressions">
343
344=item *
345
346L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale>
347for details on regexes and internationalisation.
348
349=item *
350
351I<Mastering Regular Expressions> by Jeffrey Friedl
352(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and
353reference on the topic.
354
355=back
356
357=head1 THANKS
358
359David P.C. Wollmann,
360Richard Soderberg,
361Sean M. Burke,
362Tom Christiansen,
363Jim Cromie,
364and
365Jeffrey Goff
366for useful advice.
367
368=cut
369