1=head1 NAME 2 3perlreref - Perl Regular Expressions Reference 4 5=head1 DESCRIPTION 6 7This is a quick reference to Perl's regular expressions. 8For full information see L<perlre> and L<perlop>, as well 9as the L</"SEE ALSO"> section in this document. 10 11=head2 OPERATORS 12 13C<=~> determines to which variable the regex is applied. 14In its absence, $_ is used. 15 16 $var =~ /foo/; 17 18C<!~> determines to which variable the regex is applied, 19and negates the result of the match; it returns 20false if the match succeeds, and true if it fails. 21 22 $var !~ /foo/; 23 24C<m/pattern/msixpogc> searches a string for a pattern match, 25applying the given options. 26 27 m Multiline mode - ^ and $ match internal lines 28 s match as a Single line - . matches \n 29 i case-Insensitive 30 x eXtended legibility - free whitespace and comments 31 p Preserve a copy of the matched string - 32 ${^PREMATCH}, ${^MATCH}, ${^POSTMATCH} will be defined. 33 o compile pattern Once 34 g Global - all occurrences 35 c don't reset pos on failed matches when using /g 36 37If 'pattern' is an empty string, the last I<successfully> matched 38regex is used. Delimiters other than '/' may be used for both this 39operator and the following ones. The leading C<m> can be omitted 40if the delimiter is '/'. 41 42C<qr/pattern/msixpo> lets you store a regex in a variable, 43or pass one around. Modifiers as for C<m//>, and are stored 44within the regex. 45 46C<s/pattern/replacement/msixpogce> substitutes matches of 47'pattern' with 'replacement'. Modifiers as for C<m//>, 48with one addition: 49 50 e Evaluate 'replacement' as an expression 51 52'e' may be specified multiple times. 'replacement' is interpreted 53as a double quoted string unless a single-quote (C<'>) is the delimiter. 54 55C<?pattern?> is like C<m/pattern/> but matches only once. No alternate 56delimiters can be used. Must be reset with reset(). 57 58=head2 SYNTAX 59 60 \ Escapes the character immediately following it 61 . Matches any single character except a newline (unless /s is used) 62 ^ Matches at the beginning of the string (or line, if /m is used) 63 $ Matches at the end of the string (or line, if /m is used) 64 * Matches the preceding element 0 or more times 65 + Matches the preceding element 1 or more times 66 ? Matches the preceding element 0 or 1 times 67 {...} Specifies a range of occurrences for the element preceding it 68 [...] Matches any one of the characters contained within the brackets 69 (...) Groups subexpressions for capturing to $1, $2... 70 (?:...) Groups subexpressions without capturing (cluster) 71 | Matches either the subexpression preceding or following it 72 \1, \2, \3 ... Matches the text from the Nth group 73 \g1 or \g{1}, \g2 ... Matches the text from the Nth group 74 \g-1 or \g{-1}, \g-2 ... Matches the text from the Nth previous group 75 \g{name} Named backreference 76 \k<name> Named backreference 77 \k'name' Named backreference 78 (?P=name) Named backreference (python syntax) 79 80=head2 ESCAPE SEQUENCES 81 82These work as in normal strings. 83 84 \a Alarm (beep) 85 \e Escape 86 \f Formfeed 87 \n Newline 88 \r Carriage return 89 \t Tab 90 \037 Any octal ASCII value 91 \x7f Any hexadecimal ASCII value 92 \x{263a} A wide hexadecimal value 93 \cx Control-x 94 \N{name} A named character 95 \N{U+263D} A Unicode character by hex ordinal 96 97 \l Lowercase next character 98 \u Titlecase next character 99 \L Lowercase until \E 100 \U Uppercase until \E 101 \Q Disable pattern metacharacters until \E 102 \E End modification 103 104For Titlecase, see L</Titlecase>. 105 106This one works differently from normal strings: 107 108 \b An assertion, not backspace, except in a character class 109 110=head2 CHARACTER CLASSES 111 112 [amy] Match 'a', 'm' or 'y' 113 [f-j] Dash specifies "range" 114 [f-j-] Dash escaped or at start or end means 'dash' 115 [^f-j] Caret indicates "match any character _except_ these" 116 117The following sequences (except C<\N>) work within or without a character class. 118The first six are locale aware, all are Unicode aware. See L<perllocale> 119and L<perlunicode> for details. 120 121 \d A digit 122 \D A nondigit 123 \w A word character 124 \W A non-word character 125 \s A whitespace character 126 \S A non-whitespace character 127 \h An horizontal whitespace 128 \H A non horizontal whitespace 129 \N A non newline (when not followed by '{NAME}'; experimental; not 130 valid in a character class; equivalent to [^\n]; it's like '.' 131 without /s modifier) 132 \v A vertical whitespace 133 \V A non vertical whitespace 134 \R A generic newline (?>\v|\x0D\x0A) 135 136 \C Match a byte (with Unicode, '.' matches a character) 137 \pP Match P-named (Unicode) property 138 \p{...} Match Unicode property with name longer than 1 character 139 \PP Match non-P 140 \P{...} Match lack of Unicode property with name longer than 1 char 141 \X Match Unicode extended grapheme cluster 142 143POSIX character classes and their Unicode and Perl equivalents: 144 145 alnum IsAlnum Alphanumeric 146 alpha IsAlpha Alphabetic 147 ascii IsASCII Any ASCII char 148 blank IsSpace [ \t] Horizontal whitespace (GNU extension) 149 cntrl IsCntrl Control characters 150 digit IsDigit \d Digits 151 graph IsGraph Alphanumeric and punctuation 152 lower IsLower Lowercase chars (locale and Unicode aware) 153 print IsPrint Alphanumeric, punct, and space 154 punct IsPunct Punctuation 155 space IsSpace [\s\ck] Whitespace 156 IsSpacePerl \s Perl's whitespace definition 157 upper IsUpper Uppercase chars (locale and Unicode aware) 158 word IsWord \w Alphanumeric plus _ (Perl extension) 159 xdigit IsXDigit [0-9A-Fa-f] Hexadecimal digit 160 161Within a character class: 162 163 POSIX traditional Unicode 164 [:digit:] \d \p{IsDigit} 165 [:^digit:] \D \P{IsDigit} 166 167=head2 ANCHORS 168 169All are zero-width assertions. 170 171 ^ Match string start (or line, if /m is used) 172 $ Match string end (or line, if /m is used) or before newline 173 \b Match word boundary (between \w and \W) 174 \B Match except at word boundary (between \w and \w or \W and \W) 175 \A Match string start (regardless of /m) 176 \Z Match string end (before optional newline) 177 \z Match absolute string end 178 \G Match where previous m//g left off 179 180 \K Keep the stuff left of the \K, don't include it in $& 181 182=head2 QUANTIFIERS 183 184Quantifiers are greedy by default and match the B<longest> leftmost. 185 186 Maximal Minimal Possessive Allowed range 187 ------- ------- ---------- ------------- 188 {n,m} {n,m}? {n,m}+ Must occur at least n times 189 but no more than m times 190 {n,} {n,}? {n,}+ Must occur at least n times 191 {n} {n}? {n}+ Must occur exactly n times 192 * *? *+ 0 or more times (same as {0,}) 193 + +? ++ 1 or more times (same as {1,}) 194 ? ?? ?+ 0 or 1 time (same as {0,1}) 195 196The possessive forms (new in Perl 5.10) prevent backtracking: what gets 197matched by a pattern with a possessive quantifier will not be backtracked 198into, even if that causes the whole match to fail. 199 200There is no quantifier C<{,n}>. That's interpreted as a literal string. 201 202=head2 EXTENDED CONSTRUCTS 203 204 (?#text) A comment 205 (?:...) Groups subexpressions without capturing (cluster) 206 (?pimsx-imsx:...) Enable/disable option (as per m// modifiers) 207 (?=...) Zero-width positive lookahead assertion 208 (?!...) Zero-width negative lookahead assertion 209 (?<=...) Zero-width positive lookbehind assertion 210 (?<!...) Zero-width negative lookbehind assertion 211 (?>...) Grab what we can, prohibit backtracking 212 (?|...) Branch reset 213 (?<name>...) Named capture 214 (?'name'...) Named capture 215 (?P<name>...) Named capture (python syntax) 216 (?{ code }) Embedded code, return value becomes $^R 217 (??{ code }) Dynamic regex, return value used as regex 218 (?N) Recurse into subpattern number N 219 (?-N), (?+N) Recurse into Nth previous/next subpattern 220 (?R), (?0) Recurse at the beginning of the whole pattern 221 (?&name) Recurse into a named subpattern 222 (?P>name) Recurse into a named subpattern (python syntax) 223 (?(cond)yes|no) 224 (?(cond)yes) Conditional expression, where "cond" can be: 225 (N) subpattern N has matched something 226 (<name>) named subpattern has matched something 227 ('name') named subpattern has matched something 228 (?{code}) code condition 229 (R) true if recursing 230 (RN) true if recursing into Nth subpattern 231 (R&name) true if recursing into named subpattern 232 (DEFINE) always false, no no-pattern allowed 233 234=head2 VARIABLES 235 236 $_ Default variable for operators to use 237 238 $` Everything prior to matched string 239 $& Entire matched string 240 $' Everything after to matched string 241 242 ${^PREMATCH} Everything prior to matched string 243 ${^MATCH} Entire matched string 244 ${^POSTMATCH} Everything after to matched string 245 246The use of C<$`>, C<$&> or C<$'> will slow down B<all> regex use 247within your program. Consult L<perlvar> for C<@-> 248to see equivalent expressions that won't cause slow down. 249See also L<Devel::SawAmpersand>. Starting with Perl 5.10, you 250can also use the equivalent variables C<${^PREMATCH}>, C<${^MATCH}> 251and C<${^POSTMATCH}>, but for them to be defined, you have to 252specify the C</p> (preserve) modifier on your regular expression. 253 254 $1, $2 ... hold the Xth captured expr 255 $+ Last parenthesized pattern match 256 $^N Holds the most recently closed capture 257 $^R Holds the result of the last (?{...}) expr 258 @- Offsets of starts of groups. $-[0] holds start of whole match 259 @+ Offsets of ends of groups. $+[0] holds end of whole match 260 %+ Named capture buffers 261 %- Named capture buffers, as array refs 262 263Captured groups are numbered according to their I<opening> paren. 264 265=head2 FUNCTIONS 266 267 lc Lowercase a string 268 lcfirst Lowercase first char of a string 269 uc Uppercase a string 270 ucfirst Titlecase first char of a string 271 272 pos Return or set current match position 273 quotemeta Quote metacharacters 274 reset Reset ?pattern? status 275 study Analyze string for optimizing matching 276 277 split Use a regex to split a string into parts 278 279The first four of these are like the escape sequences C<\L>, C<\l>, 280C<\U>, and C<\u>. For Titlecase, see L</Titlecase>. 281 282=head2 TERMINOLOGY 283 284=head3 Titlecase 285 286Unicode concept which most often is equal to uppercase, but for 287certain characters like the German "sharp s" there is a difference. 288 289=head1 AUTHOR 290 291Iain Truskett. Updated by the Perl 5 Porters. 292 293This document may be distributed under the same terms as Perl itself. 294 295=head1 SEE ALSO 296 297=over 4 298 299=item * 300 301L<perlretut> for a tutorial on regular expressions. 302 303=item * 304 305L<perlrequick> for a rapid tutorial. 306 307=item * 308 309L<perlre> for more details. 310 311=item * 312 313L<perlvar> for details on the variables. 314 315=item * 316 317L<perlop> for details on the operators. 318 319=item * 320 321L<perlfunc> for details on the functions. 322 323=item * 324 325L<perlfaq6> for FAQs on regular expressions. 326 327=item * 328 329L<perlrebackslash> for a reference on backslash sequences. 330 331=item * 332 333L<perlrecharclass> for a reference on character classes. 334 335=item * 336 337The L<re> module to alter behaviour and aid 338debugging. 339 340=item * 341 342L<perldebug/"Debugging regular expressions"> 343 344=item * 345 346L<perluniintro>, L<perlunicode>, L<charnames> and L<perllocale> 347for details on regexes and internationalisation. 348 349=item * 350 351I<Mastering Regular Expressions> by Jeffrey Friedl 352(F<http://oreilly.com/catalog/9780596528126/>) for a thorough grounding and 353reference on the topic. 354 355=back 356 357=head1 THANKS 358 359David P.C. Wollmann, 360Richard Soderberg, 361Sean M. Burke, 362Tom Christiansen, 363Jim Cromie, 364and 365Jeffrey Goff 366for useful advice. 367 368=cut 369