1=head1 NAME 2 3perlreapi - Perl regular expression plugin interface 4 5=head1 DESCRIPTION 6 7As of Perl 5.9.5 there is a new interface for plugging and using 8regular expression engines other than the default one. 9 10Each engine is supposed to provide access to a constant structure of the 11following format: 12 13 typedef struct regexp_engine { 14 REGEXP* (*comp) (pTHX_ 15 const SV * const pattern, const U32 flags); 16 I32 (*exec) (pTHX_ 17 REGEXP * const rx, 18 char* stringarg, 19 char* strend, char* strbeg, 20 SSize_t minend, SV* sv, 21 void* data, U32 flags); 22 char* (*intuit) (pTHX_ 23 REGEXP * const rx, SV *sv, 24 const char * const strbeg, 25 char *strpos, char *strend, U32 flags, 26 struct re_scream_pos_data_s *data); 27 SV* (*checkstr) (pTHX_ REGEXP * const rx); 28 void (*free) (pTHX_ REGEXP * const rx); 29 void (*numbered_buff_FETCH) (pTHX_ 30 REGEXP * const rx, 31 const I32 paren, 32 SV * const sv); 33 void (*numbered_buff_STORE) (pTHX_ 34 REGEXP * const rx, 35 const I32 paren, 36 SV const * const value); 37 I32 (*numbered_buff_LENGTH) (pTHX_ 38 REGEXP * const rx, 39 const SV * const sv, 40 const I32 paren); 41 SV* (*named_buff) (pTHX_ 42 REGEXP * const rx, 43 SV * const key, 44 SV * const value, 45 U32 flags); 46 SV* (*named_buff_iter) (pTHX_ 47 REGEXP * const rx, 48 const SV * const lastkey, 49 const U32 flags); 50 SV* (*qr_package)(pTHX_ REGEXP * const rx); 51 #ifdef USE_ITHREADS 52 void* (*dupe) (pTHX_ REGEXP * const rx, CLONE_PARAMS *param); 53 #endif 54 REGEXP* (*op_comp) (...); 55 56 57When a regexp is compiled, its C<engine> field is then set to point at 58the appropriate structure, so that when it needs to be used Perl can find 59the right routines to do so. 60 61In order to install a new regexp handler, C<$^H{regcomp}> is set 62to an integer which (when casted appropriately) resolves to one of these 63structures. When compiling, the C<comp> method is executed, and the 64resulting C<regexp> structure's engine field is expected to point back at 65the same structure. 66 67The pTHX_ symbol in the definition is a macro used by Perl under threading 68to provide an extra argument to the routine holding a pointer back to 69the interpreter that is executing the regexp. So under threading all 70routines get an extra argument. 71 72=head1 Callbacks 73 74=head2 comp 75 76 REGEXP* comp(pTHX_ const SV * const pattern, const U32 flags); 77 78Compile the pattern stored in C<pattern> using the given C<flags> and 79return a pointer to a prepared C<REGEXP> structure that can perform 80the match. See L</The REGEXP structure> below for an explanation of 81the individual fields in the REGEXP struct. 82 83The C<pattern> parameter is the scalar that was used as the 84pattern. Previous versions of Perl would pass two C<char*> indicating 85the start and end of the stringified pattern; the following snippet can 86be used to get the old parameters: 87 88 STRLEN plen; 89 char* exp = SvPV(pattern, plen); 90 char* xend = exp + plen; 91 92Since any scalar can be passed as a pattern, it's possible to implement 93an engine that does something with an array (C<< "ook" =~ [ qw/ eek 94hlagh / ] >>) or with the non-stringified form of a compiled regular 95expression (C<< "ook" =~ qr/eek/ >>). Perl's own engine will always 96stringify everything using the snippet above, but that doesn't mean 97other engines have to. 98 99The C<flags> parameter is a bitfield which indicates which of the 100C<msixpn> flags the regex was compiled with. It also contains 101additional info, such as if C<use locale> is in effect. 102 103The C<eogc> flags are stripped out before being passed to the comp 104routine. The regex engine does not need to know if any of these 105are set, as those flags should only affect what Perl does with the 106pattern and its match variables, not how it gets compiled and 107executed. 108 109By the time the comp callback is called, some of these flags have 110already had effect (noted below where applicable). However most of 111their effect occurs after the comp callback has run, in routines that 112read the C<< rx->extflags >> field which it populates. 113 114In general the flags should be preserved in C<< rx->extflags >> after 115compilation, although the regex engine might want to add or delete 116some of them to invoke or disable some special behavior in Perl. The 117flags along with any special behavior they cause are documented below: 118 119The pattern modifiers: 120 121=over 4 122 123=item C</m> - RXf_PMf_MULTILINE 124 125If this is in C<< rx->extflags >> it will be passed to 126C<Perl_fbm_instr> by C<pp_split> which will treat the subject string 127as a multi-line string. 128 129=item C</s> - RXf_PMf_SINGLELINE 130 131=item C</i> - RXf_PMf_FOLD 132 133=item C</x> - RXf_PMf_EXTENDED 134 135If present on a regex, C<"#"> comments will be handled differently by the 136tokenizer in some cases. 137 138TODO: Document those cases. 139 140=item C</p> - RXf_PMf_KEEPCOPY 141 142TODO: Document this 143 144=item Character set 145 146The character set rules are determined by an enum that is contained 147in this field. This is still experimental and subject to change, but 148the current interface returns the rules by use of the in-line function 149C<get_regex_charset(const U32 flags)>. The only currently documented 150value returned from it is REGEX_LOCALE_CHARSET, which is set if 151C<use locale> is in effect. If present in C<< rx->extflags >>, 152C<split> will use the locale dependent definition of whitespace 153when RXf_SKIPWHITE or RXf_WHITE is in effect. ASCII whitespace 154is defined as per L<isSPACE|perlapi/isSPACE>, and by the internal 155macros C<is_utf8_space> under UTF-8, and C<isSPACE_LC> under C<use 156locale>. 157 158=back 159 160Additional flags: 161 162=over 4 163 164=item RXf_SPLIT 165 166This flag was removed in perl 5.18.0. C<split ' '> is now special-cased 167solely in the parser. RXf_SPLIT is still #defined, so you can test for it. 168This is how it used to work: 169 170If C<split> is invoked as C<split ' '> or with no arguments (which 171really means C<split(' ', $_)>, see L<split|perlfunc/split>), Perl will 172set this flag. The regex engine can then check for it and set the 173SKIPWHITE and WHITE extflags. To do this, the Perl engine does: 174 175 if (flags & RXf_SPLIT && r->prelen == 1 && r->precomp[0] == ' ') 176 r->extflags |= (RXf_SKIPWHITE|RXf_WHITE); 177 178=back 179 180These flags can be set during compilation to enable optimizations in 181the C<split> operator. 182 183=over 4 184 185=item RXf_SKIPWHITE 186 187This flag was removed in perl 5.18.0. It is still #defined, so you can 188set it, but doing so will have no effect. This is how it used to work: 189 190If the flag is present in C<< rx->extflags >> C<split> will delete 191whitespace from the start of the subject string before it's operated 192on. What is considered whitespace depends on if the subject is a 193UTF-8 string and if the C<RXf_PMf_LOCALE> flag is set. 194 195If RXf_WHITE is set in addition to this flag, C<split> will behave like 196C<split " "> under the Perl engine. 197 198=item RXf_START_ONLY 199 200Tells the split operator to split the target string on newlines 201(C<\n>) without invoking the regex engine. 202 203Perl's engine sets this if the pattern is C</^/> (C<plen == 1 && *exp 204== '^'>), even under C</^/s>; see L<split|perlfunc>. Of course a 205different regex engine might want to use the same optimizations 206with a different syntax. 207 208=item RXf_WHITE 209 210Tells the split operator to split the target string on whitespace 211without invoking the regex engine. The definition of whitespace varies 212depending on if the target string is a UTF-8 string and on 213if RXf_PMf_LOCALE is set. 214 215Perl's engine sets this flag if the pattern is C<\s+>. 216 217=item RXf_NULL 218 219Tells the split operator to split the target string on 220characters. The definition of character varies depending on if 221the target string is a UTF-8 string. 222 223Perl's engine sets this flag on empty patterns, this optimization 224makes C<split //> much faster than it would otherwise be. It's even 225faster than C<unpack>. 226 227=item RXf_NO_INPLACE_SUBST 228 229Added in perl 5.18.0, this flag indicates that a regular expression might 230perform an operation that would interfere with inplace substitution. For 231instance it might contain lookbehind, or assign to non-magical variables 232(such as $REGMARK and $REGERROR) during matching. C<s///> will skip 233certain optimisations when this is set. 234 235=back 236 237=head2 exec 238 239 I32 exec(pTHX_ REGEXP * const rx, 240 char *stringarg, char* strend, char* strbeg, 241 SSize_t minend, SV* sv, 242 void* data, U32 flags); 243 244Execute a regexp. The arguments are 245 246=over 4 247 248=item rx 249 250The regular expression to execute. 251 252=item sv 253 254This is the SV to be matched against. Note that the 255actual char array to be matched against is supplied by the arguments 256described below; the SV is just used to determine UTF8ness, C<pos()> etc. 257 258=item strbeg 259 260Pointer to the physical start of the string. 261 262=item strend 263 264Pointer to the character following the physical end of the string (i.e. 265the C<\0>, if any). 266 267=item stringarg 268 269Pointer to the position in the string where matching should start; it might 270not be equal to C<strbeg> (for example in a later iteration of C</.../g>). 271 272=item minend 273 274Minimum length of string (measured in bytes from C<stringarg>) that must 275match; if the engine reaches the end of the match but hasn't reached this 276position in the string, it should fail. 277 278=item data 279 280Optimisation data; subject to change. 281 282=item flags 283 284Optimisation flags; subject to change. 285 286=back 287 288=head2 intuit 289 290 char* intuit(pTHX_ 291 REGEXP * const rx, 292 SV *sv, 293 const char * const strbeg, 294 char *strpos, 295 char *strend, 296 const U32 flags, 297 struct re_scream_pos_data_s *data); 298 299Find the start position where a regex match should be attempted, 300or possibly if the regex engine should not be run because the 301pattern can't match. This is called, as appropriate, by the core, 302depending on the values of the C<extflags> member of the C<regexp> 303structure. 304 305Arguments: 306 307 rx: the regex to match against 308 sv: the SV being matched: only used for utf8 flag; the string 309 itself is accessed via the pointers below. Note that on 310 something like an overloaded SV, SvPOK(sv) may be false 311 and the string pointers may point to something unrelated to 312 the SV itself. 313 strbeg: real beginning of string 314 strpos: the point in the string at which to begin matching 315 strend: pointer to the byte following the last char of the string 316 flags currently unused; set to 0 317 data: currently unused; set to NULL 318 319 320=head2 checkstr 321 322 SV* checkstr(pTHX_ REGEXP * const rx); 323 324Return a SV containing a string that must appear in the pattern. Used 325by C<split> for optimising matches. 326 327=head2 free 328 329 void free(pTHX_ REGEXP * const rx); 330 331Called by Perl when it is freeing a regexp pattern so that the engine 332can release any resources pointed to by the C<pprivate> member of the 333C<regexp> structure. This is only responsible for freeing private data; 334Perl will handle releasing anything else contained in the C<regexp> structure. 335 336=head2 Numbered capture callbacks 337 338Called to get/set the value of C<$`>, C<$'>, C<$&> and their named 339equivalents, ${^PREMATCH}, ${^POSTMATCH} and ${^MATCH}, as well as the 340numbered capture groups (C<$1>, C<$2>, ...). 341 342The C<paren> parameter will be C<1> for C<$1>, C<2> for C<$2> and so 343forth, and have these symbolic values for the special variables: 344 345 ${^PREMATCH} RX_BUFF_IDX_CARET_PREMATCH 346 ${^POSTMATCH} RX_BUFF_IDX_CARET_POSTMATCH 347 ${^MATCH} RX_BUFF_IDX_CARET_FULLMATCH 348 $` RX_BUFF_IDX_PREMATCH 349 $' RX_BUFF_IDX_POSTMATCH 350 $& RX_BUFF_IDX_FULLMATCH 351 352Note that in Perl 5.17.3 and earlier, the last three constants were also 353used for the caret variants of the variables. 354 355 356The names have been chosen by analogy with L<Tie::Scalar> methods 357names with an additional B<LENGTH> callback for efficiency. However 358named capture variables are currently not tied internally but 359implemented via magic. 360 361=head3 numbered_buff_FETCH 362 363 void numbered_buff_FETCH(pTHX_ REGEXP * const rx, const I32 paren, 364 SV * const sv); 365 366Fetch a specified numbered capture. C<sv> should be set to the scalar 367to return, the scalar is passed as an argument rather than being 368returned from the function because when it's called Perl already has a 369scalar to store the value, creating another one would be 370redundant. The scalar can be set with C<sv_setsv>, C<sv_setpvn> and 371friends, see L<perlapi>. 372 373This callback is where Perl untaints its own capture variables under 374taint mode (see L<perlsec>). See the C<Perl_reg_numbered_buff_fetch> 375function in F<regcomp.c> for how to untaint capture variables if 376that's something you'd like your engine to do as well. 377 378=head3 numbered_buff_STORE 379 380 void (*numbered_buff_STORE) (pTHX_ 381 REGEXP * const rx, 382 const I32 paren, 383 SV const * const value); 384 385Set the value of a numbered capture variable. C<value> is the scalar 386that is to be used as the new value. It's up to the engine to make 387sure this is used as the new value (or reject it). 388 389Example: 390 391 if ("ook" =~ /(o*)/) { 392 # 'paren' will be '1' and 'value' will be 'ee' 393 $1 =~ tr/o/e/; 394 } 395 396Perl's own engine will croak on any attempt to modify the capture 397variables, to do this in another engine use the following callback 398(copied from C<Perl_reg_numbered_buff_store>): 399 400 void 401 Example_reg_numbered_buff_store(pTHX_ 402 REGEXP * const rx, 403 const I32 paren, 404 SV const * const value) 405 { 406 PERL_UNUSED_ARG(rx); 407 PERL_UNUSED_ARG(paren); 408 PERL_UNUSED_ARG(value); 409 410 if (!PL_localizing) 411 Perl_croak(aTHX_ PL_no_modify); 412 } 413 414Actually Perl will not I<always> croak in a statement that looks 415like it would modify a numbered capture variable. This is because the 416STORE callback will not be called if Perl can determine that it 417doesn't have to modify the value. This is exactly how tied variables 418behave in the same situation: 419 420 package CaptureVar; 421 use parent 'Tie::Scalar'; 422 423 sub TIESCALAR { bless [] } 424 sub FETCH { undef } 425 sub STORE { die "This doesn't get called" } 426 427 package main; 428 429 tie my $sv => "CaptureVar"; 430 $sv =~ y/a/b/; 431 432Because C<$sv> is C<undef> when the C<y///> operator is applied to it, 433the transliteration won't actually execute and the program won't 434C<die>. This is different to how 5.8 and earlier versions behaved 435since the capture variables were READONLY variables then; now they'll 436just die when assigned to in the default engine. 437 438=head3 numbered_buff_LENGTH 439 440 I32 numbered_buff_LENGTH (pTHX_ 441 REGEXP * const rx, 442 const SV * const sv, 443 const I32 paren); 444 445Get the C<length> of a capture variable. There's a special callback 446for this so that Perl doesn't have to do a FETCH and run C<length> on 447the result, since the length is (in Perl's case) known from an offset 448stored in C<< rx->offs >>, this is much more efficient: 449 450 I32 s1 = rx->offs[paren].start; 451 I32 s2 = rx->offs[paren].end; 452 I32 len = t1 - s1; 453 454This is a little bit more complex in the case of UTF-8, see what 455C<Perl_reg_numbered_buff_length> does with 456L<is_utf8_string_loclen|perlapi/is_utf8_string_loclen>. 457 458=head2 Named capture callbacks 459 460Called to get/set the value of C<%+> and C<%->, as well as by some 461utility functions in L<re>. 462 463There are two callbacks, C<named_buff> is called in all the cases the 464FETCH, STORE, DELETE, CLEAR, EXISTS and SCALAR L<Tie::Hash> callbacks 465would be on changes to C<%+> and C<%-> and C<named_buff_iter> in the 466same cases as FIRSTKEY and NEXTKEY. 467 468The C<flags> parameter can be used to determine which of these 469operations the callbacks should respond to. The following flags are 470currently defined: 471 472Which L<Tie::Hash> operation is being performed from the Perl level on 473C<%+> or C<%+>, if any: 474 475 RXapif_FETCH 476 RXapif_STORE 477 RXapif_DELETE 478 RXapif_CLEAR 479 RXapif_EXISTS 480 RXapif_SCALAR 481 RXapif_FIRSTKEY 482 RXapif_NEXTKEY 483 484If C<%+> or C<%-> is being operated on, if any. 485 486 RXapif_ONE /* %+ */ 487 RXapif_ALL /* %- */ 488 489If this is being called as C<re::regname>, C<re::regnames> or 490C<re::regnames_count>, if any. The first two will be combined with 491C<RXapif_ONE> or C<RXapif_ALL>. 492 493 RXapif_REGNAME 494 RXapif_REGNAMES 495 RXapif_REGNAMES_COUNT 496 497Internally C<%+> and C<%-> are implemented with a real tied interface 498via L<Tie::Hash::NamedCapture>. The methods in that package will call 499back into these functions. However the usage of 500L<Tie::Hash::NamedCapture> for this purpose might change in future 501releases. For instance this might be implemented by magic instead 502(would need an extension to mgvtbl). 503 504=head3 named_buff 505 506 SV* (*named_buff) (pTHX_ REGEXP * const rx, SV * const key, 507 SV * const value, U32 flags); 508 509=head3 named_buff_iter 510 511 SV* (*named_buff_iter) (pTHX_ 512 REGEXP * const rx, 513 const SV * const lastkey, 514 const U32 flags); 515 516=head2 qr_package 517 518 SV* qr_package(pTHX_ REGEXP * const rx); 519 520The package the qr// magic object is blessed into (as seen by C<ref 521qr//>). It is recommended that engines change this to their package 522name for identification regardless of if they implement methods 523on the object. 524 525The package this method returns should also have the internal 526C<Regexp> package in its C<@ISA>. C<< qr//->isa("Regexp") >> should always 527be true regardless of what engine is being used. 528 529Example implementation might be: 530 531 SV* 532 Example_qr_package(pTHX_ REGEXP * const rx) 533 { 534 PERL_UNUSED_ARG(rx); 535 return newSVpvs("re::engine::Example"); 536 } 537 538Any method calls on an object created with C<qr//> will be dispatched to the 539package as a normal object. 540 541 use re::engine::Example; 542 my $re = qr//; 543 $re->meth; # dispatched to re::engine::Example::meth() 544 545To retrieve the C<REGEXP> object from the scalar in an XS function use 546the C<SvRX> macro, see L<"REGEXP Functions" in perlapi|perlapi/REGEXP 547Functions>. 548 549 void meth(SV * rv) 550 PPCODE: 551 REGEXP * re = SvRX(sv); 552 553=head2 dupe 554 555 void* dupe(pTHX_ REGEXP * const rx, CLONE_PARAMS *param); 556 557On threaded builds a regexp may need to be duplicated so that the pattern 558can be used by multiple threads. This routine is expected to handle the 559duplication of any private data pointed to by the C<pprivate> member of 560the C<regexp> structure. It will be called with the preconstructed new 561C<regexp> structure as an argument, the C<pprivate> member will point at 562the B<old> private structure, and it is this routine's responsibility to 563construct a copy and return a pointer to it (which Perl will then use to 564overwrite the field as passed to this routine.) 565 566This allows the engine to dupe its private data but also if necessary 567modify the final structure if it really must. 568 569On unthreaded builds this field doesn't exist. 570 571=head2 op_comp 572 573This is private to the Perl core and subject to change. Should be left 574null. 575 576=head1 The REGEXP structure 577 578The REGEXP struct is defined in F<regexp.h>. 579All regex engines must be able to 580correctly build such a structure in their L</comp> routine. 581 582The REGEXP structure contains all the data that Perl needs to be aware of 583to properly work with the regular expression. It includes data about 584optimisations that Perl can use to determine if the regex engine should 585really be used, and various other control info that is needed to properly 586execute patterns in various contexts, such as if the pattern anchored in 587some way, or what flags were used during the compile, or if the 588program contains special constructs that Perl needs to be aware of. 589 590In addition it contains two fields that are intended for the private 591use of the regex engine that compiled the pattern. These are the 592C<intflags> and C<pprivate> members. C<pprivate> is a void pointer to 593an arbitrary structure, whose use and management is the responsibility 594of the compiling engine. Perl will never modify either of these 595values. 596 597 typedef struct regexp { 598 /* what engine created this regexp? */ 599 const struct regexp_engine* engine; 600 601 /* what re is this a lightweight copy of? */ 602 struct regexp* mother_re; 603 604 /* Information about the match that the Perl core uses to manage 605 * things */ 606 U32 extflags; /* Flags used both externally and internally */ 607 I32 minlen; /* mininum possible number of chars in */ 608 string to match */ 609 I32 minlenret; /* mininum possible number of chars in $& */ 610 U32 gofs; /* chars left of pos that we search from */ 611 612 /* substring data about strings that must appear 613 in the final match, used for optimisations */ 614 struct reg_substr_data *substrs; 615 616 U32 nparens; /* number of capture groups */ 617 618 /* private engine specific data */ 619 U32 intflags; /* Engine Specific Internal flags */ 620 void *pprivate; /* Data private to the regex engine which 621 created this object. */ 622 623 /* Data about the last/current match. These are modified during 624 * matching*/ 625 U32 lastparen; /* highest close paren matched ($+) */ 626 U32 lastcloseparen; /* last close paren matched ($^N) */ 627 regexp_paren_pair *offs; /* Array of offsets for (@-) and 628 (@+) */ 629 630 char *subbeg; /* saved or original string so \digit works 631 forever. */ 632 SV_SAVED_COPY /* If non-NULL, SV which is COW from original */ 633 I32 sublen; /* Length of string pointed by subbeg */ 634 I32 suboffset; /* byte offset of subbeg from logical start of 635 str */ 636 I32 subcoffset; /* suboffset equiv, but in chars (for @-/@+) */ 637 638 /* Information about the match that isn't often used */ 639 I32 prelen; /* length of precomp */ 640 const char *precomp; /* pre-compilation regular expression */ 641 642 char *wrapped; /* wrapped version of the pattern */ 643 I32 wraplen; /* length of wrapped */ 644 645 I32 seen_evals; /* number of eval groups in the pattern - for 646 security checks */ 647 HV *paren_names; /* Optional hash of paren names */ 648 649 /* Refcount of this regexp */ 650 I32 refcnt; /* Refcount of this regexp */ 651 } regexp; 652 653The fields are discussed in more detail below: 654 655=head2 C<engine> 656 657This field points at a C<regexp_engine> structure which contains pointers 658to the subroutines that are to be used for performing a match. It 659is the compiling routine's responsibility to populate this field before 660returning the regexp object. 661 662Internally this is set to C<NULL> unless a custom engine is specified in 663C<$^H{regcomp}>, Perl's own set of callbacks can be accessed in the struct 664pointed to by C<RE_ENGINE_PTR>. 665 666=head2 C<mother_re> 667 668TODO, see L<http://www.mail-archive.com/perl5-changes@perl.org/msg17328.html> 669 670=head2 C<extflags> 671 672This will be used by Perl to see what flags the regexp was compiled 673with, this will normally be set to the value of the flags parameter by 674the L<comp|/comp> callback. See the L<comp|/comp> documentation for 675valid flags. 676 677=head2 C<minlen> C<minlenret> 678 679The minimum string length (in characters) required for the pattern to match. 680This is used to 681prune the search space by not bothering to match any closer to the end of a 682string than would allow a match. For instance there is no point in even 683starting the regex engine if the minlen is 10 but the string is only 5 684characters long. There is no way that the pattern can match. 685 686C<minlenret> is the minimum length (in characters) of the string that would 687be found in $& after a match. 688 689The difference between C<minlen> and C<minlenret> can be seen in the 690following pattern: 691 692 /ns(?=\d)/ 693 694where the C<minlen> would be 3 but C<minlenret> would only be 2 as the \d is 695required to match but is not actually 696included in the matched content. This 697distinction is particularly important as the substitution logic uses the 698C<minlenret> to tell if it can do in-place substitutions (these can 699result in considerable speed-up). 700 701=head2 C<gofs> 702 703Left offset from pos() to start match at. 704 705=head2 C<substrs> 706 707Substring data about strings that must appear in the final match. This 708is currently only used internally by Perl's engine, but might be 709used in the future for all engines for optimisations. 710 711=head2 C<nparens>, C<lastparen>, and C<lastcloseparen> 712 713These fields are used to keep track of: how many paren capture groups 714there are in the pattern; which was the highest paren to be closed (see 715L<perlvar/$+>); and which was the most recent paren to be closed (see 716L<perlvar/$^N>). 717 718=head2 C<intflags> 719 720The engine's private copy of the flags the pattern was compiled with. Usually 721this is the same as C<extflags> unless the engine chose to modify one of them. 722 723=head2 C<pprivate> 724 725A void* pointing to an engine-defined 726data structure. The Perl engine uses the 727C<regexp_internal> structure (see L<perlreguts/Base Structures>) but a custom 728engine should use something else. 729 730=head2 C<offs> 731 732A C<regexp_paren_pair> structure which defines offsets into the string being 733matched which correspond to the C<$&> and C<$1>, C<$2> etc. captures, the 734C<regexp_paren_pair> struct is defined as follows: 735 736 typedef struct regexp_paren_pair { 737 I32 start; 738 I32 end; 739 } regexp_paren_pair; 740 741If C<< ->offs[num].start >> or C<< ->offs[num].end >> is C<-1> then that 742capture group did not match. 743C<< ->offs[0].start/end >> represents C<$&> (or 744C<${^MATCH}> under C</p>) and C<< ->offs[paren].end >> matches C<$$paren> where 745C<$paren >= 1>. 746 747=head2 C<precomp> C<prelen> 748 749Used for optimisations. C<precomp> holds a copy of the pattern that 750was compiled and C<prelen> its length. When a new pattern is to be 751compiled (such as inside a loop) the internal C<regcomp> operator 752checks if the last compiled C<REGEXP>'s C<precomp> and C<prelen> 753are equivalent to the new one, and if so uses the old pattern instead 754of compiling a new one. 755 756The relevant snippet from C<Perl_pp_regcomp>: 757 758 if (!re || !re->precomp || re->prelen != (I32)len || 759 memNE(re->precomp, t, len)) 760 /* Compile a new pattern */ 761 762=head2 C<paren_names> 763 764This is a hash used internally to track named capture groups and their 765offsets. The keys are the names of the buffers the values are dualvars, 766with the IV slot holding the number of buffers with the given name and the 767pv being an embedded array of I32. The values may also be contained 768independently in the data array in cases where named backreferences are 769used. 770 771=head2 C<substrs> 772 773Holds information on the longest string that must occur at a fixed 774offset from the start of the pattern, and the longest string that must 775occur at a floating offset from the start of the pattern. Used to do 776Fast-Boyer-Moore searches on the string to find out if its worth using 777the regex engine at all, and if so where in the string to search. 778 779=head2 C<subbeg> C<sublen> C<saved_copy> C<suboffset> C<subcoffset> 780 781Used during the execution phase for managing search and replace patterns, 782and for providing the text for C<$&>, C<$1> etc. C<subbeg> points to a 783buffer (either the original string, or a copy in the case of 784C<RX_MATCH_COPIED(rx)>), and C<sublen> is the length of the buffer. The 785C<RX_OFFS> start and end indices index into this buffer. 786 787In the presence of the C<REXEC_COPY_STR> flag, but with the addition of 788the C<REXEC_COPY_SKIP_PRE> or C<REXEC_COPY_SKIP_POST> flags, an engine 789can choose not to copy the full buffer (although it must still do so in 790the presence of C<RXf_PMf_KEEPCOPY> or the relevant bits being set in 791C<PL_sawampersand>). In this case, it may set C<suboffset> to indicate the 792number of bytes from the logical start of the buffer to the physical start 793(i.e. C<subbeg>). It should also set C<subcoffset>, the number of 794characters in the offset. The latter is needed to support C<@-> and C<@+> 795which work in characters, not bytes. 796 797=head2 C<wrapped> C<wraplen> 798 799Stores the string C<qr//> stringifies to. The Perl engine for example 800stores C<(?^:eek)> in the case of C<qr/eek/>. 801 802When using a custom engine that doesn't support the C<(?:)> construct 803for inline modifiers, it's probably best to have C<qr//> stringify to 804the supplied pattern, note that this will create undesired patterns in 805cases such as: 806 807 my $x = qr/a|b/; # "a|b" 808 my $y = qr/c/i; # "c" 809 my $z = qr/$x$y/; # "a|bc" 810 811There's no solution for this problem other than making the custom 812engine understand a construct like C<(?:)>. 813 814=head2 C<seen_evals> 815 816This stores the number of eval groups in 817the pattern. This is used for security 818purposes when embedding compiled regexes into larger patterns with C<qr//>. 819 820=head2 C<refcnt> 821 822The number of times the structure is referenced. When 823this falls to 0, the regexp is automatically freed 824by a call to pregfree. This should be set to 1 in 825each engine's L</comp> routine. 826 827=head1 HISTORY 828 829Originally part of L<perlreguts>. 830 831=head1 AUTHORS 832 833Originally written by Yves Orton, expanded by E<AElig>var ArnfjE<ouml>rE<eth> 834Bjarmason. 835 836=head1 LICENSE 837 838Copyright 2006 Yves Orton and 2007 E<AElig>var ArnfjE<ouml>rE<eth> Bjarmason. 839 840This program is free software; you can redistribute it and/or modify it under 841the same terms as Perl itself. 842 843=cut 844