1=head1 NAME 2 3perlguts - Introduction to the Perl API 4 5=head1 DESCRIPTION 6 7This document attempts to describe how to use the Perl API, as well as 8to provide some info on the basic workings of the Perl core. It is far 9from complete and probably contains many errors. Please refer any 10questions or comments to the author below. 11 12=head1 Variables 13 14=head2 Datatypes 15 16Perl has three typedefs that handle Perl's three main data types: 17 18 SV Scalar Value 19 AV Array Value 20 HV Hash Value 21 22Each typedef has specific routines that manipulate the various data types. 23 24=for apidoc_section $AV 25=for apidoc Ayh||AV 26=for apidoc_section $HV 27=for apidoc Ayh||HV 28=for apidoc_section $SV 29=for apidoc Ayh||SV 30 31=head2 What is an "IV"? 32 33Perl uses a special typedef IV which is a simple signed integer type that is 34guaranteed to be large enough to hold a pointer (as well as an integer). 35Additionally, there is the UV, which is simply an unsigned IV. 36 37Perl also uses several special typedefs to declare variables to hold 38integers of (at least) a given size. 39Use I8, I16, I32, and I64 to declare a signed integer variable which has 40at least as many bits as the number in its name. These all evaluate to 41the native C type that is closest to the given number of bits, but no 42smaller than that number. For example, on many platforms, a C<short> is 4316 bits long, and if so, I16 will evaluate to a C<short>. But on 44platforms where a C<short> isn't exactly 16 bits, Perl will use the 45smallest type that contains 16 bits or more. 46 47U8, U16, U32, and U64 are to declare the corresponding unsigned integer 48types. 49 50If the platform doesn't support 64-bit integers, both I64 and U64 will 51be undefined. Use IV and UV to declare the largest practicable, and 52C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which 53may not be usable in all circumstances. 54 55A numeric constant can be specified with L<perlapi/C<INT16_C>>, 56L<perlapi/C<UINTMAX_C>>, and similar. 57 58=for apidoc_section $integer 59=for apidoc Ayh||I8 60=for apidoc_item ||I16 61=for apidoc_item ||I32 62=for apidoc_item ||I64 63=for apidoc_item ||IV 64 65=for apidoc Ayh||U8 66=for apidoc_item ||U16 67=for apidoc_item ||U32 68=for apidoc_item ||U64 69=for apidoc_item ||UV 70 71=head2 Working with SVs 72 73An SV can be created and loaded with one command. There are five types of 74values that can be loaded: an integer value (IV), an unsigned integer 75value (UV), a double (NV), a string (PV), and another scalar (SV). 76("PV" stands for "Pointer Value". You might think that it is misnamed 77because it is described as pointing only to strings. However, it is 78possible to have it point to other things. For example, it could point 79to an array of UVs. But, 80using it for non-strings requires care, as the underlying assumption of 81much of the internals is that PVs are just for strings. Often, for 82example, a trailing C<NUL> is tacked on automatically. The non-string use 83is documented only in this paragraph.) 84 85=for apidoc Ayh||NV 86 87The seven routines are: 88 89 SV* newSViv(IV); 90 SV* newSVuv(UV); 91 SV* newSVnv(double); 92 SV* newSVpv(const char*, STRLEN); 93 SV* newSVpvn(const char*, STRLEN); 94 SV* newSVpvf(const char*, ...); 95 SV* newSVsv(SV*); 96 97C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in 98F<config.h>) guaranteed to be large enough to represent the size of 99any string that perl can handle. 100 101=for apidoc Ayh||STRLEN 102 103In the unlikely case of a SV requiring more complex initialization, you 104can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 105type NULL is returned, else an SV of type PV is returned with len + 1 (for 106the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 107the SV has the undef value. 108 109 SV *sv = newSV(0); /* no storage allocated */ 110 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 111 * allocated */ 112 113To change the value of an I<already-existing> SV, there are eight routines: 114 115 void sv_setiv(SV*, IV); 116 void sv_setuv(SV*, UV); 117 void sv_setnv(SV*, double); 118 void sv_setpv(SV*, const char*); 119 void sv_setpvn(SV*, const char*, STRLEN) 120 void sv_setpvf(SV*, const char*, ...); 121 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 122 SV **, Size_t, bool *); 123 void sv_setsv(SV*, SV*); 124 125Notice that you can choose to specify the length of the string to be 126assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 127allow Perl to calculate the length by using C<sv_setpv> or by specifying 1280 as the second argument to C<newSVpv>. Be warned, though, that Perl will 129determine the string's length by using C<strlen>, which depends on the 130string terminating with a C<NUL> character, and not otherwise containing 131NULs. 132 133The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 134formatted output becomes the value. 135 136C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 137either a pointer to a variable argument list or the address and length of 138an array of SVs. The last argument points to a boolean; on return, if that 139boolean is true, then locale-specific information has been used to format 140the string, and the string's contents are therefore untrustworthy (see 141L<perlsec>). This pointer may be NULL if that information is not 142important. Note that this function requires you to specify the length of 143the format. 144 145The C<sv_set*()> functions are not generic enough to operate on values 146that have "magic". See L</Magic Virtual Tables> later in this document. 147 148All SVs that contain strings should be terminated with a C<NUL> character. 149If it is not C<NUL>-terminated there is a risk of 150core dumps and corruptions from code which passes the string to C 151functions or system calls which expect a C<NUL>-terminated string. 152Perl's own functions typically add a trailing C<NUL> for this reason. 153Nevertheless, you should be very careful when you pass a string stored 154in an SV to a C function or system call. 155 156To access the actual value that an SV points to, Perl's API exposes 157several macros that coerce the actual scalar type into an IV, UV, double, 158or string: 159 160=over 161 162=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) 163 164=item * C<SvNV(SV*)> (C<double>) 165 166=item * Strings are a bit complicated: 167 168=over 169 170=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> 171 172If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. 173 174This is suitable for Perl strings that represent bytes. 175 176=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> 177 178If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. 179 180This is suitable for Perl strings that represent characters. 181 182B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, 183which means that if the SV contains non-Unicode code points (e.g., 1840x110000), then the result may contain extensions over valid UTF-8. 185See L<perlapi/is_strict_utf8_string> for some methods Perl gives 186you to check the UTF-8 validity of these macros' returns. 187 188=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> 189to fetch the SV's raw internal buffer. This is tricky, though; if your Perl 190string 191is C<"\xff\xff">, then depending on the SV's internal encoding you might get 192back a 2-byte B<OR> a 4-byte C<char*>. 193Moreover, if it's the 4-byte string, that could come from either Perl 194C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored 195as raw octets. To differentiate between these you B<MUST> look up the 196SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string 197is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be 198off). 199 200B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or 201similarly-named macros I<without> looking up the SV's UTF8 bit is 202almost certainly a bug if non-ASCII input is allowed. 203 204When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies 205here as for C<SvPVutf8>. 206 207=back 208 209(See L</How do I pass a Perl string to a C library?> for more details.) 210 211In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned 212is placed into the 213variable C<len> (these are macros, so you do I<not> use C<&len>). If you do 214not care what the length of the data is, use C<SvPVbyte_nolen>, 215C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. 216The global variable C<PL_na> can also be given to 217C<SvPVbyte>/C<SvPVutf8>/C<SvPV> 218in this case. But that can be quite inefficient because C<PL_na> must 219be accessed in thread-local storage in threaded Perl. In any case, remember 220that Perl allows arbitrary strings of data that may both contain NULs and 221might not be terminated by a C<NUL>. 222 223Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), 224len);>. It might work with your 225compiler, but it won't work for everyone. 226Break this sort of statement up into separate assignments: 227 228 SV *s; 229 STRLEN len; 230 char *ptr; 231 ptr = SvPVbyte(s, len); 232 foo(ptr, len); 233 234=back 235 236If you want to know if the scalar value is TRUE, you can use: 237 238 SvTRUE(SV*) 239 240Although Perl will automatically grow strings for you, if you need to force 241Perl to allocate more memory for your SV, you can use the macro 242 243 SvGROW(SV*, STRLEN newlen) 244 245which will determine if more memory needs to be allocated. If so, it will 246call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 247decrease, the allocated memory of an SV and that it does not automatically 248add space for the trailing C<NUL> byte (perl's own string functions typically do 249C<SvGROW(sv, len + 1)>). 250 251If you want to write to an existing SV's buffer and set its value to a 252string, use SvPVbyte_force() or one of its variants to force the SV to be 253a PV. This will remove any of various types of non-stringness from 254the SV while preserving the content of the SV in the PV. This can be 255used, for example, to append data from an API function to a buffer 256without extra copying: 257 258 (void)SvPVbyte_force(sv, len); 259 s = SvGROW(sv, len + needlen + 1); 260 /* something that modifies up to needlen bytes at s+len, but 261 modifies newlen bytes 262 eg. newlen = read(fd, s + len, needlen); 263 ignoring errors for these examples 264 */ 265 s[len + newlen] = '\0'; 266 SvCUR_set(sv, len + newlen); 267 SvUTF8_off(sv); 268 SvSETMAGIC(sv); 269 270If you already have the data in memory or if you want to keep your 271code simple, you can use one of the sv_cat*() variants, such as 272sv_catpvn(). If you want to insert anywhere in the string you can use 273sv_insert() or sv_insert_flags(). 274 275If you don't need the existing content of the SV, you can avoid some 276copying with: 277 278 SvPVCLEAR(sv); 279 s = SvGROW(sv, needlen + 1); 280 /* something that modifies up to needlen bytes at s, but modifies 281 newlen bytes 282 eg. newlen = read(fd, s, needlen); 283 */ 284 s[newlen] = '\0'; 285 SvCUR_set(sv, newlen); 286 SvPOK_only(sv); /* also clears SVf_UTF8 */ 287 SvSETMAGIC(sv); 288 289Again, if you already have the data in memory or want to avoid the 290complexity of the above, you can use sv_setpvn(). 291 292If you have a buffer allocated with Newx() and want to set that as the 293SV's value, you can use sv_usepvn_flags(). That has some requirements 294if you want to avoid perl re-allocating the buffer to fit the trailing 295NUL: 296 297 Newx(buf, somesize+1, char); 298 /* ... fill in buf ... */ 299 buf[somesize] = '\0'; 300 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 301 /* buf now belongs to perl, don't release it */ 302 303If you have an SV and want to know what kind of data Perl thinks is stored 304in it, you can use the following macros to check the type of SV you have. 305 306 SvIOK(SV*) 307 SvNOK(SV*) 308 SvPOK(SV*) 309 310You can get and set the current length of the string stored in an SV with 311the following macros: 312 313 SvCUR(SV*) 314 SvCUR_set(SV*, I32 val) 315 316You can also get a pointer to the end of the string stored in the SV 317with the macro: 318 319 SvEND(SV*) 320 321But note that these last three macros are valid only if C<SvPOK()> is true. 322 323If you want to append something to the end of string stored in an C<SV*>, 324you can use the following functions: 325 326 void sv_catpv(SV*, const char*); 327 void sv_catpvn(SV*, const char*, STRLEN); 328 void sv_catpvf(SV*, const char*, ...); 329 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 330 I32, bool); 331 void sv_catsv(SV*, SV*); 332 333The first function calculates the length of the string to be appended by 334using C<strlen>. In the second, you specify the length of the string 335yourself. The third function processes its arguments like C<sprintf> and 336appends the formatted output. The fourth function works like C<vsprintf>. 337You can specify the address and length of an array of SVs instead of the 338va_list argument. The fifth function 339extends the string stored in the first 340SV with the string stored in the second SV. It also forces the second SV 341to be interpreted as a string. 342 343The C<sv_cat*()> functions are not generic enough to operate on values that 344have "magic". See L</Magic Virtual Tables> later in this document. 345 346If you know the name of a scalar variable, you can get a pointer to its SV 347by using the following: 348 349 SV* get_sv("package::varname", 0); 350 351This returns NULL if the variable does not exist. 352 353If you want to know if this variable (or any other SV) is actually C<defined>, 354you can call: 355 356 SvOK(SV*) 357 358The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 359 360Its address can be used whenever an C<SV*> is needed. Make sure that 361you don't try to compare a random sv with C<&PL_sv_undef>. For example 362when interfacing Perl code, it'll work correctly for: 363 364 foo(undef); 365 366But won't work when called as: 367 368 $x = undef; 369 foo($x); 370 371So to repeat always use SvOK() to check whether an sv is defined. 372 373Also you have to be careful when using C<&PL_sv_undef> as a value in 374AVs or HVs (see L</AVs, HVs and undefined values>). 375 376There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 377boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 378addresses can be used whenever an C<SV*> is needed. 379 380Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 381Take this code: 382 383 SV* sv = (SV*) 0; 384 if (I-am-to-return-a-real-value) { 385 sv = sv_2mortal(newSViv(42)); 386 } 387 sv_setsv(ST(0), sv); 388 389This code tries to return a new SV (which contains the value 42) if it should 390return a real value, or undef otherwise. Instead it has returned a NULL 391pointer which, somewhere down the line, will cause a segmentation violation, 392bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 393first line and all will be well. 394 395To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 396call is not necessary (see L</Reference Counts and Mortality>). 397 398=head2 Offsets 399 400Perl provides the function C<sv_chop> to efficiently remove characters 401from the beginning of a string; you give it an SV and a pointer to 402somewhere inside the PV, and it discards everything before the 403pointer. The efficiency comes by means of a little hack: instead of 404actually removing the characters, C<sv_chop> sets the flag C<OOK> 405(offset OK) to signal to other functions that the offset hack is in 406effect, and it moves the PV pointer (called C<SvPVX>) forward 407by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 408accordingly. (A portion of the space between the old and new PV 409pointers is used to store the count of chopped bytes.) 410 411Hence, at this point, the start of the buffer that we allocated lives 412at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 413into the middle of this allocated storage. 414 415This is best demonstrated by example. Normally copy-on-write will prevent 416the substitution from operator from using this hack, but if you can craft a 417string for which copy-on-write is not possible, you can see it in play. In 418the current implementation, the final byte of a string buffer is used as a 419copy-on-write reference count. If the buffer is not big enough, then 420copy-on-write is skipped. First have a look at an empty string: 421 422 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' 423 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 424 REFCNT = 1 425 FLAGS = (POK,pPOK) 426 PV = 0x7ffb7bc05b50 ""\0 427 CUR = 0 428 LEN = 10 429 430Notice here the LEN is 10. (It may differ on your platform.) Extend the 431length of the string to one less than 10, and do a substitution: 432 433 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ 434 Dump($a)' 435 SV = PV(0x7ffa04008a70) at 0x7ffa04030390 436 REFCNT = 1 437 FLAGS = (POK,OOK,pPOK) 438 OFFSET = 1 439 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 440 CUR = 8 441 LEN = 9 442 443Here the number of bytes chopped off (1) is shown next as the OFFSET. The 444portion of the string between the "real" and the "fake" beginnings is 445shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 446the fake beginning, not the real one. (The first character of the string 447buffer happens to have changed to "\1" here, not "1", because the current 448implementation stores the offset count in the string buffer. This is 449subject to change.) 450 451Something similar to the offset hack is performed on AVs to enable 452efficient shifting and splicing off the beginning of the array; while 453C<AvARRAY> points to the first element in the array that is visible from 454Perl, C<AvALLOC> points to the real start of the C array. These are 455usually the same, but a C<shift> operation can be carried out by 456increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 457Again, the location of the real start of the C array only comes into 458play when freeing the array. See C<av_shift> in F<av.c>. 459 460=head2 What's Really Stored in an SV? 461 462Recall that the usual method of determining the type of scalar you have is 463to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 464usually these macros will always return TRUE and calling the C<Sv*V> 465macros will do the appropriate conversion of string to integer/double or 466integer/double to string. 467 468If you I<really> need to know if you have an integer, double, or string 469pointer in an SV, you can use the following three macros instead: 470 471 SvIOKp(SV*) 472 SvNOKp(SV*) 473 SvPOKp(SV*) 474 475These will tell you if you truly have an integer, double, or string pointer 476stored in your SV. The "p" stands for private. 477 478There are various ways in which the private and public flags may differ. 479For example, in perl 5.16 and earlier a tied SV may have a valid 480underlying value in the IV slot (so SvIOKp is true), but the data 481should be accessed via the FETCH routine rather than directly, 482so SvIOK is false. (In perl 5.18 onwards, tied scalars use 483the flags the same way as untied scalars.) Another is when 484numeric conversion has occurred and precision has been lost: only the 485private flag is set on 'lossy' values. So when an NV is converted to an 486IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 487 488In general, though, it's best to use the C<Sv*V> macros. 489 490=head2 Working with AVs 491 492There are two ways to create and load an AV. The first method creates an 493empty AV: 494 495 AV* newAV(); 496 497The second method both creates the AV and initially populates it with SVs: 498 499 AV* av_make(SSize_t num, SV **ptr); 500 501The second argument points to an array containing C<num> C<SV*>'s. Once the 502AV has been created, the SVs can be destroyed, if so desired. 503 504Once the AV has been created, the following operations are possible on it: 505 506 void av_push(AV*, SV*); 507 SV* av_pop(AV*); 508 SV* av_shift(AV*); 509 void av_unshift(AV*, SSize_t num); 510 511These should be familiar operations, with the exception of C<av_unshift>. 512This routine adds C<num> elements at the front of the array with the C<undef> 513value. You must then use C<av_store> (described below) to assign values 514to these new elements. 515 516Here are some other functions: 517 518 SSize_t av_top_index(AV*); 519 SV** av_fetch(AV*, SSize_t key, I32 lval); 520 SV** av_store(AV*, SSize_t key, SV* val); 521 522The C<av_top_index> function returns the highest index value in an array (just 523like $#array in Perl). If the array is empty, -1 is returned. The 524C<av_fetch> function returns the value at index C<key>, but if C<lval> 525is non-zero, then C<av_fetch> will store an undef value at that index. 526The C<av_store> function stores the value C<val> at index C<key>, and does 527not increment the reference count of C<val>. Thus the caller is responsible 528for taking care of that, and if C<av_store> returns NULL, the caller will 529have to decrement the reference count to avoid a memory leak. Note that 530C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 531return value. 532 533A few more: 534 535 void av_clear(AV*); 536 void av_undef(AV*); 537 void av_extend(AV*, SSize_t key); 538 539The C<av_clear> function deletes all the elements in the AV* array, but 540does not actually delete the array itself. The C<av_undef> function will 541delete all the elements in the array plus the array itself. The 542C<av_extend> function extends the array so that it contains at least C<key+1> 543elements. If C<key+1> is less than the currently allocated length of the array, 544then nothing is done. 545 546If you know the name of an array variable, you can get a pointer to its AV 547by using the following: 548 549 AV* get_av("package::varname", 0); 550 551This returns NULL if the variable does not exist. 552 553See L</Understanding the Magic of Tied Hashes and Arrays> for more 554information on how to use the array access functions on tied arrays. 555 556=head2 Working with HVs 557 558To create an HV, you use the following routine: 559 560 HV* newHV(); 561 562Once the HV has been created, the following operations are possible on it: 563 564 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 565 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 566 567The C<klen> parameter is the length of the key being passed in (Note that 568you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 569length of the key). The C<val> argument contains the SV pointer to the 570scalar being stored, and C<hash> is the precomputed hash value (zero if 571you want C<hv_store> to calculate it for you). The C<lval> parameter 572indicates whether this fetch is actually a part of a store operation, in 573which case a new undefined value will be added to the HV with the supplied 574key and C<hv_fetch> will return as if the value had already existed. 575 576Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 577C<SV*>. To access the scalar value, you must first dereference the return 578value. However, you should check to make sure that the return value is 579not NULL before dereferencing it. 580 581The first of these two functions checks if a hash table entry exists, and the 582second deletes it. 583 584 bool hv_exists(HV*, const char* key, U32 klen); 585 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 586 587If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 588create and return a mortal copy of the deleted value. 589 590And more miscellaneous functions: 591 592 void hv_clear(HV*); 593 void hv_undef(HV*); 594 595Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 596table but does not actually delete the hash table. The C<hv_undef> deletes 597both the entries and the hash table itself. 598 599Perl keeps the actual data in a linked list of structures with a typedef of HE. 600These contain the actual key and value pointers (plus extra administrative 601overhead). The key is a string pointer; the value is an C<SV*>. However, 602once you have an C<HE*>, to get the actual key and value, use the routines 603specified below. 604 605=for apidoc Ayh||HE 606 607 I32 hv_iterinit(HV*); 608 /* Prepares starting point to traverse hash table */ 609 HE* hv_iternext(HV*); 610 /* Get the next entry, and return a pointer to a 611 structure that has both the key and value */ 612 char* hv_iterkey(HE* entry, I32* retlen); 613 /* Get the key from an HE structure and also return 614 the length of the key string */ 615 SV* hv_iterval(HV*, HE* entry); 616 /* Return an SV pointer to the value of the HE 617 structure */ 618 SV* hv_iternextsv(HV*, char** key, I32* retlen); 619 /* This convenience routine combines hv_iternext, 620 hv_iterkey, and hv_iterval. The key and retlen 621 arguments are return values for the key and its 622 length. The value is returned in the SV* argument */ 623 624If you know the name of a hash variable, you can get a pointer to its HV 625by using the following: 626 627 HV* get_hv("package::varname", 0); 628 629This returns NULL if the variable does not exist. 630 631The hash algorithm is defined in the C<PERL_HASH> macro: 632 633 PERL_HASH(hash, key, klen) 634 635The exact implementation of this macro varies by architecture and version 636of perl, and the return value may change per invocation, so the value 637is only valid for the duration of a single perl process. 638 639See L</Understanding the Magic of Tied Hashes and Arrays> for more 640information on how to use the hash access functions on tied hashes. 641 642=for apidoc_section $HV 643=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen 644 645=head2 Hash API Extensions 646 647Beginning with version 5.004, the following functions are also supported: 648 649 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 650 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 651 652 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 653 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 654 655 SV* hv_iterkeysv (HE* entry); 656 657Note that these functions take C<SV*> keys, which simplifies writing 658of extension code that deals with hash structures. These functions 659also allow passing of C<SV*> keys to C<tie> functions without forcing 660you to stringify the keys (unlike the previous set of functions). 661 662They also return and accept whole hash entries (C<HE*>), making their 663use more efficient (since the hash number for a particular string 664doesn't have to be recomputed every time). See L<perlapi> for detailed 665descriptions. 666 667The following macros must always be used to access the contents of hash 668entries. Note that the arguments to these macros must be simple 669variables, since they may get evaluated more than once. See 670L<perlapi> for detailed descriptions of these macros. 671 672 HePV(HE* he, STRLEN len) 673 HeVAL(HE* he) 674 HeHASH(HE* he) 675 HeSVKEY(HE* he) 676 HeSVKEY_force(HE* he) 677 HeSVKEY_set(HE* he, SV* sv) 678 679These two lower level macros are defined, but must only be used when 680dealing with keys that are not C<SV*>s: 681 682 HeKEY(HE* he) 683 HeKLEN(HE* he) 684 685Note that both C<hv_store> and C<hv_store_ent> do not increment the 686reference count of the stored C<val>, which is the caller's responsibility. 687If these functions return a NULL value, the caller will usually have to 688decrement the reference count of C<val> to avoid a memory leak. 689 690=head2 AVs, HVs and undefined values 691 692Sometimes you have to store undefined values in AVs or HVs. Although 693this may be a rare case, it can be tricky. That's because you're 694used to using C<&PL_sv_undef> if you need an undefined SV. 695 696For example, intuition tells you that this XS code: 697 698 AV *av = newAV(); 699 av_store( av, 0, &PL_sv_undef ); 700 701is equivalent to this Perl code: 702 703 my @av; 704 $av[0] = undef; 705 706Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 707for indicating that an array element has not yet been initialized. 708Thus, C<exists $av[0]> would be true for the above Perl code, but 709false for the array generated by the XS code. In perl 5.20, storing 710&PL_sv_undef will create a read-only element, because the scalar 711&PL_sv_undef itself is stored, not a copy. 712 713Similar problems can occur when storing C<&PL_sv_undef> in HVs: 714 715 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 716 717This will indeed make the value C<undef>, but if you try to modify 718the value of C<key>, you'll get the following error: 719 720 Modification of non-creatable hash value attempted 721 722In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 723in restricted hashes. This caused such hash entries not to appear 724when iterating over the hash or when checking for the keys 725with the C<hv_exists> function. 726 727You can run into similar problems when you store C<&PL_sv_yes> or 728C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 729will give you the following error: 730 731 Modification of a read-only value attempted 732 733To make a long story short, you can use the special variables 734C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 735HVs, but you have to make sure you know what you're doing. 736 737Generally, if you want to store an undefined value in an AV 738or HV, you should not use C<&PL_sv_undef>, but rather create a 739new undefined value using the C<newSV> function, for example: 740 741 av_store( av, 42, newSV(0) ); 742 hv_store( hv, "foo", 3, newSV(0), 0 ); 743 744=head2 References 745 746References are a special type of scalar that point to other data types 747(including other references). 748 749To create a reference, use either of the following functions: 750 751 SV* newRV_inc((SV*) thing); 752 SV* newRV_noinc((SV*) thing); 753 754The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 755functions are identical except that C<newRV_inc> increments the reference 756count of the C<thing>, while C<newRV_noinc> does not. For historical 757reasons, C<newRV> is a synonym for C<newRV_inc>. 758 759Once you have a reference, you can use the following macro to dereference 760the reference: 761 762 SvRV(SV*) 763 764then call the appropriate routines, casting the returned C<SV*> to either an 765C<AV*> or C<HV*>, if required. 766 767To determine if an SV is a reference, you can use the following macro: 768 769 SvROK(SV*) 770 771To discover what type of value the reference refers to, use the following 772macro and then check the return value. 773 774 SvTYPE(SvRV(SV*)) 775 776The most useful types that will be returned are: 777 778 SVt_PVAV Array 779 SVt_PVHV Hash 780 SVt_PVCV Code 781 SVt_PVGV Glob (possibly a file handle) 782 783Any numerical value returned which is less than SVt_PVAV will be a scalar 784of some form. 785 786See L<perlapi/svtype> for more details. 787 788=head2 Blessed References and Class Objects 789 790References are also used to support object-oriented programming. In perl's 791OO lexicon, an object is simply a reference that has been blessed into a 792package (or class). Once blessed, the programmer may now use the reference 793to access the various methods in the class. 794 795A reference can be blessed into a package with the following function: 796 797 SV* sv_bless(SV* sv, HV* stash); 798 799The C<sv> argument must be a reference value. The C<stash> argument 800specifies which class the reference will belong to. See 801L</Stashes and Globs> for information on converting class names into stashes. 802 803/* Still under construction */ 804 805The following function upgrades rv to reference if not already one. 806Creates a new SV for rv to point to. If C<classname> is non-null, the SV 807is blessed into the specified class. SV is returned. 808 809 SV* newSVrv(SV* rv, const char* classname); 810 811The following three functions copy integer, unsigned integer or double 812into an SV whose reference is C<rv>. SV is blessed if C<classname> is 813non-null. 814 815 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 816 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 817 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 818 819The following function copies the pointer value (I<the address, not the 820string!>) into an SV whose reference is rv. SV is blessed if C<classname> 821is non-null. 822 823 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 824 825The following function copies a string into an SV whose reference is C<rv>. 826Set length to 0 to let Perl calculate the string length. SV is blessed if 827C<classname> is non-null. 828 829 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 830 STRLEN length); 831 832The following function tests whether the SV is blessed into the specified 833class. It does not check inheritance relationships. 834 835 int sv_isa(SV* sv, const char* name); 836 837The following function tests whether the SV is a reference to a blessed object. 838 839 int sv_isobject(SV* sv); 840 841The following function tests whether the SV is derived from the specified 842class. SV can be either a reference to a blessed object or a string 843containing a class name. This is the function implementing the 844C<UNIVERSAL::isa> functionality. 845 846 bool sv_derived_from(SV* sv, const char* name); 847 848To check if you've got an object derived from a specific class you have 849to write: 850 851 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 852 853=head2 Creating New Variables 854 855To create a new Perl variable with an undef value which can be accessed from 856your Perl script, use the following routines, depending on the variable type. 857 858 SV* get_sv("package::varname", GV_ADD); 859 AV* get_av("package::varname", GV_ADD); 860 HV* get_hv("package::varname", GV_ADD); 861 862Notice the use of GV_ADD as the second parameter. The new variable can now 863be set, using the routines appropriate to the data type. 864 865There are additional macros whose values may be bitwise OR'ed with the 866C<GV_ADD> argument to enable certain extra features. Those bits are: 867 868=over 869 870=item GV_ADDMULTI 871 872Marks the variable as multiply defined, thus preventing the: 873 874 Name <varname> used only once: possible typo 875 876warning. 877 878=item GV_ADDWARN 879 880Issues the warning: 881 882 Had to create <varname> unexpectedly 883 884if the variable did not exist before the function was called. 885 886=back 887 888If you do not specify a package name, the variable is created in the current 889package. 890 891=head2 Reference Counts and Mortality 892 893Perl uses a reference count-driven garbage collection mechanism. SVs, 894AVs, or HVs (xV for short in the following) start their life with a 895reference count of 1. If the reference count of an xV ever drops to 0, 896then it will be destroyed and its memory made available for reuse. 897At the most basic internal level, reference counts can be manipulated 898with the following macros: 899 900 int SvREFCNT(SV* sv); 901 SV* SvREFCNT_inc(SV* sv); 902 void SvREFCNT_dec(SV* sv); 903 904(There are also suffixed versions of the increment and decrement macros, 905for situations where the full generality of these basic macros can be 906exchanged for some performance.) 907 908However, the way a programmer should think about references is not so 909much in terms of the bare reference count, but in terms of I<ownership> 910of references. A reference to an xV can be owned by any of a variety 911of entities: another xV, the Perl interpreter, an XS data structure, 912a piece of running code, or a dynamic scope. An xV generally does not 913know what entities own the references to it; it only knows how many 914references there are, which is the reference count. 915 916To correctly maintain reference counts, it is essential to keep track 917of what references the XS code is manipulating. The programmer should 918always know where a reference has come from and who owns it, and be 919aware of any creation or destruction of references, and any transfers 920of ownership. Because ownership isn't represented explicitly in the xV 921data structures, only the reference count need be actually maintained 922by the code, and that means that this understanding of ownership is not 923actually evident in the code. For example, transferring ownership of a 924reference from one owner to another doesn't change the reference count 925at all, so may be achieved with no actual code. (The transferring code 926doesn't touch the referenced object, but does need to ensure that the 927former owner knows that it no longer owns the reference, and that the 928new owner knows that it now does.) 929 930An xV that is visible at the Perl level should not become unreferenced 931and thus be destroyed. Normally, an object will only become unreferenced 932when it is no longer visible, often by the same means that makes it 933invisible. For example, a Perl reference value (RV) owns a reference to 934its referent, so if the RV is overwritten that reference gets destroyed, 935and the no-longer-reachable referent may be destroyed as a result. 936 937Many functions have some kind of reference manipulation as 938part of their purpose. Sometimes this is documented in terms 939of ownership of references, and sometimes it is (less helpfully) 940documented in terms of changes to reference counts. For example, the 941L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV 942(with reference count 1) and increment the reference count of the referent 943that was supplied by the caller. This is best understood as creating 944a new reference to the referent, which is owned by the created RV, 945and returning to the caller ownership of the sole reference to the RV. 946The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not 947increment the reference count of the referent, but the RV nevertheless 948ends up owning a reference to the referent. It is therefore implied 949that the caller of C<newRV_noinc()> is relinquishing a reference to the 950referent, making this conceptually a more complicated operation even 951though it does less to the data structures. 952 953For example, imagine you want to return a reference from an XSUB 954function. Inside the XSUB routine, you create an SV which initially 955has just a single reference, owned by the XSUB routine. This reference 956needs to be disposed of before the routine is complete, otherwise it 957will leak, preventing the SV from ever being destroyed. So to create 958an RV referencing the SV, it is most convenient to pass the SV to 959C<newRV_noinc()>, which consumes that reference. Now the XSUB routine 960no longer owns a reference to the SV, but does own a reference to the RV, 961which in turn owns a reference to the SV. The ownership of the reference 962to the RV is then transferred by the process of returning the RV from 963the XSUB. 964 965There are some convenience functions available that can help with the 966destruction of xVs. These functions introduce the concept of "mortality". 967Much documentation speaks of an xV itself being mortal, but this is 968misleading. It is really I<a reference to> an xV that is mortal, and it 969is possible for there to be more than one mortal reference to a single xV. 970For a reference to be mortal means that it is owned by the temps stack, 971one of perl's many internal stacks, which will destroy that reference 972"a short time later". Usually the "short time later" is the end of 973the current Perl statement. However, it gets more complicated around 974dynamic scopes: there can be multiple sets of mortal references hanging 975around at the same time, with different death dates. Internally, the 976actual determinant for when mortal xV references are destroyed depends 977on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> 978and L</Temporaries Stack> below for more details on these macros. 979 980Mortal references are mainly used for xVs that are placed on perl's 981main stack. The stack is problematic for reference tracking, because it 982contains a lot of xV references, but doesn't own those references: they 983are not counted. Currently, there are many bugs resulting from xVs being 984destroyed while referenced by the stack, because the stack's uncounted 985references aren't enough to keep the xVs alive. So when putting an 986(uncounted) reference on the stack, it is vitally important to ensure that 987there will be a counted reference to the same xV that will last at least 988as long as the uncounted reference. But it's also important that that 989counted reference be cleaned up at an appropriate time, and not unduly 990prolong the xV's life. For there to be a mortal reference is often the 991best way to satisfy this requirement, especially if the xV was created 992especially to be put on the stack and would otherwise be unreferenced. 993 994To create a mortal reference, use the functions: 995 996 SV* sv_newmortal() 997 SV* sv_mortalcopy(SV*) 998 SV* sv_2mortal(SV*) 999 1000C<sv_newmortal()> creates an SV (with the undefined value) whose sole 1001reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a 1002copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> 1003mortalises an existing xV reference: it transfers ownership of a reference 1004from the caller to the temps stack. Because C<sv_newmortal> gives the new 1005SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, 1006etc. : 1007 1008 SV *tmp = sv_newmortal(); 1009 sv_setiv(tmp, an_integer); 1010 1011As that is multiple C statements it is quite common so see this idiom instead: 1012 1013 SV *tmp = sv_2mortal(newSViv(an_integer)); 1014 1015The mortal routines are not just for SVs; AVs and HVs can be 1016made mortal by passing their address (type-casted to C<SV*>) to the 1017C<sv_2mortal> or C<sv_mortalcopy> routines. 1018 1019=head2 Stashes and Globs 1020 1021A B<stash> is a hash that contains all variables that are defined 1022within a package. Each key of the stash is a symbol 1023name (shared by all the different types of objects that have the same 1024name), and each value in the hash table is a GV (Glob Value). This GV 1025in turn contains references to the various objects of that name, 1026including (but not limited to) the following: 1027 1028 Scalar Value 1029 Array Value 1030 Hash Value 1031 I/O Handle 1032 Format 1033 Subroutine 1034 1035There is a single stash called C<PL_defstash> that holds the items that exist 1036in the C<main> package. To get at the items in other packages, append the 1037string "::" to the package name. The items in the C<Foo> package are in 1038the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 1039in the stash C<Baz::> in C<Bar::>'s stash. 1040 1041To get the stash pointer for a particular package, use the function: 1042 1043 HV* gv_stashpv(const char* name, I32 flags) 1044 HV* gv_stashsv(SV*, I32 flags) 1045 1046The first function takes a literal string, the second uses the string stored 1047in the SV. Remember that a stash is just a hash table, so you get back an 1048C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 1049 1050The name that C<gv_stash*v> wants is the name of the package whose symbol table 1051you want. The default package is called C<main>. If you have multiply nested 1052packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 1053language itself. 1054 1055Alternately, if you have an SV that is a blessed reference, you can find 1056out the stash pointer by using: 1057 1058 HV* SvSTASH(SvRV(SV*)); 1059 1060then use the following to get the package name itself: 1061 1062 char* HvNAME(HV* stash); 1063 1064If you need to bless or re-bless an object you can use the following 1065function: 1066 1067 SV* sv_bless(SV*, HV* stash) 1068 1069where the first argument, an C<SV*>, must be a reference, and the second 1070argument is a stash. The returned C<SV*> can now be used in the same way 1071as any other SV. 1072 1073For more information on references and blessings, consult L<perlref>. 1074 1075=head2 I/O Handles 1076 1077Like AVs and HVs, IO objects are another type of non-scalar SV which 1078may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> 1079from opendir(). 1080 1081You can create a new IO object: 1082 1083 IO* newIO(); 1084 1085Unlike other SVs, a new IO object is automatically blessed into the 1086L<IO::File> class. 1087 1088The IO object contains an input and output PerlIO handle: 1089 1090 PerlIO *IoIFP(IO *io); 1091 PerlIO *IoOFP(IO *io); 1092 1093Typically if the IO object has been opened on a file, the input handle 1094is always present, but the output handle is only present if the file 1095is open for output. For a file, if both are present they will be the 1096same PerlIO object. 1097 1098Distinct input and output PerlIO objects are created for sockets and 1099character devices. 1100 1101The IO object also contains other data associated with Perl I/O 1102handles: 1103 1104 IV IoLINES(io); /* $. */ 1105 IV IoPAGE(io); /* $% */ 1106 IV IoPAGE_LEN(io); /* $= */ 1107 IV IoLINES_LEFT(io); /* $- */ 1108 char *IoTOP_NAME(io); /* $^ */ 1109 GV *IoTOP_GV(io); /* $^ */ 1110 char *IoFMT_NAME(io); /* $~ */ 1111 GV *IoFMT_GV(io); /* $~ */ 1112 char *IoBOTTOM_NAME(io); 1113 GV *IoBOTTOM_GV(io); 1114 char IoTYPE(io); 1115 U8 IoFLAGS(io); 1116 1117Most of these are involved with L<formats|perlform>. 1118 1119IoFLAGs() may contain a combination of flags, the most interesting of 1120which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, 1121settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. 1122 1123The IO object may also contains a directory handle: 1124 1125 DIR *IoDIRP(io); 1126 1127suitable for use with PerlDir_read() etc. 1128 1129All of these accessors macros are lvalues, there are no distinct 1130C<_set()> macros to modify the members of the IO object. 1131 1132=head2 Double-Typed SVs 1133 1134Scalar variables normally contain only one type of value, an integer, 1135double, pointer, or reference. Perl will automatically convert the 1136actual scalar data from the stored type into the requested type. 1137 1138Some scalar variables contain more than one type of scalar data. For 1139example, the variable C<$!> contains either the numeric value of C<errno> 1140or its string equivalent from either C<strerror> or C<sys_errlist[]>. 1141 1142To force multiple data values into an SV, you must do two things: use the 1143C<sv_set*v> routines to add the additional scalar type, then set a flag 1144so that Perl will believe it contains more than one type of data. The 1145four macros to set the flags are: 1146 1147 SvIOK_on 1148 SvNOK_on 1149 SvPOK_on 1150 SvROK_on 1151 1152The particular macro you must use depends on which C<sv_set*v> routine 1153you called first. This is because every C<sv_set*v> routine turns on 1154only the bit for the particular type of data being set, and turns off 1155all the rest. 1156 1157For example, to create a new Perl variable called "dberror" that contains 1158both the numeric and descriptive string error values, you could use the 1159following code: 1160 1161 extern int dberror; 1162 extern char *dberror_list; 1163 1164 SV* sv = get_sv("dberror", GV_ADD); 1165 sv_setiv(sv, (IV) dberror); 1166 sv_setpv(sv, dberror_list[dberror]); 1167 SvIOK_on(sv); 1168 1169If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 1170macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 1171 1172=head2 Read-Only Values 1173 1174In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 1175flag bit with read-only scalars. So the only way to test whether 1176C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 1177in those versions is: 1178 1179 SvREADONLY(sv) && !SvIsCOW(sv) 1180 1181Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 1182and, under 5.20, copy-on-write scalars can also be read-only, so the above 1183check is incorrect. You just want: 1184 1185 SvREADONLY(sv) 1186 1187If you need to do this check often, define your own macro like this: 1188 1189 #if PERL_VERSION >= 18 1190 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 1191 #else 1192 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 1193 #endif 1194 1195=head2 Copy on Write 1196 1197Perl implements a copy-on-write (COW) mechanism for scalars, in which 1198string copies are not immediately made when requested, but are deferred 1199until made necessary by one or the other scalar changing. This is mostly 1200transparent, but one must take care not to modify string buffers that are 1201shared by multiple SVs. 1202 1203You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 1204 1205You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 1206 1207If you want to make the SV drop its string buffer, use 1208C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1209C<sv_setsv(sv, NULL)>. 1210 1211All of these functions will croak on read-only scalars (see the previous 1212section for more on those). 1213 1214To test that your code is behaving correctly and not modifying COW buffers, 1215on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1216C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1217into crashes. You will find it to be marvellously slow, so you may want to 1218skip perl's own tests. 1219 1220=head2 Magic Variables 1221 1222[This section still under construction. Ignore everything here. Post no 1223bills. Everything not permitted is forbidden.] 1224 1225Any SV may be magical, that is, it has special features that a normal 1226SV does not have. These features are stored in the SV structure in a 1227linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1228 1229 struct magic { 1230 MAGIC* mg_moremagic; 1231 MGVTBL* mg_virtual; 1232 U16 mg_private; 1233 char mg_type; 1234 U8 mg_flags; 1235 I32 mg_len; 1236 SV* mg_obj; 1237 char* mg_ptr; 1238 }; 1239 1240Note this is current as of patchlevel 0, and could change at any time. 1241 1242=head2 Assigning Magic 1243 1244Perl adds magic to an SV using the sv_magic function: 1245 1246 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1247 1248The C<sv> argument is a pointer to the SV that is to acquire a new magical 1249feature. 1250 1251If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1252convert C<sv> to type C<SVt_PVMG>. 1253Perl then continues by adding new magic 1254to the beginning of the linked list of magical features. Any prior entry 1255of the same type of magic is deleted. Note that this can be overridden, 1256and multiple instances of the same type of magic can be associated with an 1257SV. 1258 1259The C<name> and C<namlen> arguments are used to associate a string with 1260the magic, typically the name of a variable. C<namlen> is stored in the 1261C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1262C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1263whether C<namlen> is greater than zero or equal to zero respectively. As a 1264special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1265to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1266 1267The sv_magic function uses C<how> to determine which, if any, predefined 1268"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1269See the L</Magic Virtual Tables> section below. The C<how> argument is also 1270stored in the C<mg_type> field. The value of 1271C<how> should be chosen from the set of macros 1272C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1273these macros were added, Perl internals used to directly use character 1274literals, so you may occasionally come across old code or documentation 1275referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1276 1277The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1278structure. If it is not the same as the C<sv> argument, the reference 1279count of the C<obj> object is incremented. If it is the same, or if 1280the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, 1281C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely 1282stored, without the reference count being incremented. 1283 1284See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1285to an SV. 1286 1287There is also a function to add magic to an C<HV>: 1288 1289 void hv_magic(HV *hv, GV *gv, int how); 1290 1291This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1292 1293To remove the magic from an SV, call the function sv_unmagic: 1294 1295 int sv_unmagic(SV *sv, int type); 1296 1297The C<type> argument should be equal to the C<how> value when the C<SV> 1298was initially made magical. 1299 1300However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1301C<SV>. If you want to remove only certain 1302magic of a C<type> based on the magic 1303virtual table, use C<sv_unmagicext> instead: 1304 1305 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1306 1307=head2 Magic Virtual Tables 1308 1309The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1310C<MGVTBL>, which is a structure of function pointers and stands for 1311"Magic Virtual Table" to handle the various operations that might be 1312applied to that variable. 1313 1314=for apidoc Ayh||MGVTBL 1315 1316The C<MGVTBL> has five (or sometimes eight) pointers to the following 1317routine types: 1318 1319 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); 1320 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); 1321 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); 1322 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); 1323 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); 1324 1325 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, 1326 const char *name, I32 namlen); 1327 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); 1328 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); 1329 1330 1331This MGVTBL structure is set at compile-time in F<perl.h> and there are 1332currently 32 types. These different structures contain pointers to various 1333routines that perform additional actions depending on which function is 1334being called. 1335 1336 Function pointer Action taken 1337 ---------------- ------------ 1338 svt_get Do something before the value of the SV is 1339 retrieved. 1340 svt_set Do something after the SV is assigned a value. 1341 svt_len Report on the SV's length. 1342 svt_clear Clear something the SV represents. 1343 svt_free Free any extra storage associated with the SV. 1344 1345 svt_copy copy tied variable magic to a tied element 1346 svt_dup duplicate a magic structure during thread cloning 1347 svt_local copy magic to local value during 'local' 1348 1349For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1350to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1351 1352 { magic_get, magic_set, magic_len, 0, 0 } 1353 1354Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1355if a get operation is being performed, the routine C<magic_get> is 1356called. All the various routines for the various magical types begin 1357with C<magic_>. NOTE: the magic routines are not considered part of 1358the Perl API, and may not be exported by the Perl library. 1359 1360The last three slots are a recent addition, and for source code 1361compatibility they are only checked for if one of the three flags 1362MGf_COPY, MGf_DUP or MGf_LOCAL is set in mg_flags. 1363This means that most code can continue declaring 1364a vtable as a 5-element value. These three are 1365currently used exclusively by the threading code, and are highly subject 1366to change. 1367 1368The current kinds of Magic Virtual Tables are: 1369 1370=for comment 1371This table is generated by regen/mg_vtable.pl. Any changes made here 1372will be lost. 1373 1374=for mg_vtable.pl begin 1375 1376 mg_type 1377 (old-style char and macro) MGVTBL Type of magic 1378 -------------------------- ------ ------------- 1379 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1380 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1381 % PERL_MAGIC_rhash (none) Extra data for restricted 1382 hashes 1383 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace 1384 vars 1385 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1386 : PERL_MAGIC_symtab (none) Extra data for symbol 1387 tables 1388 < PERL_MAGIC_backref vtbl_backref For weak ref data 1389 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV 1390 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1391 (fast string search) 1392 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1393 (AMT) on stash 1394 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1395 (@+ and @- vars) 1396 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1397 element 1398 E PERL_MAGIC_env vtbl_env %ENV hash 1399 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1400 f PERL_MAGIC_fm vtbl_regexp Formline 1401 ('compiled' format) 1402 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1403 H PERL_MAGIC_hints vtbl_hints %^H hash 1404 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1405 I PERL_MAGIC_isa vtbl_isa @ISA array 1406 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1407 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1408 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1409 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1410 element 1411 N PERL_MAGIC_shared (none) Shared between threads 1412 n PERL_MAGIC_shared_scalar (none) Shared between threads 1413 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1414 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1415 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1416 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1417 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex 1418 S PERL_MAGIC_sig vtbl_sig %SIG hash 1419 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1420 t PERL_MAGIC_taint vtbl_taint Taintedness 1421 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1422 extensions 1423 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1424 extensions 1425 V PERL_MAGIC_vstring (none) SV was vstring literal 1426 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1427 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1428 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1429 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not 1430 exist 1431 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1432 variable / smart parameter 1433 vivification 1434 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference 1435 constructor 1436 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call 1437 to this CV 1438 ~ PERL_MAGIC_ext (none) Available for use by 1439 extensions 1440 1441 1442=for apidoc AmnhU||PERL_MAGIC_arylen 1443=for apidoc_item ||PERL_MAGIC_arylen_p 1444=for apidoc_item ||PERL_MAGIC_backref 1445=for apidoc_item ||PERL_MAGIC_bm 1446=for apidoc_item ||PERL_MAGIC_checkcall 1447=for apidoc_item ||PERL_MAGIC_collxfrm 1448=for apidoc_item ||PERL_MAGIC_dbfile 1449=for apidoc_item ||PERL_MAGIC_dbline 1450=for apidoc_item ||PERL_MAGIC_debugvar 1451=for apidoc_item ||PERL_MAGIC_defelem 1452=for apidoc_item ||PERL_MAGIC_env 1453=for apidoc_item ||PERL_MAGIC_envelem 1454=for apidoc_item ||PERL_MAGIC_ext 1455=for apidoc_item ||PERL_MAGIC_fm 1456=for apidoc_item ||PERL_MAGIC_hints 1457=for apidoc_item ||PERL_MAGIC_hintselem 1458=for apidoc_item ||PERL_MAGIC_isa 1459=for apidoc_item ||PERL_MAGIC_isaelem 1460=for apidoc_item ||PERL_MAGIC_lvref 1461=for apidoc_item ||PERL_MAGIC_nkeys 1462=for apidoc_item ||PERL_MAGIC_nonelem 1463=for apidoc_item ||PERL_MAGIC_overload_table 1464=for apidoc_item ||PERL_MAGIC_pos 1465=for apidoc_item ||PERL_MAGIC_qr 1466=for apidoc_item ||PERL_MAGIC_regdata 1467=for apidoc_item ||PERL_MAGIC_regdatum 1468=for apidoc_item ||PERL_MAGIC_regex_global 1469=for apidoc_item ||PERL_MAGIC_rhash 1470=for apidoc_item ||PERL_MAGIC_shared 1471=for apidoc_item ||PERL_MAGIC_shared_scalar 1472=for apidoc_item ||PERL_MAGIC_sig 1473=for apidoc_item ||PERL_MAGIC_sigelem 1474=for apidoc_item ||PERL_MAGIC_substr 1475=for apidoc_item ||PERL_MAGIC_sv 1476=for apidoc_item ||PERL_MAGIC_symtab 1477=for apidoc_item ||PERL_MAGIC_taint 1478=for apidoc_item ||PERL_MAGIC_tied 1479=for apidoc_item ||PERL_MAGIC_tiedelem 1480=for apidoc_item ||PERL_MAGIC_tiedscalar 1481=for apidoc_item ||PERL_MAGIC_utf8 1482=for apidoc_item ||PERL_MAGIC_uvar 1483=for apidoc_item ||PERL_MAGIC_uvar_elem 1484=for apidoc_item ||PERL_MAGIC_vec 1485=for apidoc_item ||PERL_MAGIC_vstring 1486 1487=for mg_vtable.pl end 1488 1489When an uppercase and lowercase letter both exist in the table, then the 1490uppercase letter is typically used to represent some kind of composite type 1491(a list or a hash), and the lowercase letter is used to represent an element 1492of that composite type. Some internals code makes use of this case 1493relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1494 1495The C<PERL_MAGIC_ext> and C<PERL_MAGIC_uvar> magic types are defined 1496specifically for use by extensions and will not be used by perl itself. 1497Extensions can use C<PERL_MAGIC_ext> magic to 'attach' private information 1498to variables (typically objects). This is especially useful because 1499there is no way for normal perl code to corrupt this private information 1500(unlike using extra elements of a hash object). 1501 1502Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1503C function any time a scalar's value is used or changed. The C<MAGIC>'s 1504C<mg_ptr> field points to a C<ufuncs> structure: 1505 1506 struct ufuncs { 1507 I32 (*uf_val)(pTHX_ IV, SV*); 1508 I32 (*uf_set)(pTHX_ IV, SV*); 1509 IV uf_index; 1510 }; 1511 1512When the SV is read from or written to, the C<uf_val> or C<uf_set> 1513function will be called with C<uf_index> as the first arg and a pointer to 1514the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1515magic is shown below. Note that the ufuncs structure is copied by 1516sv_magic, so you can safely allocate it on the stack. 1517 1518 void 1519 Umagic(sv) 1520 SV *sv; 1521 PREINIT: 1522 struct ufuncs uf; 1523 CODE: 1524 uf.uf_val = &my_get_fn; 1525 uf.uf_set = &my_set_fn; 1526 uf.uf_index = 0; 1527 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1528 1529Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1530 1531For hashes there is a specialized hook that gives control over hash 1532keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1533if the "set" function in the C<ufuncs> structure is NULL. The hook 1534is activated whenever the hash is accessed with a key specified as 1535an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1536C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1537through the functions without the C<..._ent> suffix circumvents the 1538hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1539 1540Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1541or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1542extra care to avoid conflict. Typically only using the magic on 1543objects blessed into the same class as the extension is sufficient. 1544For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1545C<MGVTBL>, even if all its fields will be C<0>, so that individual 1546C<MAGIC> pointers can be identified as a particular kind of magic 1547using their magic virtual table. C<mg_findext> provides an easy way 1548to do that: 1549 1550 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1551 1552 MAGIC *mg; 1553 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1554 /* this is really ours, not another module's PERL_MAGIC_ext */ 1555 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1556 ... 1557 } 1558 1559Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1560earlier do B<not> invoke 'set' magic on their targets. This must 1561be done by the user either by calling the C<SvSETMAGIC()> macro after 1562calling these functions, or by using one of the C<sv_set*_mg()> or 1563C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1564C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1565obtained from external sources in functions that don't handle magic. 1566See L<perlapi> for a description of these functions. 1567For example, calls to the C<sv_cat*()> functions typically need to be 1568followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1569since their implementation handles 'get' magic. 1570 1571=head2 Finding Magic 1572 1573 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1574 * type */ 1575 1576This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1577If the SV does not have that magical 1578feature, C<NULL> is returned. If the 1579SV has multiple instances of that magical feature, the first one will be 1580returned. C<mg_findext> can be used 1581to find a C<MAGIC> structure of an SV 1582based on both its magic type and its magic virtual table: 1583 1584 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1585 1586Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1587SVt_PVMG, Perl may core dump. 1588 1589 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1590 1591This routine checks to see what types of magic C<sv> has. If the mg_type 1592field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1593the mg_type field is changed to be the lowercase letter. 1594 1595=head2 Understanding the Magic of Tied Hashes and Arrays 1596 1597Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1598magic type. 1599 1600WARNING: As of the 5.004 release, proper usage of the array and hash 1601access functions requires understanding a few caveats. Some 1602of these caveats are actually considered bugs in the API, to be fixed 1603in later releases, and are bracketed with [MAYCHANGE] below. If 1604you find yourself actually applying such information in this section, be 1605aware that the behavior may change in the future, umm, without warning. 1606 1607The perl tie function associates a variable with an object that implements 1608the various GET, SET, etc methods. To perform the equivalent of the perl 1609tie function from an XSUB, you must mimic this behaviour. The code below 1610carries out the necessary steps -- firstly it creates a new hash, and then 1611creates a second hash which it blesses into the class which will implement 1612the tie methods. Lastly it ties the two hashes together, and returns a 1613reference to the new tied hash. Note that the code below does NOT call the 1614TIEHASH method in the MyTie class - 1615see L</Calling Perl Routines from within C Programs> for details on how 1616to do this. 1617 1618 SV* 1619 mytie() 1620 PREINIT: 1621 HV *hash; 1622 HV *stash; 1623 SV *tie; 1624 CODE: 1625 hash = newHV(); 1626 tie = newRV_noinc((SV*)newHV()); 1627 stash = gv_stashpv("MyTie", GV_ADD); 1628 sv_bless(tie, stash); 1629 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1630 RETVAL = newRV_noinc(hash); 1631 OUTPUT: 1632 RETVAL 1633 1634The C<av_store> function, when given a tied array argument, merely 1635copies the magic of the array onto the value to be "stored", using 1636C<mg_copy>. It may also return NULL, indicating that the value did not 1637actually need to be stored in the array. [MAYCHANGE] After a call to 1638C<av_store> on a tied array, the caller will usually need to call 1639C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1640TIEARRAY object. If C<av_store> did return NULL, a call to 1641C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1642leak. [/MAYCHANGE] 1643 1644The previous paragraph is applicable verbatim to tied hash access using the 1645C<hv_store> and C<hv_store_ent> functions as well. 1646 1647C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1648C<hv_fetch_ent> actually return an undefined mortal value whose magic 1649has been initialized using C<mg_copy>. Note the value so returned does not 1650need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1651need to call C<mg_get()> on the returned value in order to actually invoke 1652the perl level "FETCH" method on the underlying TIE object. Similarly, 1653you may also call C<mg_set()> on the return value after possibly assigning 1654a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1655method on the TIE object. [/MAYCHANGE] 1656 1657[MAYCHANGE] 1658In other words, the array or hash fetch/store functions don't really 1659fetch and store actual values in the case of tied arrays and hashes. They 1660merely call C<mg_copy> to attach magic to the values that were meant to be 1661"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1662do the job of invoking the TIE methods on the underlying objects. Thus 1663the magic mechanism currently implements a kind of lazy access to arrays 1664and hashes. 1665 1666Currently (as of perl version 5.004), use of the hash and array access 1667functions requires the user to be aware of whether they are operating on 1668"normal" hashes and arrays, or on their tied variants. The API may be 1669changed to provide more transparent access to both tied and normal data 1670types in future versions. 1671[/MAYCHANGE] 1672 1673You would do well to understand that the TIEARRAY and TIEHASH interfaces 1674are mere sugar to invoke some perl method calls while using the uniform hash 1675and array syntax. The use of this sugar imposes some overhead (typically 1676about two to four extra opcodes per FETCH/STORE operation, in addition to 1677the creation of all the mortal variables required to invoke the methods). 1678This overhead will be comparatively small if the TIE methods are themselves 1679substantial, but if they are only a few statements long, the overhead 1680will not be insignificant. 1681 1682=head2 Localizing changes 1683 1684Perl has a very handy construction 1685 1686 { 1687 local $var = 2; 1688 ... 1689 } 1690 1691This construction is I<approximately> equivalent to 1692 1693 { 1694 my $oldvar = $var; 1695 $var = 2; 1696 ... 1697 $var = $oldvar; 1698 } 1699 1700The biggest difference is that the first construction would 1701reinstate the initial value of $var, irrespective of how control exits 1702the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1703more efficient as well. 1704 1705There is a way to achieve a similar task from C via Perl API: create a 1706I<pseudo-block>, and arrange for some changes to be automatically 1707undone at the end of it, either explicit, or via a non-local exit (via 1708die()). A I<block>-like construct is created by a pair of 1709C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1710Such a construct may be created specially for some important localized 1711task, or an existing one (like boundaries of enclosing Perl 1712subroutine/block, or an existing pair for freeing TMPs) may be 1713used. (In the second case the overhead of additional localization must 1714be almost negligible.) Note that any XSUB is automatically enclosed in 1715an C<ENTER>/C<LEAVE> pair. 1716 1717Inside such a I<pseudo-block> the following service is available: 1718 1719=over 4 1720 1721=item C<SAVEINT(int i)> 1722 1723=item C<SAVEIV(IV i)> 1724 1725=item C<SAVEI32(I32 i)> 1726 1727=item C<SAVELONG(long i)> 1728 1729=item C<SAVEI8(I8 i)> 1730 1731=item C<SAVEI16(I16 i)> 1732 1733=item C<SAVEBOOL(int i)> 1734 1735=item C<SAVESTRLEN(STRLEN i)> 1736 1737These macros arrange things to restore the value of integer variable 1738C<i> at the end of the enclosing I<pseudo-block>. 1739 1740=for apidoc_section $stack 1741=for apidoc Amh||SAVEINT|int i 1742=for apidoc Amh||SAVEIV|IV i 1743=for apidoc Amh||SAVEI32|I32 i 1744=for apidoc Amh||SAVELONG|long i 1745=for apidoc Amh||SAVEI8|I8 i 1746=for apidoc Amh||SAVEI16|I16 i 1747=for apidoc Amh||SAVEBOOL|bool i 1748=for apidoc Amh||SAVESTRLEN|STRLEN i 1749 1750=item C<SAVESPTR(s)> 1751 1752=item C<SAVEPPTR(p)> 1753 1754These macros arrange things to restore the value of pointers C<s> and 1755C<p>. C<s> must be a pointer of a type which survives conversion to 1756C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1757and back. 1758 1759=for apidoc Amh||SAVESPTR|SV * s 1760=for apidoc Amh||SAVEPPTR|char * p 1761 1762=item C<SAVEFREESV(SV *sv)> 1763 1764The refcount of C<sv> will be decremented at the end of 1765I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1766mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1767extends the lifetime of C<sv> until the beginning of the next statement, 1768C<SAVEFREESV> extends it until the end of the enclosing scope. These 1769lifetimes can be wildly different. 1770 1771Also compare C<SAVEMORTALIZESV>. 1772 1773=for apidoc Amh||SAVEFREESV|SV* sv 1774 1775=item C<SAVEMORTALIZESV(SV *sv)> 1776 1777Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1778scope instead of decrementing its reference count. This usually has the 1779effect of keeping C<sv> alive until the statement that called the currently 1780live scope has finished executing. 1781 1782=for apidoc Amh||SAVEMORTALIZESV|SV* sv 1783 1784=item C<SAVEFREEOP(OP *op)> 1785 1786The C<OP *> is op_free()ed at the end of I<pseudo-block>. 1787 1788=for apidoc Amh||SAVEFREEOP|OP *op 1789 1790=item C<SAVEFREEPV(p)> 1791 1792The chunk of memory which is pointed to by C<p> is Safefree()ed at the 1793end of I<pseudo-block>. 1794 1795=for apidoc Amh||SAVEFREEPV|void * p 1796 1797=item C<SAVECLEARSV(SV *sv)> 1798 1799Clears a slot in the current scratchpad which corresponds to C<sv> at 1800the end of I<pseudo-block>. 1801 1802=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1803 1804The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1805string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1806short-lived storage, the corresponding string may be reallocated like 1807this: 1808 1809 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1810 1811=for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length 1812 1813=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1814 1815At the end of I<pseudo-block> the function C<f> is called with the 1816only argument C<p>. 1817 1818=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t 1819=for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p 1820 1821=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1822 1823At the end of I<pseudo-block> the function C<f> is called with the 1824implicit context argument (if any), and C<p>. 1825 1826=for apidoc Ayh||DESTRUCTORFUNC_t 1827=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p 1828 1829=item C<SAVESTACK_POS()> 1830 1831The current offset on the Perl internal stack (cf. C<SP>) is restored 1832at the end of I<pseudo-block>. 1833 1834=for apidoc Amh||SAVESTACK_POS 1835 1836=back 1837 1838The following API list contains functions, thus one needs to 1839provide pointers to the modifiable data explicitly (either C pointers, 1840or Perlish C<GV *>s). Where the above macros take C<int>, a similar 1841function takes C<int *>. 1842 1843Other macros above have functions implementing them, but its probably 1844best to just use the macro, and not those or the ones below. 1845 1846=over 4 1847 1848=item C<SV* save_scalar(GV *gv)> 1849 1850=for apidoc save_scalar 1851 1852Equivalent to Perl code C<local $gv>. 1853 1854=item C<AV* save_ary(GV *gv)> 1855 1856=for apidoc save_ary 1857 1858=item C<HV* save_hash(GV *gv)> 1859 1860=for apidoc save_hash 1861 1862Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 1863 1864=item C<void save_item(SV *item)> 1865 1866=for apidoc save_item 1867 1868Duplicates the current value of C<SV>. On the exit from the current 1869C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored 1870using the stored value. It doesn't handle magic. Use C<save_scalar> if 1871magic is affected. 1872 1873=item C<void save_list(SV **sarg, I32 maxsarg)> 1874 1875=for apidoc save_list 1876 1877A variant of C<save_item> which takes multiple arguments via an array 1878C<sarg> of C<SV*> of length C<maxsarg>. 1879 1880=item C<SV* save_svref(SV **sptr)> 1881 1882=for apidoc save_svref 1883 1884Similar to C<save_scalar>, but will reinstate an C<SV *>. 1885 1886=item C<void save_aptr(AV **aptr)> 1887 1888=item C<void save_hptr(HV **hptr)> 1889 1890=for apidoc save_aptr 1891=for apidoc save_hptr 1892 1893Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 1894 1895=back 1896 1897The C<Alias> module implements localization of the basic types within the 1898I<caller's scope>. People who are interested in how to localize things in 1899the containing scope should take a look there too. 1900 1901=head1 Subroutines 1902 1903=head2 XSUBs and the Argument Stack 1904 1905The XSUB mechanism is a simple way for Perl programs to access C subroutines. 1906An XSUB routine will have a stack that contains the arguments from the Perl 1907program, and a way to map from the Perl data structures to a C equivalent. 1908 1909The stack arguments are accessible through the C<ST(n)> macro, which returns 1910the C<n>'th stack argument. Argument 0 is the first argument passed in the 1911Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 1912an C<SV*> is used. 1913 1914Most of the time, output from the C routine can be handled through use of 1915the RETVAL and OUTPUT directives. However, there are some cases where the 1916argument stack is not already long enough to handle all the return values. 1917An example is the POSIX tzname() call, which takes no arguments, but returns 1918two, the local time zone's standard and summer time abbreviations. 1919 1920To handle this situation, the PPCODE directive is used and the stack is 1921extended using the macro: 1922 1923 EXTEND(SP, num); 1924 1925where C<SP> is the macro that represents the local copy of the stack pointer, 1926and C<num> is the number of elements the stack should be extended by. 1927 1928Now that there is room on the stack, values can be pushed on it using C<PUSHs> 1929macro. The pushed values will often need to be "mortal" (See 1930L</Reference Counts and Mortality>): 1931 1932 PUSHs(sv_2mortal(newSViv(an_integer))) 1933 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 1934 PUSHs(sv_2mortal(newSVnv(a_double))) 1935 PUSHs(sv_2mortal(newSVpv("Some String",0))) 1936 /* Although the last example is better written as the more 1937 * efficient: */ 1938 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 1939 1940And now the Perl program calling C<tzname>, the two values will be assigned 1941as in: 1942 1943 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 1944 1945An alternate (and possibly simpler) method to pushing values on the stack is 1946to use the macro: 1947 1948 XPUSHs(SV*) 1949 1950This macro automatically adjusts the stack for you, if needed. Thus, you 1951do not need to call C<EXTEND> to extend the stack. 1952 1953Despite their suggestions in earlier versions of this document the macros 1954C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 1955For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 1956C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 1957 1958For more information, consult L<perlxs> and L<perlxstut>. 1959 1960=head2 Autoloading with XSUBs 1961 1962If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 1963fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 1964of the XSUB's package. 1965 1966But it also puts the same information in certain fields of the XSUB itself: 1967 1968 HV *stash = CvSTASH(cv); 1969 const char *subname = SvPVX(cv); 1970 STRLEN name_length = SvCUR(cv); /* in bytes */ 1971 U32 is_utf8 = SvUTF8(cv); 1972 1973C<SvPVX(cv)> contains just the sub name itself, not including the package. 1974For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 1975C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 1976 1977B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 1978XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 1979XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 1980to support 5.8-5.14, use the XSUB's fields. 1981 1982=head2 Calling Perl Routines from within C Programs 1983 1984There are four routines that can be used to call a Perl subroutine from 1985within a C program. These four are: 1986 1987 I32 call_sv(SV*, I32); 1988 I32 call_pv(const char*, I32); 1989 I32 call_method(const char*, I32); 1990 I32 call_argv(const char*, I32, char**); 1991 1992The routine most often used is C<call_sv>. The C<SV*> argument 1993contains either the name of the Perl subroutine to be called, or a 1994reference to the subroutine. The second argument consists of flags 1995that control the context in which the subroutine is called, whether 1996or not the subroutine is being passed arguments, how errors should be 1997trapped, and how to treat return values. 1998 1999All four routines return the number of arguments that the subroutine returned 2000on the Perl stack. 2001 2002These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 2003but those names are now deprecated; macros of the same name are provided for 2004compatibility. 2005 2006When using any of these routines (except C<call_argv>), the programmer 2007must manipulate the Perl stack. These include the following macros and 2008functions: 2009 2010 dSP 2011 SP 2012 PUSHMARK() 2013 PUTBACK 2014 SPAGAIN 2015 ENTER 2016 SAVETMPS 2017 FREETMPS 2018 LEAVE 2019 XPUSH*() 2020 POP*() 2021 2022For a detailed description of calling conventions from C to Perl, 2023consult L<perlcall>. 2024 2025=head2 Putting a C value on Perl stack 2026 2027A lot of opcodes (this is an elementary operation in the internal perl 2028stack machine) put an SV* on the stack. However, as an optimization 2029the corresponding SV is (usually) not recreated each time. The opcodes 2030reuse specially assigned SVs (I<target>s) which are (as a corollary) 2031not constantly freed/created. 2032 2033Each of the targets is created only once (but see 2034L</Scratchpads and recursion> below), and when an opcode needs to put 2035an integer, a double, or a string on stack, it just sets the 2036corresponding parts of its I<target> and puts the I<target> on stack. 2037 2038The macro to put this target on stack is C<PUSHTARG>, and it is 2039directly used in some opcodes, as well as indirectly in zillions of 2040others, which use it via C<(X)PUSH[iunp]>. 2041 2042Because the target is reused, you must be careful when pushing multiple 2043values on the stack. The following code will not do what you think: 2044 2045 XPUSHi(10); 2046 XPUSHi(20); 2047 2048This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 2049the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 2050At the end of the operation, the stack does not contain the values 10 2051and 20, but actually contains two pointers to C<TARG>, which we have set 2052to 20. 2053 2054If you need to push multiple different values then you should either use 2055the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 2056none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 2057SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 2058will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 2059this a little easier to achieve by creating a new mortal for you (via 2060C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 2061in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 2062Thus, instead of writing this to "fix" the example above: 2063 2064 XPUSHs(sv_2mortal(newSViv(10))) 2065 XPUSHs(sv_2mortal(newSViv(20))) 2066 2067you can simply write: 2068 2069 mXPUSHi(10) 2070 mXPUSHi(20) 2071 2072On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 2073need a C<dTARG> in your variable declarations so that the C<*PUSH*> 2074macros can make use of the local variable C<TARG>. See also C<dTARGET> 2075and C<dXSTARG>. 2076 2077=head2 Scratchpads 2078 2079The question remains on when the SVs which are I<target>s for opcodes 2080are created. The answer is that they are created when the current 2081unit--a subroutine or a file (for opcodes for statements outside of 2082subroutines)--is compiled. During this time a special anonymous Perl 2083array is created, which is called a scratchpad for the current unit. 2084 2085A scratchpad keeps SVs which are lexicals for the current unit and are 2086targets for opcodes. A previous version of this document 2087stated that one can deduce that an SV lives on a scratchpad 2088by looking on its flags: lexicals have C<SVs_PADMY> set, and 2089I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 2090C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 2091While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 2092that have never resided in a pad, but nonetheless act like I<target>s. As 2093of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as 20940. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. 2095 2096The correspondence between OPs and I<target>s is not 1-to-1. Different 2097OPs in the compile tree of the unit can use the same target, if this 2098would not conflict with the expected life of the temporary. 2099 2100=head2 Scratchpads and recursion 2101 2102In fact it is not 100% true that a compiled unit contains a pointer to 2103the scratchpad AV. In fact it contains a pointer to an AV of 2104(initially) one element, and this element is the scratchpad AV. Why do 2105we need an extra level of indirection? 2106 2107The answer is B<recursion>, and maybe B<threads>. Both 2108these can create several execution pointers going into the same 2109subroutine. For the subroutine-child not write over the temporaries 2110for the subroutine-parent (lifespan of which covers the call to the 2111child), the parent and the child should have different 2112scratchpads. (I<And> the lexicals should be separate anyway!) 2113 2114So each subroutine is born with an array of scratchpads (of length 1). 2115On each entry to the subroutine it is checked that the current 2116depth of the recursion is not more than the length of this array, and 2117if it is, new scratchpad is created and pushed into the array. 2118 2119The I<target>s on this scratchpad are C<undef>s, but they are already 2120marked with correct flags. 2121 2122=head1 Memory Allocation 2123 2124=head2 Allocation 2125 2126All memory meant to be used with the Perl API functions should be manipulated 2127using the macros described in this section. The macros provide the necessary 2128transparency between differences in the actual malloc implementation that is 2129used within perl. 2130 2131The following three macros are used to initially allocate memory : 2132 2133 Newx(pointer, number, type); 2134 Newxc(pointer, number, type, cast); 2135 Newxz(pointer, number, type); 2136 2137The first argument C<pointer> should be the name of a variable that will 2138point to the newly allocated memory. 2139 2140The second and third arguments C<number> and C<type> specify how many of 2141the specified type of data structure should be allocated. The argument 2142C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 2143should be used if the C<pointer> argument is different from the C<type> 2144argument. 2145 2146Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 2147to zero out all the newly allocated memory. 2148 2149=head2 Reallocation 2150 2151 Renew(pointer, number, type); 2152 Renewc(pointer, number, type, cast); 2153 Safefree(pointer) 2154 2155These three macros are used to change a memory buffer size or to free a 2156piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 2157match those of C<New> and C<Newc> with the exception of not needing the 2158"magic cookie" argument. 2159 2160=head2 Moving 2161 2162 Move(source, dest, number, type); 2163 Copy(source, dest, number, type); 2164 Zero(dest, number, type); 2165 2166These three macros are used to move, copy, or zero out previously allocated 2167memory. The C<source> and C<dest> arguments point to the source and 2168destination starting points. Perl will move, copy, or zero out C<number> 2169instances of the size of the C<type> data structure (using the C<sizeof> 2170function). 2171 2172=head1 PerlIO 2173 2174The most recent development releases of Perl have been experimenting with 2175removing Perl's dependency on the "normal" standard I/O suite and allowing 2176other stdio implementations to be used. This involves creating a new 2177abstraction layer that then calls whichever implementation of stdio Perl 2178was compiled with. All XSUBs should now use the functions in the PerlIO 2179abstraction layer and not make any assumptions about what kind of stdio 2180is being used. 2181 2182For a complete description of the PerlIO abstraction, consult L<perlapio>. 2183 2184=head1 Compiled code 2185 2186=head2 Code tree 2187 2188Here we describe the internal form your code is converted to by 2189Perl. Start with a simple example: 2190 2191 $a = $b + $c; 2192 2193This is converted to a tree similar to this one: 2194 2195 assign-to 2196 / \ 2197 + $a 2198 / \ 2199 $b $c 2200 2201(but slightly more complicated). This tree reflects the way Perl 2202parsed your code, but has nothing to do with the execution order. 2203There is an additional "thread" going through the nodes of the tree 2204which shows the order of execution of the nodes. In our simplified 2205example above it looks like: 2206 2207 $b ---> $c ---> + ---> $a ---> assign-to 2208 2209But with the actual compile tree for C<$a = $b + $c> it is different: 2210some nodes I<optimized away>. As a corollary, though the actual tree 2211contains more nodes than our simplified example, the execution order 2212is the same as in our example. 2213 2214=head2 Examining the tree 2215 2216If you have your perl compiled for debugging (usually done with 2217C<-DDEBUGGING> on the C<Configure> command line), you may examine the 2218compiled tree by specifying C<-Dx> on the Perl command line. The 2219output takes several lines per node, and for C<$b+$c> it looks like 2220this: 2221 2222 5 TYPE = add ===> 6 2223 TARG = 1 2224 FLAGS = (SCALAR,KIDS) 2225 { 2226 TYPE = null ===> (4) 2227 (was rv2sv) 2228 FLAGS = (SCALAR,KIDS) 2229 { 2230 3 TYPE = gvsv ===> 4 2231 FLAGS = (SCALAR) 2232 GV = main::b 2233 } 2234 } 2235 { 2236 TYPE = null ===> (5) 2237 (was rv2sv) 2238 FLAGS = (SCALAR,KIDS) 2239 { 2240 4 TYPE = gvsv ===> 5 2241 FLAGS = (SCALAR) 2242 GV = main::c 2243 } 2244 } 2245 2246This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 2247not optimized away (one per number in the left column). The immediate 2248children of the given node correspond to C<{}> pairs on the same level 2249of indentation, thus this listing corresponds to the tree: 2250 2251 add 2252 / \ 2253 null null 2254 | | 2255 gvsv gvsv 2256 2257The execution order is indicated by C<===E<gt>> marks, thus it is C<3 22584 5 6> (node C<6> is not included into above listing), i.e., 2259C<gvsv gvsv add whatever>. 2260 2261Each of these nodes represents an op, a fundamental operation inside the 2262Perl core. The code which implements each operation can be found in the 2263F<pp*.c> files; the function which implements the op with type C<gvsv> 2264is C<pp_gvsv>, and so on. As the tree above shows, different ops have 2265different numbers of children: C<add> is a binary operator, as one would 2266expect, and so has two children. To accommodate the various different 2267numbers of children, there are various types of op data structure, and 2268they link together in different ways. 2269 2270The simplest type of op structure is C<OP>: this has no children. Unary 2271operators, C<UNOP>s, have one child, and this is pointed to by the 2272C<op_first> field. Binary operators (C<BINOP>s) have not only an 2273C<op_first> field but also an C<op_last> field. The most complex type of 2274op is a C<LISTOP>, which has any number of children. In this case, the 2275first child is pointed to by C<op_first> and the last child by 2276C<op_last>. The children in between can be found by iteratively 2277following the C<OpSIBLING> pointer from the first child to the last (but 2278see below). 2279 2280=for apidoc Ayh||OP 2281=for apidoc Ayh||BINOP 2282=for apidoc Ayh||LISTOP 2283=for apidoc Ayh||UNOP 2284 2285There are also some other op types: a C<PMOP> holds a regular expression, 2286and has no children, and a C<LOOP> may or may not have children. If the 2287C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 2288complicate matters, if a C<UNOP> is actually a C<null> op after 2289optimization (see L</Compile pass 2: context propagation>) it will still 2290have children in accordance with its former type. 2291 2292=for apidoc Ayh||LOOP 2293=for apidoc Ayh||PMOP 2294 2295Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one 2296or more children, but it doesn't have an C<op_last> field: so you have to 2297follow C<op_first> and then the C<OpSIBLING> chain itself to find the 2298last child. Instead it has an C<op_other> field, which is comparable to 2299the C<op_next> field described below, and represents an alternate 2300execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note 2301that in general, C<op_other> may not point to any of the direct children 2302of the C<LOGOP>. 2303 2304=for apidoc Ayh||LOGOP 2305 2306Starting in version 5.21.2, perls built with the experimental 2307define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, 2308C<op_moresib>. When not set, this indicates that this is the last op in an 2309C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last 2310sibling to point back to the parent op. Under this build, that field is 2311also renamed C<op_sibparent> to reflect its joint role. The macro 2312C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on 2313the last sibling. With this build the C<op_parent(o)> function can be 2314used to find the parent of any op. Thus for forward compatibility, you 2315should always use the C<OpSIBLING(o)> macro rather than accessing 2316C<op_sibling> directly. 2317 2318Another way to examine the tree is to use a compiler back-end module, such 2319as L<B::Concise>. 2320 2321=head2 Compile pass 1: check routines 2322 2323The tree is created by the compiler while I<yacc> code feeds it 2324the constructions it recognizes. Since I<yacc> works bottom-up, so does 2325the first pass of perl compilation. 2326 2327What makes this pass interesting for perl developers is that some 2328optimization may be performed on this pass. This is optimization by 2329so-called "check routines". The correspondence between node names 2330and corresponding check routines is described in F<opcode.pl> (do not 2331forget to run C<make regen_headers> if you modify this file). 2332 2333A check routine is called when the node is fully constructed except 2334for the execution-order thread. Since at this time there are no 2335back-links to the currently constructed node, one can do most any 2336operation to the top-level node, including freeing it and/or creating 2337new nodes above/below it. 2338 2339The check routine returns the node which should be inserted into the 2340tree (if the top-level node was not modified, check routine returns 2341its argument). 2342 2343By convention, check routines have names C<ck_*>. They are usually 2344called from C<new*OP> subroutines (or C<convert>) (which in turn are 2345called from F<perly.y>). 2346 2347=head2 Compile pass 1a: constant folding 2348 2349Immediately after the check routine is called the returned node is 2350checked for being compile-time executable. If it is (the value is 2351judged to be constant) it is immediately executed, and a I<constant> 2352node with the "return value" of the corresponding subtree is 2353substituted instead. The subtree is deleted. 2354 2355If constant folding was not performed, the execution-order thread is 2356created. 2357 2358=head2 Compile pass 2: context propagation 2359 2360When a context for a part of compile tree is known, it is propagated 2361down through the tree. At this time the context can have 5 values 2362(instead of 2 for runtime context): void, boolean, scalar, list, and 2363lvalue. In contrast with the pass 1 this pass is processed from top 2364to bottom: a node's context determines the context for its children. 2365 2366Additional context-dependent optimizations are performed at this time. 2367Since at this moment the compile tree contains back-references (via 2368"thread" pointers), nodes cannot be free()d now. To allow 2369optimized-away nodes at this stage, such nodes are null()ified instead 2370of free()ing (i.e. their type is changed to OP_NULL). 2371 2372=head2 Compile pass 3: peephole optimization 2373 2374After the compile tree for a subroutine (or for an C<eval> or a file) 2375is created, an additional pass over the code is performed. This pass 2376is neither top-down or bottom-up, but in the execution order (with 2377additional complications for conditionals). Optimizations performed 2378at this stage are subject to the same restrictions as in the pass 2. 2379 2380Peephole optimizations are done by calling the function pointed to 2381by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2382calls the function pointed to by the global variable C<PL_rpeepp>. 2383By default, that performs some basic op fixups and optimisations along 2384the execution-order op chain, and recursively calls C<PL_rpeepp> for 2385each side chain of ops (resulting from conditionals). Extensions may 2386provide additional optimisations or fixups, hooking into either the 2387per-subroutine or recursive stage, like this: 2388 2389 static peep_t prev_peepp; 2390 static void my_peep(pTHX_ OP *o) 2391 { 2392 /* custom per-subroutine optimisation goes here */ 2393 prev_peepp(aTHX_ o); 2394 /* custom per-subroutine optimisation may also go here */ 2395 } 2396 BOOT: 2397 prev_peepp = PL_peepp; 2398 PL_peepp = my_peep; 2399 2400 static peep_t prev_rpeepp; 2401 static void my_rpeep(pTHX_ OP *first) 2402 { 2403 OP *o = first, *t = first; 2404 for(; o = o->op_next, t = t->op_next) { 2405 /* custom per-op optimisation goes here */ 2406 o = o->op_next; 2407 if (!o || o == t) break; 2408 /* custom per-op optimisation goes AND here */ 2409 } 2410 prev_rpeepp(aTHX_ orig_o); 2411 } 2412 BOOT: 2413 prev_rpeepp = PL_rpeepp; 2414 PL_rpeepp = my_rpeep; 2415 2416=for apidoc Ayh||peep_t 2417 2418=head2 Pluggable runops 2419 2420The compile tree is executed in a runops function. There are two runops 2421functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2422with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2423control over the execution of the compile tree it is possible to provide 2424your own runops function. 2425 2426It's probably best to copy one of the existing runops functions and 2427change it to suit your needs. Then, in the BOOT section of your XS 2428file, add the line: 2429 2430 PL_runops = my_runops; 2431 2432=for apidoc Amnh|runops_proc_t|PL_runops 2433 2434This function should be as efficient as possible to keep your programs 2435running as fast as possible. 2436 2437=head2 Compile-time scope hooks 2438 2439As of perl 5.14 it is possible to hook into the compile-time lexical 2440scope mechanism using C<Perl_blockhook_register>. This is used like 2441this: 2442 2443 STATIC void my_start_hook(pTHX_ int full); 2444 STATIC BHK my_hooks; 2445 2446 BOOT: 2447 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2448 Perl_blockhook_register(aTHX_ &my_hooks); 2449 2450This will arrange to have C<my_start_hook> called at the start of 2451compiling every lexical scope. The available hooks are: 2452 2453=for apidoc Ayh||BHK 2454 2455=over 4 2456 2457=item C<void bhk_start(pTHX_ int full)> 2458 2459This is called just after starting a new lexical scope. Note that Perl 2460code like 2461 2462 if ($x) { ... } 2463 2464creates two scopes: the first starts at the C<(> and has C<full == 1>, 2465the second starts at the C<{> and has C<full == 0>. Both end at the 2466C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything 2467pushed onto the save stack by this hook will be popped just before the 2468scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2469 2470=item C<void bhk_pre_end(pTHX_ OP **o)> 2471 2472This is called at the end of a lexical scope, just before unwinding the 2473stack. I<o> is the root of the optree representing the scope; it is a 2474double pointer so you can replace the OP if you need to. 2475 2476=item C<void bhk_post_end(pTHX_ OP **o)> 2477 2478This is called at the end of a lexical scope, just after unwinding the 2479stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2480and C<post_end> to nest, if there is something on the save stack that 2481calls string eval. 2482 2483=item C<void bhk_eval(pTHX_ OP *const o)> 2484 2485This is called just before starting to compile an C<eval STRING>, C<do 2486FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2487OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2488C<OP_DOFILE> or C<OP_REQUIRE>. 2489 2490=back 2491 2492Once you have your hook functions, you need a C<BHK> structure to put 2493them in. It's best to allocate it statically, since there is no way to 2494free it once it's registered. The function pointers should be inserted 2495into this structure using the C<BhkENTRY_set> macro, which will also set 2496flags indicating which entries are valid. If you do need to allocate 2497your C<BHK> dynamically for some reason, be sure to zero it before you 2498start. 2499 2500Once registered, there is no mechanism to switch these hooks off, so if 2501that is necessary you will need to do this yourself. An entry in C<%^H> 2502is probably the best way, so the effect is lexically scoped; however it 2503is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2504temporarily switch entries on and off. You should also be aware that 2505generally speaking at least one scope will have opened before your 2506extension is loaded, so you will see some C<pre>/C<post_end> pairs that 2507didn't have a matching C<start>. 2508 2509=head1 Examining internal data structures with the C<dump> functions 2510 2511To aid debugging, the source file F<dump.c> contains a number of 2512functions which produce formatted output of internal data structures. 2513 2514The most commonly used of these functions is C<Perl_sv_dump>; it's used 2515for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2516C<sv_dump> to produce debugging output from Perl-space, so users of that 2517module should already be familiar with its format. 2518 2519C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2520derivatives, and produces output similar to C<perl -Dx>; in fact, 2521C<Perl_dump_eval> will dump the main root of the code being evaluated, 2522exactly like C<-Dx>. 2523 2524Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2525op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2526subroutines in a package like so: (Thankfully, these are all xsubs, so 2527there is no op tree) 2528 2529 (gdb) print Perl_dump_packsubs(PL_defstash) 2530 2531 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2532 2533 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2534 2535 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2536 2537 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2538 2539 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2540 2541and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2542the op tree of the main root. 2543 2544=head1 How multiple interpreters and concurrency are supported 2545 2546=head2 Background and MULTIPLICITY 2547 2548The Perl interpreter can be regarded as a closed box: it has an API 2549for feeding it code or otherwise making it do things, but it also has 2550functions for its own use. This smells a lot like an object, and 2551there is a way for you to build Perl so that you can have multiple 2552interpreters, with one interpreter represented either as a C structure, 2553or inside a thread-specific structure. These structures contain all 2554the context, the state of that interpreter. 2555 2556The macro that controls the major Perl build flavor is MULTIPLICITY. The 2557MULTIPLICITY build has a C structure that packages all the interpreter 2558state, which is being passed to various perl functions as a "hidden" 2559first argument. MULTIPLICITY makes multi-threaded perls possible (with the 2560ithreads threading model, related to the macro USE_ITHREADS.) 2561 2562PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. 2563 2564To see whether you have non-const data you can use a BSD (or GNU) 2565compatible C<nm>: 2566 2567 nm libperl.a | grep -v ' [TURtr] ' 2568 2569If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), 2570you have non-const data. The symbols the C<grep> removed are as follows: 2571C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, 2572and the C<U> is <undefined>, external symbols referred to. 2573 2574The test F<t/porting/libperl.t> does this kind of symbol sanity 2575checking on C<libperl.a>. 2576 2577All this obviously requires a way for the Perl internal functions to be 2578either subroutines taking some kind of structure as the first 2579argument, or subroutines taking nothing as the first argument. To 2580enable these two very different ways of building the interpreter, 2581the Perl source (as it does in so many other situations) makes heavy 2582use of macros and subroutine naming conventions. 2583 2584First problem: deciding which functions will be public API functions and 2585which will be private. All functions whose names begin C<S_> are private 2586(think "S" for "secret" or "static"). All other functions begin with 2587"Perl_", but just because a function begins with "Perl_" does not mean it is 2588part of the API. (See L</Internal 2589Functions>.) The easiest way to be B<sure> a 2590function is part of the API is to find its entry in L<perlapi>. 2591If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2592think it should be (i.e., you need it for your extension), submit an issue at 2593L<https://github.com/Perl/perl5/issues> explaining why you think it should be. 2594 2595Second problem: there must be a syntax so that the same subroutine 2596declarations and calls can pass a structure as their first argument, 2597or pass nothing. To solve this, the subroutines are named and 2598declared in a particular way. Here's a typical start of a static 2599function used within the Perl guts: 2600 2601 STATIC void 2602 S_incline(pTHX_ char *s) 2603 2604STATIC becomes "static" in C, and may be #define'd to nothing in some 2605configurations in the future. 2606 2607=for apidoc_section $directives 2608=for apidoc Ayh||STATIC 2609 2610A public function (i.e. part of the internal API, but not necessarily 2611sanctioned for use in extensions) begins like this: 2612 2613 void 2614 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2615 2616C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2617details of the interpreter's context. THX stands for "thread", "this", 2618or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2619The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2620or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2621their variants. 2622 2623=for apidoc_section $concurrency 2624=for apidoc Amnh||aTHX 2625=for apidoc Amnh||aTHX_ 2626=for apidoc Amnh||dTHX 2627=for apidoc Amnh||pTHX 2628=for apidoc Amnh||pTHX_ 2629 2630When Perl is built without options that set MULTIPLICITY, there is no 2631first argument containing the interpreter's context. The trailing underscore 2632in the pTHX_ macro indicates that the macro expansion needs a comma 2633after the context argument because other arguments follow it. If 2634MULTIPLICITY is not defined, pTHX_ will be ignored, and the 2635subroutine is not prototyped to take the extra argument. The form of the 2636macro without the trailing underscore is used when there are no additional 2637explicit arguments. 2638 2639When a core function calls another, it must pass the context. This 2640is normally hidden via macros. Consider C<sv_setiv>. It expands into 2641something like this: 2642 2643 #ifdef MULTIPLICITY 2644 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2645 /* can't do this for vararg functions, see below */ 2646 #else 2647 #define sv_setiv Perl_sv_setiv 2648 #endif 2649 2650This works well, and means that XS authors can gleefully write: 2651 2652 sv_setiv(foo, bar); 2653 2654and still have it work under all the modes Perl could have been 2655compiled with. 2656 2657This doesn't work so cleanly for varargs functions, though, as macros 2658imply that the number of arguments is known in advance. Instead we 2659either need to spell them out fully, passing C<aTHX_> as the first 2660argument (the Perl core tends to do this with functions like 2661Perl_warner), or use a context-free version. 2662 2663The context-free version of Perl_warner is called 2664Perl_warner_nocontext, and does not take the extra argument. Instead 2665it does C<dTHX;> to get the context from thread-local storage. We 2666C<#define warner Perl_warner_nocontext> so that extensions get source 2667compatibility at the expense of performance. (Passing an arg is 2668cheaper than grabbing it from thread-local storage.) 2669 2670You can ignore [pad]THXx when browsing the Perl headers/sources. 2671Those are strictly for use within the core. Extensions and embedders 2672need only be aware of [pad]THX. 2673 2674=head2 So what happened to dTHR? 2675 2676=for apidoc Amnh||dTHR 2677 2678C<dTHR> was introduced in perl 5.005 to support the older thread model. 2679The older thread model now uses the C<THX> mechanism to pass context 2680pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2681later still have it for backward source compatibility, but it is defined 2682to be a no-op. 2683 2684=head2 How do I use all this in extensions? 2685 2686When Perl is built with MULTIPLICITY, extensions that call 2687any functions in the Perl API will need to pass the initial context 2688argument somehow. The kicker is that you will need to write it in 2689such a way that the extension still compiles when Perl hasn't been 2690built with MULTIPLICITY enabled. 2691 2692There are three ways to do this. First, the easy but inefficient way, 2693which is also the default, in order to maintain source compatibility 2694with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2695and aTHX_ macros to call a function that will return the context. 2696Thus, something like: 2697 2698 sv_setiv(sv, num); 2699 2700in your extension will translate to this when MULTIPLICITY is 2701in effect: 2702 2703 Perl_sv_setiv(Perl_get_context(), sv, num); 2704 2705or to this otherwise: 2706 2707 Perl_sv_setiv(sv, num); 2708 2709You don't have to do anything new in your extension to get this; since 2710the Perl library provides Perl_get_context(), it will all just 2711work. 2712 2713The second, more efficient way is to use the following template for 2714your Foo.xs: 2715 2716 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2717 #include "EXTERN.h" 2718 #include "perl.h" 2719 #include "XSUB.h" 2720 2721 STATIC void my_private_function(int arg1, int arg2); 2722 2723 STATIC void 2724 my_private_function(int arg1, int arg2) 2725 { 2726 dTHX; /* fetch context */ 2727 ... call many Perl API functions ... 2728 } 2729 2730 [... etc ...] 2731 2732 MODULE = Foo PACKAGE = Foo 2733 2734 /* typical XSUB */ 2735 2736 void 2737 my_xsub(arg) 2738 int arg 2739 CODE: 2740 my_private_function(arg, 10); 2741 2742Note that the only two changes from the normal way of writing an 2743extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2744including the Perl headers, followed by a C<dTHX;> declaration at 2745the start of every function that will call the Perl API. (You'll 2746know which functions need this, because the C compiler will complain 2747that there's an undeclared identifier in those functions.) No changes 2748are needed for the XSUBs themselves, because the XS() macro is 2749correctly defined to pass in the implicit context if needed. 2750 2751The third, even more efficient way is to ape how it is done within 2752the Perl guts: 2753 2754 2755 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2756 #include "EXTERN.h" 2757 #include "perl.h" 2758 #include "XSUB.h" 2759 2760 /* pTHX_ only needed for functions that call Perl API */ 2761 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2762 2763 STATIC void 2764 my_private_function(pTHX_ int arg1, int arg2) 2765 { 2766 /* dTHX; not needed here, because THX is an argument */ 2767 ... call Perl API functions ... 2768 } 2769 2770 [... etc ...] 2771 2772 MODULE = Foo PACKAGE = Foo 2773 2774 /* typical XSUB */ 2775 2776 void 2777 my_xsub(arg) 2778 int arg 2779 CODE: 2780 my_private_function(aTHX_ arg, 10); 2781 2782This implementation never has to fetch the context using a function 2783call, since it is always passed as an extra argument. Depending on 2784your needs for simplicity or efficiency, you may mix the previous 2785two approaches freely. 2786 2787Never add a comma after C<pTHX> yourself--always use the form of the 2788macro with the underscore for functions that take explicit arguments, 2789or the form without the argument for functions with no explicit arguments. 2790 2791=head2 Should I do anything special if I call perl from multiple threads? 2792 2793If you create interpreters in one thread and then proceed to call them in 2794another, you need to make sure perl's own Thread Local Storage (TLS) slot is 2795initialized correctly in each of those threads. 2796 2797The C<perl_alloc> and C<perl_clone> API functions will automatically set 2798the TLS slot to the interpreter they created, so that there is no need to do 2799anything special if the interpreter is always accessed in the same thread that 2800created it, and that thread did not create or call any other interpreters 2801afterwards. If that is not the case, you have to set the TLS slot of the 2802thread before calling any functions in the Perl API on that particular 2803interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 2804thread as the first thing you do: 2805 2806 /* do this before doing anything else with some_perl */ 2807 PERL_SET_CONTEXT(some_perl); 2808 2809 ... other Perl API calls on some_perl go here ... 2810 2811=head2 Future Plans and PERL_IMPLICIT_SYS 2812 2813Just as MULTIPLICITY provides a way to bundle up everything 2814that the interpreter knows about itself and pass it around, so too are 2815there plans to allow the interpreter to bundle up everything it knows 2816about the environment it's running on. This is enabled with the 2817PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 2818Windows. 2819 2820This allows the ability to provide an extra pointer (called the "host" 2821environment) for all the system calls. This makes it possible for 2822all the system stuff to maintain their own state, broken down into 2823seven C structures. These are thin wrappers around the usual system 2824calls (see F<win32/perllib.c>) for the default perl executable, but for a 2825more ambitious host (like the one that would do fork() emulation) all 2826the extra work needed to pretend that different interpreters are 2827actually different "processes", would be done here. 2828 2829The Perl engine/interpreter and the host are orthogonal entities. 2830There could be one or more interpreters in a process, and one or 2831more "hosts", with free association between them. 2832 2833=head1 Internal Functions 2834 2835All of Perl's internal functions which will be exposed to the outside 2836world are prefixed by C<Perl_> so that they will not conflict with XS 2837functions or functions used in a program in which Perl is embedded. 2838Similarly, all global variables begin with C<PL_>. (By convention, 2839static functions start with C<S_>.) 2840 2841Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 2842either with or without the C<Perl_> prefix, thanks to a bunch of defines 2843that live in F<embed.h>. Note that extension code should I<not> set 2844C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 2845breakage of the XS in each new perl release. 2846 2847The file F<embed.h> is generated automatically from 2848F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 2849header files for the internal functions, generates the documentation 2850and a lot of other bits and pieces. It's important that when you add 2851a new function to the core or change an existing one, you change the 2852data in the table in F<embed.fnc> as well. Here's a sample entry from 2853that table: 2854 2855 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 2856 2857The first column is a set of flags, the second column the return type, 2858the third column the name. Columns after that are the arguments. 2859The flags are documented at the top of F<embed.fnc>. 2860 2861If you edit F<embed.pl> or F<embed.fnc>, you will need to run 2862C<make regen_headers> to force a rebuild of F<embed.h> and other 2863auto-generated files. 2864 2865=head2 Formatted Printing of IVs, UVs, and NVs 2866 2867If you are printing IVs, UVs, or NVS instead of the stdio(3) style 2868formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 2869following macros for portability 2870 2871 IVdf IV in decimal 2872 UVuf UV in decimal 2873 UVof UV in octal 2874 UVxf UV in hexadecimal 2875 NVef NV %e-like 2876 NVff NV %f-like 2877 NVgf NV %g-like 2878 2879These will take care of 64-bit integers and long doubles. 2880For example: 2881 2882 printf("IV is %" IVdf "\n", iv); 2883 2884The C<IVdf> will expand to whatever is the correct format for the IVs. 2885Note that the spaces are required around the format in case the code is 2886compiled with C++, to maintain compliance with its standard. 2887 2888Note that there are different "long doubles": Perl will use 2889whatever the compiler has. 2890 2891If you are printing addresses of pointers, use %p or UVxf combined 2892with PTR2UV(). 2893 2894=head2 Formatted Printing of SVs 2895 2896The contents of SVs may be printed using the C<SVf> format, like so: 2897 2898 Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) 2899 2900where C<err_msg> is an SV. 2901 2902=for apidoc Amnh||SVf 2903=for apidoc Amh||SVfARG|SV *sv 2904 2905Not all scalar types are printable. Simple values certainly are: one of 2906IV, UV, NV, or PV. Also, if the SV is a reference to some value, 2907either it will be dereferenced and the value printed, or information 2908about the type of that value and its address are displayed. The results 2909of printing any other type of SV are undefined and likely to lead to an 2910interpreter crash. NVs are printed using a C<%g>-ish format. 2911 2912Note that the spaces are required around the C<SVf> in case the code is 2913compiled with C++, to maintain compliance with its standard. 2914 2915Note that any filehandle being printed to under UTF-8 must be expecting 2916UTF-8 in order to get good results and avoid Wide-character warnings. 2917One way to do this for typical filehandles is to invoke perl with the 2918C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 2919 2920You can use this to concatenate two scalars: 2921 2922 SV *var1 = get_sv("var1", GV_ADD); 2923 SV *var2 = get_sv("var2", GV_ADD); 2924 SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, 2925 SVfARG(var1), SVfARG(var2)); 2926 2927=head2 Formatted Printing of Strings 2928 2929If you just want the bytes printed in a 7bit NUL-terminated string, you can 2930just use C<%s> (assuming they are all really only 7bit). But if there is a 2931possibility the value will be encoded as UTF-8 or contains bytes above 2932C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. 2933And as its parameter, use the C<UTF8fARG()> macro: 2934 2935 chr * msg; 2936 2937 /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK 2938 U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ 2939 if (can_utf8) 2940 msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; 2941 else 2942 msg = "'Uses simple quotes'"; 2943 2944 Perl_croak(aTHX_ "The message is: %" UTF8f "\n", 2945 UTF8fARG(can_utf8, strlen(msg), msg)); 2946 2947The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in 2948UTF-8; 0 if string is in native byte encoding (Latin1). 2949The second parameter is the number of bytes in the string to print. 2950And the third and final parameter is a pointer to the first byte in the 2951string. 2952 2953Note that any filehandle being printed to under UTF-8 must be expecting 2954UTF-8 in order to get good results and avoid Wide-character warnings. 2955One way to do this for typical filehandles is to invoke perl with the 2956C<-C>> parameter. (See L<perlrun/-C [numberE<sol>list]>. 2957 2958=for apidoc_section $formats 2959=for apidoc Amnh||UTF8f 2960=for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str 2961 2962=cut 2963 2964=head2 Formatted Printing of C<Size_t> and C<SSize_t> 2965 2966The most general way to do this is to cast them to a UV or IV, and 2967print as in the 2968L<previous section|/Formatted Printing of IVs, UVs, and NVs>. 2969 2970But if you're using C<PerlIO_printf()>, it's less typing and visual 2971clutter to use the C<%z> length modifier (for I<siZe>): 2972 2973 PerlIO_printf("STRLEN is %zu\n", len); 2974 2975This modifier is not portable, so its use should be restricted to 2976C<PerlIO_printf()>. 2977 2978=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes 2979 2980There are modifiers for these special situations if you are using 2981C<PerlIO_printf()>. See L<perlfunc/size>. 2982 2983=head2 Pointer-To-Integer and Integer-To-Pointer 2984 2985Because pointer size does not necessarily equal integer size, 2986use the follow macros to do it right. 2987 2988 PTR2UV(pointer) 2989 PTR2IV(pointer) 2990 PTR2NV(pointer) 2991 INT2PTR(pointertotype, integer) 2992 2993=for apidoc_section $casting 2994=for apidoc Amh|type|INT2PTR|type|int value 2995=for apidoc Amh|UV|PTR2UV|void * ptr 2996=for apidoc Amh|IV|PTR2IV|void * ptr 2997=for apidoc Amh|NV|PTR2NV|void * ptr 2998 2999For example: 3000 3001 IV iv = ...; 3002 SV *sv = INT2PTR(SV*, iv); 3003 3004and 3005 3006 AV *av = ...; 3007 UV uv = PTR2UV(av); 3008 3009There are also 3010 3011 PTR2nat(pointer) /* pointer to integer of PTRSIZE */ 3012 PTR2ul(pointer) /* pointer to unsigned long */ 3013 3014=for apidoc Amh|IV|PTR2nat|void * 3015=for apidoc Amh|unsigned long|PTR2ul|void * 3016 3017And C<PTRV> which gives the native type for an integer the same size as 3018pointers, such as C<unsigned> or C<unsigned long>. 3019 3020=for apidoc Ayh|type|PTRV 3021 3022=head2 Exception Handling 3023 3024There are a couple of macros to do very basic exception handling in XS 3025modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 3026be able to use these macros: 3027 3028 #define NO_XSLOCKS 3029 #include "XSUB.h" 3030 3031You can use these macros if you call code that may croak, but you need 3032to do some cleanup before giving control back to Perl. For example: 3033 3034 dXCPT; /* set up necessary variables */ 3035 3036 XCPT_TRY_START { 3037 code_that_may_croak(); 3038 } XCPT_TRY_END 3039 3040 XCPT_CATCH 3041 { 3042 /* do cleanup here */ 3043 XCPT_RETHROW; 3044 } 3045 3046Note that you always have to rethrow an exception that has been 3047caught. Using these macros, it is not possible to just catch the 3048exception and ignore it. If you have to ignore the exception, you 3049have to use the C<call_*> function. 3050 3051The advantage of using the above macros is that you don't have 3052to setup an extra function for C<call_*>, and that using these 3053macros is faster than using C<call_*>. 3054 3055=head2 Source Documentation 3056 3057There's an effort going on to document the internal functions and 3058automatically produce reference manuals from them -- L<perlapi> is one 3059such manual which details all the functions which are available to XS 3060writers. L<perlintern> is the autogenerated manual for the functions 3061which are not part of the API and are supposedly for internal use only. 3062 3063Source documentation is created by putting POD comments into the C 3064source, like this: 3065 3066 /* 3067 =for apidoc sv_setiv 3068 3069 Copies an integer into the given SV. Does not handle 'set' magic. See 3070 L<perlapi/sv_setiv_mg>. 3071 3072 =cut 3073 */ 3074 3075Please try and supply some documentation if you add functions to the 3076Perl core. 3077 3078=head2 Backwards compatibility 3079 3080The Perl API changes over time. New functions are 3081added or the interfaces of existing functions are 3082changed. The C<Devel::PPPort> module tries to 3083provide compatibility code for some of these changes, so XS writers don't 3084have to code it themselves when supporting multiple versions of Perl. 3085 3086C<Devel::PPPort> generates a C header file F<ppport.h> that can also 3087be run as a Perl script. To generate F<ppport.h>, run: 3088 3089 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 3090 3091Besides checking existing XS code, the script can also be used to retrieve 3092compatibility information for various API calls using the C<--api-info> 3093command line switch. For example: 3094 3095 % perl ppport.h --api-info=sv_magicext 3096 3097For details, see C<perldoc ppport.h>. 3098 3099=head1 Unicode Support 3100 3101Perl 5.6.0 introduced Unicode support. It's important for porters and XS 3102writers to understand this support and make sure that the code they 3103write does not corrupt Unicode data. 3104 3105=head2 What B<is> Unicode, anyway? 3106 3107In the olden, less enlightened times, we all used to use ASCII. Most of 3108us did, anyway. The big problem with ASCII is that it's American. Well, 3109no, that's not actually the problem; the problem is that it's not 3110particularly useful for people who don't use the Roman alphabet. What 3111used to happen was that particular languages would stick their own 3112alphabet in the upper range of the sequence, between 128 and 255. Of 3113course, we then ended up with plenty of variants that weren't quite 3114ASCII, and the whole point of it being a standard was lost. 3115 3116Worse still, if you've got a language like Chinese or 3117Japanese that has hundreds or thousands of characters, then you really 3118can't fit them into a mere 256, so they had to forget about ASCII 3119altogether, and build their own systems using pairs of numbers to refer 3120to one character. 3121 3122To fix this, some people formed Unicode, Inc. and 3123produced a new character set containing all the characters you can 3124possibly think of and more. There are several ways of representing these 3125characters, and the one Perl uses is called UTF-8. UTF-8 uses 3126a variable number of bytes to represent a character. You can learn more 3127about Unicode and Perl's Unicode model in L<perlunicode>. 3128 3129(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of 3130UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. 3131UTF-EBCDIC is like UTF-8, but the details are different. The macros 3132hide the differences from you, just remember that the particular numbers 3133and bit patterns presented below will differ in UTF-EBCDIC.) 3134 3135=head2 How can I recognise a UTF-8 string? 3136 3137You can't. This is because UTF-8 data is stored in bytes just like 3138non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 3139capital E with a grave accent, is represented by the two bytes 3140C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 3141has that byte sequence as well. So you can't tell just by looking -- this 3142is what makes Unicode input an interesting problem. 3143 3144In general, you either have to know what you're dealing with, or you 3145have to guess. The API function C<is_utf8_string> can help; it'll tell 3146you if a string contains only valid UTF-8 characters, and the chances 3147of a non-UTF-8 string looking like valid UTF-8 become very small very 3148quickly with increasing string length. On a character-by-character 3149basis, C<isUTF8_CHAR> 3150will tell you whether the current character in a string is valid UTF-8. 3151 3152=head2 How does UTF-8 represent Unicode characters? 3153 3154As mentioned above, UTF-8 uses a variable number of bytes to store a 3155character. Characters with values 0...127 are stored in one 3156byte, just like good ol' ASCII. Character 128 is stored as 3157C<v194.128>; this continues up to character 191, which is 3158C<v194.191>. Now we've run out of bits (191 is binary 3159C<10111111>) so we move on; character 192 is C<v195.128>. And 3160so it goes on, moving to three bytes at character 2048. 3161L<perlunicode/Unicode Encodings> has pictures of how this works. 3162 3163Assuming you know you're dealing with a UTF-8 string, you can find out 3164how long the first character in it is with the C<UTF8SKIP> macro: 3165 3166 char *utf = "\305\233\340\240\201"; 3167 I32 len; 3168 3169 len = UTF8SKIP(utf); /* len is 2 here */ 3170 utf += len; 3171 len = UTF8SKIP(utf); /* len is 3 here */ 3172 3173Another way to skip over characters in a UTF-8 string is to use 3174C<utf8_hop>, which takes a string and a number of characters to skip 3175over. You're on your own about bounds checking, though, so don't use it 3176lightly. 3177 3178All bytes in a multi-byte UTF-8 character will have the high bit set, 3179so you can test if you need to do something special with this 3180character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests 3181whether the byte is encoded as a single byte even in UTF-8): 3182 3183 U8 *utf; /* Initialize this to point to the beginning of the 3184 sequence to convert */ 3185 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence 3186 pointed to by 'utf' */ 3187 UV uv; /* Returned code point; note: a UV, not a U8, not a 3188 char */ 3189 STRLEN len; /* Returned length of character in bytes */ 3190 3191 if (!UTF8_IS_INVARIANT(*utf)) 3192 /* Must treat this as UTF-8 */ 3193 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 3194 else 3195 /* OK to treat this character as a byte */ 3196 uv = *utf; 3197 3198You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 3199value of the character; the inverse function C<uvchr_to_utf8> is available 3200for putting a UV into UTF-8: 3201 3202 if (!UVCHR_IS_INVARIANT(uv)) 3203 /* Must treat this as UTF8 */ 3204 utf8 = uvchr_to_utf8(utf8, uv); 3205 else 3206 /* OK to treat this character as a byte */ 3207 *utf8++ = uv; 3208 3209You B<must> convert characters to UVs using the above functions if 3210you're ever in a situation where you have to match UTF-8 and non-UTF-8 3211characters. You may not skip over UTF-8 characters in this case. If you 3212do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 3213for instance, if your UTF-8 string contains C<v196.172>, and you skip 3214that character, you can never match a C<chr(200)> in a non-UTF-8 string. 3215So don't do that! 3216 3217(Note that we don't have to test for invariant characters in the 3218examples above. The functions work on any well-formed UTF-8 input. 3219It's just that its faster to avoid the function overhead when it's not 3220needed.) 3221 3222=head2 How does Perl store UTF-8 strings? 3223 3224Currently, Perl deals with UTF-8 strings and non-UTF-8 strings 3225slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 3226string is internally encoded as UTF-8. Without it, the byte value is the 3227codepoint number and vice versa. This flag is only meaningful if the SV 3228is C<SvPOK> or immediately after stringification via C<SvPV> or a 3229similar macro. You can check and manipulate this flag with the 3230following macros: 3231 3232 SvUTF8(sv) 3233 SvUTF8_on(sv) 3234 SvUTF8_off(sv) 3235 3236This flag has an important effect on Perl's treatment of the string: if 3237UTF-8 data is not properly distinguished, regular expressions, 3238C<length>, C<substr> and other string handling operations will have 3239undesirable (wrong) results. 3240 3241The problem comes when you have, for instance, a string that isn't 3242flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- 3243especially when combining non-UTF-8 and UTF-8 strings. 3244 3245Never forget that the C<SVf_UTF8> flag is separate from the PV value; you 3246need to be sure you don't accidentally knock it off while you're 3247manipulating SVs. More specifically, you cannot expect to do this: 3248 3249 SV *sv; 3250 SV *nsv; 3251 STRLEN len; 3252 char *p; 3253 3254 p = SvPV(sv, len); 3255 frobnicate(p); 3256 nsv = newSVpvn(p, len); 3257 3258The C<char*> string does not tell you the whole story, and you can't 3259copy or reconstruct an SV just by copying the string value. Check if the 3260old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 3261accordingly: 3262 3263 p = SvPV(sv, len); 3264 is_utf8 = SvUTF8(sv); 3265 frobnicate(p, is_utf8); 3266 nsv = newSVpvn(p, len); 3267 if (is_utf8) 3268 SvUTF8_on(nsv); 3269 3270In the above, your C<frobnicate> function has been changed to be made 3271aware of whether or not it's dealing with UTF-8 data, so that it can 3272handle the string appropriately. 3273 3274Since just passing an SV to an XS function and copying the data of 3275the SV is not enough to copy the UTF8 flags, even less right is just 3276passing a S<C<char *>> to an XS function. 3277 3278For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the 3279string in an SV is to be I<treated> as UTF-8. This takes into account 3280if the call to the XS function is being made from within the scope of 3281L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the 3282UTF-8 string are to be exposed, rather than the character they 3283represent. But this pragma should only really be used for debugging and 3284perhaps low-level testing at the byte level. Hence most XS code need 3285not concern itself with this, but various areas of the perl core do need 3286to support it. 3287 3288And this isn't the whole story. Starting in Perl v5.12, strings that 3289aren't encoded in UTF-8 may also be treated as Unicode under various 3290conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). 3291This is only really a problem for characters whose ordinals are between 3292128 and 255, and their behavior varies under ASCII versus Unicode rules 3293in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). 3294There is no published API for dealing with this, as it is subject to 3295change, but you can look at the code for C<pp_lc> in F<pp.c> for an 3296example as to how it's currently done. 3297 3298=head2 How do I pass a Perl string to a C library? 3299 3300A Perl string, conceptually, is an opaque sequence of code points. 3301Many C libraries expect their inputs to be "classical" C strings, which are 3302arrays of octets 1-255, terminated with a NUL byte. Your job when writing 3303an interface between Perl and a C library is to define the mapping between 3304Perl and that library. 3305 3306Generally speaking, C<SvPVbyte> and related macros suit this task well. 3307These assume that your Perl string is a "byte string", i.e., is either 3308raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. 3309 3310Alternatively, if your C library expects UTF-8 text, you can use 3311C<SvPVutf8> and related macros. This has the same effect as encoding 3312to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. 3313 3314Some C libraries may expect other encodings (e.g., UTF-16LE). To give 3315Perl strings to such libraries 3316you must either do that encoding in Perl then use C<SvPVbyte>, or 3317use an intermediary C library to convert from however Perl stores the 3318string to the desired encoding. 3319 3320Take care also that NULs in your Perl string don't confuse the C 3321library. If possible, give the string's length to the C library; if that's 3322not possible, consider rejecting strings that contain NUL bytes. 3323 3324=head3 What about C<SvPV>, C<SvPV_nolen>, etc.? 3325 3326Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. 3327Perl can store these 3 characters either of two ways: 3328 3329=over 3330 3331=item * bytes: 0x64 0x78 0x8c 3332 3333=item * UTF-8: 0x64 0x78 0xc2 0x8c 3334 3335=back 3336 3337Now let's say you convert C<$foo> to a C string thus: 3338 3339 STRLEN strlen; 3340 char *str = SvPV(foo_sv, strlen); 3341 3342At this point C<str> could point to a 3-byte C string or a 4-byte one. 3343 3344Generally speaking, we want C<str> to be the same regardless of how 3345Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> 3346and C<SvPVutf8> solve that by giving predictable output: use 3347C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> 3348if it expects UTF-8. 3349 3350If your C library happens to support both encodings, then C<SvPV>--always 3351in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more 3352efficient. 3353 3354B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions 3355in your tests to ensure consistent handling regardless of Perl's 3356internal encoding. 3357 3358=head2 How do I convert a string to UTF-8? 3359 3360If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 3361the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do 3362this is: 3363 3364 sv_utf8_upgrade(sv); 3365 3366However, you must not do this, for example: 3367 3368 if (!SvUTF8(left)) 3369 sv_utf8_upgrade(left); 3370 3371If you do this in a binary operator, you will actually change one of the 3372strings that came into the operator, and, while it shouldn't be noticeable 3373by the end user, it can cause problems in deficient code. 3374 3375Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 3376string argument. This is useful for having the data available for 3377comparisons and so on, without harming the original SV. There's also 3378C<utf8_to_bytes> to go the other way, but naturally, this will fail if 3379the string contains any characters above 255 that can't be represented 3380in a single byte. 3381 3382=head2 How do I compare strings? 3383 3384L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic 3385comparison of two SV's, and handle UTF-8ness properly. Note, however, 3386that Unicode specifies a much fancier mechanism for collation, available 3387via the L<Unicode::Collate> module. 3388 3389To just compare two strings for equality/non-equality, you can just use 3390L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, 3391except the strings must be both UTF-8 or not UTF-8 encoded. 3392 3393To compare two strings case-insensitively, use 3394L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have 3395the same UTF-8ness). 3396 3397=head2 Is there anything else I need to know? 3398 3399Not really. Just remember these things: 3400 3401=over 3 3402 3403=item * 3404 3405There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 3406or not. But you can tell if an SV is to be treated as UTF-8 by calling 3407C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar 3408macro. And, you can tell if SV is actually UTF-8 (even if it is not to 3409be treated as such) by looking at its C<SvUTF8> flag (again after 3410stringifying it). Don't forget to set the flag if something should be 3411UTF-8. 3412Treat the flag as part of the PV, even though it's not -- if you pass on 3413the PV to somewhere, pass on the flag too. 3414 3415=item * 3416 3417If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 3418unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 3419 3420=item * 3421 3422When writing a character UV to a UTF-8 string, B<always> use 3423C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case 3424you can use C<*s = uv>. 3425 3426=item * 3427 3428Mixing UTF-8 and non-UTF-8 strings is 3429tricky. Use C<bytes_to_utf8> to get 3430a new string which is UTF-8 encoded, and then combine them. 3431 3432=back 3433 3434=head1 Custom Operators 3435 3436Custom operator support is an experimental feature that allows you to 3437define your own ops. This is primarily to allow the building of 3438interpreters for other languages in the Perl core, but it also allows 3439optimizations through the creation of "macro-ops" (ops which perform the 3440functions of multiple ops which are usually executed together, such as 3441C<gvsv, gvsv, add>.) 3442 3443This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 3444core does not "know" anything special about this op type, and so it will 3445not be involved in any optimizations. This also means that you can 3446define your custom ops to be any op structure -- unary, binary, list and 3447so on -- you like. 3448 3449It's important to know what custom operators won't do for you. They 3450won't let you add new syntax to Perl, directly. They won't even let you 3451add new keywords, directly. In fact, they won't change the way Perl 3452compiles a program at all. You have to do those changes yourself, after 3453Perl has compiled the program. You do this either by manipulating the op 3454tree using a C<CHECK> block and the C<B::Generate> module, or by adding 3455a custom peephole optimizer with the C<optimize> module. 3456 3457When you do this, you replace ordinary Perl ops with custom ops by 3458creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 3459PP function. This should be defined in XS code, and should look like 3460the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 3461takes the appropriate number of values from the stack, and you are 3462responsible for adding stack marks if necessary. 3463 3464You should also "register" your op with the Perl interpreter so that it 3465can produce sensible error and warning messages. Since it is possible to 3466have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 3467Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 3468it is dealing with. You should create an C<XOP> structure for each 3469ppaddr you use, set the properties of the custom op with 3470C<XopENTRY_set>, and register the structure against the ppaddr using 3471C<Perl_custom_op_register>. A trivial example might look like: 3472 3473=for apidoc Ayh||XOP 3474 3475 static XOP my_xop; 3476 static OP *my_pp(pTHX); 3477 3478 BOOT: 3479 XopENTRY_set(&my_xop, xop_name, "myxop"); 3480 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 3481 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 3482 3483The available fields in the structure are: 3484 3485=over 4 3486 3487=item xop_name 3488 3489A short name for your op. This will be included in some error messages, 3490and will also be returned as C<< $op->name >> by the L<B|B> module, so 3491it will appear in the output of module like L<B::Concise|B::Concise>. 3492 3493=item xop_desc 3494 3495A short description of the function of the op. 3496 3497=item xop_class 3498 3499Which of the various C<*OP> structures this op uses. This should be one of 3500the C<OA_*> constants from F<op.h>, namely 3501 3502=over 4 3503 3504=item OA_BASEOP 3505 3506=item OA_UNOP 3507 3508=item OA_BINOP 3509 3510=item OA_LOGOP 3511 3512=item OA_LISTOP 3513 3514=item OA_PMOP 3515 3516=item OA_SVOP 3517 3518=item OA_PADOP 3519 3520=item OA_PVOP_OR_SVOP 3521 3522This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3523the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3524 3525=item OA_LOOP 3526 3527=item OA_COP 3528 3529=back 3530 3531The other C<OA_*> constants should not be used. 3532 3533=item xop_peep 3534 3535This member is of type C<Perl_cpeep_t>, which expands to C<void 3536(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3537will be called from C<Perl_rpeep> when ops of this type are encountered 3538by the peephole optimizer. I<o> is the OP that needs optimizing; 3539I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3540 3541=for apidoc Ayh||Perl_cpeep_t 3542 3543=back 3544 3545C<B::Generate> directly supports the creation of custom ops by name. 3546 3547=head1 Stacks 3548 3549Descriptions above occasionally refer to "the stack", but there are in fact 3550many stack-like data structures within the perl interpreter. When otherwise 3551unqualified, "the stack" usually refers to the value stack. 3552 3553The various stacks have different purposes, and operate in slightly different 3554ways. Their differences are noted below. 3555 3556=head2 Value Stack 3557 3558This stack stores the values that regular perl code is operating on, usually 3559intermediate values of expressions within a statement. The stack itself is 3560formed of an array of SV pointers. 3561 3562The base of this stack is pointed to by the interpreter variable 3563C<PL_stack_base>, of type C<SV **>. 3564 3565The head of the stack is C<PL_stack_sp>, and points to the most 3566recently-pushed item. 3567 3568Items are pushed to the stack by using the C<PUSHs()> macro or its variants 3569described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed 3570versions. Note carefully that the non-C<X> versions of these macros do not 3571check the size of the stack and assume it to be big enough. These must be 3572paired with a suitable check of the stack's size, such as the C<EXTEND> macro 3573to ensure it is large enough. For example 3574 3575 EXTEND(SP, 4); 3576 mPUSHi(10); 3577 mPUSHi(20); 3578 mPUSHi(30); 3579 mPUSHi(40); 3580 3581This is slightly more performant than making four separate checks in four 3582separate C<mXPUSHi()> calls. 3583 3584As a further performance optimisation, the various C<PUSH> macros all operate 3585using a local variable C<SP>, rather than the interpreter-global variable 3586C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is 3587normally implied by XSUBs and similar so it is rare you have to consider it 3588directly. Once declared, the C<PUSH> macros will operate only on this local 3589variable, so before invoking any other perl core functions you must use the 3590C<PUTBACK> macro to return the value from the local C<SP> variable back to 3591the interpreter variable. Similarly, after calling a perl core function which 3592may have had reason to move the stack or push/pop values to it, you must use 3593the C<SPAGAIN> macro which refreshes the local C<SP> value back from the 3594interpreter one. 3595 3596Items are popped from the stack by using the C<POPs> macro or its typed 3597versions, There is also a macro C<TOPs> that inspects the topmost item without 3598removing it. 3599 3600Note specifically that SV pointers on the value stack do not contribute to the 3601overall reference count of the xVs being referred to. If newly-created xVs are 3602being pushed to the stack you must arrange for them to be destroyed at a 3603suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> 3604to mortalise the xV. 3605 3606=head2 Mark Stack 3607 3608The value stack stores individual perl scalar values as temporaries between 3609expressions. Some perl expressions operate on entire lists; for that purpose 3610we need to know where on the stack each list begins. This is the purpose of the 3611mark stack. 3612 3613The mark stack stores integers as I32 values, which are the height of the 3614value stack at the time before the list began; thus the mark itself actually 3615points to the value stack entry one before the list. The list itself starts at 3616C<mark + 1>. 3617 3618The base of this stack is pointed to by the interpreter variable 3619C<PL_markstack>, of type C<I32 *>. 3620 3621The head of the stack is C<PL_markstack_ptr>, and points to the most 3622recently-pushed item. 3623 3624Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though 3625the stack itself stores (value) stack indices as integers, the C<PUSHMARK> 3626macro should be given a stack pointer directly; it will calculate the index 3627offset by comparing to the C<PL_stack_sp> variable. Thus almost always the 3628code to perform this is 3629 3630 PUSHMARK(SP); 3631 3632Items are popped from the stack by the C<POPMARK> macro. There is also a macro 3633C<TOPMARK> that inspects the topmost item without removing it. These macros 3634return I32 index values directly. There is also the C<dMARK> macro which 3635declares a new SV double-pointer variable, called C<mark>, which points at the 3636marked stack slot; this is the usual macro that C code will use when operating 3637on lists given on the stack. 3638 3639As noted above, the C<mark> variable itself will point at the most recently 3640pushed value on the value stack before the list begins, and so the list itself 3641starts at C<mark + 1>. The values of the list may be iterated by code such as 3642 3643 for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { 3644 SV *item = *svp; 3645 ... 3646 } 3647 3648Note specifically in the case that the list is already empty, C<mark> will 3649equal C<PL_stack_sp>. 3650 3651Because the C<mark> variable is converted to a pointer on the value stack, 3652extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are 3653invoked within the function, because the stack may need to be moved to 3654extend it and so the existing pointer will now be invalid. If this may be a 3655problem, a possible solution is to track the mark offset as an integer and 3656track the mark itself later on after the stack had been moved. 3657 3658 I32 markoff = POPMARK; 3659 3660 ... 3661 3662 SP **mark = PL_stack_base + markoff; 3663 3664=head2 Temporaries Stack 3665 3666As noted above, xV references on the main value stack do not contribute to the 3667reference count of an xV, and so another mechanism is used to track when 3668temporary values which live on the stack must be released. This is the job of 3669the temporaries stack. 3670 3671The temporaries stack stores pointers to xVs whose reference counts will be 3672decremented soon. 3673 3674The base of this stack is pointed to by the interpreter variable 3675C<PL_tmps_stack>, of type C<SV **>. 3676 3677The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the 3678index in the array of the most recently-pushed item. 3679 3680There is no public API to directly push items to the temporaries stack. Instead, 3681the API function C<sv_2mortal()> is used to mortalize an xV, adding its 3682address to the temporaries stack. 3683 3684Likewise, there is no public API to read values from the temporaries stack. 3685Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> 3686macro establishes the base levels of the temporaries stack, by capturing the 3687current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous 3688value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of 3689the temporaries that have been pushed since that level are reclaimed. 3690 3691While it is common to see these two macros in pairs within an C<ENTER>/ 3692C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke 3693C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a 3694loop iterating over elements of a list. While you can invoke C<SAVETMPS> 3695multiple times within a scope pair, it is unlikely to be useful. Subsequent 3696invocations will move the temporaries floor further up, thus effectively 3697trapping the existing temporaries to only be released at the end of the scope. 3698 3699=head2 Save Stack 3700 3701The save stack is used by perl to implement the C<local> keyword and other 3702similar behaviours; any cleanup operations that need to be performed when 3703leaving the current scope. Items pushed to this stack generally capture the 3704current value of some internal variable or state, which will be restored when 3705the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other 3706reasons. 3707 3708Whereas other perl internal stacks store individual items all of the same type 3709(usually SV pointers or integers), the items pushed to the save stack are 3710formed of many different types, having multiple fields to them. For example, 3711the C<SAVEt_INT> type needs to store both the address of the C<int> variable 3712to restore, and the value to restore it to. This information could have been 3713stored using fields of a C<struct>, but would have to be large enough to store 3714three pointers in the largest case, which would waste a lot of space in most 3715of the smaller cases. 3716 3717Instead, the stack stores information in a variable-length encoding of C<ANY> 3718structures. The final value pushed is stored in the C<UV> field which encodes 3719the kind of item held by the preceding items; the count and types of which 3720will depend on what kind of item is being stored. The kind field is pushed 3721last because that will be the first field to be popped when unwinding items 3722from the stack. 3723 3724The base of this stack is pointed to by the interpreter variable 3725C<PL_savestack>, of type C<ANY *>. 3726 3727The head of the stack is indexed by C<PL_savestack_ix>, an integer which 3728stores the index in the array at which the next item should be pushed. (Note 3729that this is different to most other stacks, which reference the most 3730recently-pushed item). 3731 3732Items are pushed to the save stack by using the various C<SAVE...()> macros. 3733Many of these macros take a variable and store both its address and current 3734value on the save stack, ensuring that value gets restored on scope exit. 3735 3736 SAVEI8(i8) 3737 SAVEI16(i16) 3738 SAVEI32(i32) 3739 SAVEINT(i) 3740 ... 3741 3742There are also a variety of other special-purpose macros which save particular 3743types or values of interest. C<SAVETMPS> has already been mentioned above. 3744Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to 3745be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to 3746be invoked on scope exit. A full list of such macros can be found in 3747F<scope.h>. 3748 3749There is no public API for popping individual values or items from the save 3750stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way 3751to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will 3752restore all of the saved values that had been pushed since the most recent 3753C<ENTER>. 3754 3755=head2 Scope Stack 3756 3757As with the mark stack to the value stack, the scope stack forms a pair with 3758the save stack. The scope stack stores the height of the save stack at which 3759nested scopes begin, and allows the save stack to be unwound back to that 3760point when the scope is left. 3761 3762When perl is built with debugging enabled, there is a second part to this 3763stack storing human-readable string names describing the type of stack 3764context. Each push operation saves the name as well as the height of the save 3765stack, and each pop operation checks the topmost name with what is expected, 3766causing an assertion failure if the name does not match. 3767 3768The base of this stack is pointed to by the interpreter variable 3769C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are 3770stored in a separate array pointed to by C<PL_scopestack_name>, of type 3771C<const char **>. 3772 3773The head of the stack is indexed by C<PL_scopestack_ix>, an integer which 3774stores the index of the array or arrays at which the next item should be 3775pushed. (Note that this is different to most other stacks, which reference the 3776most recently-pushed item). 3777 3778Values are pushed to the scope stack using the C<ENTER> macro, which begins a 3779new nested scope. Any items pushed to the save stack are then restored at the 3780next nested invocation of the C<LEAVE> macro. 3781 3782=head1 Dynamic Scope and the Context Stack 3783 3784B<Note:> this section describes a non-public internal API that is subject 3785to change without notice. 3786 3787=head2 Introduction to the context stack 3788 3789In Perl, dynamic scoping refers to the runtime nesting of things like 3790subroutine calls, evals etc, as well as the entering and exiting of block 3791scopes. For example, the restoring of a C<local>ised variable is 3792determined by the dynamic scope. 3793 3794Perl tracks the dynamic scope by a data structure called the context 3795stack, which is an array of C<PERL_CONTEXT> structures, and which is 3796itself a big union for all the types of context. Whenever a new scope is 3797entered (such as a block, a C<for> loop, or a subroutine call), a new 3798context entry is pushed onto the stack. Similarly when leaving a block or 3799returning from a subroutine call etc. a context is popped. Since the 3800context stack represents the current dynamic scope, it can be searched. 3801For example, C<next LABEL> searches back through the stack looking for a 3802loop context that matches the label; C<return> pops contexts until it 3803finds a sub or eval context or similar; C<caller> examines sub contexts on 3804the stack. 3805 3806Each context entry is labelled with a context type, C<cx_type>. Typical 3807context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> 3808and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) 3809and a sort block. The type determines which part of the context union are 3810valid. 3811 3812The main division in the context struct is between a substitution scope 3813(C<CXt_SUBST>) and block scopes, which are everything else. The former is 3814just used while executing C<s///e>, and won't be discussed further 3815here. 3816 3817All the block scope types share a common base, which corresponds to 3818C<CXt_BLOCK>. This stores the old values of various scope-related 3819variables like C<PL_curpm>, as well as information about the current 3820scope, such as C<gimme>. On scope exit, the old variables are restored. 3821 3822Particular block scope types store extra per-type information. For 3823example, C<CXt_SUB> stores the currently executing CV, while the various 3824for loop types might hold the original loop variable SV. On scope exit, 3825the per-type data is processed; for example the CV has its reference count 3826decremented, and the original loop variable is restored. 3827 3828The macro C<cxstack> returns the base of the current context stack, while 3829C<cxstack_ix> is the index of the current frame within that stack. 3830 3831In fact, the context stack is actually part of a stack-of-stacks system; 3832whenever something unusual is done such as calling a C<DESTROY> or tie 3833handler, a new stack is pushed, then popped at the end. 3834 3835Note that the API described here changed considerably in perl 5.24; prior 3836to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 3837they were replaced by the inline static functions described below. In 3838addition, the ordering and detail of how these macros/function work 3839changed in many ways, often subtly. In particular they didn't handle 3840saving the savestack and temps stack positions, and required additional 3841C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The 3842old-style macros will not be described further. 3843 3844 3845=head2 Pushing contexts 3846 3847For pushing a new context, the two basic functions are 3848C<cx = cx_pushblock()>, which pushes a new basic context block and returns 3849its address, and a family of similar functions with names like 3850C<cx_pushsub(cx)> which populate the additional type-dependent fields in 3851the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their 3852own push functions, as they don't store any data beyond that pushed by 3853C<cx_pushblock>. 3854 3855The fields of the context struct and the arguments to the C<cx_*> 3856functions are subject to change between perl releases, representing 3857whatever is convenient or efficient for that release. 3858 3859A typical context stack pushing can be found in C<pp_entersub>; the 3860following shows a simplified and stripped-down example of a non-XS call, 3861along with comments showing roughly what each function does. 3862 3863 dMARK; 3864 U8 gimme = GIMME_V; 3865 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); 3866 OP *retop = PL_op->op_next; 3867 I32 old_ss_ix = PL_savestack_ix; 3868 CV *cv = ....; 3869 3870 /* ... make mortal copies of stack args which are PADTMPs here ... */ 3871 3872 /* ... do any additional savestack pushes here ... */ 3873 3874 /* Now push a new context entry of type 'CXt_SUB'; initially just 3875 * doing the actions common to all block types: */ 3876 3877 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); 3878 3879 /* this does (approximately): 3880 CXINC; /* cxstack_ix++ (grow if necessary) */ 3881 cx = CX_CUR(); /* and get the address of new frame */ 3882 cx->cx_type = CXt_SUB; 3883 cx->blk_gimme = gimme; 3884 cx->blk_oldsp = MARK - PL_stack_base; 3885 cx->blk_oldsaveix = old_ss_ix; 3886 cx->blk_oldcop = PL_curcop; 3887 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; 3888 cx->blk_oldscopesp = PL_scopestack_ix; 3889 cx->blk_oldpm = PL_curpm; 3890 cx->blk_old_tmpsfloor = PL_tmps_floor; 3891 3892 PL_tmps_floor = PL_tmps_ix; 3893 */ 3894 3895 3896 /* then update the new context frame with subroutine-specific info, 3897 * such as the CV about to be executed: */ 3898 3899 cx_pushsub(cx, cv, retop, hasargs); 3900 3901 /* this does (approximately): 3902 cx->blk_sub.cv = cv; 3903 cx->blk_sub.olddepth = CvDEPTH(cv); 3904 cx->blk_sub.prevcomppad = PL_comppad; 3905 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; 3906 cx->blk_sub.retop = retop; 3907 SvREFCNT_inc_simple_void_NN(cv); 3908 */ 3909 3910Note that C<cx_pushblock()> sets two new floors: for the args stack (to 3911C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this 3912scope level, every C<nextstate> (amongst others) will reset the args and 3913tmps stack levels to these floors. Note that since C<cx_pushblock> uses 3914the current value of C<PL_tmps_ix> rather than it being passed as an arg, 3915this dictates at what point C<cx_pushblock> should be called. In 3916particular, any new mortals which should be freed only on scope exit 3917(rather than at the next C<nextstate>) should be created first. 3918 3919Most callers of C<cx_pushblock> simply set the new args stack floor to the 3920top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the 3921items being iterated over on the stack, and so sets C<blk_oldsp> to the 3922top of these items instead. Note that, contrary to its name, C<blk_oldsp> 3923doesn't always represent the value to restore C<PL_stack_sp> to on scope 3924exit. 3925 3926Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is 3927later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, 3928this is because, although most values needing saving are stored in fields 3929of the context struct, an extra value needs saving only when the debugger 3930is running, and it doesn't make sense to bloat the struct for this rare 3931case. So instead it is saved on the savestack. Since this value gets 3932calculated and saved before the context is pushed, it is necessary to pass 3933the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the 3934saved value gets freed during scope exit. For most users of 3935C<cx_pushblock>, where nothing needs pushing on the save stack, 3936C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. 3937 3938Note that where possible, values should be saved in the context struct 3939rather than on the save stack; it's much faster that way. 3940 3941Normally C<cx_pushblock> should be immediately followed by the appropriate 3942C<cx_pushfoo>, with nothing between them; this is because if code 3943in-between could die (e.g. a warning upgraded to fatal), then the context 3944stack unwinding code in C<dounwind> would see (in the example above) a 3945C<CXt_SUB> context frame, but without all the subroutine-specific fields 3946set, and crashes would soon ensue. 3947 3948Where the two must be separate, initially set the type to C<CXt_NULL> or 3949C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the 3950C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's 3951determined which type of loop it's pushing. 3952 3953=head2 Popping contexts 3954 3955Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note 3956however, that unlike C<cx_pushblock>, neither of these functions actually 3957decrement the current context stack index; this is done separately using 3958C<CX_POP()>. 3959 3960There are two main ways that contexts are popped. During normal execution 3961as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and 3962C<pp_leavesub> process and pop just one context using C<cx_popfoo> and 3963C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> 3964may have to pop back several scopes until a sub or loop context is found, 3965and exceptions (such as C<die>) need to pop back contexts until an eval 3966context is found. Both of these are accomplished by C<dounwind()>, which 3967is capable of processing and popping all contexts above the target one. 3968 3969Here is a typical example of context popping, as found in C<pp_leavesub> 3970(simplified slightly): 3971 3972 U8 gimme; 3973 PERL_CONTEXT *cx; 3974 SV **oldsp; 3975 OP *retop; 3976 3977 cx = CX_CUR(); 3978 3979 gimme = cx->blk_gimme; 3980 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ 3981 3982 if (gimme == G_VOID) 3983 PL_stack_sp = oldsp; 3984 else 3985 leave_adjust_stacks(oldsp, oldsp, gimme, 0); 3986 3987 CX_LEAVE_SCOPE(cx); 3988 cx_popsub(cx); 3989 cx_popblock(cx); 3990 retop = cx->blk_sub.retop; 3991 CX_POP(cx); 3992 3993 return retop; 3994 3995The steps above are in a very specific order, designed to be the reverse 3996order of when the context was pushed. The first thing to do is to copy 3997and/or protect any return arguments and free any temps in the current 3998scope. Scope exits like an rvalue sub normally return a mortal copy of 3999their return args (as opposed to lvalue subs). It is important to make 4000this copy before the save stack is popped or variables are restored, or 4001bad things like the following can happen: 4002 4003 sub f { my $x =...; $x } # $x freed before we get to copy it 4004 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied 4005 4006Although we wish to free any temps at the same time, we have to be careful 4007not to free any temps which are keeping return args alive; nor to free the 4008temps we have just created while mortal copying return args. Fortunately, 4009C<leave_adjust_stacks()> is capable of making mortal copies of return args, 4010shifting args down the stack, and only processing those entries on the 4011temps stack that are safe to do so. 4012 4013In void context no args are returned, so it's more efficient to skip 4014calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op 4015is likely to be imminently called which will do a C<FREETMPS>, so there's 4016no need to do that either. 4017 4018The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just 4019defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the 4020popping, it's possible for perl to call destructors, call C<STORE> to undo 4021localisations of tied vars, and so on. Any of these can die or call 4022C<exit()>. In this case, C<dounwind()> will be called, and the current 4023context stack frame will be re-processed. Thus it is vital that all steps 4024in popping a context are done in such a way to support reentrancy. The 4025other alternative, of decrementing C<cxstack_ix> I<before> processing the 4026frame, would lead to leaks and the like if something died halfway through, 4027or overwriting of the current frame. 4028 4029C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack 4030items have been popped before dying and getting trapped by eval, then the 4031C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where 4032the first one left off. 4033 4034The next step is the type-specific context processing; in this case 4035C<cx_popsub>. In part, this looks like: 4036 4037 cv = cx->blk_sub.cv; 4038 CvDEPTH(cv) = cx->blk_sub.olddepth; 4039 cx->blk_sub.cv = NULL; 4040 SvREFCNT_dec(cv); 4041 4042where its processing the just-executed CV. Note that before it decrements 4043the CV's reference count, it nulls the C<blk_sub.cv>. This means that if 4044it re-enters, the CV won't be freed twice. It also means that you can't 4045rely on such type-specific fields having useful values after the return 4046from C<cx_popfoo>. 4047 4048Next, C<cx_popblock> restores all the various interpreter vars to their 4049previous values or previous high water marks; it expands to: 4050 4051 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; 4052 PL_scopestack_ix = cx->blk_oldscopesp; 4053 PL_curpm = cx->blk_oldpm; 4054 PL_curcop = cx->blk_oldcop; 4055 PL_tmps_floor = cx->blk_old_tmpsfloor; 4056 4057Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, 4058which value to restore it to depends on the context type (specifically 4059C<for (list) {}>), and what args (if any) it returns; and that will 4060already have been sorted out earlier by C<leave_adjust_stacks()>. 4061 4062Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. 4063After this point, it's possible that that the current context frame could 4064be overwritten by other contexts being pushed. Although things like ties 4065and C<DESTROY> are supposed to work within a new context stack, it's best 4066not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately 4067sets C<cx> to null to detect code that is still relying on the field 4068values in that context frame. Note in the C<pp_leavesub()> example above, 4069we grab C<blk_sub.retop> I<before> calling C<CX_POP>. 4070 4071=head2 Redoing contexts 4072 4073Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> 4074as regards to resetting various vars to their base values. It is used in 4075places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than 4076exiting a scope, we want to re-initialise the scope. As well as resetting 4077C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, 4078C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a 4079C<FREETMPS>. 4080 4081 4082=head1 Slab-based operator allocation 4083 4084B<Note:> this section describes a non-public internal API that is subject 4085to change without notice. 4086 4087Perl's internal error-handling mechanisms implement C<die> (and its internal 4088equivalents) using longjmp. If this occurs during lexing, parsing or 4089compilation, we must ensure that any ops allocated as part of the compilation 4090process are freed. (Older Perl versions did not adequately handle this 4091situation: when failing a parse, they would leak ops that were stored in 4092C C<auto> variables and not linked anywhere else.) 4093 4094To handle this situation, Perl uses I<op slabs> that are attached to the 4095currently-compiling CV. A slab is a chunk of allocated memory. New ops are 4096allocated as regions of the slab. If the slab fills up, a new one is created 4097(and linked from the previous one). When an error occurs and the CV is freed, 4098any ops remaining are freed. 4099 4100Each op is preceded by two pointers: one points to the next op in the slab, and 4101the other points to the slab that owns it. The next-op pointer is needed so 4102that Perl can iterate over a slab and free all its ops. (Op structures are of 4103different sizes, so the slab's ops can't merely be treated as a dense array.) 4104The slab pointer is needed for accessing a reference count on the slab: when 4105the last op on a slab is freed, the slab itself is freed. 4106 4107The slab allocator puts the ops at the end of the slab first. This will tend to 4108allocate the leaves of the op tree first, and the layout will therefore 4109hopefully be cache-friendly. In addition, this means that there's no need to 4110store the size of the slab (see below on why slabs vary in size), because Perl 4111can follow pointers to find the last op. 4112 4113It might seem possible to eliminate slab reference counts altogether, by having 4114all ops implicitly attached to C<PL_compcv> when allocated and freed when the 4115CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and 4116thus free ops faster. But that doesn't work in those cases where ops need to 4117survive beyond their CVs, such as re-evals. 4118 4119The CV also has to have a reference count on the slab. Sometimes the first op 4120created is immediately freed. If the reference count of the slab reaches 0, 4121then it will be freed with the CV still pointing to it. 4122 4123CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count 4124on the slab. When this flag is set, the slab is accessible via C<CvSTART> when 4125C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from 4126C<CvROOT> when it is set. The alternative to this approach of sneaking the slab 4127into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by 4128another pointer. But that would make all CVs larger, even though slab-based op 4129freeing is typically of benefit only for programs that make significant use of 4130string eval. 4131 4132When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing 4133the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is 4134assumed that a compilation error has occurred, so the op slab is traversed and 4135all the ops are freed. 4136 4137Under normal circumstances, the CV forgets about its slab (decrementing the 4138reference count) when the root is attached. So the slab reference counting that 4139happens when ops are freed takes care of freeing the slab. In some cases, the 4140CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the 4141ops can survive after the CV is done away with. 4142 4143Forgetting the slab when the root is attached is not strictly necessary, but 4144avoids potential problems with C<CvROOT> being written over. There is code all 4145over the place, both in core and on CPAN, that does things with C<CvROOT>, so 4146forgetting the slab makes things more robust and avoids potential problems. 4147 4148Since the CV takes ownership of its slab when flagged, that flag is never 4149copied when a CV is cloned, as one CV could free a slab that another CV still 4150points to, since forced freeing of ops ignores the reference count (but asserts 4151that it looks right). 4152 4153To avoid slab fragmentation, freed ops are marked as freed and attached to the 4154slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused 4155when possible. Not reusing freed ops would be simpler, but it would result in 4156significantly higher memory usage for programs with large C<if (DEBUG) {...}> 4157blocks. 4158 4159C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause 4160an op to be freed after its CV. If the CV has forcibly freed the ops on its 4161slab and the slab itself, then we will be fiddling with a freed slab. Making 4162C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when 4163there is no compilation error, so the op would never be freed. It holds 4164a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> 4165now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of 4166ops after a compilation error won't free any ops thus marked. 4167 4168Since many pieces of code create tiny subroutines consisting of only a few ops, 4169and since a huge slab would be quite a bit of baggage for those to carry 4170around, the first slab is always very small. To avoid allocating too many 4171slabs for a single CV, each subsequent slab is twice the size of the previous. 4172 4173Smartmatch expects to be able to allocate an op at run time, run it, and then 4174throw it away. For that to work the op is simply malloced when PL_compcv hasn't 4175been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), 4176to distinguish them from malloced ops. 4177 4178 4179=head1 AUTHORS 4180 4181Until May 1997, this document was maintained by Jeff Okamoto 4182E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 4183itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 4184 4185With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 4186Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 4187Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 4188Stephen McCamant, and Gurusamy Sarathy. 4189 4190=head1 SEE ALSO 4191 4192L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 4193