1=for comment 2The part of this file between =for mg_vtable.pl markers is auto 3generated by mg_vtable.pl; any changes there need to be made instead to 4mg_vtable.pl 5 6=head1 NAME 7 8perlguts - Introduction to the Perl API 9 10=head1 DESCRIPTION 11 12This document attempts to describe how to use the Perl API, as well as 13to provide some info on the basic workings of the Perl core. It is far 14from complete and probably contains many errors. Please refer any 15questions or comments to the author below. 16 17=head1 Variables 18 19=head2 Datatypes 20 21Perl has three typedefs that handle Perl's three main data types: 22 23 SV Scalar Value 24 AV Array Value 25 HV Hash Value 26 27Each typedef has specific routines that manipulate the various data types. 28 29=for apidoc_section $AV 30=for apidoc Ayh||AV 31=for apidoc_section $HV 32=for apidoc Ayh||HV 33=for apidoc_section $SV 34=for apidoc Ayh||SV 35 36=head2 What is an "IV"? 37 38Perl uses a special typedef IV which is a simple signed integer type that is 39guaranteed to be large enough to hold a pointer (as well as an integer). 40Additionally, there is the UV, which is simply an unsigned IV. 41 42Perl also uses several special typedefs to declare variables to hold 43integers of (at least) a given size. 44Use I8, I16, I32, and I64 to declare a signed integer variable which has 45at least as many bits as the number in its name. These all evaluate to 46the native C type that is closest to the given number of bits, but no 47smaller than that number. For example, on many platforms, a C<short> is 4816 bits long, and if so, I16 will evaluate to a C<short>. But on 49platforms where a C<short> isn't exactly 16 bits, Perl will use the 50smallest type that contains 16 bits or more. 51 52U8, U16, U32, and U64 are to declare the corresponding unsigned integer 53types. 54 55If the platform doesn't support 64-bit integers, both I64 and U64 will 56be undefined. Use IV and UV to declare the largest practicable, and 57C<L<perlapi/WIDEST_UTYPE>> for the absolute maximum unsigned, but which 58may not be usable in all circumstances. 59 60A numeric constant can be specified with L<perlapi/C<INT16_C>>, 61L<perlapi/C<UINTMAX_C>>, and similar. 62 63=for apidoc_section $integer 64=for apidoc Ayh ||IV 65=for apidoc_item ||I8 66=for apidoc_item ||I16 67=for apidoc_item ||I32 68=for apidoc_item ||I64 69 70=for apidoc Ayh ||UV 71=for apidoc_item ||U8 72=for apidoc_item ||U16 73=for apidoc_item ||U32 74=for apidoc_item ||U64 75 76=head2 Working with SVs 77 78An SV can be created and loaded with one command. There are five types of 79values that can be loaded: an integer value (IV), an unsigned integer 80value (UV), a double (NV), a string (PV), and another scalar (SV). 81("PV" stands for "Pointer Value". You might think that it is misnamed 82because it is described as pointing only to strings. However, it is 83possible to have it point to other things. For example, it could point 84to an array of UVs. But, 85using it for non-strings requires care, as the underlying assumption of 86much of the internals is that PVs are just for strings. Often, for 87example, a trailing C<NUL> is tacked on automatically. The non-string use 88is documented only in this paragraph.) 89 90=for apidoc_section $floating 91=for apidoc Ayh||NV 92 93The seven routines are: 94 95 SV* newSViv(IV); 96 SV* newSVuv(UV); 97 SV* newSVnv(double); 98 SV* newSVpv(const char*, STRLEN); 99 SV* newSVpvn(const char*, STRLEN); 100 SV* newSVpvf(const char*, ...); 101 SV* newSVsv(SV*); 102 103C<STRLEN> is an integer type (C<Size_t>, usually defined as C<size_t> in 104F<config.h>) guaranteed to be large enough to represent the size of 105any string that perl can handle. 106 107=for apidoc_section $string 108=for apidoc Ayh||STRLEN 109 110In the unlikely case of a SV requiring more complex initialization, you 111can create an empty SV with newSV(len). If C<len> is 0 an empty SV of 112type NULL is returned, else an SV of type PV is returned with len + 1 (for 113the C<NUL>) bytes of storage allocated, accessible via SvPVX. In both cases 114the SV has the undef value. 115 116 SV *sv = newSV(0); /* no storage allocated */ 117 SV *sv = newSV(10); /* 10 (+1) bytes of uninitialised storage 118 * allocated */ 119 120To change the value of an I<already-existing> SV, there are eight routines: 121 122 void sv_setiv(SV*, IV); 123 void sv_setuv(SV*, UV); 124 void sv_setnv(SV*, double); 125 void sv_setpv(SV*, const char*); 126 void sv_setpvn(SV*, const char*, STRLEN) 127 void sv_setpvf(SV*, const char*, ...); 128 void sv_vsetpvfn(SV*, const char*, STRLEN, va_list *, 129 SV **, Size_t, bool *); 130 void sv_setsv(SV*, SV*); 131 132Notice that you can choose to specify the length of the string to be 133assigned by using C<sv_setpvn>, C<newSVpvn>, or C<newSVpv>, or you may 134allow Perl to calculate the length by using C<sv_setpv> or by specifying 1350 as the second argument to C<newSVpv>. Be warned, though, that Perl will 136determine the string's length by using C<strlen>, which depends on the 137string terminating with a C<NUL> character, and not otherwise containing 138NULs. 139 140The arguments of C<sv_setpvf> are processed like C<sprintf>, and the 141formatted output becomes the value. 142 143C<sv_vsetpvfn> is an analogue of C<vsprintf>, but it allows you to specify 144either a pointer to a variable argument list or the address and length of 145an array of SVs. The last argument points to a boolean; on return, if that 146boolean is true, then locale-specific information has been used to format 147the string, and the string's contents are therefore untrustworthy (see 148L<perlsec>). This pointer may be NULL if that information is not 149important. Note that this function requires you to specify the length of 150the format. 151 152The C<sv_set*()> functions are not generic enough to operate on values 153that have "magic". See L</Magic Virtual Tables> later in this document. 154 155All SVs that contain strings should be terminated with a C<NUL> character. 156If it is not C<NUL>-terminated there is a risk of 157core dumps and corruptions from code which passes the string to C 158functions or system calls which expect a C<NUL>-terminated string. 159Perl's own functions typically add a trailing C<NUL> for this reason. 160Nevertheless, you should be very careful when you pass a string stored 161in an SV to a C function or system call. 162 163To access the actual value that an SV points to, Perl's API exposes 164several macros that coerce the actual scalar type into an IV, UV, double, 165or string: 166 167=over 168 169=item * C<SvIV(SV*)> (C<IV>) and C<SvUV(SV*)> (C<UV>) 170 171=item * C<SvNV(SV*)> (C<double>) 172 173=item * Strings are a bit complicated: 174 175=over 176 177=item * Byte string: C<SvPVbyte(SV*, STRLEN len)> or C<SvPVbyte_nolen(SV*)> 178 179If the Perl string is C<"\xff\xff">, then this returns a 2-byte C<char*>. 180 181This is suitable for Perl strings that represent bytes. 182 183=item * UTF-8 string: C<SvPVutf8(SV*, STRLEN len)> or C<SvPVutf8_nolen(SV*)> 184 185If the Perl string is C<"\xff\xff">, then this returns a 4-byte C<char*>. 186 187This is suitable for Perl strings that represent characters. 188 189B<CAVEAT>: That C<char*> will be encoded via Perl's internal UTF-8 variant, 190which means that if the SV contains non-Unicode code points (e.g., 1910x110000), then the result may contain extensions over valid UTF-8. 192See L<perlapi/is_strict_utf8_string> for some methods Perl gives 193you to check the UTF-8 validity of these macros' returns. 194 195=item * You can also use C<SvPV(SV*, STRLEN len)> or C<SvPV_nolen(SV*)> 196to fetch the SV's raw internal buffer. This is tricky, though; if your Perl 197string 198is C<"\xff\xff">, then depending on the SV's internal encoding you might get 199back a 2-byte B<OR> a 4-byte C<char*>. 200Moreover, if it's the 4-byte string, that could come from either Perl 201C<"\xff\xff"> stored UTF-8 encoded, or Perl C<"\xc3\xbf\xc3\xbf"> stored 202as raw octets. To differentiate between these you B<MUST> look up the 203SV's UTF8 bit (cf. C<SvUTF8>) to know whether the source Perl string 204is 2 characters (C<SvUTF8> would be on) or 4 characters (C<SvUTF8> would be 205off). 206 207B<IMPORTANT:> Use of C<SvPV>, C<SvPV_nolen>, or 208similarly-named macros I<without> looking up the SV's UTF8 bit is 209almost certainly a bug if non-ASCII input is allowed. 210 211When the UTF8 bit is on, the same B<CAVEAT> about UTF-8 validity applies 212here as for C<SvPVutf8>. 213 214=back 215 216(See L</How do I pass a Perl string to a C library?> for more details.) 217 218In C<SvPVbyte>, C<SvPVutf8>, and C<SvPV>, the length of the C<char*> returned 219is placed into the 220variable C<len> (these are macros, so you do I<not> use C<&len>). If you do 221not care what the length of the data is, use C<SvPVbyte_nolen>, 222C<SvPVutf8_nolen>, or C<SvPV_nolen> instead. 223The global variable C<PL_na> can also be given to 224C<SvPVbyte>/C<SvPVutf8>/C<SvPV> 225in this case. But that can be quite inefficient because C<PL_na> must 226be accessed in thread-local storage in threaded Perl. In any case, remember 227that Perl allows arbitrary strings of data that may both contain NULs and 228might not be terminated by a C<NUL>. 229 230Also remember that C doesn't allow you to safely say C<foo(SvPVbyte(s, len), 231len);>. It might work with your 232compiler, but it won't work for everyone. 233Break this sort of statement up into separate assignments: 234 235 SV *s; 236 STRLEN len; 237 char *ptr; 238 ptr = SvPVbyte(s, len); 239 foo(ptr, len); 240 241=back 242 243If you want to know if the scalar value is TRUE, you can use: 244 245 SvTRUE(SV*) 246 247Although Perl will automatically grow strings for you, if you need to force 248Perl to allocate more memory for your SV, you can use the macro 249 250 SvGROW(SV*, STRLEN newlen) 251 252which will determine if more memory needs to be allocated. If so, it will 253call the function C<sv_grow>. Note that C<SvGROW> can only increase, not 254decrease, the allocated memory of an SV and that it does not automatically 255add space for the trailing C<NUL> byte (perl's own string functions typically do 256C<SvGROW(sv, len + 1)>). 257 258If you want to write to an existing SV's buffer and set its value to a 259string, use SvPVbyte_force() or one of its variants to force the SV to be 260a PV. This will remove any of various types of non-stringness from 261the SV while preserving the content of the SV in the PV. This can be 262used, for example, to append data from an API function to a buffer 263without extra copying: 264 265 (void)SvPVbyte_force(sv, len); 266 s = SvGROW(sv, len + needlen + 1); 267 /* something that modifies up to needlen bytes at s+len, but 268 modifies newlen bytes 269 eg. newlen = read(fd, s + len, needlen); 270 ignoring errors for these examples 271 */ 272 s[len + newlen] = '\0'; 273 SvCUR_set(sv, len + newlen); 274 SvUTF8_off(sv); 275 SvSETMAGIC(sv); 276 277If you already have the data in memory or if you want to keep your 278code simple, you can use one of the sv_cat*() variants, such as 279sv_catpvn(). If you want to insert anywhere in the string you can use 280sv_insert() or sv_insert_flags(). 281 282If you don't need the existing content of the SV, you can avoid some 283copying with: 284 285 SvPVCLEAR(sv); 286 s = SvGROW(sv, needlen + 1); 287 /* something that modifies up to needlen bytes at s, but modifies 288 newlen bytes 289 eg. newlen = read(fd, s, needlen); 290 */ 291 s[newlen] = '\0'; 292 SvCUR_set(sv, newlen); 293 SvPOK_only(sv); /* also clears SVf_UTF8 */ 294 SvSETMAGIC(sv); 295 296Again, if you already have the data in memory or want to avoid the 297complexity of the above, you can use sv_setpvn(). 298 299If you have a buffer allocated with Newx() and want to set that as the 300SV's value, you can use sv_usepvn_flags(). That has some requirements 301if you want to avoid perl re-allocating the buffer to fit the trailing 302NUL: 303 304 Newx(buf, somesize+1, char); 305 /* ... fill in buf ... */ 306 buf[somesize] = '\0'; 307 sv_usepvn_flags(sv, buf, somesize, SV_SMAGIC | SV_HAS_TRAILING_NUL); 308 /* buf now belongs to perl, don't release it */ 309 310If you have an SV and want to know what kind of data Perl thinks is stored 311in it, you can use the following macros to check the type of SV you have. 312 313 SvIOK(SV*) 314 SvNOK(SV*) 315 SvPOK(SV*) 316 317Be aware that retrieving the numeric value of an SV can set IOK or NOK 318on that SV, even when the SV started as a string. Prior to Perl 3195.36.0 retrieving the string value of an integer could set POK, but 320this can no longer occur. From 5.36.0 this can be used to distinguish 321the original representation of an SV and is intended to make life 322simpler for serializers: 323 324 /* references handled elsewhere */ 325 if (SvIsBOOL(sv)) { 326 /* originally boolean */ 327 ... 328 } 329 else if (SvPOK(sv)) { 330 /* originally a string */ 331 ... 332 } 333 else if (SvNIOK(sv)) { 334 /* originally numeric */ 335 ... 336 } 337 else { 338 /* something special or undef */ 339 } 340 341You can get and set the current length of the string stored in an SV with 342the following macros: 343 344 SvCUR(SV*) 345 SvCUR_set(SV*, I32 val) 346 347You can also get a pointer to the end of the string stored in the SV 348with the macro: 349 350 SvEND(SV*) 351 352But note that these last three macros are valid only if C<SvPOK()> is true. 353 354If you want to append something to the end of string stored in an C<SV*>, 355you can use the following functions: 356 357 void sv_catpv(SV*, const char*); 358 void sv_catpvn(SV*, const char*, STRLEN); 359 void sv_catpvf(SV*, const char*, ...); 360 void sv_vcatpvfn(SV*, const char*, STRLEN, va_list *, SV **, 361 I32, bool); 362 void sv_catsv(SV*, SV*); 363 364The first function calculates the length of the string to be appended by 365using C<strlen>. In the second, you specify the length of the string 366yourself. The third function processes its arguments like C<sprintf> and 367appends the formatted output. The fourth function works like C<vsprintf>. 368You can specify the address and length of an array of SVs instead of the 369va_list argument. The fifth function 370extends the string stored in the first 371SV with the string stored in the second SV. It also forces the second SV 372to be interpreted as a string. 373 374The C<sv_cat*()> functions are not generic enough to operate on values that 375have "magic". See L</Magic Virtual Tables> later in this document. 376 377If you know the name of a scalar variable, you can get a pointer to its SV 378by using the following: 379 380 SV* get_sv("package::varname", 0); 381 382This returns NULL if the variable does not exist. 383 384If you want to know if this variable (or any other SV) is actually C<defined>, 385you can call: 386 387 SvOK(SV*) 388 389The scalar C<undef> value is stored in an SV instance called C<PL_sv_undef>. 390 391Its address can be used whenever an C<SV*> is needed. Make sure that 392you don't try to compare a random sv with C<&PL_sv_undef>. For example 393when interfacing Perl code, it'll work correctly for: 394 395 foo(undef); 396 397But won't work when called as: 398 399 $x = undef; 400 foo($x); 401 402So to repeat always use SvOK() to check whether an sv is defined. 403 404Also you have to be careful when using C<&PL_sv_undef> as a value in 405AVs or HVs (see L</AVs, HVs and undefined values>). 406 407There are also the two values C<PL_sv_yes> and C<PL_sv_no>, which contain 408boolean TRUE and FALSE values, respectively. Like C<PL_sv_undef>, their 409addresses can be used whenever an C<SV*> is needed. 410 411Do not be fooled into thinking that C<(SV *) 0> is the same as C<&PL_sv_undef>. 412Take this code: 413 414 SV* sv = (SV*) 0; 415 if (I-am-to-return-a-real-value) { 416 sv = sv_2mortal(newSViv(42)); 417 } 418 sv_setsv(ST(0), sv); 419 420This code tries to return a new SV (which contains the value 42) if it should 421return a real value, or undef otherwise. Instead it has returned a NULL 422pointer which, somewhere down the line, will cause a segmentation violation, 423bus error, or just weird results. Change the zero to C<&PL_sv_undef> in the 424first line and all will be well. 425 426To free an SV that you've created, call C<SvREFCNT_dec(SV*)>. Normally this 427call is not necessary (see L</Reference Counts and Mortality>). 428 429=head2 Offsets 430 431Perl provides the function C<sv_chop> to efficiently remove characters 432from the beginning of a string; you give it an SV and a pointer to 433somewhere inside the PV, and it discards everything before the 434pointer. The efficiency comes by means of a little hack: instead of 435actually removing the characters, C<sv_chop> sets the flag C<OOK> 436(offset OK) to signal to other functions that the offset hack is in 437effect, and it moves the PV pointer (called C<SvPVX>) forward 438by the number of bytes chopped off, and adjusts C<SvCUR> and C<SvLEN> 439accordingly. (A portion of the space between the old and new PV 440pointers is used to store the count of chopped bytes.) 441 442Hence, at this point, the start of the buffer that we allocated lives 443at C<SvPVX(sv) - SvIV(sv)> in memory and the PV pointer is pointing 444into the middle of this allocated storage. 445 446This is best demonstrated by example. Normally copy-on-write will prevent 447the substitution from operator from using this hack, but if you can craft a 448string for which copy-on-write is not possible, you can see it in play. In 449the current implementation, the final byte of a string buffer is used as a 450copy-on-write reference count. If the buffer is not big enough, then 451copy-on-write is skipped. First have a look at an empty string: 452 453 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a .= ""; Dump $a' 454 SV = PV(0x7ffb7c008a70) at 0x7ffb7c030390 455 REFCNT = 1 456 FLAGS = (POK,pPOK) 457 PV = 0x7ffb7bc05b50 ""\0 458 CUR = 0 459 LEN = 10 460 461Notice here the LEN is 10. (It may differ on your platform.) Extend the 462length of the string to one less than 10, and do a substitution: 463 464 % ./perl -Ilib -MDevel::Peek -le '$a=""; $a.="123456789"; $a=~s/.//; \ 465 Dump($a)' 466 SV = PV(0x7ffa04008a70) at 0x7ffa04030390 467 REFCNT = 1 468 FLAGS = (POK,OOK,pPOK) 469 OFFSET = 1 470 PV = 0x7ffa03c05b61 ( "\1" . ) "23456789"\0 471 CUR = 8 472 LEN = 9 473 474Here the number of bytes chopped off (1) is shown next as the OFFSET. The 475portion of the string between the "real" and the "fake" beginnings is 476shown in parentheses, and the values of C<SvCUR> and C<SvLEN> reflect 477the fake beginning, not the real one. (The first character of the string 478buffer happens to have changed to "\1" here, not "1", because the current 479implementation stores the offset count in the string buffer. This is 480subject to change.) 481 482Something similar to the offset hack is performed on AVs to enable 483efficient shifting and splicing off the beginning of the array; while 484C<AvARRAY> points to the first element in the array that is visible from 485Perl, C<AvALLOC> points to the real start of the C array. These are 486usually the same, but a C<shift> operation can be carried out by 487increasing C<AvARRAY> by one and decreasing C<AvFILL> and C<AvMAX>. 488Again, the location of the real start of the C array only comes into 489play when freeing the array. See C<av_shift> in F<av.c>. 490 491=for apidoc_section $AV 492=for apidoc Amh||AvALLOC|AV* av 493 494=head2 What's Really Stored in an SV? 495 496Recall that the usual method of determining the type of scalar you have is 497to use C<Sv*OK> macros. Because a scalar can be both a number and a string, 498usually these macros will always return TRUE and calling the C<Sv*V> 499macros will do the appropriate conversion of string to integer/double or 500integer/double to string. 501 502If you I<really> need to know if you have an integer, double, or string 503pointer in an SV, you can use the following three macros instead: 504 505 SvIOKp(SV*) 506 SvNOKp(SV*) 507 SvPOKp(SV*) 508 509These will tell you if you truly have an integer, double, or string pointer 510stored in your SV. The "p" stands for private. 511 512There are various ways in which the private and public flags may differ. 513For example, in perl 5.16 and earlier a tied SV may have a valid 514underlying value in the IV slot (so SvIOKp is true), but the data 515should be accessed via the FETCH routine rather than directly, 516so SvIOK is false. (In perl 5.18 onwards, tied scalars use 517the flags the same way as untied scalars.) Another is when 518numeric conversion has occurred and precision has been lost: only the 519private flag is set on 'lossy' values. So when an NV is converted to an 520IV with loss, SvIOKp, SvNOKp and SvNOK will be set, while SvIOK wont be. 521 522In general, though, it's best to use the C<Sv*V> macros. 523 524=head2 Working with AVs 525 526There are two main, longstanding ways to create and load an AV. The first 527method creates an empty AV: 528 529 AV* newAV(); 530 531The second method both creates the AV and initially populates it with SVs: 532 533 AV* av_make(SSize_t num, SV **ptr); 534 535The second argument points to an array containing C<num> C<SV*>'s. Once the 536AV has been created, the SVs can be destroyed, if so desired. 537 538Perl v5.36 added two new ways to create an AV and allocate a SV** array 539without populating it. These are more efficient than a newAV() followed by an 540av_extend(). 541 542 /* Creates but does not initialize (Zero) the SV** array */ 543 AV *av = newAV_alloc_x(1); 544 /* Creates and does initialize (Zero) the SV** array */ 545 AV *av = newAV_alloc_xz(1); 546 547The numerical argument refers to the number of array elements to allocate, not 548an array index, and must be >0. The first form must only ever be used when all 549elements will be initialized before any read occurs. Reading a non-initialized 550SV* - i.e. treating a random memory address as a SV* - is a serious bug. 551 552Once the AV has been created, the following operations are possible on it: 553 554 void av_push(AV*, SV*); 555 SV* av_pop(AV*); 556 SV* av_shift(AV*); 557 void av_unshift(AV*, SSize_t num); 558 559These should be familiar operations, with the exception of C<av_unshift>. 560This routine adds C<num> elements at the front of the array with the C<undef> 561value. You must then use C<av_store> (described below) to assign values 562to these new elements. 563 564Here are some other functions: 565 566 Size_t av_count(AV*); 567 SSize_t av_top_index(AV*); 568 SV** av_fetch(AV*, SSize_t key, I32 lval); 569 SV** av_store(AV*, SSize_t key, SV* val); 570 571C<av_count> returns the number of elements in the array (including 572any empty slots (undefined ones) that are intermixed with filled-in ones). 573The C<av_top_index> function returns the highest index value in an array (just 574like $#array in Perl). If the array is empty, -1 is returned. It is 575always equal to S<C<av_count() - 1>>. The 576C<av_fetch> function returns the value at index C<key>, but if C<lval> 577is non-zero, then C<av_fetch> will store an undef value at that index. 578The C<av_store> function stores the value C<val> at index C<key>, and does 579not increment the reference count of C<val>. Thus the caller is responsible 580for taking care of that, and if C<av_store> returns NULL, the caller will 581have to decrement the reference count to avoid a memory leak. Note that 582C<av_fetch> and C<av_store> both return C<SV**>'s, not C<SV*>'s as their 583return value. 584 585A few more: 586 587 void av_clear(AV*); 588 void av_undef(AV*); 589 void av_extend(AV*, SSize_t key); 590 591The C<av_clear> function deletes all the elements in the AV* array, but 592does not actually delete the array itself. The C<av_undef> function will 593delete all the elements in the array plus the array itself. The 594C<av_extend> function extends the array so that it contains at least C<key+1> 595elements. If C<key+1> is less than the currently allocated length of the array, 596then nothing is done. 597 598If you know the name of an array variable, you can get a pointer to its AV 599by using the following: 600 601 AV* get_av("package::varname", 0); 602 603This returns NULL if the variable does not exist. 604 605See L</Understanding the Magic of Tied Hashes and Arrays> for more 606information on how to use the array access functions on tied arrays. 607 608=head3 More efficient working with new or vanilla AVs 609 610Perl v5.36 and v5.38 introduced streamlined, inlined versions of some 611functions: 612 613=over 614 615=item * C<av_store_simple> 616 617=item * C<av_fetch_simple> 618 619=item * C<av_push_simple> 620 621=back 622 623These are drop-in replacements, but can only be used on straightforward 624AVs that meet the following criteria: 625 626=over 627 628=item * are not magical 629 630=item * are not readonly 631 632=item * are "real" (refcounted) AVs 633 634=item * have an av_top_index value > -2 635 636=back 637 638AVs created using C<newAV()>, C<av_make>, C<newAV_alloc_x>, and 639C<newAV_alloc_xz> are all compatible at the time of creation. It is 640only if they are declared readonly or unreal, have magic attached, or 641are otherwise configured unusually that they will stop being compatible. 642 643Note that some interpreter functions may attach magic to an AV as part 644of normal operations. It is therefore safest, unless you are sure of the 645lifecycle of an AV, to only use these new functions close to the point 646of AV creation. 647 648=head2 Working with HVs 649 650To create an HV, you use the following routine: 651 652 HV* newHV(); 653 654Once the HV has been created, the following operations are possible on it: 655 656 SV** hv_store(HV*, const char* key, U32 klen, SV* val, U32 hash); 657 SV** hv_fetch(HV*, const char* key, U32 klen, I32 lval); 658 659The C<klen> parameter is the length of the key being passed in (Note that 660you cannot pass 0 in as a value of C<klen> to tell Perl to measure the 661length of the key). The C<val> argument contains the SV pointer to the 662scalar being stored, and C<hash> is the precomputed hash value (zero if 663you want C<hv_store> to calculate it for you). The C<lval> parameter 664indicates whether this fetch is actually a part of a store operation, in 665which case a new undefined value will be added to the HV with the supplied 666key and C<hv_fetch> will return as if the value had already existed. 667 668Remember that C<hv_store> and C<hv_fetch> return C<SV**>'s and not just 669C<SV*>. To access the scalar value, you must first dereference the return 670value. However, you should check to make sure that the return value is 671not NULL before dereferencing it. 672 673The first of these two functions checks if a hash table entry exists, and the 674second deletes it. 675 676 bool hv_exists(HV*, const char* key, U32 klen); 677 SV* hv_delete(HV*, const char* key, U32 klen, I32 flags); 678 679If C<flags> does not include the C<G_DISCARD> flag then C<hv_delete> will 680create and return a mortal copy of the deleted value. 681 682And more miscellaneous functions: 683 684 void hv_clear(HV*); 685 void hv_undef(HV*); 686 687Like their AV counterparts, C<hv_clear> deletes all the entries in the hash 688table but does not actually delete the hash table. The C<hv_undef> deletes 689both the entries and the hash table itself. 690 691Perl keeps the actual data in a linked list of structures with a typedef of HE. 692These contain the actual key and value pointers (plus extra administrative 693overhead). The key is a string pointer; the value is an C<SV*>. However, 694once you have an C<HE*>, to get the actual key and value, use the routines 695specified below. 696 697=for apidoc_section $HV 698=for apidoc Ayh||HE 699 700 I32 hv_iterinit(HV*); 701 /* Prepares starting point to traverse hash table */ 702 HE* hv_iternext(HV*); 703 /* Get the next entry, and return a pointer to a 704 structure that has both the key and value */ 705 char* hv_iterkey(HE* entry, I32* retlen); 706 /* Get the key from an HE structure and also return 707 the length of the key string */ 708 SV* hv_iterval(HV*, HE* entry); 709 /* Return an SV pointer to the value of the HE 710 structure */ 711 SV* hv_iternextsv(HV*, char** key, I32* retlen); 712 /* This convenience routine combines hv_iternext, 713 hv_iterkey, and hv_iterval. The key and retlen 714 arguments are return values for the key and its 715 length. The value is returned in the SV* argument */ 716 717If you know the name of a hash variable, you can get a pointer to its HV 718by using the following: 719 720 HV* get_hv("package::varname", 0); 721 722This returns NULL if the variable does not exist. 723 724The hash algorithm is defined in the C<PERL_HASH> macro: 725 726 PERL_HASH(hash, key, klen) 727 728The exact implementation of this macro varies by architecture and version 729of perl, and the return value may change per invocation, so the value 730is only valid for the duration of a single perl process. 731 732See L</Understanding the Magic of Tied Hashes and Arrays> for more 733information on how to use the hash access functions on tied hashes. 734 735=for apidoc_section $HV 736=for apidoc Amh|void|PERL_HASH|U32 hash|char *key|STRLEN klen 737 738=head2 Hash API Extensions 739 740Beginning with version 5.004, the following functions are also supported: 741 742 HE* hv_fetch_ent (HV* tb, SV* key, I32 lval, U32 hash); 743 HE* hv_store_ent (HV* tb, SV* key, SV* val, U32 hash); 744 745 bool hv_exists_ent (HV* tb, SV* key, U32 hash); 746 SV* hv_delete_ent (HV* tb, SV* key, I32 flags, U32 hash); 747 748 SV* hv_iterkeysv (HE* entry); 749 750Note that these functions take C<SV*> keys, which simplifies writing 751of extension code that deals with hash structures. These functions 752also allow passing of C<SV*> keys to C<tie> functions without forcing 753you to stringify the keys (unlike the previous set of functions). 754 755They also return and accept whole hash entries (C<HE*>), making their 756use more efficient (since the hash number for a particular string 757doesn't have to be recomputed every time). See L<perlapi> for detailed 758descriptions. 759 760The following macros must always be used to access the contents of hash 761entries. Note that the arguments to these macros must be simple 762variables, since they may get evaluated more than once. See 763L<perlapi> for detailed descriptions of these macros. 764 765 HePV(HE* he, STRLEN len) 766 HeVAL(HE* he) 767 HeHASH(HE* he) 768 HeSVKEY(HE* he) 769 HeSVKEY_force(HE* he) 770 HeSVKEY_set(HE* he, SV* sv) 771 772These two lower level macros are defined, but must only be used when 773dealing with keys that are not C<SV*>s: 774 775 HeKEY(HE* he) 776 HeKLEN(HE* he) 777 778Note that both C<hv_store> and C<hv_store_ent> do not increment the 779reference count of the stored C<val>, which is the caller's responsibility. 780If these functions return a NULL value, the caller will usually have to 781decrement the reference count of C<val> to avoid a memory leak. 782 783=head2 AVs, HVs and undefined values 784 785Sometimes you have to store undefined values in AVs or HVs. Although 786this may be a rare case, it can be tricky. That's because you're 787used to using C<&PL_sv_undef> if you need an undefined SV. 788 789For example, intuition tells you that this XS code: 790 791 AV *av = newAV(); 792 av_store( av, 0, &PL_sv_undef ); 793 794is equivalent to this Perl code: 795 796 my @av; 797 $av[0] = undef; 798 799Unfortunately, this isn't true. In perl 5.18 and earlier, AVs use C<&PL_sv_undef> as a marker 800for indicating that an array element has not yet been initialized. 801Thus, C<exists $av[0]> would be true for the above Perl code, but 802false for the array generated by the XS code. In perl 5.20, storing 803&PL_sv_undef will create a read-only element, because the scalar 804&PL_sv_undef itself is stored, not a copy. 805 806Similar problems can occur when storing C<&PL_sv_undef> in HVs: 807 808 hv_store( hv, "key", 3, &PL_sv_undef, 0 ); 809 810This will indeed make the value C<undef>, but if you try to modify 811the value of C<key>, you'll get the following error: 812 813 Modification of non-creatable hash value attempted 814 815In perl 5.8.0, C<&PL_sv_undef> was also used to mark placeholders 816in restricted hashes. This caused such hash entries not to appear 817when iterating over the hash or when checking for the keys 818with the C<hv_exists> function. 819 820You can run into similar problems when you store C<&PL_sv_yes> or 821C<&PL_sv_no> into AVs or HVs. Trying to modify such elements 822will give you the following error: 823 824 Modification of a read-only value attempted 825 826To make a long story short, you can use the special variables 827C<&PL_sv_undef>, C<&PL_sv_yes> and C<&PL_sv_no> with AVs and 828HVs, but you have to make sure you know what you're doing. 829 830Generally, if you want to store an undefined value in an AV 831or HV, you should not use C<&PL_sv_undef>, but rather create a 832new undefined value using the C<newSV> function, for example: 833 834 av_store( av, 42, newSV(0) ); 835 hv_store( hv, "foo", 3, newSV(0), 0 ); 836 837=head2 References 838 839References are a special type of scalar that point to other data types 840(including other references). 841 842To create a reference, use either of the following functions: 843 844 SV* newRV_inc((SV*) thing); 845 SV* newRV_noinc((SV*) thing); 846 847The C<thing> argument can be any of an C<SV*>, C<AV*>, or C<HV*>. The 848functions are identical except that C<newRV_inc> increments the reference 849count of the C<thing>, while C<newRV_noinc> does not. For historical 850reasons, C<newRV> is a synonym for C<newRV_inc>. 851 852Once you have a reference, you can use the following macro to dereference 853the reference: 854 855 SvRV(SV*) 856 857then call the appropriate routines, casting the returned C<SV*> to either an 858C<AV*> or C<HV*>, if required. 859 860To determine if an SV is a reference, you can use the following macro: 861 862 SvROK(SV*) 863 864To discover what type of value the reference refers to, use the following 865macro and then check the return value. 866 867 SvTYPE(SvRV(SV*)) 868 869The most useful types that will be returned are: 870 871 SVt_PVAV Array 872 SVt_PVHV Hash 873 SVt_PVCV Code 874 SVt_PVGV Glob (possibly a file handle) 875 876Any numerical value returned which is less than SVt_PVAV will be a scalar 877of some form. 878 879See L<perlapi/svtype> for more details. 880 881=head2 Blessed References and Class Objects 882 883References are also used to support object-oriented programming. In perl's 884OO lexicon, an object is simply a reference that has been blessed into a 885package (or class). Once blessed, the programmer may now use the reference 886to access the various methods in the class. 887 888A reference can be blessed into a package with the following function: 889 890 SV* sv_bless(SV* sv, HV* stash); 891 892The C<sv> argument must be a reference value. The C<stash> argument 893specifies which class the reference will belong to. See 894L</Stashes and Globs> for information on converting class names into stashes. 895 896/* Still under construction */ 897 898The following function upgrades rv to reference if not already one. 899Creates a new SV for rv to point to. If C<classname> is non-null, the SV 900is blessed into the specified class. SV is returned. 901 902 SV* newSVrv(SV* rv, const char* classname); 903 904The following three functions copy integer, unsigned integer or double 905into an SV whose reference is C<rv>. SV is blessed if C<classname> is 906non-null. 907 908 SV* sv_setref_iv(SV* rv, const char* classname, IV iv); 909 SV* sv_setref_uv(SV* rv, const char* classname, UV uv); 910 SV* sv_setref_nv(SV* rv, const char* classname, NV iv); 911 912The following function copies the pointer value (I<the address, not the 913string!>) into an SV whose reference is rv. SV is blessed if C<classname> 914is non-null. 915 916 SV* sv_setref_pv(SV* rv, const char* classname, void* pv); 917 918The following function copies a string into an SV whose reference is C<rv>. 919Set length to 0 to let Perl calculate the string length. SV is blessed if 920C<classname> is non-null. 921 922 SV* sv_setref_pvn(SV* rv, const char* classname, char* pv, 923 STRLEN length); 924 925The following function tests whether the SV is blessed into the specified 926class. It does not check inheritance relationships. 927 928 int sv_isa(SV* sv, const char* name); 929 930The following function tests whether the SV is a reference to a blessed object. 931 932 int sv_isobject(SV* sv); 933 934The following function tests whether the SV is derived from the specified 935class. SV can be either a reference to a blessed object or a string 936containing a class name. This is the function implementing the 937C<UNIVERSAL::isa> functionality. 938 939 bool sv_derived_from(SV* sv, const char* name); 940 941To check if you've got an object derived from a specific class you have 942to write: 943 944 if (sv_isobject(sv) && sv_derived_from(sv, class)) { ... } 945 946=head2 Creating New Variables 947 948To create a new Perl variable with an undef value which can be accessed from 949your Perl script, use the following routines, depending on the variable type. 950 951 SV* get_sv("package::varname", GV_ADD); 952 AV* get_av("package::varname", GV_ADD); 953 HV* get_hv("package::varname", GV_ADD); 954 955Notice the use of GV_ADD as the second parameter. The new variable can now 956be set, using the routines appropriate to the data type. 957 958There are additional macros whose values may be bitwise OR'ed with the 959C<GV_ADD> argument to enable certain extra features. Those bits are: 960 961=over 962 963=item GV_ADDMULTI 964 965Marks the variable as multiply defined, thus preventing the: 966 967 Name <varname> used only once: possible typo 968 969warning. 970 971=item GV_ADDWARN 972 973Issues the warning: 974 975 Had to create <varname> unexpectedly 976 977if the variable did not exist before the function was called. 978 979=back 980 981If you do not specify a package name, the variable is created in the current 982package. 983 984=head2 Reference Counts and Mortality 985 986Perl uses a reference count-driven garbage collection mechanism. SVs, 987AVs, or HVs (xV for short in the following) start their life with a 988reference count of 1. If the reference count of an xV ever drops to 0, 989then it will be destroyed and its memory made available for reuse. 990At the most basic internal level, reference counts can be manipulated 991with the following macros: 992 993 int SvREFCNT(SV* sv); 994 SV* SvREFCNT_inc(SV* sv); 995 void SvREFCNT_dec(SV* sv); 996 997(There are also suffixed versions of the increment and decrement macros, 998for situations where the full generality of these basic macros can be 999exchanged for some performance.) 1000 1001However, the way a programmer should think about references is not so 1002much in terms of the bare reference count, but in terms of I<ownership> 1003of references. A reference to an xV can be owned by any of a variety 1004of entities: another xV, the Perl interpreter, an XS data structure, 1005a piece of running code, or a dynamic scope. An xV generally does not 1006know what entities own the references to it; it only knows how many 1007references there are, which is the reference count. 1008 1009To correctly maintain reference counts, it is essential to keep track 1010of what references the XS code is manipulating. The programmer should 1011always know where a reference has come from and who owns it, and be 1012aware of any creation or destruction of references, and any transfers 1013of ownership. Because ownership isn't represented explicitly in the xV 1014data structures, only the reference count need be actually maintained 1015by the code, and that means that this understanding of ownership is not 1016actually evident in the code. For example, transferring ownership of a 1017reference from one owner to another doesn't change the reference count 1018at all, so may be achieved with no actual code. (The transferring code 1019doesn't touch the referenced object, but does need to ensure that the 1020former owner knows that it no longer owns the reference, and that the 1021new owner knows that it now does.) 1022 1023An xV that is visible at the Perl level should not become unreferenced 1024and thus be destroyed. Normally, an object will only become unreferenced 1025when it is no longer visible, often by the same means that makes it 1026invisible. For example, a Perl reference value (RV) owns a reference to 1027its referent, so if the RV is overwritten that reference gets destroyed, 1028and the no-longer-reachable referent may be destroyed as a result. 1029 1030Many functions have some kind of reference manipulation as 1031part of their purpose. Sometimes this is documented in terms 1032of ownership of references, and sometimes it is (less helpfully) 1033documented in terms of changes to reference counts. For example, the 1034L<newRV_inc()|perlapi/newRV_inc> function is documented to create a new RV 1035(with reference count 1) and increment the reference count of the referent 1036that was supplied by the caller. This is best understood as creating 1037a new reference to the referent, which is owned by the created RV, 1038and returning to the caller ownership of the sole reference to the RV. 1039The L<newRV_noinc()|perlapi/newRV_noinc> function instead does not 1040increment the reference count of the referent, but the RV nevertheless 1041ends up owning a reference to the referent. It is therefore implied 1042that the caller of C<newRV_noinc()> is relinquishing a reference to the 1043referent, making this conceptually a more complicated operation even 1044though it does less to the data structures. 1045 1046For example, imagine you want to return a reference from an XSUB 1047function. Inside the XSUB routine, you create an SV which initially 1048has just a single reference, owned by the XSUB routine. This reference 1049needs to be disposed of before the routine is complete, otherwise it 1050will leak, preventing the SV from ever being destroyed. So to create 1051an RV referencing the SV, it is most convenient to pass the SV to 1052C<newRV_noinc()>, which consumes that reference. Now the XSUB routine 1053no longer owns a reference to the SV, but does own a reference to the RV, 1054which in turn owns a reference to the SV. The ownership of the reference 1055to the RV is then transferred by the process of returning the RV from 1056the XSUB. 1057 1058There are some convenience functions available that can help with the 1059destruction of xVs. These functions introduce the concept of "mortality". 1060Much documentation speaks of an xV itself being mortal, but this is 1061misleading. It is really I<a reference to> an xV that is mortal, and it 1062is possible for there to be more than one mortal reference to a single xV. 1063For a reference to be mortal means that it is owned by the temps stack, 1064one of perl's many internal stacks, which will destroy that reference 1065"a short time later". Usually the "short time later" is the end of 1066the current Perl statement. However, it gets more complicated around 1067dynamic scopes: there can be multiple sets of mortal references hanging 1068around at the same time, with different death dates. Internally, the 1069actual determinant for when mortal xV references are destroyed depends 1070on two macros, SAVETMPS and FREETMPS. See L<perlcall> and L<perlxs> 1071and L</Temporaries Stack> below for more details on these macros. 1072 1073Mortal references are mainly used for xVs that are placed on perl's 1074main stack. The stack is problematic for reference tracking, because it 1075contains a lot of xV references, but doesn't own those references: they 1076are not counted. Currently, there are many bugs resulting from xVs being 1077destroyed while referenced by the stack, because the stack's uncounted 1078references aren't enough to keep the xVs alive. So when putting an 1079(uncounted) reference on the stack, it is vitally important to ensure that 1080there will be a counted reference to the same xV that will last at least 1081as long as the uncounted reference. But it's also important that that 1082counted reference be cleaned up at an appropriate time, and not unduly 1083prolong the xV's life. For there to be a mortal reference is often the 1084best way to satisfy this requirement, especially if the xV was created 1085especially to be put on the stack and would otherwise be unreferenced. 1086 1087To create a mortal reference, use the functions: 1088 1089 SV* sv_newmortal() 1090 SV* sv_mortalcopy(SV*) 1091 SV* sv_2mortal(SV*) 1092 1093C<sv_newmortal()> creates an SV (with the undefined value) whose sole 1094reference is mortal. C<sv_mortalcopy()> creates an xV whose value is a 1095copy of a supplied xV and whose sole reference is mortal. C<sv_2mortal()> 1096mortalises an existing xV reference: it transfers ownership of a reference 1097from the caller to the temps stack. Because C<sv_newmortal> gives the new 1098SV no value, it must normally be given one via C<sv_setpv>, C<sv_setiv>, 1099etc. : 1100 1101 SV *tmp = sv_newmortal(); 1102 sv_setiv(tmp, an_integer); 1103 1104As that is multiple C statements it is quite common so see this idiom instead: 1105 1106 SV *tmp = sv_2mortal(newSViv(an_integer)); 1107 1108The mortal routines are not just for SVs; AVs and HVs can be 1109made mortal by passing their address (type-casted to C<SV*>) to the 1110C<sv_2mortal> or C<sv_mortalcopy> routines. 1111 1112=head2 Stashes and Globs 1113 1114A B<stash> is a hash that contains all variables that are defined 1115within a package. Each key of the stash is a symbol 1116name (shared by all the different types of objects that have the same 1117name), and each value in the hash table is a GV (Glob Value). This GV 1118in turn contains references to the various objects of that name, 1119including (but not limited to) the following: 1120 1121 Scalar Value 1122 Array Value 1123 Hash Value 1124 I/O Handle 1125 Format 1126 Subroutine 1127 1128There is a single stash called C<PL_defstash> that holds the items that exist 1129in the C<main> package. To get at the items in other packages, append the 1130string "::" to the package name. The items in the C<Foo> package are in 1131the stash C<Foo::> in PL_defstash. The items in the C<Bar::Baz> package are 1132in the stash C<Baz::> in C<Bar::>'s stash. 1133 1134=for apidoc_section $GV 1135=for apidoc Amnh||PL_defstash 1136 1137To get the stash pointer for a particular package, use the function: 1138 1139 HV* gv_stashpv(const char* name, I32 flags) 1140 HV* gv_stashsv(SV*, I32 flags) 1141 1142The first function takes a literal string, the second uses the string stored 1143in the SV. Remember that a stash is just a hash table, so you get back an 1144C<HV*>. The C<flags> flag will create a new package if it is set to GV_ADD. 1145 1146The name that C<gv_stash*v> wants is the name of the package whose symbol table 1147you want. The default package is called C<main>. If you have multiply nested 1148packages, pass their names to C<gv_stash*v>, separated by C<::> as in the Perl 1149language itself. 1150 1151Alternately, if you have an SV that is a blessed reference, you can find 1152out the stash pointer by using: 1153 1154 HV* SvSTASH(SvRV(SV*)); 1155 1156then use the following to get the package name itself: 1157 1158 char* HvNAME(HV* stash); 1159 1160If you need to bless or re-bless an object you can use the following 1161function: 1162 1163 SV* sv_bless(SV*, HV* stash) 1164 1165where the first argument, an C<SV*>, must be a reference, and the second 1166argument is a stash. The returned C<SV*> can now be used in the same way 1167as any other SV. 1168 1169For more information on references and blessings, consult L<perlref>. 1170 1171=head2 I/O Handles 1172 1173Like AVs and HVs, IO objects are another type of non-scalar SV which 1174may contain input and output L<PerlIO|perlapio> objects or a C<DIR *> 1175from opendir(). 1176 1177You can create a new IO object: 1178 1179 IO* newIO(); 1180 1181Unlike other SVs, a new IO object is automatically blessed into the 1182L<IO::File> class. 1183 1184The IO object contains an input and output PerlIO handle: 1185 1186 PerlIO *IoIFP(IO *io); 1187 PerlIO *IoOFP(IO *io); 1188 1189=for apidoc_section $io 1190=for apidoc Amh|PerlIO *|IoIFP|IO *io 1191=for apidoc Amh|PerlIO *|IoOFP|IO *io 1192 1193Typically if the IO object has been opened on a file, the input handle 1194is always present, but the output handle is only present if the file 1195is open for output. For a file, if both are present they will be the 1196same PerlIO object. 1197 1198Distinct input and output PerlIO objects are created for sockets and 1199character devices. 1200 1201The IO object also contains other data associated with Perl I/O 1202handles: 1203 1204 IV IoLINES(io); /* $. */ 1205 IV IoPAGE(io); /* $% */ 1206 IV IoPAGE_LEN(io); /* $= */ 1207 IV IoLINES_LEFT(io); /* $- */ 1208 char *IoTOP_NAME(io); /* $^ */ 1209 GV *IoTOP_GV(io); /* $^ */ 1210 char *IoFMT_NAME(io); /* $~ */ 1211 GV *IoFMT_GV(io); /* $~ */ 1212 char *IoBOTTOM_NAME(io); 1213 GV *IoBOTTOM_GV(io); 1214 char IoTYPE(io); 1215 U8 IoFLAGS(io); 1216 1217 =for apidoc_sections $io_scn, $formats_section 1218=for apidoc_section $reports 1219=for apidoc Amh|IV|IoLINES|IO *io 1220=for apidoc Amh|IV|IoPAGE|IO *io 1221=for apidoc Amh|IV|IoPAGE_LEN|IO *io 1222=for apidoc Amh|IV|IoLINES_LEFT|IO *io 1223=for apidoc Amh|char *|IoTOP_NAME|IO *io 1224=for apidoc Amh|GV *|IoTOP_GV|IO *io 1225=for apidoc Amh|char *|IoFMT_NAME|IO *io 1226=for apidoc Amh|GV *|IoFMT_GV|IO *io 1227=for apidoc Amh|char *|IoBOTTOM_NAME|IO *io 1228=for apidoc Amh|GV *|IoBOTTOM_GV|IO *io 1229=for apidoc_section $io 1230=for apidoc Amh|char|IoTYPE|IO *io 1231=for apidoc Amh|U8|IoFLAGS|IO *io 1232 1233Most of these are involved with L<formats|perlform>. 1234 1235IoFLAGs() may contain a combination of flags, the most interesting of 1236which are C<IOf_FLUSH> (C<$|>) for autoflush and C<IOf_UNTAINT>, 1237settable with L<< IO::Handle's untaint() method|IO::Handle/"$io->untaint" >>. 1238 1239=for apidoc Amnh||IOf_FLUSH 1240=for apidoc Amnh||IOf_UNTAINT 1241 1242The IO object may also contains a directory handle: 1243 1244 DIR *IoDIRP(io); 1245 1246=for apidoc Amh|DIR *|IoDIRP|IO *io 1247 1248suitable for use with PerlDir_read() etc. 1249 1250All of these accessors macros are lvalues, there are no distinct 1251C<_set()> macros to modify the members of the IO object. 1252 1253=head2 Double-Typed SVs 1254 1255Scalar variables normally contain only one type of value, an integer, 1256double, pointer, or reference. Perl will automatically convert the 1257actual scalar data from the stored type into the requested type. 1258 1259Some scalar variables contain more than one type of scalar data. For 1260example, the variable C<$!> contains either the numeric value of C<errno> 1261or its string equivalent from either C<strerror> or C<sys_errlist[]>. 1262 1263To force multiple data values into an SV, you must do two things: use the 1264C<sv_set*v> routines to add the additional scalar type, then set a flag 1265so that Perl will believe it contains more than one type of data. The 1266four macros to set the flags are: 1267 1268 SvIOK_on 1269 SvNOK_on 1270 SvPOK_on 1271 SvROK_on 1272 1273The particular macro you must use depends on which C<sv_set*v> routine 1274you called first. This is because every C<sv_set*v> routine turns on 1275only the bit for the particular type of data being set, and turns off 1276all the rest. 1277 1278For example, to create a new Perl variable called "dberror" that contains 1279both the numeric and descriptive string error values, you could use the 1280following code: 1281 1282 extern int dberror; 1283 extern char *dberror_list; 1284 1285 SV* sv = get_sv("dberror", GV_ADD); 1286 sv_setiv(sv, (IV) dberror); 1287 sv_setpv(sv, dberror_list[dberror]); 1288 SvIOK_on(sv); 1289 1290If the order of C<sv_setiv> and C<sv_setpv> had been reversed, then the 1291macro C<SvPOK_on> would need to be called instead of C<SvIOK_on>. 1292 1293=head2 Read-Only Values 1294 1295In Perl 5.16 and earlier, copy-on-write (see the next section) shared a 1296flag bit with read-only scalars. So the only way to test whether 1297C<sv_setsv>, etc., will raise a "Modification of a read-only value" error 1298in those versions is: 1299 1300 SvREADONLY(sv) && !SvIsCOW(sv) 1301 1302Under Perl 5.18 and later, SvREADONLY only applies to read-only variables, 1303and, under 5.20, copy-on-write scalars can also be read-only, so the above 1304check is incorrect. You just want: 1305 1306 SvREADONLY(sv) 1307 1308If you need to do this check often, define your own macro like this: 1309 1310 #if PERL_VERSION >= 18 1311 # define SvTRULYREADONLY(sv) SvREADONLY(sv) 1312 #else 1313 # define SvTRULYREADONLY(sv) (SvREADONLY(sv) && !SvIsCOW(sv)) 1314 #endif 1315 1316=head2 Copy on Write 1317 1318Perl implements a copy-on-write (COW) mechanism for scalars, in which 1319string copies are not immediately made when requested, but are deferred 1320until made necessary by one or the other scalar changing. This is mostly 1321transparent, but one must take care not to modify string buffers that are 1322shared by multiple SVs. 1323 1324You can test whether an SV is using copy-on-write with C<SvIsCOW(sv)>. 1325 1326You can force an SV to make its own copy of its string buffer by calling C<sv_force_normal(sv)> or SvPV_force_nolen(sv). 1327 1328If you want to make the SV drop its string buffer, use 1329C<sv_force_normal_flags(sv, SV_COW_DROP_PV)> or simply 1330C<sv_setsv(sv, NULL)>. 1331 1332All of these functions will croak on read-only scalars (see the previous 1333section for more on those). 1334 1335To test that your code is behaving correctly and not modifying COW buffers, 1336on systems that support L<mmap(2)> (i.e., Unix) you can configure perl with 1337C<-Accflags=-DPERL_DEBUG_READONLY_COW> and it will turn buffer violations 1338into crashes. You will find it to be marvellously slow, so you may want to 1339skip perl's own tests. 1340 1341=head2 Magic Variables 1342 1343[This section still under construction. Ignore everything here. Post no 1344bills. Everything not permitted is forbidden.] 1345 1346Any SV may be magical, that is, it has special features that a normal 1347SV does not have. These features are stored in the SV structure in a 1348linked list of C<struct magic>'s, typedef'ed to C<MAGIC>. 1349 1350 struct magic { 1351 MAGIC* mg_moremagic; 1352 MGVTBL* mg_virtual; 1353 U16 mg_private; 1354 char mg_type; 1355 U8 mg_flags; 1356 I32 mg_len; 1357 SV* mg_obj; 1358 char* mg_ptr; 1359 }; 1360 1361Note this is current as of patchlevel 0, and could change at any time. 1362 1363=head2 Assigning Magic 1364 1365Perl adds magic to an SV using the sv_magic function: 1366 1367 void sv_magic(SV* sv, SV* obj, int how, const char* name, I32 namlen); 1368 1369The C<sv> argument is a pointer to the SV that is to acquire a new magical 1370feature. 1371 1372If C<sv> is not already magical, Perl uses the C<SvUPGRADE> macro to 1373convert C<sv> to type C<SVt_PVMG>. 1374Perl then continues by adding new magic 1375to the beginning of the linked list of magical features. Any prior entry 1376of the same type of magic is deleted. Note that this can be overridden, 1377and multiple instances of the same type of magic can be associated with an 1378SV. 1379 1380The C<name> and C<namlen> arguments are used to associate a string with 1381the magic, typically the name of a variable. C<namlen> is stored in the 1382C<mg_len> field and if C<name> is non-null then either a C<savepvn> copy of 1383C<name> or C<name> itself is stored in the C<mg_ptr> field, depending on 1384whether C<namlen> is greater than zero or equal to zero respectively. As a 1385special case, if C<(name && namlen == HEf_SVKEY)> then C<name> is assumed 1386to contain an C<SV*> and is stored as-is with its REFCNT incremented. 1387 1388The sv_magic function uses C<how> to determine which, if any, predefined 1389"Magic Virtual Table" should be assigned to the C<mg_virtual> field. 1390See the L</Magic Virtual Tables> section below. The C<how> argument is also 1391stored in the C<mg_type> field. The value of 1392C<how> should be chosen from the set of macros 1393C<PERL_MAGIC_foo> found in F<perl.h>. Note that before 1394these macros were added, Perl internals used to directly use character 1395literals, so you may occasionally come across old code or documentation 1396referring to 'U' magic rather than C<PERL_MAGIC_uvar> for example. 1397 1398The C<obj> argument is stored in the C<mg_obj> field of the C<MAGIC> 1399structure. If it is not the same as the C<sv> argument, the reference 1400count of the C<obj> object is incremented. If it is the same, or if 1401the C<how> argument is C<PERL_MAGIC_arylen>, C<PERL_MAGIC_regdatum>, 1402C<PERL_MAGIC_regdata>, or if it is a NULL pointer, then C<obj> is merely 1403stored, without the reference count being incremented. 1404 1405See also C<sv_magicext> in L<perlapi> for a more flexible way to add magic 1406to an SV. 1407 1408There is also a function to add magic to an C<HV>: 1409 1410 void hv_magic(HV *hv, GV *gv, int how); 1411 1412This simply calls C<sv_magic> and coerces the C<gv> argument into an C<SV>. 1413 1414To remove the magic from an SV, call the function sv_unmagic: 1415 1416 int sv_unmagic(SV *sv, int type); 1417 1418The C<type> argument should be equal to the C<how> value when the C<SV> 1419was initially made magical. 1420 1421However, note that C<sv_unmagic> removes all magic of a certain C<type> from the 1422C<SV>. If you want to remove only certain 1423magic of a C<type> based on the magic 1424virtual table, use C<sv_unmagicext> instead: 1425 1426 int sv_unmagicext(SV *sv, int type, MGVTBL *vtbl); 1427 1428=head2 Magic Virtual Tables 1429 1430The C<mg_virtual> field in the C<MAGIC> structure is a pointer to an 1431C<MGVTBL>, which is a structure of function pointers and stands for 1432"Magic Virtual Table" to handle the various operations that might be 1433applied to that variable. 1434 1435=for apidoc_section $magic 1436=for apidoc Ayh||MGVTBL 1437 1438The C<MGVTBL> has five (or sometimes eight) pointers to the following 1439routine types: 1440 1441 int (*svt_get) (pTHX_ SV* sv, MAGIC* mg); 1442 int (*svt_set) (pTHX_ SV* sv, MAGIC* mg); 1443 U32 (*svt_len) (pTHX_ SV* sv, MAGIC* mg); 1444 int (*svt_clear)(pTHX_ SV* sv, MAGIC* mg); 1445 int (*svt_free) (pTHX_ SV* sv, MAGIC* mg); 1446 1447 int (*svt_copy) (pTHX_ SV *sv, MAGIC* mg, SV *nsv, 1448 const char *name, I32 namlen); 1449 int (*svt_dup) (pTHX_ MAGIC *mg, CLONE_PARAMS *param); 1450 int (*svt_local)(pTHX_ SV *nsv, MAGIC *mg); 1451 1452 1453This MGVTBL structure is set at compile-time in F<perl.h> and there are 1454currently 32 types. These different structures contain pointers to various 1455routines that perform additional actions depending on which function is 1456being called. 1457 1458 Function pointer Action taken 1459 ---------------- ------------ 1460 svt_get Do something before the value of the SV is 1461 retrieved. 1462 svt_set Do something after the SV is assigned a value. 1463 svt_len Report on the SV's length. 1464 svt_clear Clear something the SV represents. 1465 svt_free Free any extra storage associated with the SV. 1466 1467 svt_copy copy tied variable magic to a tied element 1468 svt_dup duplicate a magic structure during thread cloning 1469 svt_local copy magic to local value during 'local' 1470 1471For instance, the MGVTBL structure called C<vtbl_sv> (which corresponds 1472to an C<mg_type> of C<PERL_MAGIC_sv>) contains: 1473 1474 { magic_get, magic_set, magic_len, 0, 0 } 1475 1476Thus, when an SV is determined to be magical and of type C<PERL_MAGIC_sv>, 1477if a get operation is being performed, the routine C<magic_get> is 1478called. All the various routines for the various magical types begin 1479with C<magic_>. NOTE: the magic routines are not considered part of 1480the Perl API, and may not be exported by the Perl library. 1481 1482The last three slots are a recent addition, and for source code 1483compatibility they are only checked for if one of the three flags 1484C<MGf_COPY>, C<MGf_DUP>, or C<MGf_LOCAL> is set in mg_flags. 1485This means that most code can continue declaring 1486a vtable as a 5-element value. These three are 1487currently used exclusively by the threading code, and are highly subject 1488to change. 1489 1490=for apidoc_section $magic 1491=for apidoc Amnh||MGf_COPY 1492=for apidoc_item ||MGf_DUP 1493=for apidoc_item ||MGf_LOCAL 1494 1495The current kinds of Magic Virtual Tables are: 1496 1497=for comment 1498This table is generated by regen/mg_vtable.pl. Any changes made here 1499will be lost. 1500 1501=for mg_vtable.pl begin 1502 1503 mg_type 1504 (old-style char and macro) MGVTBL Type of magic 1505 -------------------------- ------ ------------- 1506 \0 PERL_MAGIC_sv vtbl_sv Special scalar variable 1507 # PERL_MAGIC_arylen vtbl_arylen Array length ($#ary) 1508 % PERL_MAGIC_rhash (none) Extra data for restricted 1509 hashes 1510 * PERL_MAGIC_debugvar vtbl_debugvar $DB::single, signal, trace 1511 vars 1512 . PERL_MAGIC_pos vtbl_pos pos() lvalue 1513 : PERL_MAGIC_symtab (none) Extra data for symbol 1514 tables 1515 < PERL_MAGIC_backref vtbl_backref For weak ref data 1516 @ PERL_MAGIC_arylen_p (none) To move arylen out of XPVAV 1517 B PERL_MAGIC_bm vtbl_regexp Boyer-Moore 1518 (fast string search) 1519 c PERL_MAGIC_overload_table vtbl_ovrld Holds overload table 1520 (AMT) on stash 1521 D PERL_MAGIC_regdata vtbl_regdata Regex match position data 1522 (@+ and @- vars) 1523 d PERL_MAGIC_regdatum vtbl_regdatum Regex match position data 1524 element 1525 E PERL_MAGIC_env vtbl_env %ENV hash 1526 e PERL_MAGIC_envelem vtbl_envelem %ENV hash element 1527 f PERL_MAGIC_fm vtbl_regexp Formline 1528 ('compiled' format) 1529 g PERL_MAGIC_regex_global vtbl_mglob m//g target 1530 H PERL_MAGIC_hints vtbl_hints %^H hash 1531 h PERL_MAGIC_hintselem vtbl_hintselem %^H hash element 1532 I PERL_MAGIC_isa vtbl_isa @ISA array 1533 i PERL_MAGIC_isaelem vtbl_isaelem @ISA array element 1534 k PERL_MAGIC_nkeys vtbl_nkeys scalar(keys()) lvalue 1535 L PERL_MAGIC_dbfile (none) Debugger %_<filename 1536 l PERL_MAGIC_dbline vtbl_dbline Debugger %_<filename 1537 element 1538 N PERL_MAGIC_shared (none) Shared between threads 1539 n PERL_MAGIC_shared_scalar (none) Shared between threads 1540 o PERL_MAGIC_collxfrm vtbl_collxfrm Locale transformation 1541 P PERL_MAGIC_tied vtbl_pack Tied array or hash 1542 p PERL_MAGIC_tiedelem vtbl_packelem Tied array or hash element 1543 q PERL_MAGIC_tiedscalar vtbl_packelem Tied scalar or handle 1544 r PERL_MAGIC_qr vtbl_regexp Precompiled qr// regex 1545 S PERL_MAGIC_sig vtbl_sig %SIG hash 1546 s PERL_MAGIC_sigelem vtbl_sigelem %SIG hash element 1547 t PERL_MAGIC_taint vtbl_taint Taintedness 1548 U PERL_MAGIC_uvar vtbl_uvar Available for use by 1549 extensions 1550 u PERL_MAGIC_uvar_elem (none) Reserved for use by 1551 extensions 1552 V PERL_MAGIC_vstring (none) SV was vstring literal 1553 v PERL_MAGIC_vec vtbl_vec vec() lvalue 1554 w PERL_MAGIC_utf8 vtbl_utf8 Cached UTF-8 information 1555 X PERL_MAGIC_destruct vtbl_destruct destruct callback 1556 x PERL_MAGIC_substr vtbl_substr substr() lvalue 1557 Y PERL_MAGIC_nonelem vtbl_nonelem Array element that does not 1558 exist 1559 y PERL_MAGIC_defelem vtbl_defelem Shadow "foreach" iterator 1560 variable / smart parameter 1561 vivification 1562 Z PERL_MAGIC_hook vtbl_hook %{^HOOK} hash 1563 z PERL_MAGIC_hookelem vtbl_hookelem %{^HOOK} hash element 1564 \ PERL_MAGIC_lvref vtbl_lvref Lvalue reference 1565 constructor 1566 ] PERL_MAGIC_checkcall vtbl_checkcall Inlining/mutation of call 1567 to this CV 1568 ^ PERL_MAGIC_extvalue (none) Value magic available for 1569 use by extensions 1570 ~ PERL_MAGIC_ext (none) Variable magic available 1571 for use by extensions 1572 1573 1574=for apidoc_section $magic 1575=for apidoc AmnhU||PERL_MAGIC_arylen 1576=for apidoc_item ||PERL_MAGIC_arylen_p 1577=for apidoc_item ||PERL_MAGIC_backref 1578=for apidoc_item ||PERL_MAGIC_bm 1579=for apidoc_item ||PERL_MAGIC_checkcall 1580=for apidoc_item ||PERL_MAGIC_collxfrm 1581=for apidoc_item ||PERL_MAGIC_dbfile 1582=for apidoc_item ||PERL_MAGIC_dbline 1583=for apidoc_item ||PERL_MAGIC_debugvar 1584=for apidoc_item ||PERL_MAGIC_defelem 1585=for apidoc_item ||PERL_MAGIC_destruct 1586=for apidoc_item ||PERL_MAGIC_env 1587=for apidoc_item ||PERL_MAGIC_envelem 1588=for apidoc_item ||PERL_MAGIC_ext 1589=for apidoc_item ||PERL_MAGIC_extvalue 1590=for apidoc_item ||PERL_MAGIC_fm 1591=for apidoc_item ||PERL_MAGIC_hints 1592=for apidoc_item ||PERL_MAGIC_hintselem 1593=for apidoc_item ||PERL_MAGIC_hook 1594=for apidoc_item ||PERL_MAGIC_hookelem 1595=for apidoc_item ||PERL_MAGIC_isa 1596=for apidoc_item ||PERL_MAGIC_isaelem 1597=for apidoc_item ||PERL_MAGIC_lvref 1598=for apidoc_item ||PERL_MAGIC_nkeys 1599=for apidoc_item ||PERL_MAGIC_nonelem 1600=for apidoc_item ||PERL_MAGIC_overload_table 1601=for apidoc_item ||PERL_MAGIC_pos 1602=for apidoc_item ||PERL_MAGIC_qr 1603=for apidoc_item ||PERL_MAGIC_regdata 1604=for apidoc_item ||PERL_MAGIC_regdatum 1605=for apidoc_item ||PERL_MAGIC_regex_global 1606=for apidoc_item ||PERL_MAGIC_rhash 1607=for apidoc_item ||PERL_MAGIC_shared 1608=for apidoc_item ||PERL_MAGIC_shared_scalar 1609=for apidoc_item ||PERL_MAGIC_sig 1610=for apidoc_item ||PERL_MAGIC_sigelem 1611=for apidoc_item ||PERL_MAGIC_substr 1612=for apidoc_item ||PERL_MAGIC_sv 1613=for apidoc_item ||PERL_MAGIC_symtab 1614=for apidoc_item ||PERL_MAGIC_taint 1615=for apidoc_item ||PERL_MAGIC_tied 1616=for apidoc_item ||PERL_MAGIC_tiedelem 1617=for apidoc_item ||PERL_MAGIC_tiedscalar 1618=for apidoc_item ||PERL_MAGIC_utf8 1619=for apidoc_item ||PERL_MAGIC_uvar 1620=for apidoc_item ||PERL_MAGIC_uvar_elem 1621=for apidoc_item ||PERL_MAGIC_vec 1622=for apidoc_item ||PERL_MAGIC_vstring 1623 1624=for mg_vtable.pl end 1625 1626When an uppercase and lowercase letter both exist in the table, then the 1627uppercase letter is typically used to represent some kind of composite type 1628(a list or a hash), and the lowercase letter is used to represent an element 1629of that composite type. Some internals code makes use of this case 1630relationship. However, 'v' and 'V' (vec and v-string) are in no way related. 1631 1632The C<PERL_MAGIC_ext>, C<PERL_MAGIC_extvalue> and C<PERL_MAGIC_uvar> magic types 1633are defined specifically for use by extensions and will not be used by perl 1634itself. Extensions can use C<PERL_MAGIC_ext> or C<PERL_MAGIC_extvalue> magic to 1635'attach' private information to variables (typically objects). This is 1636especially useful because there is no way for normal perl code to corrupt this 1637private information (unlike using extra elements of a hash object). 1638C<PERL_MAGIC_extvalue> is value magic (unlike C<PERL_MAGIC_ext> and 1639C<PERL_MAGIC_uvar>) meaning that on localization the new value will not be 1640magical. 1641 1642Similarly, C<PERL_MAGIC_uvar> magic can be used much like tie() to call a 1643C function any time a scalar's value is used or changed. The C<MAGIC>'s 1644C<mg_ptr> field points to a C<ufuncs> structure: 1645 1646 struct ufuncs { 1647 I32 (*uf_val)(pTHX_ IV, SV*); 1648 I32 (*uf_set)(pTHX_ IV, SV*); 1649 IV uf_index; 1650 }; 1651 1652When the SV is read from or written to, the C<uf_val> or C<uf_set> 1653function will be called with C<uf_index> as the first arg and a pointer to 1654the SV as the second. A simple example of how to add C<PERL_MAGIC_uvar> 1655magic is shown below. Note that the ufuncs structure is copied by 1656sv_magic, so you can safely allocate it on the stack. 1657 1658 void 1659 Umagic(sv) 1660 SV *sv; 1661 PREINIT: 1662 struct ufuncs uf; 1663 CODE: 1664 uf.uf_val = &my_get_fn; 1665 uf.uf_set = &my_set_fn; 1666 uf.uf_index = 0; 1667 sv_magic(sv, 0, PERL_MAGIC_uvar, (char*)&uf, sizeof(uf)); 1668 1669Attaching C<PERL_MAGIC_uvar> to arrays is permissible but has no effect. 1670 1671For hashes there is a specialized hook that gives control over hash 1672keys (but not values). This hook calls C<PERL_MAGIC_uvar> 'get' magic 1673if the "set" function in the C<ufuncs> structure is NULL. The hook 1674is activated whenever the hash is accessed with a key specified as 1675an C<SV> through the functions C<hv_store_ent>, C<hv_fetch_ent>, 1676C<hv_delete_ent>, and C<hv_exists_ent>. Accessing the key as a string 1677through the functions without the C<..._ent> suffix circumvents the 1678hook. See L<Hash::Util::FieldHash/GUTS> for a detailed description. 1679 1680Note that because multiple extensions may be using C<PERL_MAGIC_ext> 1681or C<PERL_MAGIC_uvar> magic, it is important for extensions to take 1682extra care to avoid conflict. Typically only using the magic on 1683objects blessed into the same class as the extension is sufficient. 1684For C<PERL_MAGIC_ext> magic, it is usually a good idea to define an 1685C<MGVTBL>, even if all its fields will be C<0>, so that individual 1686C<MAGIC> pointers can be identified as a particular kind of magic 1687using their magic virtual table. C<mg_findext> provides an easy way 1688to do that: 1689 1690 STATIC MGVTBL my_vtbl = { 0, 0, 0, 0, 0, 0, 0, 0 }; 1691 1692 MAGIC *mg; 1693 if ((mg = mg_findext(sv, PERL_MAGIC_ext, &my_vtbl))) { 1694 /* this is really ours, not another module's PERL_MAGIC_ext */ 1695 my_priv_data_t *priv = (my_priv_data_t *)mg->mg_ptr; 1696 ... 1697 } 1698 1699Also note that the C<sv_set*()> and C<sv_cat*()> functions described 1700earlier do B<not> invoke 'set' magic on their targets. This must 1701be done by the user either by calling the C<SvSETMAGIC()> macro after 1702calling these functions, or by using one of the C<sv_set*_mg()> or 1703C<sv_cat*_mg()> functions. Similarly, generic C code must call the 1704C<SvGETMAGIC()> macro to invoke any 'get' magic if they use an SV 1705obtained from external sources in functions that don't handle magic. 1706See L<perlapi> for a description of these functions. 1707For example, calls to the C<sv_cat*()> functions typically need to be 1708followed by C<SvSETMAGIC()>, but they don't need a prior C<SvGETMAGIC()> 1709since their implementation handles 'get' magic. 1710 1711=head2 Finding Magic 1712 1713 MAGIC *mg_find(SV *sv, int type); /* Finds the magic pointer of that 1714 * type */ 1715 1716This routine returns a pointer to a C<MAGIC> structure stored in the SV. 1717If the SV does not have that magical 1718feature, C<NULL> is returned. If the 1719SV has multiple instances of that magical feature, the first one will be 1720returned. C<mg_findext> can be used 1721to find a C<MAGIC> structure of an SV 1722based on both its magic type and its magic virtual table: 1723 1724 MAGIC *mg_findext(SV *sv, int type, MGVTBL *vtbl); 1725 1726Also, if the SV passed to C<mg_find> or C<mg_findext> is not of type 1727SVt_PVMG, Perl may core dump. 1728 1729 int mg_copy(SV* sv, SV* nsv, const char* key, STRLEN klen); 1730 1731This routine checks to see what types of magic C<sv> has. If the mg_type 1732field is an uppercase letter, then the mg_obj is copied to C<nsv>, but 1733the mg_type field is changed to be the lowercase letter. 1734 1735=head2 Understanding the Magic of Tied Hashes and Arrays 1736 1737Tied hashes and arrays are magical beasts of the C<PERL_MAGIC_tied> 1738magic type. 1739 1740WARNING: As of the 5.004 release, proper usage of the array and hash 1741access functions requires understanding a few caveats. Some 1742of these caveats are actually considered bugs in the API, to be fixed 1743in later releases, and are bracketed with [MAYCHANGE] below. If 1744you find yourself actually applying such information in this section, be 1745aware that the behavior may change in the future, umm, without warning. 1746 1747The perl tie function associates a variable with an object that implements 1748the various GET, SET, etc methods. To perform the equivalent of the perl 1749tie function from an XSUB, you must mimic this behaviour. The code below 1750carries out the necessary steps -- firstly it creates a new hash, and then 1751creates a second hash which it blesses into the class which will implement 1752the tie methods. Lastly it ties the two hashes together, and returns a 1753reference to the new tied hash. Note that the code below does NOT call the 1754TIEHASH method in the MyTie class - 1755see L</Calling Perl Routines from within C Programs> for details on how 1756to do this. 1757 1758 SV* 1759 mytie() 1760 PREINIT: 1761 HV *hash; 1762 HV *stash; 1763 SV *tie; 1764 CODE: 1765 hash = newHV(); 1766 tie = newRV_noinc((SV*)newHV()); 1767 stash = gv_stashpv("MyTie", GV_ADD); 1768 sv_bless(tie, stash); 1769 hv_magic(hash, (GV*)tie, PERL_MAGIC_tied); 1770 SvREFCNT_dec(tie); /* hv_magic() increases tie ref count */ 1771 RETVAL = newRV_noinc(hash); 1772 OUTPUT: 1773 RETVAL 1774 1775The C<av_store> function, when given a tied array argument, merely 1776copies the magic of the array onto the value to be "stored", using 1777C<mg_copy>. It may also return NULL, indicating that the value did not 1778actually need to be stored in the array. [MAYCHANGE] After a call to 1779C<av_store> on a tied array, the caller will usually need to call 1780C<mg_set(val)> to actually invoke the perl level "STORE" method on the 1781TIEARRAY object. If C<av_store> did return NULL, a call to 1782C<SvREFCNT_dec(val)> will also be usually necessary to avoid a memory 1783leak. [/MAYCHANGE] 1784 1785The previous paragraph is applicable verbatim to tied hash access using the 1786C<hv_store> and C<hv_store_ent> functions as well. 1787 1788C<av_fetch> and the corresponding hash functions C<hv_fetch> and 1789C<hv_fetch_ent> actually return an undefined mortal value whose magic 1790has been initialized using C<mg_copy>. Note the value so returned does not 1791need to be deallocated, as it is already mortal. [MAYCHANGE] But you will 1792need to call C<mg_get()> on the returned value in order to actually invoke 1793the perl level "FETCH" method on the underlying TIE object. Similarly, 1794you may also call C<mg_set()> on the return value after possibly assigning 1795a suitable value to it using C<sv_setsv>, which will invoke the "STORE" 1796method on the TIE object. [/MAYCHANGE] 1797 1798[MAYCHANGE] 1799In other words, the array or hash fetch/store functions don't really 1800fetch and store actual values in the case of tied arrays and hashes. They 1801merely call C<mg_copy> to attach magic to the values that were meant to be 1802"stored" or "fetched". Later calls to C<mg_get> and C<mg_set> actually 1803do the job of invoking the TIE methods on the underlying objects. Thus 1804the magic mechanism currently implements a kind of lazy access to arrays 1805and hashes. 1806 1807Currently (as of perl version 5.004), use of the hash and array access 1808functions requires the user to be aware of whether they are operating on 1809"normal" hashes and arrays, or on their tied variants. The API may be 1810changed to provide more transparent access to both tied and normal data 1811types in future versions. 1812[/MAYCHANGE] 1813 1814You would do well to understand that the TIEARRAY and TIEHASH interfaces 1815are mere sugar to invoke some perl method calls while using the uniform hash 1816and array syntax. The use of this sugar imposes some overhead (typically 1817about two to four extra opcodes per FETCH/STORE operation, in addition to 1818the creation of all the mortal variables required to invoke the methods). 1819This overhead will be comparatively small if the TIE methods are themselves 1820substantial, but if they are only a few statements long, the overhead 1821will not be insignificant. 1822 1823=head2 Localizing changes 1824 1825Perl has a very handy construction 1826 1827 { 1828 local $var = 2; 1829 ... 1830 } 1831 1832This construction is I<approximately> equivalent to 1833 1834 { 1835 my $oldvar = $var; 1836 $var = 2; 1837 ... 1838 $var = $oldvar; 1839 } 1840 1841The biggest difference is that the first construction would 1842reinstate the initial value of $var, irrespective of how control exits 1843the block: C<goto>, C<return>, C<die>/C<eval>, etc. It is a little bit 1844more efficient as well. 1845 1846There is a way to achieve a similar task from C via Perl API: create a 1847I<pseudo-block>, and arrange for some changes to be automatically 1848undone at the end of it, either explicit, or via a non-local exit (via 1849die()). A I<block>-like construct is created by a pair of 1850C<ENTER>/C<LEAVE> macros (see L<perlcall/"Returning a Scalar">). 1851Such a construct may be created specially for some important localized 1852task, or an existing one (like boundaries of enclosing Perl 1853subroutine/block, or an existing pair for freeing TMPs) may be 1854used. (In the second case the overhead of additional localization must 1855be almost negligible.) Note that any XSUB is automatically enclosed in 1856an C<ENTER>/C<LEAVE> pair. 1857 1858Inside such a I<pseudo-block> the following service is available: 1859 1860=over 4 1861 1862=item C<SAVEINT(int i)> 1863 1864=item C<SAVEIV(IV i)> 1865 1866=item C<SAVEI32(I32 i)> 1867 1868=item C<SAVELONG(long i)> 1869 1870=item C<SAVEI8(I8 i)> 1871 1872=item C<SAVEI16(I16 i)> 1873 1874=item C<SAVEBOOL(int i)> 1875 1876=item C<SAVESTRLEN(STRLEN i)> 1877 1878These macros arrange things to restore the value of integer variable 1879C<i> at the end of the enclosing I<pseudo-block>. 1880 1881=for apidoc_section $callback 1882=for apidoc Amh||SAVEINT|int i 1883=for apidoc Amh||SAVEIV|IV i 1884=for apidoc Amh||SAVEI32|I32 i 1885=for apidoc Amh||SAVELONG|long i 1886=for apidoc Amh||SAVEI8|I8 i 1887=for apidoc Amh||SAVEI16|I16 i 1888=for apidoc Amh||SAVEBOOL|bool i 1889=for apidoc Amh||SAVESTRLEN|STRLEN i 1890 1891=item C<SAVESPTR(s)> 1892 1893=item C<SAVEPPTR(p)> 1894 1895These macros arrange things to restore the value of pointers C<s> and 1896C<p>. C<s> must be a pointer of a type which survives conversion to 1897C<SV*> and back, C<p> should be able to survive conversion to C<char*> 1898and back. 1899 1900=for apidoc Amh||SAVESPTR|SV * s 1901=for apidoc Amh||SAVEPPTR|char * p 1902 1903=item C<SAVERCPV(char **ppv)> 1904 1905This macro arranges to restore the value of a C<char *> variable which 1906was allocated with a call to C<rcpv_new()> to its previous state when 1907the current pseudo block is completed. The pointer stored in C<*ppv> at 1908the time of the call will be refcount incremented and stored on the save 1909stack. Later when the current I<pseudo-block> is completed the value 1910stored in C<*ppv> will be refcount decremented, and the previous value 1911restored from the savestack which will also be refcount decremented. 1912 1913This is the C<RCPV> equivalent of C<SAVEGENERICSV()>. 1914 1915=for apidoc Amh||SAVERCPV|char *pv 1916 1917=item C<SAVEGENERICSV(SV **psv)> 1918 1919This macro arranges to restore the value of a C<SV *> variable to its 1920previous state when the current pseudo block is completed. The pointer 1921stored in C<*psv> at the time of the call will be refcount incremented 1922and stored on the save stack. Later when the current I<pseudo-block> is 1923completed the value stored in C<*ppv> will be refcount decremented, and 1924the previous value restored from the savestack which will also be refcount 1925decremented. This the C equivalent of C<local $sv>. 1926 1927=for apidoc Amh||SAVEGENERICSV|char **psv 1928 1929=item C<SAVEFREESV(SV *sv)> 1930 1931The refcount of C<sv> will be decremented at the end of 1932I<pseudo-block>. This is similar to C<sv_2mortal> in that it is also a 1933mechanism for doing a delayed C<SvREFCNT_dec>. However, while C<sv_2mortal> 1934extends the lifetime of C<sv> until the beginning of the next statement, 1935C<SAVEFREESV> extends it until the end of the enclosing scope. These 1936lifetimes can be wildly different. 1937 1938Also compare C<SAVEMORTALIZESV>. 1939 1940=for apidoc Amh||SAVEFREESV|SV* sv 1941 1942=item C<SAVEMORTALIZESV(SV *sv)> 1943 1944Just like C<SAVEFREESV>, but mortalizes C<sv> at the end of the current 1945scope instead of decrementing its reference count. This usually has the 1946effect of keeping C<sv> alive until the statement that called the currently 1947live scope has finished executing. 1948 1949=for apidoc Amh||SAVEMORTALIZESV|SV* sv 1950 1951=item C<SAVEFREEOP(OP *op)> 1952 1953The C<OP *> is C<op_free()>ed at the end of I<pseudo-block>. 1954 1955=for apidoc Amh||SAVEFREEOP|OP *op 1956 1957=item C<SAVEFREEPV(p)> 1958 1959The chunk of memory which is pointed to by C<p> is C<Safefree()>ed at the 1960end of the current I<pseudo-block>. 1961 1962=for apidoc Amh||SAVEFREEPV|char *pv 1963 1964=item C<SAVEFREERCPV(char *pv)> 1965 1966Ensures that a C<char *> which was created by a call to C<rcpv_new()> is 1967C<rcpv_free()>ed at the end of the current I<pseudo-block>. 1968 1969This is the RCPV equivalent of C<SAVEFREESV()>. 1970 1971=for apidoc Amh||SAVEFREERCPV|char *pv 1972 1973=item C<SAVECLEARSV(SV *sv)> 1974 1975Clears a slot in the current scratchpad which corresponds to C<sv> at 1976the end of I<pseudo-block>. 1977 1978=item C<SAVEDELETE(HV *hv, char *key, I32 length)> 1979 1980The key C<key> of C<hv> is deleted at the end of I<pseudo-block>. The 1981string pointed to by C<key> is Safefree()ed. If one has a I<key> in 1982short-lived storage, the corresponding string may be reallocated like 1983this: 1984 1985 SAVEDELETE(PL_defstash, savepv(tmpbuf), strlen(tmpbuf)); 1986 1987=for apidoc Amh||SAVEDELETE|HV * hv|char * key|I32 length 1988 1989=item C<SAVEDESTRUCTOR(DESTRUCTORFUNC_NOCONTEXT_t f, void *p)> 1990 1991At the end of I<pseudo-block> the function C<f> is called with the 1992only argument C<p> which may be NULL. 1993 1994=for apidoc Ayh||DESTRUCTORFUNC_NOCONTEXT_t 1995=for apidoc Amh||SAVEDESTRUCTOR|DESTRUCTORFUNC_NOCONTEXT_t f|void *p 1996 1997=item C<SAVEDESTRUCTOR_X(DESTRUCTORFUNC_t f, void *p)> 1998 1999At the end of I<pseudo-block> the function C<f> is called with the 2000implicit context argument (if any), and C<p> which may be NULL. 2001 2002Note the I<end of the current pseudo-block> may occur much later than 2003the I<end of the current statement>. You may wish to look at the 2004C<MORTALSVFUNC_X()> macro instead. 2005 2006=for apidoc Ayh||DESTRUCTORFUNC_t 2007=for apidoc Amh||SAVEDESTRUCTOR_X|DESTRUCTORFUNC_t f|void *p 2008 2009=item C<MORTALSVFUNC_X(SVFUNC_t f, SV *sv)> 2010 2011At the end of I<the current statement> the function C<f> is called with 2012the implicit context argument (if any), and C<sv> which may be NULL. 2013 2014Be aware that the parameter argument to the destructor function differs 2015from the related C<SAVEDESTRUCTOR_X()> in that it MUST be either NULL or 2016an C<SV*>. 2017 2018Note the I<end of the current statement> may occur much before the 2019the I<end of the current pseudo-block>. You may wish to look at the 2020C<SAVEDESTRUCTOR_X()> macro instead. 2021 2022=for apidoc Amh||MORTALSVFUNC_X|SVFUNC_t f|SV *sv 2023 2024=item C<MORTALDESTRUCTOR_SV(SV *coderef, SV *args)> 2025 2026At the end of I<the current statement> the Perl function contained in 2027C<coderef> is called with the arguments provided (if any) in C<args>. 2028See the documentation for C<mortal_destructor_sv()> for details on 2029the C<args> parameter is handled. 2030 2031Note the I<end of the current statement> may occur much before the 2032the I<end of the current pseudo-block>. If you wish to call a perl 2033function at the end of the current pseudo block you should use the 2034C<SAVEDESTRUCTOR_X()> API instead, which will require you create a 2035C wrapper to call the Perl function. 2036 2037=for apidoc Amh||MORTALDESTRUCTOR_SV|SV *coderef|SV *args 2038 2039=item C<SAVESTACK_POS()> 2040 2041The current offset on the Perl internal stack (cf. C<SP>) is restored 2042at the end of I<pseudo-block>. 2043 2044=for apidoc Amh||SAVESTACK_POS 2045 2046=back 2047 2048The following API list contains functions, thus one needs to 2049provide pointers to the modifiable data explicitly (either C pointers, 2050or Perlish C<GV *>s). Where the above macros take C<int>, a similar 2051function takes C<int *>. 2052 2053Other macros above have functions implementing them, but its probably 2054best to just use the macro, and not those or the ones below. 2055 2056=over 4 2057 2058=item C<SV* save_scalar(GV *gv)> 2059 2060=for apidoc save_scalar 2061 2062Equivalent to Perl code C<local $gv>. 2063 2064=item C<AV* save_ary(GV *gv)> 2065 2066=for apidoc save_ary 2067 2068=item C<HV* save_hash(GV *gv)> 2069 2070=for apidoc save_hash 2071 2072Similar to C<save_scalar>, but localize C<@gv> and C<%gv>. 2073 2074=item C<void save_item(SV *item)> 2075 2076=for apidoc save_item 2077 2078Duplicates the current value of C<SV>. On the exit from the current 2079C<ENTER>/C<LEAVE> I<pseudo-block> the value of C<SV> will be restored 2080using the stored value. It doesn't handle magic. Use C<save_scalar> if 2081magic is affected. 2082 2083=item C<SV* save_svref(SV **sptr)> 2084 2085=for apidoc save_svref 2086 2087Similar to C<save_scalar>, but will reinstate an C<SV *>. 2088 2089=item C<void save_aptr(AV **aptr)> 2090 2091=item C<void save_hptr(HV **hptr)> 2092 2093=for apidoc save_aptr 2094=for apidoc save_hptr 2095 2096Similar to C<save_svref>, but localize C<AV *> and C<HV *>. 2097 2098=back 2099 2100The C<Alias> module implements localization of the basic types within the 2101I<caller's scope>. People who are interested in how to localize things in 2102the containing scope should take a look there too. 2103 2104=head1 Subroutines 2105 2106=head2 XSUBs and the Argument Stack 2107 2108The XSUB mechanism is a simple way for Perl programs to access C subroutines. 2109An XSUB routine will have a stack that contains the arguments from the Perl 2110program, and a way to map from the Perl data structures to a C equivalent. 2111 2112The stack arguments are accessible through the C<ST(n)> macro, which returns 2113the C<n>'th stack argument. Argument 0 is the first argument passed in the 2114Perl subroutine call. These arguments are C<SV*>, and can be used anywhere 2115an C<SV*> is used. 2116 2117Most of the time, output from the C routine can be handled through use of 2118the RETVAL and OUTPUT directives. However, there are some cases where the 2119argument stack is not already long enough to handle all the return values. 2120An example is the POSIX tzname() call, which takes no arguments, but returns 2121two, the local time zone's standard and summer time abbreviations. 2122 2123To handle this situation, the PPCODE directive is used and the stack is 2124extended using the macro: 2125 2126 EXTEND(SP, num); 2127 2128where C<SP> is the macro that represents the local copy of the stack pointer, 2129and C<num> is the number of elements the stack should be extended by. 2130 2131Now that there is room on the stack, values can be pushed on it using C<PUSHs> 2132macro. The pushed values will often need to be "mortal" (See 2133L</Reference Counts and Mortality>): 2134 2135 PUSHs(sv_2mortal(newSViv(an_integer))) 2136 PUSHs(sv_2mortal(newSVuv(an_unsigned_integer))) 2137 PUSHs(sv_2mortal(newSVnv(a_double))) 2138 PUSHs(sv_2mortal(newSVpv("Some String",0))) 2139 /* Although the last example is better written as the more 2140 * efficient: */ 2141 PUSHs(newSVpvs_flags("Some String", SVs_TEMP)) 2142 2143And now the Perl program calling C<tzname>, the two values will be assigned 2144as in: 2145 2146 ($standard_abbrev, $summer_abbrev) = POSIX::tzname; 2147 2148An alternate (and possibly simpler) method to pushing values on the stack is 2149to use the macro: 2150 2151 XPUSHs(SV*) 2152 2153This macro automatically adjusts the stack for you, if needed. Thus, you 2154do not need to call C<EXTEND> to extend the stack. 2155 2156Despite their suggestions in earlier versions of this document the macros 2157C<(X)PUSH[iunp]> are I<not> suited to XSUBs which return multiple results. 2158For that, either stick to the C<(X)PUSHs> macros shown above, or use the new 2159C<m(X)PUSH[iunp]> macros instead; see L</Putting a C value on Perl stack>. 2160 2161For more information, consult L<perlxs> and L<perlxstut>. 2162 2163=head2 Autoloading with XSUBs 2164 2165If an AUTOLOAD routine is an XSUB, as with Perl subroutines, Perl puts the 2166fully-qualified name of the autoloaded subroutine in the $AUTOLOAD variable 2167of the XSUB's package. 2168 2169But it also puts the same information in certain fields of the XSUB itself: 2170 2171 HV *stash = CvSTASH(cv); 2172 const char *subname = SvPVX(cv); 2173 STRLEN name_length = SvCUR(cv); /* in bytes */ 2174 U32 is_utf8 = SvUTF8(cv); 2175 2176C<SvPVX(cv)> contains just the sub name itself, not including the package. 2177For an AUTOLOAD routine in UNIVERSAL or one of its superclasses, 2178C<CvSTASH(cv)> returns NULL during a method call on a nonexistent package. 2179 2180B<Note>: Setting $AUTOLOAD stopped working in 5.6.1, which did not support 2181XS AUTOLOAD subs at all. Perl 5.8.0 introduced the use of fields in the 2182XSUB itself. Perl 5.16.0 restored the setting of $AUTOLOAD. If you need 2183to support 5.8-5.14, use the XSUB's fields. 2184 2185=head2 Calling Perl Routines from within C Programs 2186 2187There are four routines that can be used to call a Perl subroutine from 2188within a C program. These four are: 2189 2190 I32 call_sv(SV*, I32); 2191 I32 call_pv(const char*, I32); 2192 I32 call_method(const char*, I32); 2193 I32 call_argv(const char*, I32, char**); 2194 2195The routine most often used is C<call_sv>. The C<SV*> argument 2196contains either the name of the Perl subroutine to be called, or a 2197reference to the subroutine. The second argument consists of flags 2198that control the context in which the subroutine is called, whether 2199or not the subroutine is being passed arguments, how errors should be 2200trapped, and how to treat return values. 2201 2202All four routines return the number of arguments that the subroutine returned 2203on the Perl stack. 2204 2205These routines used to be called C<perl_call_sv>, etc., before Perl v5.6.0, 2206but those names are now deprecated; macros of the same name are provided for 2207compatibility. 2208 2209When using any of these routines (except C<call_argv>), the programmer 2210must manipulate the Perl stack. These include the following macros and 2211functions: 2212 2213 dSP 2214 SP 2215 PUSHMARK() 2216 PUTBACK 2217 SPAGAIN 2218 ENTER 2219 SAVETMPS 2220 FREETMPS 2221 LEAVE 2222 XPUSH*() 2223 POP*() 2224 2225For a detailed description of calling conventions from C to Perl, 2226consult L<perlcall>. 2227 2228=head2 Putting a C value on Perl stack 2229 2230A lot of opcodes (this is an elementary operation in the internal perl 2231stack machine) put an SV* on the stack. However, as an optimization 2232the corresponding SV is (usually) not recreated each time. The opcodes 2233reuse specially assigned SVs (I<target>s) which are (as a corollary) 2234not constantly freed/created. 2235 2236Each of the targets is created only once (but see 2237L</Scratchpads and recursion> below), and when an opcode needs to put 2238an integer, a double, or a string on the stack, it just sets the 2239corresponding parts of its I<target> and puts the I<target> on stack. 2240 2241The macro to put this target on stack is C<PUSHTARG>, and it is 2242directly used in some opcodes, as well as indirectly in zillions of 2243others, which use it via C<(X)PUSH[iunp]>. 2244 2245Because the target is reused, you must be careful when pushing multiple 2246values on the stack. The following code will not do what you think: 2247 2248 XPUSHi(10); 2249 XPUSHi(20); 2250 2251This translates as "set C<TARG> to 10, push a pointer to C<TARG> onto 2252the stack; set C<TARG> to 20, push a pointer to C<TARG> onto the stack". 2253At the end of the operation, the stack does not contain the values 10 2254and 20, but actually contains two pointers to C<TARG>, which we have set 2255to 20. 2256 2257If you need to push multiple different values then you should either use 2258the C<(X)PUSHs> macros, or else use the new C<m(X)PUSH[iunp]> macros, 2259none of which make use of C<TARG>. The C<(X)PUSHs> macros simply push an 2260SV* on the stack, which, as noted under L</XSUBs and the Argument Stack>, 2261will often need to be "mortal". The new C<m(X)PUSH[iunp]> macros make 2262this a little easier to achieve by creating a new mortal for you (via 2263C<(X)PUSHmortal>), pushing that onto the stack (extending it if necessary 2264in the case of the C<mXPUSH[iunp]> macros), and then setting its value. 2265Thus, instead of writing this to "fix" the example above: 2266 2267 XPUSHs(sv_2mortal(newSViv(10))) 2268 XPUSHs(sv_2mortal(newSViv(20))) 2269 2270you can simply write: 2271 2272 mXPUSHi(10) 2273 mXPUSHi(20) 2274 2275On a related note, if you do use C<(X)PUSH[iunp]>, then you're going to 2276need a C<dTARG> in your variable declarations so that the C<*PUSH*> 2277macros can make use of the local variable C<TARG>. See also 2278C<dTARGET> and C<dXSTARG>. 2279 2280=head2 Scratchpads 2281 2282The question remains on when the SVs which are I<target>s for opcodes 2283are created. The answer is that they are created when the current 2284unit--a subroutine or a file (for opcodes for statements outside of 2285subroutines)--is compiled. During this time a special anonymous Perl 2286array is created, which is called a scratchpad for the current unit. 2287 2288A scratchpad keeps SVs which are lexicals for the current unit and are 2289targets for opcodes. A previous version of this document 2290stated that one can deduce that an SV lives on a scratchpad 2291by looking on its flags: lexicals have C<SVs_PADMY> set, and 2292I<target>s have C<SVs_PADTMP> set. But this has never been fully true. 2293C<SVs_PADMY> could be set on a variable that no longer resides in any pad. 2294While I<target>s do have C<SVs_PADTMP> set, it can also be set on variables 2295that have never resided in a pad, but nonetheless act like I<target>s. As 2296of perl 5.21.5, the C<SVs_PADMY> flag is no longer used and is defined as 22970. C<SvPADMY()> now returns true for anything without C<SVs_PADTMP>. 2298 2299=for apidoc_section $pad 2300=for apidoc Amnh||SVs_PADTMP 2301=for apidoc AmnhD||SVs_PADMY 2302 2303The correspondence between OPs and I<target>s is not 1-to-1. Different 2304OPs in the compile tree of the unit can use the same target, if this 2305would not conflict with the expected life of the temporary. 2306 2307=head2 Scratchpads and recursion 2308 2309In fact it is not 100% true that a compiled unit contains a pointer to 2310the scratchpad AV. In fact it contains a pointer to an AV of 2311(initially) one element, and this element is the scratchpad AV. Why do 2312we need an extra level of indirection? 2313 2314The answer is B<recursion>, and maybe B<threads>. Both 2315these can create several execution pointers going into the same 2316subroutine. For the subroutine-child not write over the temporaries 2317for the subroutine-parent (lifespan of which covers the call to the 2318child), the parent and the child should have different 2319scratchpads. (I<And> the lexicals should be separate anyway!) 2320 2321So each subroutine is born with an array of scratchpads (of length 1). 2322On each entry to the subroutine it is checked that the current 2323depth of the recursion is not more than the length of this array, and 2324if it is, new scratchpad is created and pushed into the array. 2325 2326The I<target>s on this scratchpad are C<undef>s, but they are already 2327marked with correct flags. 2328 2329=head1 Memory Allocation 2330 2331=head2 Allocation 2332 2333All memory meant to be used with the Perl API functions should be manipulated 2334using the macros described in this section. The macros provide the necessary 2335transparency between differences in the actual malloc implementation that is 2336used within perl. 2337 2338The following three macros are used to initially allocate memory : 2339 2340 Newx(pointer, number, type); 2341 Newxc(pointer, number, type, cast); 2342 Newxz(pointer, number, type); 2343 2344The first argument C<pointer> should be the name of a variable that will 2345point to the newly allocated memory. 2346 2347The second and third arguments C<number> and C<type> specify how many of 2348the specified type of data structure should be allocated. The argument 2349C<type> is passed to C<sizeof>. The final argument to C<Newxc>, C<cast>, 2350should be used if the C<pointer> argument is different from the C<type> 2351argument. 2352 2353Unlike the C<Newx> and C<Newxc> macros, the C<Newxz> macro calls C<memzero> 2354to zero out all the newly allocated memory. 2355 2356=head2 Reallocation 2357 2358 Renew(pointer, number, type); 2359 Renewc(pointer, number, type, cast); 2360 Safefree(pointer) 2361 2362These three macros are used to change a memory buffer size or to free a 2363piece of memory no longer needed. The arguments to C<Renew> and C<Renewc> 2364match those of C<New> and C<Newc> with the exception of not needing the 2365"magic cookie" argument. 2366 2367=head2 Moving 2368 2369 Move(source, dest, number, type); 2370 Copy(source, dest, number, type); 2371 Zero(dest, number, type); 2372 2373These three macros are used to move, copy, or zero out previously allocated 2374memory. The C<source> and C<dest> arguments point to the source and 2375destination starting points. Perl will move, copy, or zero out C<number> 2376instances of the size of the C<type> data structure (using the C<sizeof> 2377function). 2378 2379=head1 PerlIO 2380 2381The most recent development releases of Perl have been experimenting with 2382removing Perl's dependency on the "normal" standard I/O suite and allowing 2383other stdio implementations to be used. This involves creating a new 2384abstraction layer that then calls whichever implementation of stdio Perl 2385was compiled with. All XSUBs should now use the functions in the PerlIO 2386abstraction layer and not make any assumptions about what kind of stdio 2387is being used. 2388 2389For a complete description of the PerlIO abstraction, consult L<perlapio>. 2390 2391=head1 Compiled code 2392 2393=head2 Code tree 2394 2395Here we describe the internal form your code is converted to by 2396Perl. Start with a simple example: 2397 2398 $a = $b + $c; 2399 2400This is converted to a tree similar to this one: 2401 2402 assign-to 2403 / \ 2404 + $a 2405 / \ 2406 $b $c 2407 2408(but slightly more complicated). This tree reflects the way Perl 2409parsed your code, but has nothing to do with the execution order. 2410There is an additional "thread" going through the nodes of the tree 2411which shows the order of execution of the nodes. In our simplified 2412example above it looks like: 2413 2414 $b ---> $c ---> + ---> $a ---> assign-to 2415 2416But with the actual compile tree for C<$a = $b + $c> it is different: 2417some nodes I<optimized away>. As a corollary, though the actual tree 2418contains more nodes than our simplified example, the execution order 2419is the same as in our example. 2420 2421=head2 Examining the tree 2422 2423If you have your perl compiled for debugging (usually done with 2424C<-DDEBUGGING> on the C<Configure> command line), you may examine the 2425compiled tree by specifying C<-Dx> on the Perl command line. The 2426output takes several lines per node, and for C<$b+$c> it looks like 2427this: 2428 2429 5 TYPE = add ===> 6 2430 TARG = 1 2431 FLAGS = (SCALAR,KIDS) 2432 { 2433 TYPE = null ===> (4) 2434 (was rv2sv) 2435 FLAGS = (SCALAR,KIDS) 2436 { 2437 3 TYPE = gvsv ===> 4 2438 FLAGS = (SCALAR) 2439 GV = main::b 2440 } 2441 } 2442 { 2443 TYPE = null ===> (5) 2444 (was rv2sv) 2445 FLAGS = (SCALAR,KIDS) 2446 { 2447 4 TYPE = gvsv ===> 5 2448 FLAGS = (SCALAR) 2449 GV = main::c 2450 } 2451 } 2452 2453This tree has 5 nodes (one per C<TYPE> specifier), only 3 of them are 2454not optimized away (one per number in the left column). The immediate 2455children of the given node correspond to C<{}> pairs on the same level 2456of indentation, thus this listing corresponds to the tree: 2457 2458 add 2459 / \ 2460 null null 2461 | | 2462 gvsv gvsv 2463 2464The execution order is indicated by C<===E<gt>> marks, thus it is C<3 24654 5 6> (node C<6> is not included into above listing), i.e., 2466C<gvsv gvsv add whatever>. 2467 2468Each of these nodes represents an op, a fundamental operation inside the 2469Perl core. The code which implements each operation can be found in the 2470F<pp*.c> files; the function which implements the op with type C<gvsv> 2471is C<pp_gvsv>, and so on. As the tree above shows, different ops have 2472different numbers of children: C<add> is a binary operator, as one would 2473expect, and so has two children. To accommodate the various different 2474numbers of children, there are various types of op data structure, and 2475they link together in different ways. 2476 2477The simplest type of op structure is C<OP>: this has no children. Unary 2478operators, C<UNOP>s, have one child, and this is pointed to by the 2479C<op_first> field. Binary operators (C<BINOP>s) have not only an 2480C<op_first> field but also an C<op_last> field. The most complex type of 2481op is a C<LISTOP>, which has any number of children. In this case, the 2482first child is pointed to by C<op_first> and the last child by 2483C<op_last>. The children in between can be found by iteratively 2484following the C<OpSIBLING> pointer from the first child to the last (but 2485see below). 2486 2487=for apidoc_section $optree_construction 2488=for apidoc Ayh||OP 2489=for apidoc Ayh||BINOP 2490=for apidoc Ayh||LISTOP 2491=for apidoc Ayh||UNOP 2492 2493There are also some other op types: a C<PMOP> holds a regular expression, 2494and has no children, and a C<LOOP> may or may not have children. If the 2495C<op_children> field is non-zero, it behaves like a C<LISTOP>. To 2496complicate matters, if a C<UNOP> is actually a C<null> op after 2497optimization (see L</Compile pass 2: context propagation>) it will still 2498have children in accordance with its former type. 2499 2500=for apidoc Ayh||LOOP 2501=for apidoc Ayh||PMOP 2502 2503Finally, there is a C<LOGOP>, or logic op. Like a C<LISTOP>, this has one 2504or more children, but it doesn't have an C<op_last> field: so you have to 2505follow C<op_first> and then the C<OpSIBLING> chain itself to find the 2506last child. Instead it has an C<op_other> field, which is comparable to 2507the C<op_next> field described below, and represents an alternate 2508execution path. Operators like C<and>, C<or> and C<?> are C<LOGOP>s. Note 2509that in general, C<op_other> may not point to any of the direct children 2510of the C<LOGOP>. 2511 2512=for apidoc Ayh||LOGOP 2513 2514Starting in version 5.21.2, perls built with the experimental 2515define C<-DPERL_OP_PARENT> add an extra boolean flag for each op, 2516C<op_moresib>. When not set, this indicates that this is the last op in an 2517C<OpSIBLING> chain. This frees up the C<op_sibling> field on the last 2518sibling to point back to the parent op. Under this build, that field is 2519also renamed C<op_sibparent> to reflect its joint role. The macro 2520C<OpSIBLING(o)> wraps this special behaviour, and always returns NULL on 2521the last sibling. With this build the C<op_parent(o)> function can be 2522used to find the parent of any op. Thus for forward compatibility, you 2523should always use the C<OpSIBLING(o)> macro rather than accessing 2524C<op_sibling> directly. 2525 2526Another way to examine the tree is to use a compiler back-end module, such 2527as L<B::Concise>. 2528 2529=head2 Compile pass 1: check routines 2530 2531The tree is created by the compiler while I<yacc> code feeds it 2532the constructions it recognizes. Since I<yacc> works bottom-up, so does 2533the first pass of perl compilation. 2534 2535What makes this pass interesting for perl developers is that some 2536optimization may be performed on this pass. This is optimization by 2537so-called "check routines". The correspondence between node names 2538and corresponding check routines is described in F<opcode.pl> (do not 2539forget to run C<make regen_headers> if you modify this file). 2540 2541A check routine is called when the node is fully constructed except 2542for the execution-order thread. Since at this time there are no 2543back-links to the currently constructed node, one can do most any 2544operation to the top-level node, including freeing it and/or creating 2545new nodes above/below it. 2546 2547The check routine returns the node which should be inserted into the 2548tree (if the top-level node was not modified, check routine returns 2549its argument). 2550 2551By convention, check routines have names C<ck_*>. They are usually 2552called from C<new*OP> subroutines (or C<convert>) (which in turn are 2553called from F<perly.y>). 2554 2555=head2 Compile pass 1a: constant folding 2556 2557Immediately after the check routine is called the returned node is 2558checked for being compile-time executable. If it is (the value is 2559judged to be constant) it is immediately executed, and a I<constant> 2560node with the "return value" of the corresponding subtree is 2561substituted instead. The subtree is deleted. 2562 2563If constant folding was not performed, the execution-order thread is 2564created. 2565 2566=head2 Compile pass 2: context propagation 2567 2568When a context for a part of compile tree is known, it is propagated 2569down through the tree. At this time the context can have 5 values 2570(instead of 2 for runtime context): void, boolean, scalar, list, and 2571lvalue. In contrast with the pass 1 this pass is processed from top 2572to bottom: a node's context determines the context for its children. 2573 2574Additional context-dependent optimizations are performed at this time. 2575Since at this moment the compile tree contains back-references (via 2576"thread" pointers), nodes cannot be free()d now. To allow 2577optimized-away nodes at this stage, such nodes are null()ified instead 2578of free()ing (i.e. their type is changed to OP_NULL). 2579 2580=head2 Compile pass 3: peephole optimization 2581 2582After the compile tree for a subroutine (or for an C<eval> or a file) 2583is created, an additional pass over the code is performed. This pass 2584is neither top-down or bottom-up, but in the execution order (with 2585additional complications for conditionals). Optimizations performed 2586at this stage are subject to the same restrictions as in the pass 2. 2587 2588Peephole optimizations are done by calling the function pointed to 2589by the global variable C<PL_peepp>. By default, C<PL_peepp> just 2590calls the function pointed to by the global variable C<PL_rpeepp>. 2591By default, that performs some basic op fixups and optimisations along 2592the execution-order op chain, and recursively calls C<PL_rpeepp> for 2593each side chain of ops (resulting from conditionals). Extensions may 2594provide additional optimisations or fixups, hooking into either the 2595per-subroutine or recursive stage, like this: 2596 2597 static peep_t prev_peepp; 2598 static void my_peep(pTHX_ OP *o) 2599 { 2600 /* custom per-subroutine optimisation goes here */ 2601 prev_peepp(aTHX_ o); 2602 /* custom per-subroutine optimisation may also go here */ 2603 } 2604 BOOT: 2605 prev_peepp = PL_peepp; 2606 PL_peepp = my_peep; 2607 2608 static peep_t prev_rpeepp; 2609 static void my_rpeep(pTHX_ OP *first) 2610 { 2611 OP *o = first, *t = first; 2612 for(; o = o->op_next, t = t->op_next) { 2613 /* custom per-op optimisation goes here */ 2614 o = o->op_next; 2615 if (!o || o == t) break; 2616 /* custom per-op optimisation goes AND here */ 2617 } 2618 prev_rpeepp(aTHX_ orig_o); 2619 } 2620 BOOT: 2621 prev_rpeepp = PL_rpeepp; 2622 PL_rpeepp = my_rpeep; 2623 2624=for apidoc_section $optree_manipulation 2625=for apidoc Ayh||peep_t 2626 2627=head2 Pluggable runops 2628 2629The compile tree is executed in a runops function. There are two runops 2630functions, in F<run.c> and in F<dump.c>. C<Perl_runops_debug> is used 2631with DEBUGGING and C<Perl_runops_standard> is used otherwise. For fine 2632control over the execution of the compile tree it is possible to provide 2633your own runops function. 2634 2635It's probably best to copy one of the existing runops functions and 2636change it to suit your needs. Then, in the BOOT section of your XS 2637file, add the line: 2638 2639 PL_runops = my_runops; 2640 2641=for apidoc_section $debugging 2642=for apidoc runops_debug 2643=for apidoc runops_standard 2644=for apidoc Amnh|runops_proc_t|PL_runops 2645 2646This function should be as efficient as possible to keep your programs 2647running as fast as possible. 2648 2649=head2 Compile-time scope hooks 2650 2651As of perl 5.14 it is possible to hook into the compile-time lexical 2652scope mechanism using C<Perl_blockhook_register>. This is used like 2653this: 2654 2655 STATIC void my_start_hook(pTHX_ int full); 2656 STATIC BHK my_hooks; 2657 2658 BOOT: 2659 BhkENTRY_set(&my_hooks, bhk_start, my_start_hook); 2660 Perl_blockhook_register(aTHX_ &my_hooks); 2661 2662This will arrange to have C<my_start_hook> called at the start of 2663compiling every lexical scope. The available hooks are: 2664 2665=for apidoc_section $lexer 2666=for apidoc Ayh||BHK 2667 2668=over 4 2669 2670=item C<void bhk_start(pTHX_ int full)> 2671 2672This is called just after starting a new lexical scope. Note that Perl 2673code like 2674 2675 if ($x) { ... } 2676 2677creates two scopes: the first starts at the C<(> and has C<full == 1>, 2678the second starts at the C<{> and has C<full == 0>. Both end at the 2679C<}>, so calls to C<start> and C<pre>/C<post_end> will match. Anything 2680pushed onto the save stack by this hook will be popped just before the 2681scope ends (between the C<pre_> and C<post_end> hooks, in fact). 2682 2683=item C<void bhk_pre_end(pTHX_ OP **o)> 2684 2685This is called at the end of a lexical scope, just before unwinding the 2686stack. I<o> is the root of the optree representing the scope; it is a 2687double pointer so you can replace the OP if you need to. 2688 2689=item C<void bhk_post_end(pTHX_ OP **o)> 2690 2691This is called at the end of a lexical scope, just after unwinding the 2692stack. I<o> is as above. Note that it is possible for calls to C<pre_> 2693and C<post_end> to nest, if there is something on the save stack that 2694calls string eval. 2695 2696=item C<void bhk_eval(pTHX_ OP *const o)> 2697 2698This is called just before starting to compile an C<eval STRING>, C<do 2699FILE>, C<require> or C<use>, after the eval has been set up. I<o> is the 2700OP that requested the eval, and will normally be an C<OP_ENTEREVAL>, 2701C<OP_DOFILE> or C<OP_REQUIRE>. 2702 2703=back 2704 2705Once you have your hook functions, you need a C<BHK> structure to put 2706them in. It's best to allocate it statically, since there is no way to 2707free it once it's registered. The function pointers should be inserted 2708into this structure using the C<BhkENTRY_set> macro, which will also set 2709flags indicating which entries are valid. If you do need to allocate 2710your C<BHK> dynamically for some reason, be sure to zero it before you 2711start. 2712 2713Once registered, there is no mechanism to switch these hooks off, so if 2714that is necessary you will need to do this yourself. An entry in C<%^H> 2715is probably the best way, so the effect is lexically scoped; however it 2716is also possible to use the C<BhkDISABLE> and C<BhkENABLE> macros to 2717temporarily switch entries on and off. You should also be aware that 2718generally speaking at least one scope will have opened before your 2719extension is loaded, so you will see some C<pre>/C<post_end> pairs that 2720didn't have a matching C<start>. 2721 2722=head1 Examining internal data structures with the C<dump> functions 2723 2724To aid debugging, the source file F<dump.c> contains a number of 2725functions which produce formatted output of internal data structures. 2726 2727The most commonly used of these functions is C<Perl_sv_dump>; it's used 2728for dumping SVs, AVs, HVs, and CVs. The C<Devel::Peek> module calls 2729C<sv_dump> to produce debugging output from Perl-space, so users of that 2730module should already be familiar with its format. 2731 2732C<Perl_op_dump> can be used to dump an C<OP> structure or any of its 2733derivatives, and produces output similar to C<perl -Dx>; in fact, 2734C<Perl_dump_eval> will dump the main root of the code being evaluated, 2735exactly like C<-Dx>. 2736 2737=for apidoc_section $debugging 2738=for apidoc dump_eval 2739 2740Other useful functions are C<Perl_dump_sub>, which turns a C<GV> into an 2741op tree, C<Perl_dump_packsubs> which calls C<Perl_dump_sub> on all the 2742subroutines in a package like so: (Thankfully, these are all xsubs, so 2743there is no op tree) 2744 2745=for apidoc_section $debugging 2746=for apidoc dump_sub 2747 2748 (gdb) print Perl_dump_packsubs(PL_defstash) 2749 2750 SUB attributes::bootstrap = (xsub 0x811fedc 0) 2751 2752 SUB UNIVERSAL::can = (xsub 0x811f50c 0) 2753 2754 SUB UNIVERSAL::isa = (xsub 0x811f304 0) 2755 2756 SUB UNIVERSAL::VERSION = (xsub 0x811f7ac 0) 2757 2758 SUB DynaLoader::boot_DynaLoader = (xsub 0x805b188 0) 2759 2760and C<Perl_dump_all>, which dumps all the subroutines in the stash and 2761the op tree of the main root. 2762 2763=head1 How multiple interpreters and concurrency are supported 2764 2765=head2 Background and MULTIPLICITY 2766 2767=for apidoc_section $concurrency 2768=for apidoc Amnh||PERL_IMPLICIT_CONTEXT 2769 2770The Perl interpreter can be regarded as a closed box: it has an API 2771for feeding it code or otherwise making it do things, but it also has 2772functions for its own use. This smells a lot like an object, and 2773there is a way for you to build Perl so that you can have multiple 2774interpreters, with one interpreter represented either as a C structure, 2775or inside a thread-specific structure. These structures contain all 2776the context, the state of that interpreter. 2777 2778The macro that controls the major Perl build flavor is MULTIPLICITY. The 2779MULTIPLICITY build has a C structure that packages all the interpreter 2780state, which is being passed to various perl functions as a "hidden" 2781first argument. MULTIPLICITY makes multi-threaded perls possible (with the 2782ithreads threading model, related to the macro USE_ITHREADS.) 2783 2784PERL_IMPLICIT_CONTEXT is a legacy synonym for MULTIPLICITY. 2785 2786=for apidoc_section $concurrency 2787=for apidoc Amnh||MULTIPLICITY 2788 2789To see whether you have non-const data you can use a BSD (or GNU) 2790compatible C<nm>: 2791 2792 nm libperl.a | grep -v ' [TURtr] ' 2793 2794If this displays any C<D> or C<d> symbols (or possibly C<C> or C<c>), 2795you have non-const data. The symbols the C<grep> removed are as follows: 2796C<Tt> are I<text>, or code, the C<Rr> are I<read-only> (const) data, 2797and the C<U> is <undefined>, external symbols referred to. 2798 2799The test F<t/porting/libperl.t> does this kind of symbol sanity 2800checking on C<libperl.a>. 2801 2802All this obviously requires a way for the Perl internal functions to be 2803either subroutines taking some kind of structure as the first 2804argument, or subroutines taking nothing as the first argument. To 2805enable these two very different ways of building the interpreter, 2806the Perl source (as it does in so many other situations) makes heavy 2807use of macros and subroutine naming conventions. 2808 2809First problem: deciding which functions will be public API functions and 2810which will be private. All functions whose names begin C<S_> are private 2811(think "S" for "secret" or "static"). All other functions begin with 2812"Perl_", but just because a function begins with "Perl_" does not mean it is 2813part of the API. (See L</Internal 2814Functions>.) The easiest way to be B<sure> a 2815function is part of the API is to find its entry in L<perlapi>. 2816If it exists in L<perlapi>, it's part of the API. If it doesn't, and you 2817think it should be (i.e., you need it for your extension), submit an issue at 2818L<https://github.com/Perl/perl5/issues> explaining why you think it should be. 2819 2820Second problem: there must be a syntax so that the same subroutine 2821declarations and calls can pass a structure as their first argument, 2822or pass nothing. To solve this, the subroutines are named and 2823declared in a particular way. Here's a typical start of a static 2824function used within the Perl guts: 2825 2826 STATIC void 2827 S_incline(pTHX_ char *s) 2828 2829STATIC becomes "static" in C, and may be #define'd to nothing in some 2830configurations in the future. 2831 2832=for apidoc_section $directives 2833=for apidoc Ayh||STATIC 2834 2835A public function (i.e. part of the internal API, but not necessarily 2836sanctioned for use in extensions) begins like this: 2837 2838 void 2839 Perl_sv_setiv(pTHX_ SV* dsv, IV num) 2840 2841C<pTHX_> is one of a number of macros (in F<perl.h>) that hide the 2842details of the interpreter's context. THX stands for "thread", "this", 2843or "thingy", as the case may be. (And no, George Lucas is not involved. :-) 2844The first character could be 'p' for a B<p>rototype, 'a' for B<a>rgument, 2845or 'd' for B<d>eclaration, so we have C<pTHX>, C<aTHX> and C<dTHX>, and 2846their variants. 2847 2848=for apidoc_section $concurrency 2849=for apidoc Amnh||aTHX 2850=for apidoc Amnh||aTHX_ 2851=for apidoc Amnh||dTHX 2852=for apidoc Amnh||pTHX 2853=for apidoc Amnh||pTHX_ 2854 2855When Perl is built without options that set MULTIPLICITY, there is no 2856first argument containing the interpreter's context. The trailing underscore 2857in the pTHX_ macro indicates that the macro expansion needs a comma 2858after the context argument because other arguments follow it. If 2859MULTIPLICITY is not defined, pTHX_ will be ignored, and the 2860subroutine is not prototyped to take the extra argument. The form of the 2861macro without the trailing underscore is used when there are no additional 2862explicit arguments. 2863 2864When a core function calls another, it must pass the context. This 2865is normally hidden via macros. Consider C<sv_setiv>. It expands into 2866something like this: 2867 2868 #ifdef MULTIPLICITY 2869 #define sv_setiv(a,b) Perl_sv_setiv(aTHX_ a, b) 2870 /* can't do this for vararg functions, see below */ 2871 #else 2872 #define sv_setiv Perl_sv_setiv 2873 #endif 2874 2875This works well, and means that XS authors can gleefully write: 2876 2877 sv_setiv(foo, bar); 2878 2879and still have it work under all the modes Perl could have been 2880compiled with. 2881 2882This doesn't work so cleanly for varargs functions, though, as macros 2883imply that the number of arguments is known in advance. Instead we 2884either need to spell them out fully, passing C<aTHX_> as the first 2885argument (the Perl core tends to do this with functions like 2886Perl_warner), or use a context-free version. 2887 2888The context-free version of Perl_warner is called 2889Perl_warner_nocontext, and does not take the extra argument. Instead 2890it does C<dTHX;> to get the context from thread-local storage. We 2891C<#define warner Perl_warner_nocontext> so that extensions get source 2892compatibility at the expense of performance. (Passing an arg is 2893cheaper than grabbing it from thread-local storage.) 2894 2895You can ignore [pad]THXx when browsing the Perl headers/sources. 2896Those are strictly for use within the core. Extensions and embedders 2897need only be aware of [pad]THX. 2898 2899=head2 So what happened to dTHR? 2900 2901=for apidoc_section $concurrency 2902=for apidoc Amnh||dTHR 2903 2904C<dTHR> was introduced in perl 5.005 to support the older thread model. 2905The older thread model now uses the C<THX> mechanism to pass context 2906pointers around, so C<dTHR> is not useful any more. Perl 5.6.0 and 2907later still have it for backward source compatibility, but it is defined 2908to be a no-op. 2909 2910=head2 How do I use all this in extensions? 2911 2912See also L<perlclib/Dealing with embedded perls and threads>. 2913 2914When Perl is built with MULTIPLICITY, extensions that call 2915any functions in the Perl API will need to pass the initial context 2916argument somehow. The kicker is that you will need to write it in 2917such a way that the extension still compiles when Perl hasn't been 2918built with MULTIPLICITY enabled. 2919 2920There are three ways to do this. First, the easy but inefficient way, 2921which is also the default, in order to maintain source compatibility 2922with extensions: whenever F<XSUB.h> is #included, it redefines the aTHX 2923and aTHX_ macros to call a function that will return the context. 2924Thus, something like: 2925 2926 sv_setiv(sv, num); 2927 2928in your extension will translate to this when MULTIPLICITY is 2929in effect: 2930 2931 Perl_sv_setiv(Perl_get_context(), sv, num); 2932 2933or to this otherwise: 2934 2935 Perl_sv_setiv(sv, num); 2936 2937You don't have to do anything new in your extension to get this; since 2938the Perl library provides Perl_get_context(), it will all just 2939work. 2940 2941The second, more efficient way is to use the following template for 2942your Foo.xs: 2943 2944 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2945 #include "EXTERN.h" 2946 #include "perl.h" 2947 #include "XSUB.h" 2948 2949 STATIC void my_private_function(int arg1, int arg2); 2950 2951 STATIC void 2952 my_private_function(int arg1, int arg2) 2953 { 2954 dTHX; /* fetch context */ 2955 ... call many Perl API functions ... 2956 } 2957 2958 [... etc ...] 2959 2960 MODULE = Foo PACKAGE = Foo 2961 2962 /* typical XSUB */ 2963 2964 void 2965 my_xsub(arg) 2966 int arg 2967 CODE: 2968 my_private_function(arg, 10); 2969 2970Note that the only two changes from the normal way of writing an 2971extension is the addition of a C<#define PERL_NO_GET_CONTEXT> before 2972including the Perl headers, followed by a C<dTHX;> declaration at 2973the start of every function that will call the Perl API. (You'll 2974know which functions need this, because the C compiler will complain 2975that there's an undeclared identifier in those functions.) No changes 2976are needed for the XSUBs themselves, because the XS() macro is 2977correctly defined to pass in the implicit context if needed. 2978 2979=for apidoc_section $concurrency 2980=for apidoc AmnhU#||PERL_NO_GET_CONTEXT 2981 2982The third, even more efficient way is to ape how it is done within 2983the Perl guts: 2984 2985 2986 #define PERL_NO_GET_CONTEXT /* we want efficiency */ 2987 #include "EXTERN.h" 2988 #include "perl.h" 2989 #include "XSUB.h" 2990 2991 /* pTHX_ only needed for functions that call Perl API */ 2992 STATIC void my_private_function(pTHX_ int arg1, int arg2); 2993 2994 STATIC void 2995 my_private_function(pTHX_ int arg1, int arg2) 2996 { 2997 /* dTHX; not needed here, because THX is an argument */ 2998 ... call Perl API functions ... 2999 } 3000 3001 [... etc ...] 3002 3003 MODULE = Foo PACKAGE = Foo 3004 3005 /* typical XSUB */ 3006 3007 void 3008 my_xsub(arg) 3009 int arg 3010 CODE: 3011 my_private_function(aTHX_ arg, 10); 3012 3013This implementation never has to fetch the context using a function 3014call, since it is always passed as an extra argument. Depending on 3015your needs for simplicity or efficiency, you may mix the previous 3016two approaches freely. 3017 3018Never add a comma after C<pTHX> yourself--always use the form of the 3019macro with the underscore for functions that take explicit arguments, 3020or the form without the argument for functions with no explicit arguments. 3021 3022=head2 Should I do anything special if I call perl from multiple threads? 3023 3024If you create interpreters in one thread and then proceed to call them in 3025another, you need to make sure perl's own Thread Local Storage (TLS) slot is 3026initialized correctly in each of those threads. 3027 3028The C<perl_alloc> and C<perl_clone> API functions will automatically set 3029the TLS slot to the interpreter they created, so that there is no need to do 3030anything special if the interpreter is always accessed in the same thread that 3031created it, and that thread did not create or call any other interpreters 3032afterwards. If that is not the case, you have to set the TLS slot of the 3033thread before calling any functions in the Perl API on that particular 3034interpreter. This is done by calling the C<PERL_SET_CONTEXT> macro in that 3035thread as the first thing you do: 3036 3037 /* do this before doing anything else with some_perl */ 3038 PERL_SET_CONTEXT(some_perl); 3039 3040 ... other Perl API calls on some_perl go here ... 3041 3042=for apidoc_section $embedding 3043=for apidoc Amh|void|PERL_SET_CONTEXT|PerlInterpreter* i 3044 3045(You can always get the current context via C<PERL_GET_CONTEXT>.) 3046 3047=for apidoc Amnh|PerlInterpreter*|PERL_GET_CONTEXT| 3048 3049=head2 Future Plans and PERL_IMPLICIT_SYS 3050 3051Just as MULTIPLICITY provides a way to bundle up everything 3052that the interpreter knows about itself and pass it around, so too are 3053there plans to allow the interpreter to bundle up everything it knows 3054about the environment it's running on. This is enabled with the 3055PERL_IMPLICIT_SYS macro. Currently it only works with USE_ITHREADS on 3056Windows. 3057 3058This allows the ability to provide an extra pointer (called the "host" 3059environment) for all the system calls. This makes it possible for 3060all the system stuff to maintain their own state, broken down into 3061seven C structures. These are thin wrappers around the usual system 3062calls (see F<win32/perllib.c>) for the default perl executable, but for a 3063more ambitious host (like the one that would do fork() emulation) all 3064the extra work needed to pretend that different interpreters are 3065actually different "processes", would be done here. 3066 3067The Perl engine/interpreter and the host are orthogonal entities. 3068There could be one or more interpreters in a process, and one or 3069more "hosts", with free association between them. 3070 3071=head1 Internal Functions 3072 3073All of Perl's internal functions which will be exposed to the outside 3074world are prefixed by C<Perl_> so that they will not conflict with XS 3075functions or functions used in a program in which Perl is embedded. 3076Similarly, all global variables begin with C<PL_>. (By convention, 3077static functions start with C<S_>.) 3078 3079Inside the Perl core (C<PERL_CORE> defined), you can get at the functions 3080either with or without the C<Perl_> prefix, thanks to a bunch of defines 3081that live in F<embed.h>. Note that extension code should I<not> set 3082C<PERL_CORE>; this exposes the full perl internals, and is likely to cause 3083breakage of the XS in each new perl release. 3084 3085The file F<embed.h> is generated automatically from 3086F<embed.pl> and F<embed.fnc>. F<embed.pl> also creates the prototyping 3087header files for the internal functions, generates the documentation 3088and a lot of other bits and pieces. It's important that when you add 3089a new function to the core or change an existing one, you change the 3090data in the table in F<embed.fnc> as well. Here's a sample entry from 3091that table: 3092 3093 Apd |SV** |av_fetch |AV* ar|I32 key|I32 lval 3094 3095The first column is a set of flags, the second column the return type, 3096the third column the name. Columns after that are the arguments. 3097The flags are documented at the top of F<embed.fnc>. 3098 3099If you edit F<embed.pl> or F<embed.fnc>, you will need to run 3100C<make regen_headers> to force a rebuild of F<embed.h> and other 3101auto-generated files. 3102 3103=head2 Formatted Printing of IVs, UVs, and NVs 3104 3105If you are printing IVs, UVs, or NVS instead of the stdio(3) style 3106formatting codes like C<%d>, C<%ld>, C<%f>, you should use the 3107following macros for portability 3108 3109 IVdf IV in decimal 3110 UVuf UV in decimal 3111 UVof UV in octal 3112 UVxf UV in hexadecimal 3113 NVef NV %e-like 3114 NVff NV %f-like 3115 NVgf NV %g-like 3116 3117These will take care of 64-bit integers and long doubles. 3118For example: 3119 3120 printf("IV is %" IVdf "\n", iv); 3121 3122The C<IVdf> will expand to whatever is the correct format for the IVs. 3123Note that the spaces are required around the format in case the code is 3124compiled with C++, to maintain compliance with its standard. 3125 3126Note that there are different "long doubles": Perl will use 3127whatever the compiler has. 3128 3129If you are printing addresses of pointers, use %p or UVxf combined 3130with PTR2UV(). 3131 3132=head2 Formatted Printing of SVs 3133 3134The contents of SVs may be printed using the C<SVf> format, like so: 3135 3136 Perl_croak(aTHX_ "This croaked because: %" SVf "\n", SVfARG(err_msg)) 3137 3138where C<err_msg> is an SV. 3139 3140=for apidoc_section $io_formats 3141=for apidoc Amnh||SVf 3142=for apidoc Amh||SVfARG|SV *sv 3143 3144Not all scalar types are printable. Simple values certainly are: one of 3145IV, UV, NV, or PV. Also, if the SV is a reference to some value, 3146either it will be dereferenced and the value printed, or information 3147about the type of that value and its address are displayed. The results 3148of printing any other type of SV are undefined and likely to lead to an 3149interpreter crash. NVs are printed using a C<%g>-ish format. 3150 3151Note that the spaces are required around the C<SVf> in case the code is 3152compiled with C++, to maintain compliance with its standard. 3153 3154Note that any filehandle being printed to under UTF-8 must be expecting 3155UTF-8 in order to get good results and avoid Wide-character warnings. 3156One way to do this for typical filehandles is to invoke perl with the 3157C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3158 3159You can use this to concatenate two scalars: 3160 3161 SV *var1 = get_sv("var1", GV_ADD); 3162 SV *var2 = get_sv("var2", GV_ADD); 3163 SV *var3 = newSVpvf("var1=%" SVf " and var2=%" SVf, 3164 SVfARG(var1), SVfARG(var2)); 3165 3166=for apidoc Amnh||SVf_QUOTEDPREFIX 3167 3168C<SVf_QUOTEDPREFIX> is similar to C<SVf> except that it restricts the 3169number of the characters printed, showing at most the first 3170C<PERL_QUOTEDPREFIX_LEN> characters of the argument, and rendering it with 3171double quotes and with the contents escaped using double quoted string 3172escaping rules. If the string is longer than this then ellipses "..." 3173will be appended after the trailing quote. This is intended for error 3174messages where the string is assumed to be a class name. 3175 3176=for apidoc Amnh||HvNAMEf 3177=for apidoc Amnh||HvNAMEf_QUOTEDPREFIX 3178 3179C<HvNAMEf> and C<HvNAMEf_QUOTEDPREFIX> are similar to C<SVf> except they 3180extract the string, length and utf8 flags from the argument using the 3181C<HvNAME()>, C<HvNAMELEN()>, C<HvNAMEUTF8()> macros. This is intended 3182for stringifying a class name directly from an stash HV. 3183 3184=head2 Formatted Printing of Strings 3185 3186If you just want the bytes printed in a 7bit NUL-terminated string, you can 3187just use C<%s> (assuming they are all really only 7bit). But if there is a 3188possibility the value will be encoded as UTF-8 or contains bytes above 3189C<0x7F> (and therefore 8bit), you should instead use the C<UTF8f> format. 3190And as its parameter, use the C<UTF8fARG()> macro: 3191 3192 chr * msg; 3193 3194 /* U+2018: \xE2\x80\x98 LEFT SINGLE QUOTATION MARK 3195 U+2019: \xE2\x80\x99 RIGHT SINGLE QUOTATION MARK */ 3196 if (can_utf8) 3197 msg = "\xE2\x80\x98Uses fancy quotes\xE2\x80\x99"; 3198 else 3199 msg = "'Uses simple quotes'"; 3200 3201 Perl_croak(aTHX_ "The message is: %" UTF8f "\n", 3202 UTF8fARG(can_utf8, strlen(msg), msg)); 3203 3204The first parameter to C<UTF8fARG> is a boolean: 1 if the string is in 3205UTF-8; 0 if string is in native byte encoding (Latin1). 3206The second parameter is the number of bytes in the string to print. 3207And the third and final parameter is a pointer to the first byte in the 3208string. 3209 3210Note that any filehandle being printed to under UTF-8 must be expecting 3211UTF-8 in order to get good results and avoid Wide-character warnings. 3212One way to do this for typical filehandles is to invoke perl with the 3213C<-C> parameter. (See L<perlrun/-C [numberE<sol>list]>. 3214 3215=for apidoc_section $io_formats 3216=for apidoc Amnh||UTF8f 3217Output a possibly UTF8 value. Be sure to use UTF8fARG() to compose 3218the arguments for this format. 3219=for apidoc Amnh||UTF8f_QUOTEDPREFIX 3220Same as C<UTF8f> but the output is quoted, escaped and length limited. 3221See C<SVf_QUOTEDPREFIX> for more details on escaping. 3222=for apidoc Amh||UTF8fARG|bool is_utf8|Size_t byte_len|char *str 3223 3224=cut 3225 3226=head2 Formatted Printing of C<Size_t> and C<SSize_t> 3227 3228The most general way to do this is to cast them to a UV or IV, and 3229print as in the 3230L<previous section|/Formatted Printing of IVs, UVs, and NVs>. 3231 3232But if you're using C<PerlIO_printf()>, it's less typing and visual 3233clutter to use the C<%z> length modifier (for I<siZe>): 3234 3235 PerlIO_printf("STRLEN is %zu\n", len); 3236 3237This modifier is not portable, so its use should be restricted to 3238C<PerlIO_printf()>. 3239 3240=head2 Formatted Printing of C<Ptrdiff_t>, C<intmax_t>, C<short> and other special sizes 3241 3242There are modifiers for these special situations if you are using 3243C<PerlIO_printf()>. See L<perlfunc/size>. 3244 3245=head2 Pointer-To-Integer and Integer-To-Pointer 3246 3247Because pointer size does not necessarily equal integer size, 3248use the follow macros to do it right. 3249 3250 PTR2UV(pointer) 3251 PTR2IV(pointer) 3252 PTR2NV(pointer) 3253 INT2PTR(pointertotype, integer) 3254 3255=for apidoc_section $casting 3256=for apidoc Amh|type|INT2PTR|type|int value 3257=for apidoc Amh|UV|PTR2UV|void * ptr 3258=for apidoc Amh|IV|PTR2IV|void * ptr 3259=for apidoc Amh|NV|PTR2NV|void * ptr 3260 3261For example: 3262 3263 IV iv = ...; 3264 SV *sv = INT2PTR(SV*, iv); 3265 3266and 3267 3268 AV *av = ...; 3269 UV uv = PTR2UV(av); 3270 3271There are also 3272 3273 PTR2nat(pointer) /* pointer to integer of PTRSIZE */ 3274 PTR2ul(pointer) /* pointer to unsigned long */ 3275 3276=for apidoc Amh|IV|PTR2nat|void * 3277=for apidoc Amh|unsigned long|PTR2ul|void * 3278 3279And C<PTRV> which gives the native type for an integer the same size as 3280pointers, such as C<unsigned> or C<unsigned long>. 3281 3282=for apidoc Ayh|type|PTRV 3283 3284=head2 Exception Handling 3285 3286There are a couple of macros to do very basic exception handling in XS 3287modules. You have to define C<NO_XSLOCKS> before including F<XSUB.h> to 3288be able to use these macros: 3289 3290 #define NO_XSLOCKS 3291 #include "XSUB.h" 3292 3293You can use these macros if you call code that may croak, but you need 3294to do some cleanup before giving control back to Perl. For example: 3295 3296 dXCPT; /* set up necessary variables */ 3297 3298 XCPT_TRY_START { 3299 code_that_may_croak(); 3300 } XCPT_TRY_END 3301 3302 XCPT_CATCH 3303 { 3304 /* do cleanup here */ 3305 XCPT_RETHROW; 3306 } 3307 3308Note that you always have to rethrow an exception that has been 3309caught. Using these macros, it is not possible to just catch the 3310exception and ignore it. If you have to ignore the exception, you 3311have to use the C<call_*> function. 3312 3313The advantage of using the above macros is that you don't have 3314to setup an extra function for C<call_*>, and that using these 3315macros is faster than using C<call_*>. 3316 3317=head2 Source Documentation 3318 3319There's an effort going on to document the internal functions and 3320automatically produce reference manuals from them -- L<perlapi> is one 3321such manual which details all the functions which are available to XS 3322writers. L<perlintern> is the autogenerated manual for the functions 3323which are not part of the API and are supposedly for internal use only. 3324 3325Source documentation is created by putting POD comments into the C 3326source, like this: 3327 3328 /* 3329 =for apidoc sv_setiv 3330 3331 Copies an integer into the given SV. Does not handle 'set' magic. See 3332 L<perlapi/sv_setiv_mg>. 3333 3334 =cut 3335 */ 3336 3337Please try and supply some documentation if you add functions to the 3338Perl core. 3339 3340=head2 Backwards compatibility 3341 3342The Perl API changes over time. New functions are 3343added or the interfaces of existing functions are 3344changed. The C<Devel::PPPort> module tries to 3345provide compatibility code for some of these changes, so XS writers don't 3346have to code it themselves when supporting multiple versions of Perl. 3347 3348C<Devel::PPPort> generates a C header file F<ppport.h> that can also 3349be run as a Perl script. To generate F<ppport.h>, run: 3350 3351 perl -MDevel::PPPort -eDevel::PPPort::WriteFile 3352 3353Besides checking existing XS code, the script can also be used to retrieve 3354compatibility information for various API calls using the C<--api-info> 3355command line switch. For example: 3356 3357 % perl ppport.h --api-info=sv_magicext 3358 3359For details, see S<C<perldoc ppport.h>>. 3360 3361=head1 Unicode Support 3362 3363Perl 5.6.0 introduced Unicode support. It's important for porters and XS 3364writers to understand this support and make sure that the code they 3365write does not corrupt Unicode data. 3366 3367=head2 What B<is> Unicode, anyway? 3368 3369In the olden, less enlightened times, we all used to use ASCII. Most of 3370us did, anyway. The big problem with ASCII is that it's American. Well, 3371no, that's not actually the problem; the problem is that it's not 3372particularly useful for people who don't use the Roman alphabet. What 3373used to happen was that particular languages would stick their own 3374alphabet in the upper range of the sequence, between 128 and 255. Of 3375course, we then ended up with plenty of variants that weren't quite 3376ASCII, and the whole point of it being a standard was lost. 3377 3378Worse still, if you've got a language like Chinese or 3379Japanese that has hundreds or thousands of characters, then you really 3380can't fit them into a mere 256, so they had to forget about ASCII 3381altogether, and build their own systems using pairs of numbers to refer 3382to one character. 3383 3384To fix this, some people formed Unicode, Inc. and 3385produced a new character set containing all the characters you can 3386possibly think of and more. There are several ways of representing these 3387characters, and the one Perl uses is called UTF-8. UTF-8 uses 3388a variable number of bytes to represent a character. You can learn more 3389about Unicode and Perl's Unicode model in L<perlunicode>. 3390 3391(On EBCDIC platforms, Perl uses instead UTF-EBCDIC, which is a form of 3392UTF-8 adapted for EBCDIC platforms. Below, we just talk about UTF-8. 3393UTF-EBCDIC is like UTF-8, but the details are different. The macros 3394hide the differences from you, just remember that the particular numbers 3395and bit patterns presented below will differ in UTF-EBCDIC.) 3396 3397=head2 How can I recognise a UTF-8 string? 3398 3399You can't. This is because UTF-8 data is stored in bytes just like 3400non-UTF-8 data. The Unicode character 200, (C<0xC8> for you hex types) 3401capital E with a grave accent, is represented by the two bytes 3402C<v196.172>. Unfortunately, the non-Unicode string C<chr(196).chr(172)> 3403has that byte sequence as well. So you can't tell just by looking -- this 3404is what makes Unicode input an interesting problem. 3405 3406In general, you either have to know what you're dealing with, or you 3407have to guess. The API function C<is_utf8_string> can help; it'll tell 3408you if a string contains only valid UTF-8 characters, and the chances 3409of a non-UTF-8 string looking like valid UTF-8 become very small very 3410quickly with increasing string length. On a character-by-character 3411basis, C<isUTF8_CHAR> 3412will tell you whether the current character in a string is valid UTF-8. 3413 3414=head2 How does UTF-8 represent Unicode characters? 3415 3416As mentioned above, UTF-8 uses a variable number of bytes to store a 3417character. Characters with values 0...127 are stored in one 3418byte, just like good ol' ASCII. Character 128 is stored as 3419C<v194.128>; this continues up to character 191, which is 3420C<v194.191>. Now we've run out of bits (191 is binary 3421C<10111111>) so we move on; character 192 is C<v195.128>. And 3422so it goes on, moving to three bytes at character 2048. 3423L<perlunicode/Unicode Encodings> has pictures of how this works. 3424 3425Assuming you know you're dealing with a UTF-8 string, you can find out 3426how long the first character in it is with the C<UTF8SKIP> macro: 3427 3428 char *utf = "\305\233\340\240\201"; 3429 I32 len; 3430 3431 len = UTF8SKIP(utf); /* len is 2 here */ 3432 utf += len; 3433 len = UTF8SKIP(utf); /* len is 3 here */ 3434 3435Another way to skip over characters in a UTF-8 string is to use 3436C<utf8_hop>, which takes a string and a number of characters to skip 3437over. You're on your own about bounds checking, though, so don't use it 3438lightly. 3439 3440All bytes in a multi-byte UTF-8 character will have the high bit set, 3441so you can test if you need to do something special with this 3442character like this (the C<UTF8_IS_INVARIANT()> is a macro that tests 3443whether the byte is encoded as a single byte even in UTF-8): 3444 3445 U8 *utf; /* Initialize this to point to the beginning of the 3446 sequence to convert */ 3447 U8 *utf_end; /* Initialize this to 1 beyond the end of the sequence 3448 pointed to by 'utf' */ 3449 UV uv; /* Returned code point; note: a UV, not a U8, not a 3450 char */ 3451 STRLEN len; /* Returned length of character in bytes */ 3452 3453 if (!UTF8_IS_INVARIANT(*utf)) 3454 /* Must treat this as UTF-8 */ 3455 uv = utf8_to_uvchr_buf(utf, utf_end, &len); 3456 else 3457 /* OK to treat this character as a byte */ 3458 uv = *utf; 3459 3460You can also see in that example that we use C<utf8_to_uvchr_buf> to get the 3461value of the character; the inverse function C<uvchr_to_utf8> is available 3462for putting a UV into UTF-8: 3463 3464 if (!UVCHR_IS_INVARIANT(uv)) 3465 /* Must treat this as UTF8 */ 3466 utf8 = uvchr_to_utf8(utf8, uv); 3467 else 3468 /* OK to treat this character as a byte */ 3469 *utf8++ = uv; 3470 3471You B<must> convert characters to UVs using the above functions if 3472you're ever in a situation where you have to match UTF-8 and non-UTF-8 3473characters. You may not skip over UTF-8 characters in this case. If you 3474do this, you'll lose the ability to match hi-bit non-UTF-8 characters; 3475for instance, if your UTF-8 string contains C<v196.172>, and you skip 3476that character, you can never match a C<chr(200)> in a non-UTF-8 string. 3477So don't do that! 3478 3479(Note that we don't have to test for invariant characters in the 3480examples above. The functions work on any well-formed UTF-8 input. 3481It's just that its faster to avoid the function overhead when it's not 3482needed.) 3483 3484=head2 How does Perl store UTF-8 strings? 3485 3486Currently, Perl deals with UTF-8 strings and non-UTF-8 strings 3487slightly differently. A flag in the SV, C<SVf_UTF8>, indicates that the 3488string is internally encoded as UTF-8. Without it, the byte value is the 3489codepoint number and vice versa. This flag is only meaningful if the SV 3490is C<SvPOK> or immediately after stringification via C<SvPV> or a 3491similar macro. You can check and manipulate this flag with the 3492following macros: 3493 3494 SvUTF8(sv) 3495 SvUTF8_on(sv) 3496 SvUTF8_off(sv) 3497 3498This flag has an important effect on Perl's treatment of the string: if 3499UTF-8 data is not properly distinguished, regular expressions, 3500C<length>, C<substr> and other string handling operations will have 3501undesirable (wrong) results. 3502 3503The problem comes when you have, for instance, a string that isn't 3504flagged as UTF-8, and contains a byte sequence that could be UTF-8 -- 3505especially when combining non-UTF-8 and UTF-8 strings. 3506 3507Never forget that the C<SVf_UTF8> flag is separate from the PV value; you 3508need to be sure you don't accidentally knock it off while you're 3509manipulating SVs. More specifically, you cannot expect to do this: 3510 3511 SV *sv; 3512 SV *nsv; 3513 STRLEN len; 3514 char *p; 3515 3516 p = SvPV(sv, len); 3517 frobnicate(p); 3518 nsv = newSVpvn(p, len); 3519 3520The C<char*> string does not tell you the whole story, and you can't 3521copy or reconstruct an SV just by copying the string value. Check if the 3522old SV has the UTF8 flag set (I<after> the C<SvPV> call), and act 3523accordingly: 3524 3525 p = SvPV(sv, len); 3526 is_utf8 = SvUTF8(sv); 3527 frobnicate(p, is_utf8); 3528 nsv = newSVpvn(p, len); 3529 if (is_utf8) 3530 SvUTF8_on(nsv); 3531 3532In the above, your C<frobnicate> function has been changed to be made 3533aware of whether or not it's dealing with UTF-8 data, so that it can 3534handle the string appropriately. 3535 3536Since just passing an SV to an XS function and copying the data of 3537the SV is not enough to copy the UTF8 flags, even less right is just 3538passing a S<C<char *>> to an XS function. 3539 3540For full generality, use the L<C<DO_UTF8>|perlapi/DO_UTF8> macro to see if the 3541string in an SV is to be I<treated> as UTF-8. This takes into account 3542if the call to the XS function is being made from within the scope of 3543L<S<C<use bytes>>|bytes>. If so, the underlying bytes that comprise the 3544UTF-8 string are to be exposed, rather than the character they 3545represent. But this pragma should only really be used for debugging and 3546perhaps low-level testing at the byte level. Hence most XS code need 3547not concern itself with this, but various areas of the perl core do need 3548to support it. 3549 3550And this isn't the whole story. Starting in Perl v5.12, strings that 3551aren't encoded in UTF-8 may also be treated as Unicode under various 3552conditions (see L<perlunicode/ASCII Rules versus Unicode Rules>). 3553This is only really a problem for characters whose ordinals are between 3554128 and 255, and their behavior varies under ASCII versus Unicode rules 3555in ways that your code cares about (see L<perlunicode/The "Unicode Bug">). 3556There is no published API for dealing with this, as it is subject to 3557change, but you can look at the code for C<pp_lc> in F<pp.c> for an 3558example as to how it's currently done. 3559 3560=head2 How do I pass a Perl string to a C library? 3561 3562A Perl string, conceptually, is an opaque sequence of code points. 3563Many C libraries expect their inputs to be "classical" C strings, which are 3564arrays of octets 1-255, terminated with a NUL byte. Your job when writing 3565an interface between Perl and a C library is to define the mapping between 3566Perl and that library. 3567 3568Generally speaking, C<SvPVbyte> and related macros suit this task well. 3569These assume that your Perl string is a "byte string", i.e., is either 3570raw, undecoded input into Perl or is pre-encoded to, e.g., UTF-8. 3571 3572Alternatively, if your C library expects UTF-8 text, you can use 3573C<SvPVutf8> and related macros. This has the same effect as encoding 3574to UTF-8 then calling the corresponding C<SvPVbyte>-related macro. 3575 3576Some C libraries may expect other encodings (e.g., UTF-16LE). To give 3577Perl strings to such libraries 3578you must either do that encoding in Perl then use C<SvPVbyte>, or 3579use an intermediary C library to convert from however Perl stores the 3580string to the desired encoding. 3581 3582Take care also that NULs in your Perl string don't confuse the C 3583library. If possible, give the string's length to the C library; if that's 3584not possible, consider rejecting strings that contain NUL bytes. 3585 3586=head3 What about C<SvPV>, C<SvPV_nolen>, etc.? 3587 3588Consider a 3-character Perl string C<$foo = "\x64\x78\x8c">. 3589Perl can store these 3 characters either of two ways: 3590 3591=over 3592 3593=item * bytes: 0x64 0x78 0x8c 3594 3595=item * UTF-8: 0x64 0x78 0xc2 0x8c 3596 3597=back 3598 3599Now let's say you convert C<$foo> to a C string thus: 3600 3601 STRLEN strlen; 3602 char *str = SvPV(foo_sv, strlen); 3603 3604At this point C<str> could point to a 3-byte C string or a 4-byte one. 3605 3606Generally speaking, we want C<str> to be the same regardless of how 3607Perl stores C<$foo>, so the ambiguity here is undesirable. C<SvPVbyte> 3608and C<SvPVutf8> solve that by giving predictable output: use 3609C<SvPVbyte> if your C library expects byte strings, or C<SvPVutf8> 3610if it expects UTF-8. 3611 3612If your C library happens to support both encodings, then C<SvPV>--always 3613in tandem with lookups to C<SvUTF8>!--may be safe and (slightly) more 3614efficient. 3615 3616B<TESTING> B<TIP:> Use L<utf8>'s C<upgrade> and C<downgrade> functions 3617in your tests to ensure consistent handling regardless of Perl's 3618internal encoding. 3619 3620=head2 How do I convert a string to UTF-8? 3621 3622If you're mixing UTF-8 and non-UTF-8 strings, it is necessary to upgrade 3623the non-UTF-8 strings to UTF-8. If you've got an SV, the easiest way to do 3624this is: 3625 3626 sv_utf8_upgrade(sv); 3627 3628However, you must not do this, for example: 3629 3630 if (!SvUTF8(left)) 3631 sv_utf8_upgrade(left); 3632 3633If you do this in a binary operator, you will actually change one of the 3634strings that came into the operator, and, while it shouldn't be noticeable 3635by the end user, it can cause problems in deficient code. 3636 3637Instead, C<bytes_to_utf8> will give you a UTF-8-encoded B<copy> of its 3638string argument. This is useful for having the data available for 3639comparisons and so on, without harming the original SV. There's also 3640C<utf8_to_bytes> to go the other way, but naturally, this will fail if 3641the string contains any characters above 255 that can't be represented 3642in a single byte. 3643 3644=head2 How do I compare strings? 3645 3646L<perlapi/sv_cmp> and L<perlapi/sv_cmp_flags> do a lexigraphic 3647comparison of two SV's, and handle UTF-8ness properly. Note, however, 3648that Unicode specifies a much fancier mechanism for collation, available 3649via the L<Unicode::Collate> module. 3650 3651To just compare two strings for equality/non-equality, you can just use 3652L<C<memEQ()>|perlapi/memEQ> and L<C<memNE()>|perlapi/memEQ> as usual, 3653except the strings must be both UTF-8 or not UTF-8 encoded. 3654 3655To compare two strings case-insensitively, use 3656L<C<foldEQ_utf8()>|perlapi/foldEQ_utf8> (the strings don't have to have 3657the same UTF-8ness). 3658 3659=head2 Is there anything else I need to know? 3660 3661Not really. Just remember these things: 3662 3663=over 3 3664 3665=item * 3666 3667There's no way to tell if a S<C<char *>> or S<C<U8 *>> string is UTF-8 3668or not. But you can tell if an SV is to be treated as UTF-8 by calling 3669C<DO_UTF8> on it, after stringifying it with C<SvPV> or a similar 3670macro. And, you can tell if SV is actually UTF-8 (even if it is not to 3671be treated as such) by looking at its C<SvUTF8> flag (again after 3672stringifying it). Don't forget to set the flag if something should be 3673UTF-8. 3674Treat the flag as part of the PV, even though it's not -- if you pass on 3675the PV to somewhere, pass on the flag too. 3676 3677=item * 3678 3679If a string is UTF-8, B<always> use C<utf8_to_uvchr_buf> to get at the value, 3680unless C<UTF8_IS_INVARIANT(*s)> in which case you can use C<*s>. 3681 3682=item * 3683 3684When writing a character UV to a UTF-8 string, B<always> use 3685C<uvchr_to_utf8>, unless C<UVCHR_IS_INVARIANT(uv))> in which case 3686you can use C<*s = uv>. 3687 3688=item * 3689 3690Mixing UTF-8 and non-UTF-8 strings is 3691tricky. Use C<bytes_to_utf8> to get 3692a new string which is UTF-8 encoded, and then combine them. 3693 3694=back 3695 3696=head1 Custom Operators 3697 3698Custom operator support is an experimental feature that allows you to 3699define your own ops. This is primarily to allow the building of 3700interpreters for other languages in the Perl core, but it also allows 3701optimizations through the creation of "macro-ops" (ops which perform the 3702functions of multiple ops which are usually executed together, such as 3703C<gvsv, gvsv, add>.) 3704 3705This feature is implemented as a new op type, C<OP_CUSTOM>. The Perl 3706core does not "know" anything special about this op type, and so it will 3707not be involved in any optimizations. This also means that you can 3708define your custom ops to be any op structure -- unary, binary, list and 3709so on -- you like. 3710 3711It's important to know what custom operators won't do for you. They 3712won't let you add new syntax to Perl, directly. They won't even let you 3713add new keywords, directly. In fact, they won't change the way Perl 3714compiles a program at all. You have to do those changes yourself, after 3715Perl has compiled the program. You do this either by manipulating the op 3716tree using a C<CHECK> block and the C<B::Generate> module, or by adding 3717a custom peephole optimizer with the C<optimize> module. 3718 3719When you do this, you replace ordinary Perl ops with custom ops by 3720creating ops with the type C<OP_CUSTOM> and the C<op_ppaddr> of your own 3721PP function. This should be defined in XS code, and should look like 3722the PP ops in C<pp_*.c>. You are responsible for ensuring that your op 3723takes the appropriate number of values from the stack, and you are 3724responsible for adding stack marks if necessary. 3725 3726You should also "register" your op with the Perl interpreter so that it 3727can produce sensible error and warning messages. Since it is possible to 3728have multiple custom ops within the one "logical" op type C<OP_CUSTOM>, 3729Perl uses the value of C<< o->op_ppaddr >> to determine which custom op 3730it is dealing with. You should create an C<XOP> structure for each 3731ppaddr you use, set the properties of the custom op with 3732C<XopENTRY_set>, and register the structure against the ppaddr using 3733C<Perl_custom_op_register>. A trivial example might look like: 3734 3735=for apidoc_section $optree_manipulation 3736=for apidoc Ayh||XOP 3737 3738 static XOP my_xop; 3739 static OP *my_pp(pTHX); 3740 3741 BOOT: 3742 XopENTRY_set(&my_xop, xop_name, "myxop"); 3743 XopENTRY_set(&my_xop, xop_desc, "Useless custom op"); 3744 Perl_custom_op_register(aTHX_ my_pp, &my_xop); 3745 3746The available fields in the structure are: 3747 3748=over 4 3749 3750=item xop_name 3751 3752A short name for your op. This will be included in some error messages, 3753and will also be returned as C<< $op->name >> by the L<B|B> module, so 3754it will appear in the output of module like L<B::Concise|B::Concise>. 3755 3756=item xop_desc 3757 3758A short description of the function of the op. 3759 3760=item xop_class 3761 3762Which of the various C<*OP> structures this op uses. This should be one of 3763the C<OA_*> constants from F<op.h>, namely 3764 3765=over 4 3766 3767=item OA_BASEOP 3768 3769=item OA_UNOP 3770 3771=item OA_BINOP 3772 3773=item OA_LOGOP 3774 3775=item OA_LISTOP 3776 3777=item OA_PMOP 3778 3779=item OA_SVOP 3780 3781=item OA_PADOP 3782 3783=item OA_PVOP_OR_SVOP 3784 3785This should be interpreted as 'C<PVOP>' only. The C<_OR_SVOP> is because 3786the only core C<PVOP>, C<OP_TRANS>, can sometimes be a C<SVOP> instead. 3787 3788=item OA_LOOP 3789 3790=item OA_COP 3791 3792=for apidoc_section $optree_manipulation 3793=for apidoc Amnh||OA_BASEOP 3794=for apidoc_item OA_BINOP 3795=for apidoc_item OA_COP 3796=for apidoc_item OA_LISTOP 3797=for apidoc_item OA_LOGOP 3798=for apidoc_item OA_LOOP 3799=for apidoc_item OA_PADOP 3800=for apidoc_item OA_PMOP 3801=for apidoc_item OA_PVOP_OR_SVOP 3802=for apidoc_item OA_SVOP 3803=for apidoc_item OA_UNOP 3804 3805=back 3806 3807The other C<OA_*> constants should not be used. 3808 3809=item xop_peep 3810 3811This member is of type C<Perl_cpeep_t>, which expands to C<void 3812(*Perl_cpeep_t)(aTHX_ OP *o, OP *oldop)>. If it is set, this function 3813will be called from C<Perl_rpeep> when ops of this type are encountered 3814by the peephole optimizer. I<o> is the OP that needs optimizing; 3815I<oldop> is the previous OP optimized, whose C<op_next> points to I<o>. 3816 3817=for apidoc_section $optree_manipulation 3818=for apidoc Ayh||Perl_cpeep_t 3819 3820=back 3821 3822C<B::Generate> directly supports the creation of custom ops by name. 3823 3824=head1 Stacks 3825 3826Descriptions above occasionally refer to "the stack", but there are in fact 3827many stack-like data structures within the perl interpreter. When otherwise 3828unqualified, "the stack" usually refers to the value stack. 3829 3830The various stacks have different purposes, and operate in slightly different 3831ways. Their differences are noted below. 3832 3833=head2 Value Stack 3834 3835This stack stores the values that regular perl code is operating on, usually 3836intermediate values of expressions within a statement. The stack itself is 3837formed of an array of SV pointers. 3838 3839The base of this stack is pointed to by the interpreter variable 3840C<PL_stack_base>, of type C<SV **>. 3841 3842=for apidoc_section $stack 3843=for apidoc Amnh||PL_stack_base 3844 3845The head of the stack is C<PL_stack_sp>, and points to the most 3846recently-pushed item. 3847 3848=for apidoc Amnh||PL_stack_sp 3849 3850Items are pushed to the stack by using the C<PUSHs()> macro or its variants 3851described above; C<XPUSHs()>, C<mPUSHs()>, C<mXPUSHs()> and the typed 3852versions. Note carefully that the non-C<X> versions of these macros do not 3853check the size of the stack and assume it to be big enough. These must be 3854paired with a suitable check of the stack's size, such as the C<EXTEND> macro 3855to ensure it is large enough. For example 3856 3857 EXTEND(SP, 4); 3858 mPUSHi(10); 3859 mPUSHi(20); 3860 mPUSHi(30); 3861 mPUSHi(40); 3862 3863This is slightly more performant than making four separate checks in four 3864separate C<mXPUSHi()> calls. 3865 3866As a further performance optimisation, the various C<PUSH> macros all operate 3867using a local variable C<SP>, rather than the interpreter-global variable 3868C<PL_stack_sp>. This variable is declared by the C<dSP> macro - though it is 3869normally implied by XSUBs and similar so it is rare you have to consider it 3870directly. Once declared, the C<PUSH> macros will operate only on this local 3871variable, so before invoking any other perl core functions you must use the 3872C<PUTBACK> macro to return the value from the local C<SP> variable back to 3873the interpreter variable. Similarly, after calling a perl core function which 3874may have had reason to move the stack or push/pop values to it, you must use 3875the C<SPAGAIN> macro which refreshes the local C<SP> value back from the 3876interpreter one. 3877 3878Items are popped from the stack by using the C<POPs> macro or its typed 3879versions, There is also a macro C<TOPs> that inspects the topmost item without 3880removing it. 3881 3882=for apidoc_section $stack 3883=for apidoc Amnh||TOPs 3884 3885Note specifically that SV pointers on the value stack do not contribute to the 3886overall reference count of the xVs being referred to. If newly-created xVs are 3887being pushed to the stack you must arrange for them to be destroyed at a 3888suitable time; usually by using one of the C<mPUSH*> macros or C<sv_2mortal()> 3889to mortalise the xV. 3890 3891=head2 Mark Stack 3892 3893The value stack stores individual perl scalar values as temporaries between 3894expressions. Some perl expressions operate on entire lists; for that purpose 3895we need to know where on the stack each list begins. This is the purpose of the 3896mark stack. 3897 3898The mark stack stores integers as I32 values, which are the height of the 3899value stack at the time before the list began; thus the mark itself actually 3900points to the value stack entry one before the list. The list itself starts at 3901C<mark + 1>. 3902 3903The base of this stack is pointed to by the interpreter variable 3904C<PL_markstack>, of type C<I32 *>. 3905 3906=for apidoc_section $stack 3907=for apidoc Amnh||PL_markstack 3908 3909The head of the stack is C<PL_markstack_ptr>, and points to the most 3910recently-pushed item. 3911 3912=for apidoc Amnh||PL_markstack_ptr 3913 3914Items are pushed to the stack by using the C<PUSHMARK()> macro. Even though 3915the stack itself stores (value) stack indices as integers, the C<PUSHMARK> 3916macro should be given a stack pointer directly; it will calculate the index 3917offset by comparing to the C<PL_stack_sp> variable. Thus almost always the 3918code to perform this is 3919 3920 PUSHMARK(SP); 3921 3922Items are popped from the stack by the C<POPMARK> macro. There is also a macro 3923C<TOPMARK> that inspects the topmost item without removing it. These macros 3924return I32 index values directly. There is also the C<dMARK> macro which 3925declares a new SV double-pointer variable, called C<mark>, which points at the 3926marked stack slot; this is the usual macro that C code will use when operating 3927on lists given on the stack. 3928 3929As noted above, the C<mark> variable itself will point at the most recently 3930pushed value on the value stack before the list begins, and so the list itself 3931starts at C<mark + 1>. The values of the list may be iterated by code such as 3932 3933 for(SV **svp = mark + 1; svp <= PL_stack_sp; svp++) { 3934 SV *item = *svp; 3935 ... 3936 } 3937 3938Note specifically in the case that the list is already empty, C<mark> will 3939equal C<PL_stack_sp>. 3940 3941Because the C<mark> variable is converted to a pointer on the value stack, 3942extra care must be taken if C<EXTEND> or any of the C<XPUSH> macros are 3943invoked within the function, because the stack may need to be moved to 3944extend it and so the existing pointer will now be invalid. If this may be a 3945problem, a possible solution is to track the mark offset as an integer and 3946track the mark itself later on after the stack had been moved. 3947 3948 I32 markoff = POPMARK; 3949 3950 ... 3951 3952 SP **mark = PL_stack_base + markoff; 3953 3954=head2 Temporaries Stack 3955 3956As noted above, xV references on the main value stack do not contribute to the 3957reference count of an xV, and so another mechanism is used to track when 3958temporary values which live on the stack must be released. This is the job of 3959the temporaries stack. 3960 3961The temporaries stack stores pointers to xVs whose reference counts will be 3962decremented soon. 3963 3964The base of this stack is pointed to by the interpreter variable 3965C<PL_tmps_stack>, of type C<SV **>. 3966 3967=for apidoc_section $stack 3968=for apidoc Amnh||PL_tmps_stack 3969 3970The head of the stack is indexed by C<PL_tmps_ix>, an integer which stores the 3971index in the array of the most recently-pushed item. 3972 3973=for apidoc Amnh||PL_tmps_ix 3974 3975There is no public API to directly push items to the temporaries stack. Instead, 3976the API function C<sv_2mortal()> is used to mortalize an xV, adding its 3977address to the temporaries stack. 3978 3979Likewise, there is no public API to read values from the temporaries stack. 3980Instead, the macros C<SAVETMPS> and C<FREETMPS> are used. The C<SAVETMPS> 3981macro establishes the base levels of the temporaries stack, by capturing the 3982current value of C<PL_tmps_ix> into C<PL_tmps_floor> and saving the previous 3983value to the save stack. Thereafter, whenever C<FREETMPS> is invoked all of 3984the temporaries that have been pushed since that level are reclaimed. 3985 3986=for apidoc_section $stack 3987=for apidoc Amnh||PL_tmps_floor 3988 3989While it is common to see these two macros in pairs within an C<ENTER>/ 3990C<LEAVE> pair, it is not necessary to match them. It is permitted to invoke 3991C<FREETMPS> multiple times since the most recent C<SAVETMPS>; for example in a 3992loop iterating over elements of a list. While you can invoke C<SAVETMPS> 3993multiple times within a scope pair, it is unlikely to be useful. Subsequent 3994invocations will move the temporaries floor further up, thus effectively 3995trapping the existing temporaries to only be released at the end of the scope. 3996 3997=head2 Save Stack 3998 3999The save stack is used by perl to implement the C<local> keyword and other 4000similar behaviours; any cleanup operations that need to be performed when 4001leaving the current scope. Items pushed to this stack generally capture the 4002current value of some internal variable or state, which will be restored when 4003the scope is unwound due to leaving, C<return>, C<die>, C<goto> or other 4004reasons. 4005 4006Whereas other perl internal stacks store individual items all of the same type 4007(usually SV pointers or integers), the items pushed to the save stack are 4008formed of many different types, having multiple fields to them. For example, 4009the C<SAVEt_INT> type needs to store both the address of the C<int> variable 4010to restore, and the value to restore it to. This information could have been 4011stored using fields of a C<struct>, but would have to be large enough to store 4012three pointers in the largest case, which would waste a lot of space in most 4013of the smaller cases. 4014 4015=for apidoc_section $stack 4016=for apidoc Amnh||SAVEt_INT 4017 4018Instead, the stack stores information in a variable-length encoding of C<ANY> 4019structures. The final value pushed is stored in the C<UV> field which encodes 4020the kind of item held by the preceding items; the count and types of which 4021will depend on what kind of item is being stored. The kind field is pushed 4022last because that will be the first field to be popped when unwinding items 4023from the stack. 4024 4025The base of this stack is pointed to by the interpreter variable 4026C<PL_savestack>, of type C<ANY *>. 4027 4028=for apidoc_section $stack 4029=for apidoc Amnh||PL_savestack 4030 4031The head of the stack is indexed by C<PL_savestack_ix>, an integer which 4032stores the index in the array at which the next item should be pushed. (Note 4033that this is different to most other stacks, which reference the most 4034recently-pushed item). 4035 4036=for apidoc_section $stack 4037=for apidoc Amnh||PL_savestack_ix 4038 4039Items are pushed to the save stack by using the various C<SAVE...()> macros. 4040Many of these macros take a variable and store both its address and current 4041value on the save stack, ensuring that value gets restored on scope exit. 4042 4043 SAVEI8(i8) 4044 SAVEI16(i16) 4045 SAVEI32(i32) 4046 SAVEINT(i) 4047 ... 4048 4049There are also a variety of other special-purpose macros which save particular 4050types or values of interest. C<SAVETMPS> has already been mentioned above. 4051Others include C<SAVEFREEPV> which arranges for a PV (i.e. a string buffer) to 4052be freed, or C<SAVEDESTRUCTOR> which arranges for a given function pointer to 4053be invoked on scope exit. A full list of such macros can be found in 4054F<scope.h>. 4055 4056There is no public API for popping individual values or items from the save 4057stack. Instead, via the scope stack, the C<ENTER> and C<LEAVE> pair form a way 4058to start and stop nested scopes. Leaving a nested scope via C<LEAVE> will 4059restore all of the saved values that had been pushed since the most recent 4060C<ENTER>. 4061 4062=head2 Scope Stack 4063 4064As with the mark stack to the value stack, the scope stack forms a pair with 4065the save stack. The scope stack stores the height of the save stack at which 4066nested scopes begin, and allows the save stack to be unwound back to that 4067point when the scope is left. 4068 4069When perl is built with debugging enabled, there is a second part to this 4070stack storing human-readable string names describing the type of stack 4071context. Each push operation saves the name as well as the height of the save 4072stack, and each pop operation checks the topmost name with what is expected, 4073causing an assertion failure if the name does not match. 4074 4075The base of this stack is pointed to by the interpreter variable 4076C<PL_scopestack>, of type C<I32 *>. If enabled, the scope stack names are 4077stored in a separate array pointed to by C<PL_scopestack_name>, of type 4078C<const char **>. 4079 4080=for apidoc_section $stack 4081=for apidoc Amnh||PL_scopestack 4082=for apidoc Amnh||PL_scopestack_name 4083 4084The head of the stack is indexed by C<PL_scopestack_ix>, an integer which 4085stores the index of the array or arrays at which the next item should be 4086pushed. (Note that this is different to most other stacks, which reference the 4087most recently-pushed item). 4088 4089=for apidoc_section $stack 4090=for apidoc Amnh||PL_scopestack_ix 4091 4092Values are pushed to the scope stack using the C<ENTER> macro, which begins a 4093new nested scope. Any items pushed to the save stack are then restored at the 4094next nested invocation of the C<LEAVE> macro. 4095 4096=head1 Dynamic Scope and the Context Stack 4097 4098B<Note:> this section describes a non-public internal API that is subject 4099to change without notice. 4100 4101=head2 Introduction to the context stack 4102 4103In Perl, dynamic scoping refers to the runtime nesting of things like 4104subroutine calls, evals etc, as well as the entering and exiting of block 4105scopes. For example, the restoring of a C<local>ised variable is 4106determined by the dynamic scope. 4107 4108Perl tracks the dynamic scope by a data structure called the context 4109stack, which is an array of C<PERL_CONTEXT> structures, and which is 4110itself a big union for all the types of context. Whenever a new scope is 4111entered (such as a block, a C<for> loop, or a subroutine call), a new 4112context entry is pushed onto the stack. Similarly when leaving a block or 4113returning from a subroutine call etc. a context is popped. Since the 4114context stack represents the current dynamic scope, it can be searched. 4115For example, C<next LABEL> searches back through the stack looking for a 4116loop context that matches the label; C<return> pops contexts until it 4117finds a sub or eval context or similar; C<caller> examines sub contexts on 4118the stack. 4119 4120=for apidoc_section $concurrency 4121=for apidoc Cyh||PERL_CONTEXT 4122 4123Each context entry is labelled with a context type, C<cx_type>. Typical 4124context types are C<CXt_SUB>, C<CXt_EVAL> etc., as well as C<CXt_BLOCK> 4125and C<CXt_NULL> which represent a basic scope (as pushed by C<pp_enter>) 4126and a sort block. The type determines which part of the context union are 4127valid. 4128 4129=for apidoc Cyh ||cx_type 4130 4131=for apidoc Cmnh||CXt_BLOCK 4132=for apidoc_item ||CXt_EVAL 4133=for apidoc_item ||CXt_FORMAT 4134=for apidoc_item ||CXt_GIVEN 4135=for apidoc_item ||CXt_LOOP_ARY 4136=for apidoc_item ||CXt_LOOP_LAZYIV 4137=for apidoc_item ||CXt_LOOP_LAZYSV 4138=for apidoc_item ||CXt_LOOP_LIST 4139=for apidoc_item ||CXt_LOOP_PLAIN 4140=for apidoc_item ||CXt_NULL 4141=for apidoc_item ||CXt_SUB 4142=for apidoc_item ||CXt_SUBST 4143=for apidoc_item ||CXt_WHEN 4144 4145The main division in the context struct is between a substitution scope 4146(C<CXt_SUBST>) and block scopes, which are everything else. The former is 4147just used while executing C<s///e>, and won't be discussed further 4148here. 4149 4150All the block scope types share a common base, which corresponds to 4151C<CXt_BLOCK>. This stores the old values of various scope-related 4152variables like C<PL_curpm>, as well as information about the current 4153scope, such as C<gimme>. On scope exit, the old variables are restored. 4154 4155Particular block scope types store extra per-type information. For 4156example, C<CXt_SUB> stores the currently executing CV, while the various 4157for loop types might hold the original loop variable SV. On scope exit, 4158the per-type data is processed; for example the CV has its reference count 4159decremented, and the original loop variable is restored. 4160 4161The macro C<cxstack> returns the base of the current context stack, while 4162C<cxstack_ix> is the index of the current frame within that stack. 4163 4164=for apidoc_section $concurrency 4165=for apidoc Cmnh|PERL_CONTEXT *|cxstack 4166=for apidoc Cmnh|I32|cxstack_ix 4167 4168In fact, the context stack is actually part of a stack-of-stacks system; 4169whenever something unusual is done such as calling a C<DESTROY> or tie 4170handler, a new stack is pushed, then popped at the end. 4171 4172Note that the API described here changed considerably in perl 5.24; prior 4173to that, big macros like C<PUSHBLOCK> and C<POPSUB> were used; in 5.24 4174they were replaced by the inline static functions described below. In 4175addition, the ordering and detail of how these macros/function work 4176changed in many ways, often subtly. In particular they didn't handle 4177saving the savestack and temps stack positions, and required additional 4178C<ENTER>, C<SAVETMPS> and C<LEAVE> compared to the new functions. The 4179old-style macros will not be described further. 4180 4181 4182=head2 Pushing contexts 4183 4184For pushing a new context, the two basic functions are 4185C<cx = cx_pushblock()>, which pushes a new basic context block and returns 4186its address, and a family of similar functions with names like 4187C<cx_pushsub(cx)> which populate the additional type-dependent fields in 4188the C<cx> struct. Note that C<CXt_NULL> and C<CXt_BLOCK> don't have their 4189own push functions, as they don't store any data beyond that pushed by 4190C<cx_pushblock>. 4191 4192The fields of the context struct and the arguments to the C<cx_*> 4193functions are subject to change between perl releases, representing 4194whatever is convenient or efficient for that release. 4195 4196A typical context stack pushing can be found in C<pp_entersub>; the 4197following shows a simplified and stripped-down example of a non-XS call, 4198along with comments showing roughly what each function does. 4199 4200 dMARK; 4201 U8 gimme = GIMME_V; 4202 bool hasargs = cBOOL(PL_op->op_flags & OPf_STACKED); 4203 OP *retop = PL_op->op_next; 4204 I32 old_ss_ix = PL_savestack_ix; 4205 CV *cv = ....; 4206 4207 /* ... make mortal copies of stack args which are PADTMPs here ... */ 4208 4209 /* ... do any additional savestack pushes here ... */ 4210 4211 /* Now push a new context entry of type 'CXt_SUB'; initially just 4212 * doing the actions common to all block types: */ 4213 4214 cx = cx_pushblock(CXt_SUB, gimme, MARK, old_ss_ix); 4215 4216 /* this does (approximately): 4217 CXINC; /* cxstack_ix++ (grow if necessary) */ 4218 cx = CX_CUR(); /* and get the address of new frame */ 4219 cx->cx_type = CXt_SUB; 4220 cx->blk_gimme = gimme; 4221 cx->blk_oldsp = MARK - PL_stack_base; 4222 cx->blk_oldsaveix = old_ss_ix; 4223 cx->blk_oldcop = PL_curcop; 4224 cx->blk_oldmarksp = PL_markstack_ptr - PL_markstack; 4225 cx->blk_oldscopesp = PL_scopestack_ix; 4226 cx->blk_oldpm = PL_curpm; 4227 cx->blk_old_tmpsfloor = PL_tmps_floor; 4228 4229 PL_tmps_floor = PL_tmps_ix; 4230 */ 4231 4232 4233 /* then update the new context frame with subroutine-specific info, 4234 * such as the CV about to be executed: */ 4235 4236 cx_pushsub(cx, cv, retop, hasargs); 4237 4238 /* this does (approximately): 4239 cx->blk_sub.cv = cv; 4240 cx->blk_sub.olddepth = CvDEPTH(cv); 4241 cx->blk_sub.prevcomppad = PL_comppad; 4242 cx->cx_type |= (hasargs) ? CXp_HASARGS : 0; 4243 cx->blk_sub.retop = retop; 4244 SvREFCNT_inc_simple_void_NN(cv); 4245 */ 4246 4247=for apidoc_section $concurrency 4248=for apidoc Cmnh||CXINC 4249 4250Note that C<cx_pushblock()> sets two new floors: for the args stack (to 4251C<MARK>) and the temps stack (to C<PL_tmps_ix>). While executing at this 4252scope level, every C<nextstate> (amongst others) will reset the args and 4253tmps stack levels to these floors. Note that since C<cx_pushblock> uses 4254the current value of C<PL_tmps_ix> rather than it being passed as an arg, 4255this dictates at what point C<cx_pushblock> should be called. In 4256particular, any new mortals which should be freed only on scope exit 4257(rather than at the next C<nextstate>) should be created first. 4258 4259Most callers of C<cx_pushblock> simply set the new args stack floor to the 4260top of the previous stack frame, but for C<CXt_LOOP_LIST> it stores the 4261items being iterated over on the stack, and so sets C<blk_oldsp> to the 4262top of these items instead. Note that, contrary to its name, C<blk_oldsp> 4263doesn't always represent the value to restore C<PL_stack_sp> to on scope 4264exit. 4265 4266Note the early capture of C<PL_savestack_ix> to C<old_ss_ix>, which is 4267later passed as an arg to C<cx_pushblock>. In the case of C<pp_entersub>, 4268this is because, although most values needing saving are stored in fields 4269of the context struct, an extra value needs saving only when the debugger 4270is running, and it doesn't make sense to bloat the struct for this rare 4271case. So instead it is saved on the savestack. Since this value gets 4272calculated and saved before the context is pushed, it is necessary to pass 4273the old value of C<PL_savestack_ix> to C<cx_pushblock>, to ensure that the 4274saved value gets freed during scope exit. For most users of 4275C<cx_pushblock>, where nothing needs pushing on the save stack, 4276C<PL_savestack_ix> is just passed directly as an arg to C<cx_pushblock>. 4277 4278Note that where possible, values should be saved in the context struct 4279rather than on the save stack; it's much faster that way. 4280 4281Normally C<cx_pushblock> should be immediately followed by the appropriate 4282C<cx_pushfoo>, with nothing between them; this is because if code 4283in-between could die (e.g. a warning upgraded to fatal), then the context 4284stack unwinding code in C<dounwind> would see (in the example above) a 4285C<CXt_SUB> context frame, but without all the subroutine-specific fields 4286set, and crashes would soon ensue. 4287 4288=for apidoc dounwind 4289 4290Where the two must be separate, initially set the type to C<CXt_NULL> or 4291C<CXt_BLOCK>, and later change it to C<CXt_foo> when doing the 4292C<cx_pushfoo>. This is exactly what C<pp_enteriter> does, once it's 4293determined which type of loop it's pushing. 4294 4295=head2 Popping contexts 4296 4297Contexts are popped using C<cx_popsub()> etc. and C<cx_popblock()>. Note 4298however, that unlike C<cx_pushblock>, neither of these functions actually 4299decrement the current context stack index; this is done separately using 4300C<CX_POP()>. 4301 4302=for apidoc_section $concurrency 4303=for apidoc Cmh|void|CX_POP|PERL_CONTEXT* cx 4304 4305There are two main ways that contexts are popped. During normal execution 4306as scopes are exited, functions like C<pp_leave>, C<pp_leaveloop> and 4307C<pp_leavesub> process and pop just one context using C<cx_popfoo> and 4308C<cx_popblock>. On the other hand, things like C<pp_return> and C<next> 4309may have to pop back several scopes until a sub or loop context is found, 4310and exceptions (such as C<die>) need to pop back contexts until an eval 4311context is found. Both of these are accomplished by C<dounwind()>, which 4312is capable of processing and popping all contexts above the target one. 4313 4314Here is a typical example of context popping, as found in C<pp_leavesub> 4315(simplified slightly): 4316 4317 U8 gimme; 4318 PERL_CONTEXT *cx; 4319 SV **oldsp; 4320 OP *retop; 4321 4322 cx = CX_CUR(); 4323 4324 gimme = cx->blk_gimme; 4325 oldsp = PL_stack_base + cx->blk_oldsp; /* last arg of previous frame */ 4326 4327 if (gimme == G_VOID) 4328 PL_stack_sp = oldsp; 4329 else 4330 leave_adjust_stacks(oldsp, oldsp, gimme, 0); 4331 4332 CX_LEAVE_SCOPE(cx); 4333 cx_popsub(cx); 4334 cx_popblock(cx); 4335 retop = cx->blk_sub.retop; 4336 CX_POP(cx); 4337 4338 return retop; 4339 4340=for apidoc_section $concurrency 4341=for apidoc Cmh||CX_CUR 4342 4343The steps above are in a very specific order, designed to be the reverse 4344order of when the context was pushed. The first thing to do is to copy 4345and/or protect any return arguments and free any temps in the current 4346scope. Scope exits like an rvalue sub normally return a mortal copy of 4347their return args (as opposed to lvalue subs). It is important to make 4348this copy before the save stack is popped or variables are restored, or 4349bad things like the following can happen: 4350 4351 sub f { my $x =...; $x } # $x freed before we get to copy it 4352 sub f { /(...)/; $1 } # PL_curpm restored before $1 copied 4353 4354Although we wish to free any temps at the same time, we have to be careful 4355not to free any temps which are keeping return args alive; nor to free the 4356temps we have just created while mortal copying return args. Fortunately, 4357C<leave_adjust_stacks()> is capable of making mortal copies of return args, 4358shifting args down the stack, and only processing those entries on the 4359temps stack that are safe to do so. 4360 4361In void context no args are returned, so it's more efficient to skip 4362calling C<leave_adjust_stacks()>. Also in void context, a C<nextstate> op 4363is likely to be imminently called which will do a C<FREETMPS>, so there's 4364no need to do that either. 4365 4366The next step is to pop savestack entries: C<CX_LEAVE_SCOPE(cx)> is just 4367defined as C<< LEAVE_SCOPE(cx->blk_oldsaveix) >>. Note that during the 4368popping, it's possible for perl to call destructors, call C<STORE> to undo 4369localisations of tied vars, and so on. Any of these can die or call 4370C<exit()>. In this case, C<dounwind()> will be called, and the current 4371context stack frame will be re-processed. Thus it is vital that all steps 4372in popping a context are done in such a way to support reentrancy. The 4373other alternative, of decrementing C<cxstack_ix> I<before> processing the 4374frame, would lead to leaks and the like if something died halfway through, 4375or overwriting of the current frame. 4376 4377=for apidoc_section $concurrency 4378=for apidoc Cmh|void|CX_LEAVE_SCOPE|PERL_CONTEXT* cx 4379 4380C<CX_LEAVE_SCOPE> itself is safely re-entrant: if only half the savestack 4381items have been popped before dying and getting trapped by eval, then the 4382C<CX_LEAVE_SCOPE>s in C<dounwind> or C<pp_leaveeval> will continue where 4383the first one left off. 4384 4385The next step is the type-specific context processing; in this case 4386C<cx_popsub>. In part, this looks like: 4387 4388 cv = cx->blk_sub.cv; 4389 CvDEPTH(cv) = cx->blk_sub.olddepth; 4390 cx->blk_sub.cv = NULL; 4391 SvREFCNT_dec(cv); 4392 4393where its processing the just-executed CV. Note that before it decrements 4394the CV's reference count, it nulls the C<blk_sub.cv>. This means that if 4395it re-enters, the CV won't be freed twice. It also means that you can't 4396rely on such type-specific fields having useful values after the return 4397from C<cx_popfoo>. 4398 4399Next, C<cx_popblock> restores all the various interpreter vars to their 4400previous values or previous high water marks; it expands to: 4401 4402 PL_markstack_ptr = PL_markstack + cx->blk_oldmarksp; 4403 PL_scopestack_ix = cx->blk_oldscopesp; 4404 PL_curpm = cx->blk_oldpm; 4405 PL_curcop = cx->blk_oldcop; 4406 PL_tmps_floor = cx->blk_old_tmpsfloor; 4407 4408Note that it I<doesn't> restore C<PL_stack_sp>; as mentioned earlier, 4409which value to restore it to depends on the context type (specifically 4410C<for (list) {}>), and what args (if any) it returns; and that will 4411already have been sorted out earlier by C<leave_adjust_stacks()>. 4412 4413Finally, the context stack pointer is actually decremented by C<CX_POP(cx)>. 4414After this point, it's possible that that the current context frame could 4415be overwritten by other contexts being pushed. Although things like ties 4416and C<DESTROY> are supposed to work within a new context stack, it's best 4417not to assume this. Indeed on debugging builds, C<CX_POP(cx)> deliberately 4418sets C<cx> to null to detect code that is still relying on the field 4419values in that context frame. Note in the C<pp_leavesub()> example above, 4420we grab C<blk_sub.retop> I<before> calling C<CX_POP>. 4421 4422=head2 Redoing contexts 4423 4424Finally, there is C<cx_topblock(cx)>, which acts like a super-C<nextstate> 4425as regards to resetting various vars to their base values. It is used in 4426places like C<pp_next>, C<pp_redo> and C<pp_goto> where rather than 4427exiting a scope, we want to re-initialise the scope. As well as resetting 4428C<PL_stack_sp> like C<nextstate>, it also resets C<PL_markstack_ptr>, 4429C<PL_scopestack_ix> and C<PL_curpm>. Note that it doesn't do a 4430C<FREETMPS>. 4431 4432 4433=head1 Reference-counted argument stack 4434 4435=head2 Introduction 4436 4437As of perl 5.40, there is a build option, C<PERL_RC_STACK>, not enabled by 4438default, which requires that items pushed onto, or popped off the argument 4439stack have their reference counts adjusted. It is intended that eventually 4440this will be the default way (and finally the only way) to configure perl. 4441 4442The macros which manipulate the stack such as PUSHs() and POPs() don't 4443adjust the reference count of the SV. Most of the time this is fine, since 4444something else is keeping the SV alive while on the argument stack, such 4445a pointer from the TEMPs stack, or from the pad (e.g. a lexical variable 4446or a C<PADTMP>). Occasionally this can go horribly wrong. For example, 4447this code: 4448 4449 my @a = (1,2,3); 4450 sub f { @a = (); print "(@_)\n" }; 4451 f(@a, 4); 4452 4453may print undefined or random freed values, since some of the elements of 4454@_, which have been aliased to the elements of @a, have been freed. 4455C<PERL_RC_STACK> is intended to fix this by making each SV pointer on the 4456argument stack increment the reference count (RC) of the SV by one. 4457 4458In this new environment, unmodified existing PP and XS functions, which 4459have been written assuming a non reference-counted stack (non-RC for 4460short), are called via special wrapper functions which adjust the stack 4461before and after. At the moment there is no API to write an RC XS 4462function, so all XS code will continue to be called via a wrapper (which 4463makes them slightly slower), but means that in general, CPAN distributions 4464containing XS code code continue to work without modification. 4465 4466However, PP functions, either in perl core, or those in XS functions used 4467to implement custom ops or to override the PP functions for built-in ops, 4468need dealing with specially. For the latter, they can just be wrapped; 4469this involves the least work, but has a performance impact. In the longer 4470term, and for core PP functions, they need unwrapping and rewriting using 4471a new API. With this, the old macros such as PUSHs() have been replaced 4472with a new set of (mostly inline) functions with a common prefix, such as 4473rpp_push_1(). "RPP" stands for "reference-counted push and pop functions". 4474The new functions modify the reference count on C<PERL_RC_STACK> builds, 4475while leaving them unadjusted otherwise. Thus in core they generally work 4476in both cases, while in XS code they are portable to older perl versions 4477via C<PPPort> (XXX assuming that they get been added to C<PPPort>). 4478 4479The rest of this section is mainly concerned with how to convert existing 4480PP functions, and how to write new PP functions to use the new C<rpp_> 4481API. 4482 4483A reference-counted perl can be built using the PERL_RC_STACK define. 4484For development and debugging purposes, it is best to enable leaking 4485scalar debugging too, as that displays extra information about scalars 4486that have leaked or been prematurely freed. 4487 4488 Configure -DDEBUGGING \ 4489 -Accflags='-DPERL_RC_STACK -DDEBUG_LEAKING_SCALARS' 4490 4491=head2 Reference counted stack states 4492 4493In the new regime, the current argument stack can be in one of three 4494states, which can be determined by the shown expression. 4495 4496=over 4497 4498=item * not reference-counted 4499 4500 !AvREAL(PL_curstack) 4501 4502In this case, perl will assume when emptying the stack (such as during a 4503croak()) that the items on it don't need freeing. This is the traditional 4504perl behaviour. On C<PERL_RC_STACK> builds, such stacks will be rarely 4505encountered. 4506 4507=item * fully reference-counted 4508 4509 AvREAL(PL_curstack) && !PL_curstackinfo->si_stack_nonrc_base 4510 4511All the items on the stack are reference counted, and will be freed by 4512functions like rpp_popfree_1() or if perl croak()s. This is the normal 4513state of the stack in C<PERL_RC_STACK> builds. 4514 4515=item * partially reference-counted (split) 4516 4517 AvREAL(PL_curstack) && PL_curstackinfo->si_stack_nonrc_base > 0 4518 4519In this case, items on the stack from the index C<si_stack_nonrc_base> 4520upwards are non-RC; those below are RC. This state occurs when a PP or XS 4521function has been wrapped. In this case, the wrapper function pushes a 4522non-RC copy of the arg pointers above the cut then calls the real 4523function. When that returns, the wrapper function bumps up the RC of any 4524returned args. See below for more details. 4525 4526=back 4527 4528Note that perl uses a stack-of-stacks, and the AvREAL() and 4529C<si_stack_nonrc_base> states are per stack. When perl starts up, the main 4530stack is RC, but by default, new stacks pushed in XS code via PUSHSTACKi() 4531are non-RC, so it is quite possible to get a mixture. The perl core itself 4532uses the new push_stackinfo() function which replaces PUSHSTACKi() and 4533allows you to specify that the new stack should be RC by default. 4534(XXX core mostly hasn't actually been updated yet to use push_stackinfo()) 4535 4536Most places in the core assume a particular RC environment. In particular, 4537it is assumed that within a runops loop, all the PP functions are 4538RC-aware, either because they have been (re)written to be aware, or 4539because they have been wrapped. Whenever a runops loop is entered via 4540CALLRUNOPS(), it will check the current state of the stack, and if it's 4541not fully RC, will temporarily update its contents to be fully RC before 4542entering the main runops loop. Then if necessary it will restore the stack 4543to its old state on return. This means that functions like call_sv(), 4544which can be called from any environment (e.g. RC core or wrapped and 4545temporarily non-RC XS code) will always do the Right Thing when invoking 4546the runops loop, no matter what the current stack state is. 4547 4548Similarly, croaks and the like (which can occur anywhere) have to be able 4549to handle both stack types. So there are a few places in core - call_sv(), 4550eval_sv() etc, Perl_die_unwind() and S_my_exit_jump() - which have been 4551specially crafted to handle both cases; everything else can assume a fixed 4552environment. 4553 4554=head2 Wrapping 4555 4556Normally a core PP function is declared like this: 4557 4558 PP(pp_foo) 4559 { 4560 ... 4561 } 4562 4563This expands to something like: 4564 4565 OP* Perl_pp_foo(pTHX) 4566 { 4567 ... 4568 } 4569 4570When such a function needs to be wrapped, it is instead declared as: 4571 4572 PP_wrapped(pp_foo, nargs, nlists) 4573 { 4574 ... 4575 } 4576 4577which on non-RC builds, expands to the same as PP() (the extra args are 4578ignored). On RC builds it expands to something like 4579 4580 OP* Perl_pp_foo(pTHX) 4581 { 4582 return Perl_pp_wrap(aTHX_ S_Perl_pp_foo_norc, nargs, nlists); 4583 } 4584 4585 STATIC OP* S_Perl_pp_foo_norc(pTHX) 4586 { 4587 ... 4588 } 4589 4590Here the externally visible PP function calls pp_wrap(), which adjusts 4591the stack contents, then calls the hidden real body of the PP function, 4592then on return, adjusts the stack back. 4593 4594There is an API macro, XSPP_wrapped(), intended for use on PP functions 4595declared in XS code, It is identical to PP_wrapped(), except that it 4596doesn't prepend a C<Perl_> prefix to the function name. 4597 4598The C<nargs> and C<nlists> parameters to the macro are numeric constants 4599or simple expressions which specify how many arguments the PP function 4600expects, or how many lists it expects. For example, 4601 4602 PP_wrapped(pp_add, 2, 0); /* consumes two args off the stack */ 4603 4604 PP_wrapped(pp_readline, /* consumes one or two args */ 4605 ((PL_op->op_flags & OPf_STACKED) ? 2 : 1), 0); 4606 4607 PP_wrapped(pp_push, 0, 1); /* consumes one list */ 4608 4609 PP_wrapped(pp_aassign, 0, 2); /* consumes two lists */ 4610 4611To understand what pp_wrap() does, consider calling Perl_pp_foo() which 4612expects three arguments. On entry the stack may look like: 4613 4614 ... A+ B+ C+ 4615 4616(where the C<+> indicates that the pointers to A, B and C are each 4617reference counted). The wrapper function pp_wrap() marks a cut at the 4618current stack position using C<si_stack_nonrc_base>, then, based on the 4619value of C<nargs>, pushes a copy of those three pointers above the cut: 4620 4621 ... A+ B+ C+ | A0 B0 C0 4622 4623(where the C<0> indicates that the pointers aren't RC), then calls the 4624real PP function, S_Perl_pp_foo_norc(). That function processes A, B and C, 4625pops them off the stack, and pushes some result SVs. None of this 4626manipulation adjusts any RCs. On return to pp_wrap(), the stack may look 4627something like: 4628 4629 ... A+ B+ C+ | X0 Y0 4630 4631The wrapper function bumps up the RCs of X and Y, decrements A B C, 4632shifts the results down and sets C<si_stack_nonrc_base> to zero, leaving 4633the stack as: 4634 4635 ... X+ Y+ 4636 4637In places like pp_entersub(), a similar wrapping (via the functions 4638rpp_invoke_xs() and then xs_wrap()) is done when calling XS subs. 4639 4640When C<nlists> is positive, a similar action takes place, except that the 4641mark stack is examined (and adjusted) in order to determine the number of 4642args that need copying. 4643 4644A complex calling environment might have multiple nested stacks with 4645different RC states. Perl starts off with an RC stack. Then for example, 4646pp_entersub() is called, which (via xs_wrap()) splits the stack and 4647executes the XS function in a non-RC environment. That function may call 4648PUSHSTACKi(), which creates a new non-RC stack, then calls call_sv(), which 4649does CALLRUNOPS(), which causes the new stack to temporarily become RC. 4650Then a tied method is called, which pushes a new RC stack, and so on. (XXX 4651currently tied methods actually push a non-RC stack. To be fixed soon). 4652 4653=head2 (Re)writing a PP function using the rpp_() API 4654 4655Wrapping a PP function has a performance overhead, and is there mainly as 4656a temporary crutch. Eventually, PP functions should be updated to use 4657rpp_() functions, and any new PP functions should be written this way from 4658scratch and thus not ever need wrapping. 4659 4660A couple examples of core PP functions being converted can be seen in the 4661commits C<v5.39.1-304-g205fcd8410> and C<v5.39.1-303-g2fe263a83a>, which 4662demonstrate a unary and a binary op being converted (pp_not() and 4663pp_and()). 4664 4665The traditional PP stack API consisted of a C<dSP> declaration, plus a 4666number of macros to push, pop and extend the stack. A I<very simplified> 4667pp_add() function might look something like: 4668 4669 PP(pp_add) 4670 { 4671 dSP; 4672 dTARGET; 4673 IV right = SvIV(POPs); 4674 IV left = SvIV(POPs); 4675 TARGi(left + right, 1); 4676 PUSHs(TARG); 4677 PUTBACK; 4678 return NORMAL; 4679 } 4680 4681which expands to something like: 4682 4683 { 4684 SV **sp = PL_stack_sp; 4685 SV *targ = PAD_SV(PL_op->op_targ); 4686 IV right = SvIV(*sp--); 4687 IV left = SvIV(*sp--); 4688 sv_setiv(targ, left + right); 4689 *++sp = targ; 4690 PL_stack_sp = sp; 4691 return PL_op->op_next; 4692 } 4693 4694The whole C<dSP> thing harks back to the days before decent optimising 4695compilers. It was always error-prone, e.g. if you forgot a C<PUTBACK> or 4696C<SPAGAIN>. The new API always just accesses C<PL_stack_sp> directly. In 4697fact the first step of upgrading a PP function is always to remove the 4698C<dSP> declaration. This has the happy side effect that any old-style 4699macros left in the pp function which implicitly use C<sp> will become 4700compile errors. The existence of a C<dSP> somewhere in core is a good sign 4701that that function still needs updating. 4702 4703An obvious question is: why not just modify the definitions of the PUSHs() 4704etc macros to modify reference counts on RC builds? The basic problem is 4705that an SV may now be kept alive only by a single reference count from 4706the stack (formerly, they tended to be on the TEMPs stack too). So in code 4707like: 4708 4709 SV *sv = POPs; 4710 IV i = SvIV(sv); 4711 4712including an SvREFCNT_dec() in the C<POPs> macro definition would cause 4713C<sv> to be freed immediately, before its integer value can be read. 4714 4715A potential issue with the new regime is that perl can croak at basically 4716any point in execution (e.g. the SvIV() above might call FETCH() on a tied 4717variable which then croaks). Thus at all times, the RC of each SV must be 4718properly accounted for. In the example above, a naive approach to avoiding 4719a premature free of C<sv> might be: 4720 4721 SV *sv = *PL_stack_sp--; 4722 IV i = SvIV(sv); 4723 SvREFCNT_dec(sv); // got i, so ok to free sv now 4724 4725but that means that C<sv> leaks if SvIV() triggers a croak. 4726 4727To avoid that, the new regime has the general outline that arguments are 4728left on the stack I<until they are finished with>, then removed and their 4729reference count adjusted at that point. With the new API, the pp_add() 4730function looks something like: 4731 4732 { 4733 dTARGET; 4734 IV right = SvIV(PL_stack_sp[ 0]); // NB: arguments left on stack 4735 IV left = SvIV(PL_stack_sp[-1]); 4736 TARGi(left + right, 1); 4737 rpp_replace_2_1(targ); 4738 return NORMAL; 4739 } 4740 4741The rpp_replace_2_1() function pops two values off the stack and pushes 4742one new value on, while adjusting reference counts as appropriate 4743(depending on whether built with C<PERL_RC_STACK> or not). 4744 4745The rpp_() functions in the new API will be described in detail below, but 4746in summary: 4747 4748 new function approximate old equivant 4749 ------------ ----------------------- 4750 4751 rpp_extend(n) EXTEND(SP, n) 4752 4753 rpp_push_1(sv) PUSHs(sv) 4754 rpp_push_2(sv1, sv2)) PUSHs(sv1); PUSHs(sv2) 4755 rpp_xpush_1(sv) XPUSHs(sv) 4756 rpp_xpush_2(sv1, sv2)) EXTEND(SP,2); PUSHs(sv1); PUSHs(sv2); 4757 4758 rpp_push_1_norc(sv) mPUSHs(sv) // on RC bulds, skips RC++; 4759 // on non-RC builds, mortalises 4760 rpp_popfree_1() (void)POPs; 4761 rpp_popfree_2() (void)POPs; (void)POPs; 4762 rpp_popfree_to(svp) PL_stack_sp = svp; 4763 rpp_obliterate_stack_to(ix) // see description below 4764 4765 sv = rpp_pop_1_norc() sv = SvREFCNT_inc(POPs) 4766 4767 rpp_replace_1_1(sv) (void)POPs; PUSHs(sv); 4768 rpp_replace_2_1(sv) (void)POPs; (void)POPs; PUSHs(sv); 4769 rpp_replace_at(sp, sv) *sp = sv; 4770 rpp_replace_at_norc(sp, sv) *sp = sv_2mortal(sv); 4771 4772 rpp_context(mark, gimme, 4773 extra) SP -= extra; 4774 // impose void/scalar/list context on return args 4775 SP = (gimme == G_VOID) ? mark : .... 4776 4777 rpp_try_AMAGIC_1() tryAMAGICun_MG() 4778 rpp_try_AMAGIC_2() tryAMAGICbin_MG() 4779 4780 rpp_is_lone(sv) SvTEMP(sv) && SvREFCNT(sv) == 1 4781 rpp_stack_is_rc() no equivalent 4782 4783 rpp_invoke_xs(cv) CvXSUB(cv)(aTHX_ cv); 4784 4785 4786 (no replacement) dATARGET // just write the macro body in full 4787 4788There are also some C<_NN> variants which assume that any items being 4789removed from the stack are non-NULL, and so are slightly more efficient: 4790 4791 rpp_popfree_1_NN() 4792 rpp_popfree_2_NN() 4793 rpp_popfree_to_NN(svp) 4794 4795 rpp_replace_1_1_NN(sv) 4796 rpp_replace_2_1_NN(sv) 4797 rpp_replace_at_NN(sp, sv) 4798 rpp_replace_at_norc_NN(sp, sv) 4799 4800There are also a few C<_IMM> variants, which expect the single pushed or 4801replacement value to be an immortal, such as C<&PL_sv_undef> - this skips 4802incrementing the ref count of the immortal SV. It doesn't matter if the 4803ref count of the SV prematurely reaches zero, as sv_free2() will just 4804resurrect it. Not every variant is provided; if a suitable one 4805doesn't exist, just using a standard C<_1> version is fine, albeit 4806slightly slower. 4807 4808 rpp_push_IMM(&PL_sv_undef) 4809 rpp_xpush_IMM(&PL_sv_zero) 4810 rpp_replace_1_IMM_NN(&PL_sv_yes) 4811 rpp_replace_2_IMM_NN(&PL_sv_no) 4812 4813Other new C and perl functions related to reference-counted stacks are: 4814 4815 push_stackinfo(type,rc) PUSHSTACKi(type) 4816 pop_stackinfo() POPSTACK() 4817 switch_argstack(to) SWITCHSTACK(from,to) 4818 4819 (Internals::stack_refcounted() & 1) # perl built with PERL_RC_STACK 4820 4821Some of these new functions are trivial, but should be used in preference 4822to writing direct code because they will work on both RC and non-RC 4823builds, and may do extra checks and assertions on C<DEBUGGING> builds. 4824 4825Note that rpp_popfree_1() etc aren't direct replacements for C<POPs>. The 4826rpp_() variants don't return a value and are intended to be called when 4827the SV is finished with. So 4828 4829 SV *sv = POPs; 4830 ... do stuff with sv ... 4831 4832becomes 4833 4834 SV *sv = *PL_stack_sp; 4835 ... do stuff with sv ... 4836 rpp_popfree_1(); /* does SvREFCNT_dec(*PL_stack_sp--) */ 4837 4838The rpp_replace_M_N() functions are shortcuts for popping and freeing C<M> 4839items then pushing and bumping up the RCs of C<N> items. Note that they 4840handle edge cases such as an old and new SV being the same. 4841 4842rpp_replace_at(sp, sv) is similar to rpp_replace_1_1(), except that 4843it replaces an SV at an address in the stack rather than at the top. 4844 4845rpp_replace_at_norc(sp, sv) is similar to rpp_replace_at(), except that 4846it assumes that C<sv> already has a bumped reference count. So, a bit 4847like rpp_push_1_norc() (see below), it doesn't bother increasing C<sv>'s 4848reference count, or on non-RC builds it mortalises it instead. 4849 4850rpp_popfree_to(svp) is designed to replace code like 4851 4852 PL_stack_sp = PL_stack_base + cx->blk_oldsp; 4853 4854which typically appears in list ops or scope exits when the arguments are 4855finished with. Left unaltered, all the SVs above C<oldsp> would leak. The 4856new approach is 4857 4858 rpp_popfree_to(PL_stack_base + cx->blk_oldsp); 4859 4860There is a rarely-used variant of this, rpp_obliterate_stack_to(), which 4861pops the stack back to the specified index regardless of the current RC 4862state of the stack. So for example if the stack is split, it will only 4863adjust the RCs of any SVs which are below the split point, while 4864rpp_popfree_to() would mindlessly free I<all> SVs (on RC builds anyway). 4865For normal PP functions you should only ever use rpp_popfree_to(), which 4866is faster. 4867 4868There are no new equivalents for all the convenience macros like POPi() 4869and (shudder) dPOPPOPiirl(). These should be replaced with the rpp_() 4870functions above and with the conversions and variable declarations being 4871made explicit, e.g. dPOPPOPiirl() becomes: 4872 4873 IV right = SvIV(PL_stack_sp[ 0]); 4874 IV left = SvIV(PL_stack_sp[-1]); 4875 rpp_popfree_2(); 4876 4877A couple of the rpp_() functions with C<norc> in their names don't adjust 4878the reference count on RC builds (but, conversely, do on non-RC builds). 4879 4880rpp_push_1_norc(sv) does a simple C<*++PL_stack_sp = sv> on RC builds. It 4881is typically used to "root" a newly-created SV, which already has an RC of 48821. On non-RC builds it mortalises the SV instead. So for example, code 4883which used to look like 4884 4885 mPUSHs(newSViv(i)); 4886 4887and which expanded to the equivalent of: 4888 4889 PUSHs(sv_2mortal(newSViv(i)); 4890 4891should be rewritten as: 4892 4893 rpp_push_1_norc(newSViv(i)); 4894 4895This is because newSViv() and similar create a new SV with a reference 4896count one too high (1 rather than 0). This count is then "donated" to the 4897stack by pushing it. Conversely on non-RC builds, the count is donated to 4898the TEMPs stack. 4899 4900Similarly, on RC builds, C<sv = rpp_pop_1_norc()> does a simple 4901C<sv = *PL_stack_sv--> without adjusting the reference count, while on 4902non-RC builds it actually increments the SV's reference count. It is 4903intended for cases where you immediately want to increment the reference 4904count again after popping, e.g. where the SV is to be immediately embedded 4905somewhere. For example this code: 4906 4907 SV *sv = PL_stack_sp[0]; 4908 SvREFCNT_inc(sv); 4909 av_store(av, i, sv); /* in real life should check return value */ 4910 rpp_popfree_1(); 4911 4912can be more efficiently written as 4913 4914 av_store(av, i, rpp_pop_1_norc()); 4915 4916By using this function, the code works correctly on both RC and non-RC 4917builds. 4918 4919A common operation on list ops is to impose void, scalar or list context 4920on the return arguments, possibly discarding all, or all except one, of 4921them. rpp_context(mark, gimme, extra) does this. As a first step (for 4922convenience and efficiency) it notionally pops C<extra> args off the 4923stack. Then for list context, leaves things as is. For void context, the 4924stack pointer is reset to mark, and everything above is popped. For 4925scalar, the top argument (or &PL_sv_undef) is moved from the top to 4926mark+1 and everything above is discarded. 4927 4928The macros which appear at the start of many PP functions to check for 4929unary or binary op overloading (among other things) have been replaced 4930with rpp_try_AMAGIC_1() and _2() inline functions, which now rely on the 4931calling PP function to choose whether to return immediately rather than 4932the return being hidden away in the macro. 4933 4934The rpp_invoke_xs() function calls the XS function associated with the CV, 4935but may do so via a wrapper function to adjust the stack as necessary. 4936 4937In the spirit of hiding away less in macros, C<dATARGET> hasn't been given 4938a replacement; where its effect is needed, it is now written out in full; 4939see pp_add() for an example. 4940 4941Finally, a couple of rpp() functions provide information rather than 4942manipulate the stack. 4943 4944rpp_is_lone(sv) indicates whether C<sv>, assumed to be still on the stack, 4945it kept alive only by a single reference-counted pointer from the argument 4946and/or temps stacks, and thus is a candidate for some optimisations (like 4947skipping the copying of return arguments from a subroutine call). 4948 4949rpp_stack_is_rc() indicates whether the current stack is currently 4950reference-counted. It's used mainly in a few places like call_sv() which 4951can be called from anywhere, and thus have to deal with both cases. 4952 4953So for example, rather than using rpp_xpush_1(), call_sv() has lines like: 4954 4955 rpp_extend(1); 4956 *++PL_stack_sp = sv; 4957 #ifdef PERL_RC_STACK 4958 if (rpp_stack_is_rc()) 4959 SvREFCNT_inc_simple_void_NN(sv); 4960 #endif 4961 4962which works on both standard builds and RC builds, and works whether 4963call_sv() is called from a standard PP function (rpp_stack_is_rc() is 4964true) or from a wrapped PP or XS function (rpp_stack_is_rc() is false). 4965Note that you're unlikely to need to use this function, as in most places, 4966such as PP or XS functions, it is always RC or non-RC respectively. In 4967fact on debugging builds under C<PERL_RC_STACK>, PUSHs() and similar 4968macros include an C<assert(!rpp_stack_is_rc())>, while rpp_push_1() and 4969similar functions have C<assert(rpp_stack_is_rc())>. 4970 4971The macros for pushing new stackinfos have been replaced with inline 4972functions which don't rely on C<dSP> being in scope, and which have less 4973ambiguous names: they make it clear that a new I<stackinfo> is being 4974pushed, rather than just some sort of I<stack>. push_stackinfo() also has 4975a boolean argument indicating whether the new argument stack should be 4976reference-counted or not. For backwards compatibility, PUSHSTACKi(type) is 4977defined to be push_stackinfo(type, 0). 4978 4979Some test scripts check for things like leaks by testing that the 4980reference count of a particular variable has an expected value. If this 4981is different on a perl built with C<PERL_RC_STACK>, then the perl 4982function Internals::stack_refcounted() can be used. This returns an 4983integer, the lowest bit of which indicates that perl was built with 4984C<PERL_RC_STACK>. Other bits are reserved for future use and should be 4985masked out. 4986 4987=head1 Slab-based operator allocation 4988 4989B<Note:> this section describes a non-public internal API that is subject 4990to change without notice. 4991 4992Perl's internal error-handling mechanisms implement C<die> (and its internal 4993equivalents) using longjmp. If this occurs during lexing, parsing or 4994compilation, we must ensure that any ops allocated as part of the compilation 4995process are freed. (Older Perl versions did not adequately handle this 4996situation: when failing a parse, they would leak ops that were stored in 4997C C<auto> variables and not linked anywhere else.) 4998 4999To handle this situation, Perl uses I<op slabs> that are attached to the 5000currently-compiling CV. A slab is a chunk of allocated memory. New ops are 5001allocated as regions of the slab. If the slab fills up, a new one is created 5002(and linked from the previous one). When an error occurs and the CV is freed, 5003any ops remaining are freed. 5004 5005Each op is preceded by two pointers: one points to the next op in the slab, and 5006the other points to the slab that owns it. The next-op pointer is needed so 5007that Perl can iterate over a slab and free all its ops. (Op structures are of 5008different sizes, so the slab's ops can't merely be treated as a dense array.) 5009The slab pointer is needed for accessing a reference count on the slab: when 5010the last op on a slab is freed, the slab itself is freed. 5011 5012The slab allocator puts the ops at the end of the slab first. This will tend to 5013allocate the leaves of the op tree first, and the layout will therefore 5014hopefully be cache-friendly. In addition, this means that there's no need to 5015store the size of the slab (see below on why slabs vary in size), because Perl 5016can follow pointers to find the last op. 5017 5018It might seem possible to eliminate slab reference counts altogether, by having 5019all ops implicitly attached to C<PL_compcv> when allocated and freed when the 5020CV is freed. That would also allow C<op_free> to skip C<FreeOp> altogether, and 5021thus free ops faster. But that doesn't work in those cases where ops need to 5022survive beyond their CVs, such as re-evals. 5023 5024The CV also has to have a reference count on the slab. Sometimes the first op 5025created is immediately freed. If the reference count of the slab reaches 0, 5026then it will be freed with the CV still pointing to it. 5027 5028CVs use the C<CVf_SLABBED> flag to indicate that the CV has a reference count 5029on the slab. When this flag is set, the slab is accessible via C<CvSTART> when 5030C<CvROOT> is not set, or by subtracting two pointers C<(2*sizeof(I32 *))> from 5031C<CvROOT> when it is set. The alternative to this approach of sneaking the slab 5032into C<CvSTART> during compilation would be to enlarge the C<xpvcv> struct by 5033another pointer. But that would make all CVs larger, even though slab-based op 5034freeing is typically of benefit only for programs that make significant use of 5035string eval. 5036 5037=for apidoc_section $concurrency 5038=for apidoc Cmnh| |CVf_SLABBED 5039=for apidoc_item |OP *|CvROOT|CV * sv 5040=for apidoc_item |OP *|CvSTART|CV * sv 5041 5042When the C<CVf_SLABBED> flag is set, the CV takes responsibility for freeing 5043the slab. If C<CvROOT> is not set when the CV is freed or undeffed, it is 5044assumed that a compilation error has occurred, so the op slab is traversed and 5045all the ops are freed. 5046 5047Under normal circumstances, the CV forgets about its slab (decrementing the 5048reference count) when the root is attached. So the slab reference counting that 5049happens when ops are freed takes care of freeing the slab. In some cases, the 5050CV is told to forget about the slab (C<cv_forget_slab>) precisely so that the 5051ops can survive after the CV is done away with. 5052 5053Forgetting the slab when the root is attached is not strictly necessary, but 5054avoids potential problems with C<CvROOT> being written over. There is code all 5055over the place, both in core and on CPAN, that does things with C<CvROOT>, so 5056forgetting the slab makes things more robust and avoids potential problems. 5057 5058Since the CV takes ownership of its slab when flagged, that flag is never 5059copied when a CV is cloned, as one CV could free a slab that another CV still 5060points to, since forced freeing of ops ignores the reference count (but asserts 5061that it looks right). 5062 5063To avoid slab fragmentation, freed ops are marked as freed and attached to the 5064slab's freed chain (an idea stolen from DBM::Deep). Those freed ops are reused 5065when possible. Not reusing freed ops would be simpler, but it would result in 5066significantly higher memory usage for programs with large C<if (DEBUG) {...}> 5067blocks. 5068 5069C<SAVEFREEOP> is slightly problematic under this scheme. Sometimes it can cause 5070an op to be freed after its CV. If the CV has forcibly freed the ops on its 5071slab and the slab itself, then we will be fiddling with a freed slab. Making 5072C<SAVEFREEOP> a no-op doesn't help, as sometimes an op can be savefreed when 5073there is no compilation error, so the op would never be freed. It holds 5074a reference count on the slab, so the whole slab would leak. So C<SAVEFREEOP> 5075now sets a special flag on the op (C<< ->op_savefree >>). The forced freeing of 5076ops after a compilation error won't free any ops thus marked. 5077 5078Since many pieces of code create tiny subroutines consisting of only a few ops, 5079and since a huge slab would be quite a bit of baggage for those to carry 5080around, the first slab is always very small. To avoid allocating too many 5081slabs for a single CV, each subsequent slab is twice the size of the previous. 5082 5083Smartmatch expects to be able to allocate an op at run time, run it, and then 5084throw it away. For that to work the op is simply malloced when C<PL_compcv> hasn't 5085been set up. So all slab-allocated ops are marked as such (C<< ->op_slabbed >>), 5086to distinguish them from malloced ops. 5087 5088 5089=head1 AUTHORS 5090 5091Until May 1997, this document was maintained by Jeff Okamoto 5092E<lt>okamoto@corp.hp.comE<gt>. It is now maintained as part of Perl 5093itself by the Perl 5 Porters E<lt>perl5-porters@perl.orgE<gt>. 5094 5095With lots of help and suggestions from Dean Roehrich, Malcolm Beattie, 5096Andreas Koenig, Paul Hudson, Ilya Zakharevich, Paul Marquess, Neil 5097Bowers, Matthew Green, Tim Bunce, Spider Boardman, Ulrich Pfeifer, 5098Stephen McCamant, and Gurusamy Sarathy. 5099 5100=head1 SEE ALSO 5101 5102L<perlapi>, L<perlintern>, L<perlxs>, L<perlembed> 5103