1=encoding utf8 2 3=for comment 4Consistent formatting of this file is achieved with: 5 perl ./Porting/podtidy pod/perlinterp.pod 6 7=head1 NAME 8 9perlinterp - An overview of the Perl interpreter 10 11=head1 DESCRIPTION 12 13This document provides an overview of how the Perl interpreter works at 14the level of C code, along with pointers to the relevant C source code 15files. 16 17=head1 ELEMENTS OF THE INTERPRETER 18 19The work of the interpreter has two main stages: compiling the code 20into the internal representation, or bytecode, and then executing it. 21L<perlguts/Compiled code> explains exactly how the compilation stage 22happens. 23 24Here is a short breakdown of perl's operation: 25 26=head2 Startup 27 28The action begins in F<perlmain.c>. (or F<miniperlmain.c> for miniperl) 29This is very high-level code, enough to fit on a single screen, and it 30resembles the code found in L<perlembed>; most of the real action takes 31place in F<perl.c> 32 33F<perlmain.c> is generated by C<ExtUtils::Miniperl> from 34F<miniperlmain.c> at make time, so you should make perl to follow this 35along. 36 37First, F<perlmain.c> allocates some memory and constructs a Perl 38interpreter, along these lines: 39 40 1 PERL_SYS_INIT3(&argc,&argv,&env); 41 2 42 3 if (!PL_do_undump) { 43 4 my_perl = perl_alloc(); 44 5 if (!my_perl) 45 6 exit(1); 46 7 perl_construct(my_perl); 47 8 PL_perl_destruct_level = 0; 48 9 } 49 50Line 1 is a macro, and its definition is dependent on your operating 51system. Line 3 references C<PL_do_undump>, a global variable - all 52global variables in Perl start with C<PL_>. This tells you whether the 53current running program was created with the C<-u> flag to perl and 54then F<undump>, which means it's going to be false in any sane context. 55 56Line 4 calls a function in F<perl.c> to allocate memory for a Perl 57interpreter. It's quite a simple function, and the guts of it looks 58like this: 59 60 my_perl = (PerlInterpreter*)PerlMem_malloc(sizeof(PerlInterpreter)); 61 62Here you see an example of Perl's system abstraction, which we'll see 63later: C<PerlMem_malloc> is either your system's C<malloc>, or Perl's 64own C<malloc> as defined in F<malloc.c> if you selected that option at 65configure time. 66 67Next, in line 7, we construct the interpreter using perl_construct, 68also in F<perl.c>; this sets up all the special variables that Perl 69needs, the stacks, and so on. 70 71Now we pass Perl the command line options, and tell it to go: 72 73 exitstatus = perl_parse(my_perl, xs_init, argc, argv, (char **)NULL); 74 if (!exitstatus) 75 perl_run(my_perl); 76 77 exitstatus = perl_destruct(my_perl); 78 79 perl_free(my_perl); 80 81C<perl_parse> is actually a wrapper around C<S_parse_body>, as defined 82in F<perl.c>, which processes the command line options, sets up any 83statically linked XS modules, opens the program and calls C<yyparse> to 84parse it. 85 86=head2 Parsing 87 88The aim of this stage is to take the Perl source, and turn it into an 89op tree. We'll see what one of those looks like later. Strictly 90speaking, there's three things going on here. 91 92C<yyparse>, the parser, lives in F<perly.c>, although you're better off 93reading the original YACC input in F<perly.y>. (Yes, Virginia, there 94B<is> a YACC grammar for Perl!) The job of the parser is to take your 95code and "understand" it, splitting it into sentences, deciding which 96operands go with which operators and so on. 97 98The parser is nobly assisted by the lexer, which chunks up your input 99into tokens, and decides what type of thing each token is: a variable 100name, an operator, a bareword, a subroutine, a core function, and so 101on. The main point of entry to the lexer is C<yylex>, and that and its 102associated routines can be found in F<toke.c>. Perl isn't much like 103other computer languages; it's highly context sensitive at times, it 104can be tricky to work out what sort of token something is, or where a 105token ends. As such, there's a lot of interplay between the tokeniser 106and the parser, which can get pretty frightening if you're not used to 107it. 108 109As the parser understands a Perl program, it builds up a tree of 110operations for the interpreter to perform during execution. The 111routines which construct and link together the various operations are 112to be found in F<op.c>, and will be examined later. 113 114=head2 Optimization 115 116Now the parsing stage is complete, and the finished tree represents the 117operations that the Perl interpreter needs to perform to execute our 118program. Next, Perl does a dry run over the tree looking for 119optimisations: constant expressions such as C<3 + 4> will be computed 120now, and the optimizer will also see if any multiple operations can be 121replaced with a single one. For instance, to fetch the variable 122C<$foo>, instead of grabbing the glob C<*foo> and looking at the scalar 123component, the optimizer fiddles the op tree to use a function which 124directly looks up the scalar in question. The main optimizer is C<peep> 125in F<op.c>, and many ops have their own optimizing functions. 126 127=head2 Running 128 129Now we're finally ready to go: we have compiled Perl byte code, and all 130that's left to do is run it. The actual execution is done by the 131C<runops_standard> function in F<run.c>; more specifically, it's done 132by these three innocent looking lines: 133 134 while ((PL_op = PL_op->op_ppaddr(aTHX))) { 135 PERL_ASYNC_CHECK(); 136 } 137 138You may be more comfortable with the Perl version of that: 139 140 PERL_ASYNC_CHECK() while $Perl::op = &{$Perl::op->{function}}; 141 142Well, maybe not. Anyway, each op contains a function pointer, which 143stipulates the function which will actually carry out the operation. 144This function will return the next op in the sequence - this allows for 145things like C<if> which choose the next op dynamically at run time. The 146C<PERL_ASYNC_CHECK> makes sure that things like signals interrupt 147execution if required. 148 149The actual functions called are known as PP code, and they're spread 150between four files: F<pp_hot.c> contains the "hot" code, which is most 151often used and highly optimized, F<pp_sys.c> contains all the 152system-specific functions, F<pp_ctl.c> contains the functions which 153implement control structures (C<if>, C<while> and the like) and F<pp.c> 154contains everything else. These are, if you like, the C code for Perl's 155built-in functions and operators. 156 157Note that each C<pp_> function is expected to return a pointer to the 158next op. Calls to perl subs (and eval blocks) are handled within the 159same runops loop, and do not consume extra space on the C stack. For 160example, C<pp_entersub> and C<pp_entertry> just push a C<CxSUB> or 161C<CxEVAL> block struct onto the context stack which contain the address 162of the op following the sub call or eval. They then return the first op 163of that sub or eval block, and so execution continues of that sub or 164block. Later, a C<pp_leavesub> or C<pp_leavetry> op pops the C<CxSUB> 165or C<CxEVAL>, retrieves the return op from it, and returns it. 166 167=head2 Exception handing 168 169Perl's exception handing (i.e. C<die> etc.) is built on top of the 170low-level C<setjmp()>/C<longjmp()> C-library functions. These basically 171provide a way to capture the current PC and SP registers and later 172restore them; i.e. a C<longjmp()> continues at the point in code where 173a previous C<setjmp()> was done, with anything further up on the C 174stack being lost. This is why code should always save values using 175C<SAVE_FOO> rather than in auto variables. 176 177The perl core wraps C<setjmp()> etc in the macros C<JMPENV_PUSH> and 178C<JMPENV_JUMP>. The basic rule of perl exceptions is that C<exit>, and 179C<die> (in the absence of C<eval>) perform a C<JMPENV_JUMP(2)>, while 180C<die> within C<eval> does a C<JMPENV_JUMP(3)>. 181 182At entry points to perl, such as C<perl_parse()>, C<perl_run()> and 183C<call_sv(cv, G_EVAL)> each does a C<JMPENV_PUSH>, then enter a runops 184loop or whatever, and handle possible exception returns. For a 2 185return, final cleanup is performed, such as popping stacks and calling 186C<CHECK> or C<END> blocks. Amongst other things, this is how scope 187cleanup still occurs during an C<exit>. 188 189If a C<die> can find a C<CxEVAL> block on the context stack, then the 190stack is popped to that level and the return op in that block is 191assigned to C<PL_restartop>; then a C<JMPENV_JUMP(3)> is performed. 192This normally passes control back to the guard. In the case of 193C<perl_run> and C<call_sv>, a non-null C<PL_restartop> triggers 194re-entry to the runops loop. The is the normal way that C<die> or 195C<croak> is handled within an C<eval>. 196 197Sometimes ops are executed within an inner runops loop, such as tie, 198sort or overload code. In this case, something like 199 200 sub FETCH { eval { die } } 201 202would cause a longjmp right back to the guard in C<perl_run>, popping 203both runops loops, which is clearly incorrect. One way to avoid this is 204for the tie code to do a C<JMPENV_PUSH> before executing C<FETCH> in 205the inner runops loop, but for efficiency reasons, perl in fact just 206sets a flag, using C<CATCH_SET(TRUE)>. The C<pp_require>, 207C<pp_entereval> and C<pp_entertry> ops check this flag, and if true, 208they call C<docatch>, which does a C<JMPENV_PUSH> and starts a new 209runops level to execute the code, rather than doing it on the current 210loop. 211 212As a further optimisation, on exit from the eval block in the C<FETCH>, 213execution of the code following the block is still carried on in the 214inner loop. When an exception is raised, C<docatch> compares the 215C<JMPENV> level of the C<CxEVAL> with C<PL_top_env> and if they differ, 216just re-throws the exception. In this way any inner loops get popped. 217 218Here's an example. 219 220 1: eval { tie @a, 'A' }; 221 2: sub A::TIEARRAY { 222 3: eval { die }; 223 4: die; 224 5: } 225 226To run this code, C<perl_run> is called, which does a C<JMPENV_PUSH> 227then enters a runops loop. This loop executes the eval and tie ops on 228line 1, with the eval pushing a C<CxEVAL> onto the context stack. 229 230The C<pp_tie> does a C<CATCH_SET(TRUE)>, then starts a second runops 231loop to execute the body of C<TIEARRAY>. When it executes the entertry 232op on line 3, C<CATCH_GET> is true, so C<pp_entertry> calls C<docatch> 233which does a C<JMPENV_PUSH> and starts a third runops loop, which then 234executes the die op. At this point the C call stack looks like this: 235 236 Perl_pp_die 237 Perl_runops # third loop 238 S_docatch_body 239 S_docatch 240 Perl_pp_entertry 241 Perl_runops # second loop 242 S_call_body 243 Perl_call_sv 244 Perl_pp_tie 245 Perl_runops # first loop 246 S_run_body 247 perl_run 248 main 249 250and the context and data stacks, as shown by C<-Dstv>, look like: 251 252 STACK 0: MAIN 253 CX 0: BLOCK => 254 CX 1: EVAL => AV() PV("A"\0) 255 retop=leave 256 STACK 1: MAGIC 257 CX 0: SUB => 258 retop=(null) 259 CX 1: EVAL => * 260 retop=nextstate 261 262The die pops the first C<CxEVAL> off the context stack, sets 263C<PL_restartop> from it, does a C<JMPENV_JUMP(3)>, and control returns 264to the top C<docatch>. This then starts another third-level runops 265level, which executes the nextstate, pushmark and die ops on line 4. At 266the point that the second C<pp_die> is called, the C call stack looks 267exactly like that above, even though we are no longer within an inner 268eval; this is because of the optimization mentioned earlier. However, 269the context stack now looks like this, ie with the top CxEVAL popped: 270 271 STACK 0: MAIN 272 CX 0: BLOCK => 273 CX 1: EVAL => AV() PV("A"\0) 274 retop=leave 275 STACK 1: MAGIC 276 CX 0: SUB => 277 retop=(null) 278 279The die on line 4 pops the context stack back down to the CxEVAL, 280leaving it as: 281 282 STACK 0: MAIN 283 CX 0: BLOCK => 284 285As usual, C<PL_restartop> is extracted from the C<CxEVAL>, and a 286C<JMPENV_JUMP(3)> done, which pops the C stack back to the docatch: 287 288 S_docatch 289 Perl_pp_entertry 290 Perl_runops # second loop 291 S_call_body 292 Perl_call_sv 293 Perl_pp_tie 294 Perl_runops # first loop 295 S_run_body 296 perl_run 297 main 298 299In this case, because the C<JMPENV> level recorded in the C<CxEVAL> 300differs from the current one, C<docatch> just does a C<JMPENV_JUMP(3)> 301and the C stack unwinds to: 302 303 perl_run 304 main 305 306Because C<PL_restartop> is non-null, C<run_body> starts a new runops 307loop and execution continues. 308 309=head2 INTERNAL VARIABLE TYPES 310 311You should by now have had a look at L<perlguts>, which tells you about 312Perl's internal variable types: SVs, HVs, AVs and the rest. If not, do 313that now. 314 315These variables are used not only to represent Perl-space variables, 316but also any constants in the code, as well as some structures 317completely internal to Perl. The symbol table, for instance, is an 318ordinary Perl hash. Your code is represented by an SV as it's read into 319the parser; any program files you call are opened via ordinary Perl 320filehandles, and so on. 321 322The core L<Devel::Peek|Devel::Peek> module lets us examine SVs from a 323Perl program. Let's see, for instance, how Perl treats the constant 324C<"hello">. 325 326 % perl -MDevel::Peek -e 'Dump("hello")' 327 1 SV = PV(0xa041450) at 0xa04ecbc 328 2 REFCNT = 1 329 3 FLAGS = (POK,READONLY,pPOK) 330 4 PV = 0xa0484e0 "hello"\0 331 5 CUR = 5 332 6 LEN = 6 333 334Reading C<Devel::Peek> output takes a bit of practise, so let's go 335through it line by line. 336 337Line 1 tells us we're looking at an SV which lives at C<0xa04ecbc> in 338memory. SVs themselves are very simple structures, but they contain a 339pointer to a more complex structure. In this case, it's a PV, a 340structure which holds a string value, at location C<0xa041450>. Line 2 341is the reference count; there are no other references to this data, so 342it's 1. 343 344Line 3 are the flags for this SV - it's OK to use it as a PV, it's a 345read-only SV (because it's a constant) and the data is a PV internally. 346Next we've got the contents of the string, starting at location 347C<0xa0484e0>. 348 349Line 5 gives us the current length of the string - note that this does 350B<not> include the null terminator. Line 6 is not the length of the 351string, but the length of the currently allocated buffer; as the string 352grows, Perl automatically extends the available storage via a routine 353called C<SvGROW>. 354 355You can get at any of these quantities from C very easily; just add 356C<Sv> to the name of the field shown in the snippet, and you've got a 357macro which will return the value: C<SvCUR(sv)> returns the current 358length of the string, C<SvREFCOUNT(sv)> returns the reference count, 359C<SvPV(sv, len)> returns the string itself with its length, and so on. 360More macros to manipulate these properties can be found in L<perlguts>. 361 362Let's take an example of manipulating a PV, from C<sv_catpvn>, in 363F<sv.c> 364 365 1 void 366 2 Perl_sv_catpvn(pTHX_ register SV *sv, register const char *ptr, register STRLEN len) 367 3 { 368 4 STRLEN tlen; 369 5 char *junk; 370 371 6 junk = SvPV_force(sv, tlen); 372 7 SvGROW(sv, tlen + len + 1); 373 8 if (ptr == junk) 374 9 ptr = SvPVX(sv); 375 10 Move(ptr,SvPVX(sv)+tlen,len,char); 376 11 SvCUR(sv) += len; 377 12 *SvEND(sv) = '\0'; 378 13 (void)SvPOK_only_UTF8(sv); /* validate pointer */ 379 14 SvTAINT(sv); 380 15 } 381 382This is a function which adds a string, C<ptr>, of length C<len> onto 383the end of the PV stored in C<sv>. The first thing we do in line 6 is 384make sure that the SV B<has> a valid PV, by calling the C<SvPV_force> 385macro to force a PV. As a side effect, C<tlen> gets set to the current 386value of the PV, and the PV itself is returned to C<junk>. 387 388In line 7, we make sure that the SV will have enough room to 389accommodate the old string, the new string and the null terminator. If 390C<LEN> isn't big enough, C<SvGROW> will reallocate space for us. 391 392Now, if C<junk> is the same as the string we're trying to add, we can 393grab the string directly from the SV; C<SvPVX> is the address of the PV 394in the SV. 395 396Line 10 does the actual catenation: the C<Move> macro moves a chunk of 397memory around: we move the string C<ptr> to the end of the PV - that's 398the start of the PV plus its current length. We're moving C<len> bytes 399of type C<char>. After doing so, we need to tell Perl we've extended 400the string, by altering C<CUR> to reflect the new length. C<SvEND> is a 401macro which gives us the end of the string, so that needs to be a 402C<"\0">. 403 404Line 13 manipulates the flags; since we've changed the PV, any IV or NV 405values will no longer be valid: if we have C<$a=10; $a.="6";> we don't 406want to use the old IV of 10. C<SvPOK_only_utf8> is a special 407UTF-8-aware version of C<SvPOK_only>, a macro which turns off the IOK 408and NOK flags and turns on POK. The final C<SvTAINT> is a macro which 409launders tainted data if taint mode is turned on. 410 411AVs and HVs are more complicated, but SVs are by far the most common 412variable type being thrown around. Having seen something of how we 413manipulate these, let's go on and look at how the op tree is 414constructed. 415 416=head1 OP TREES 417 418First, what is the op tree, anyway? The op tree is the parsed 419representation of your program, as we saw in our section on parsing, 420and it's the sequence of operations that Perl goes through to execute 421your program, as we saw in L</Running>. 422 423An op is a fundamental operation that Perl can perform: all the 424built-in functions and operators are ops, and there are a series of ops 425which deal with concepts the interpreter needs internally - entering 426and leaving a block, ending a statement, fetching a variable, and so 427on. 428 429The op tree is connected in two ways: you can imagine that there are 430two "routes" through it, two orders in which you can traverse the tree. 431First, parse order reflects how the parser understood the code, and 432secondly, execution order tells perl what order to perform the 433operations in. 434 435The easiest way to examine the op tree is to stop Perl after it has 436finished parsing, and get it to dump out the tree. This is exactly what 437the compiler backends L<B::Terse|B::Terse>, L<B::Concise|B::Concise> 438and L<B::Debug|B::Debug> do. 439 440Let's have a look at how Perl sees C<$a = $b + $c>: 441 442 % perl -MO=Terse -e '$a=$b+$c' 443 1 LISTOP (0x8179888) leave 444 2 OP (0x81798b0) enter 445 3 COP (0x8179850) nextstate 446 4 BINOP (0x8179828) sassign 447 5 BINOP (0x8179800) add [1] 448 6 UNOP (0x81796e0) null [15] 449 7 SVOP (0x80fafe0) gvsv GV (0x80fa4cc) *b 450 8 UNOP (0x81797e0) null [15] 451 9 SVOP (0x8179700) gvsv GV (0x80efeb0) *c 452 10 UNOP (0x816b4f0) null [15] 453 11 SVOP (0x816dcf0) gvsv GV (0x80fa460) *a 454 455Let's start in the middle, at line 4. This is a BINOP, a binary 456operator, which is at location C<0x8179828>. The specific operator in 457question is C<sassign> - scalar assignment - and you can find the code 458which implements it in the function C<pp_sassign> in F<pp_hot.c>. As a 459binary operator, it has two children: the add operator, providing the 460result of C<$b+$c>, is uppermost on line 5, and the left hand side is 461on line 10. 462 463Line 10 is the null op: this does exactly nothing. What is that doing 464there? If you see the null op, it's a sign that something has been 465optimized away after parsing. As we mentioned in L</Optimization>, the 466optimization stage sometimes converts two operations into one, for 467example when fetching a scalar variable. When this happens, instead of 468rewriting the op tree and cleaning up the dangling pointers, it's 469easier just to replace the redundant operation with the null op. 470Originally, the tree would have looked like this: 471 472 10 SVOP (0x816b4f0) rv2sv [15] 473 11 SVOP (0x816dcf0) gv GV (0x80fa460) *a 474 475That is, fetch the C<a> entry from the main symbol table, and then look 476at the scalar component of it: C<gvsv> (C<pp_gvsv> into F<pp_hot.c>) 477happens to do both these things. 478 479The right hand side, starting at line 5 is similar to what we've just 480seen: we have the C<add> op (C<pp_add> also in F<pp_hot.c>) add 481together two C<gvsv>s. 482 483Now, what's this about? 484 485 1 LISTOP (0x8179888) leave 486 2 OP (0x81798b0) enter 487 3 COP (0x8179850) nextstate 488 489C<enter> and C<leave> are scoping ops, and their job is to perform any 490housekeeping every time you enter and leave a block: lexical variables 491are tidied up, unreferenced variables are destroyed, and so on. Every 492program will have those first three lines: C<leave> is a list, and its 493children are all the statements in the block. Statements are delimited 494by C<nextstate>, so a block is a collection of C<nextstate> ops, with 495the ops to be performed for each statement being the children of 496C<nextstate>. C<enter> is a single op which functions as a marker. 497 498That's how Perl parsed the program, from top to bottom: 499 500 Program 501 | 502 Statement 503 | 504 = 505 / \ 506 / \ 507 $a + 508 / \ 509 $b $c 510 511However, it's impossible to B<perform> the operations in this order: 512you have to find the values of C<$b> and C<$c> before you add them 513together, for instance. So, the other thread that runs through the op 514tree is the execution order: each op has a field C<op_next> which 515points to the next op to be run, so following these pointers tells us 516how perl executes the code. We can traverse the tree in this order 517using the C<exec> option to C<B::Terse>: 518 519 % perl -MO=Terse,exec -e '$a=$b+$c' 520 1 OP (0x8179928) enter 521 2 COP (0x81798c8) nextstate 522 3 SVOP (0x81796c8) gvsv GV (0x80fa4d4) *b 523 4 SVOP (0x8179798) gvsv GV (0x80efeb0) *c 524 5 BINOP (0x8179878) add [1] 525 6 SVOP (0x816dd38) gvsv GV (0x80fa468) *a 526 7 BINOP (0x81798a0) sassign 527 8 LISTOP (0x8179900) leave 528 529This probably makes more sense for a human: enter a block, start a 530statement. Get the values of C<$b> and C<$c>, and add them together. 531Find C<$a>, and assign one to the other. Then leave. 532 533The way Perl builds up these op trees in the parsing process can be 534unravelled by examining F<perly.y>, the YACC grammar. Let's take the 535piece we need to construct the tree for C<$a = $b + $c> 536 537 1 term : term ASSIGNOP term 538 2 { $$ = newASSIGNOP(OPf_STACKED, $1, $2, $3); } 539 3 | term ADDOP term 540 4 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } 541 542If you're not used to reading BNF grammars, this is how it works: 543You're fed certain things by the tokeniser, which generally end up in 544upper case. Here, C<ADDOP>, is provided when the tokeniser sees C<+> in 545your code. C<ASSIGNOP> is provided when C<=> is used for assigning. 546These are "terminal symbols", because you can't get any simpler than 547them. 548 549The grammar, lines one and three of the snippet above, tells you how to 550build up more complex forms. These complex forms, "non-terminal 551symbols" are generally placed in lower case. C<term> here is a 552non-terminal symbol, representing a single expression. 553 554The grammar gives you the following rule: you can make the thing on the 555left of the colon if you see all the things on the right in sequence. 556This is called a "reduction", and the aim of parsing is to completely 557reduce the input. There are several different ways you can perform a 558reduction, separated by vertical bars: so, C<term> followed by C<=> 559followed by C<term> makes a C<term>, and C<term> followed by C<+> 560followed by C<term> can also make a C<term>. 561 562So, if you see two terms with an C<=> or C<+>, between them, you can 563turn them into a single expression. When you do this, you execute the 564code in the block on the next line: if you see C<=>, you'll do the code 565in line 2. If you see C<+>, you'll do the code in line 4. It's this 566code which contributes to the op tree. 567 568 | term ADDOP term 569 { $$ = newBINOP($2, 0, scalar($1), scalar($3)); } 570 571What this does is creates a new binary op, and feeds it a number of 572variables. The variables refer to the tokens: C<$1> is the first token 573in the input, C<$2> the second, and so on - think regular expression 574backreferences. C<$$> is the op returned from this reduction. So, we 575call C<newBINOP> to create a new binary operator. The first parameter 576to C<newBINOP>, a function in F<op.c>, is the op type. It's an addition 577operator, so we want the type to be C<ADDOP>. We could specify this 578directly, but it's right there as the second token in the input, so we 579use C<$2>. The second parameter is the op's flags: 0 means "nothing 580special". Then the things to add: the left and right hand side of our 581expression, in scalar context. 582 583=head1 STACKS 584 585When perl executes something like C<addop>, how does it pass on its 586results to the next op? The answer is, through the use of stacks. Perl 587has a number of stacks to store things it's currently working on, and 588we'll look at the three most important ones here. 589 590=head2 Argument stack 591 592Arguments are passed to PP code and returned from PP code using the 593argument stack, C<ST>. The typical way to handle arguments is to pop 594them off the stack, deal with them how you wish, and then push the 595result back onto the stack. This is how, for instance, the cosine 596operator works: 597 598 NV value; 599 value = POPn; 600 value = Perl_cos(value); 601 XPUSHn(value); 602 603We'll see a more tricky example of this when we consider Perl's macros 604below. C<POPn> gives you the NV (floating point value) of the top SV on 605the stack: the C<$x> in C<cos($x)>. Then we compute the cosine, and 606push the result back as an NV. The C<X> in C<XPUSHn> means that the 607stack should be extended if necessary - it can't be necessary here, 608because we know there's room for one more item on the stack, since 609we've just removed one! The C<XPUSH*> macros at least guarantee safety. 610 611Alternatively, you can fiddle with the stack directly: C<SP> gives you 612the first element in your portion of the stack, and C<TOP*> gives you 613the top SV/IV/NV/etc. on the stack. So, for instance, to do unary 614negation of an integer: 615 616 SETi(-TOPi); 617 618Just set the integer value of the top stack entry to its negation. 619 620Argument stack manipulation in the core is exactly the same as it is in 621XSUBs - see L<perlxstut>, L<perlxs> and L<perlguts> for a longer 622description of the macros used in stack manipulation. 623 624=head2 Mark stack 625 626I say "your portion of the stack" above because PP code doesn't 627necessarily get the whole stack to itself: if your function calls 628another function, you'll only want to expose the arguments aimed for 629the called function, and not (necessarily) let it get at your own data. 630The way we do this is to have a "virtual" bottom-of-stack, exposed to 631each function. The mark stack keeps bookmarks to locations in the 632argument stack usable by each function. For instance, when dealing with 633a tied variable, (internally, something with "P" magic) Perl has to 634call methods for accesses to the tied variables. However, we need to 635separate the arguments exposed to the method to the argument exposed to 636the original function - the store or fetch or whatever it may be. 637Here's roughly how the tied C<push> is implemented; see C<av_push> in 638F<av.c>: 639 640 1 PUSHMARK(SP); 641 2 EXTEND(SP,2); 642 3 PUSHs(SvTIED_obj((SV*)av, mg)); 643 4 PUSHs(val); 644 5 PUTBACK; 645 6 ENTER; 646 7 call_method("PUSH", G_SCALAR|G_DISCARD); 647 8 LEAVE; 648 649Let's examine the whole implementation, for practice: 650 651 1 PUSHMARK(SP); 652 653Push the current state of the stack pointer onto the mark stack. This 654is so that when we've finished adding items to the argument stack, Perl 655knows how many things we've added recently. 656 657 2 EXTEND(SP,2); 658 3 PUSHs(SvTIED_obj((SV*)av, mg)); 659 4 PUSHs(val); 660 661We're going to add two more items onto the argument stack: when you 662have a tied array, the C<PUSH> subroutine receives the object and the 663value to be pushed, and that's exactly what we have here - the tied 664object, retrieved with C<SvTIED_obj>, and the value, the SV C<val>. 665 666 5 PUTBACK; 667 668Next we tell Perl to update the global stack pointer from our internal 669variable: C<dSP> only gave us a local copy, not a reference to the 670global. 671 672 6 ENTER; 673 7 call_method("PUSH", G_SCALAR|G_DISCARD); 674 8 LEAVE; 675 676C<ENTER> and C<LEAVE> localise a block of code - they make sure that 677all variables are tidied up, everything that has been localised gets 678its previous value returned, and so on. Think of them as the C<{> and 679C<}> of a Perl block. 680 681To actually do the magic method call, we have to call a subroutine in 682Perl space: C<call_method> takes care of that, and it's described in 683L<perlcall>. We call the C<PUSH> method in scalar context, and we're 684going to discard its return value. The call_method() function removes 685the top element of the mark stack, so there is nothing for the caller 686to clean up. 687 688=head2 Save stack 689 690C doesn't have a concept of local scope, so perl provides one. We've 691seen that C<ENTER> and C<LEAVE> are used as scoping braces; the save 692stack implements the C equivalent of, for example: 693 694 { 695 local $foo = 42; 696 ... 697 } 698 699See L<perlguts/"Localizing changes"> for how to use the save stack. 700 701=head1 MILLIONS OF MACROS 702 703One thing you'll notice about the Perl source is that it's full of 704macros. Some have called the pervasive use of macros the hardest thing 705to understand, others find it adds to clarity. Let's take an example, 706the code which implements the addition operator: 707 708 1 PP(pp_add) 709 2 { 710 3 dSP; dATARGET; tryAMAGICbin(add,opASSIGN); 711 4 { 712 5 dPOPTOPnnrl_ul; 713 6 SETn( left + right ); 714 7 RETURN; 715 8 } 716 9 } 717 718Every line here (apart from the braces, of course) contains a macro. 719The first line sets up the function declaration as Perl expects for PP 720code; line 3 sets up variable declarations for the argument stack and 721the target, the return value of the operation. Finally, it tries to see 722if the addition operation is overloaded; if so, the appropriate 723subroutine is called. 724 725Line 5 is another variable declaration - all variable declarations 726start with C<d> - which pops from the top of the argument stack two NVs 727(hence C<nn>) and puts them into the variables C<right> and C<left>, 728hence the C<rl>. These are the two operands to the addition operator. 729Next, we call C<SETn> to set the NV of the return value to the result 730of adding the two values. This done, we return - the C<RETURN> macro 731makes sure that our return value is properly handled, and we pass the 732next operator to run back to the main run loop. 733 734Most of these macros are explained in L<perlapi>, and some of the more 735important ones are explained in L<perlxs> as well. Pay special 736attention to L<perlguts/Background and PERL_IMPLICIT_CONTEXT> for 737information on the C<[pad]THX_?> macros. 738 739=head1 FURTHER READING 740 741For more information on the Perl internals, please see the documents 742listed at L<perl/Internals and C Language Interface>. 743