1# Copyright (C) 2007-2014, Parrot Foundation. 2 3=head1 PDD 19: Parrot Intermediate Representation (PIR) 4 5=head2 Abstract 6 7This document outlines the architecture and core syntax of Parrot 8Intermediate Representation (PIR). 9 10=head2 Description 11 12PIR is a stable, middle-level language intended both as a target for the 13generated output from high-level language compilers, and for human use 14developing core features and extensions for Parrot. 15 16=head3 Basic Syntax 17 18A valid PIR program consists of a sequence of statements, directives, comments 19and empty lines. 20 21=head4 Statements 22 23A statement starts with an optional label, contains an instruction, and is 24terminated by a newline (<NL>). Each statement must be on its own line. 25 26 [label:] [instruction] <NL> 27 28An instruction may be either a low-level opcode or a higher-level PIR 29operation, such as a subroutine call, a method call, a directive, or PIR 30syntactic sugar. 31 32=head4 Directives 33 34A directive provides information for the PIR compiler that is outside the 35normal flow of executable statements. Directives are all prefixed with a ".", 36as in C<.local> or C<.sub>. 37 38=head4 Comments 39 40Comments start with C<#> and last until the following newline. PIR also allows 41comments in Pod format. Comments, Pod content, and empty lines are ignored. 42 43=head4 Identifiers 44 45Identifiers start with a letter or underscore, then may contain additionally 46letters, digits, and underscores. Identifiers don't have any limit on length 47at the moment, but some sane-but-generous length limit may be imposed in the 48future (256 chars, 1024 chars?). The following examples are all valid 49identifiers. 50 51 a 52 _a 53 A42 54 55Opcode names are not reserved words in PIR, and may be used as variable names. 56For example, you can define a local variable named C<print>. 57Note that currently, by using an opcode name as a local variable name, the 58variable will I<hide> the opcode name, effectively making the opcode unusable. 59In the future this will be resolved. 60 61The PIR language is designed to have as few reserved keywords as possible. 62Currently, in contrast to opcode names, PIR keywords I<are> reserved, and 63cannot be used as identifiers. Some opcode names are, in fact, PIR keywords, 64which therefore cannot be used as identifiers. This, too, will be resolved 65in a future re-implementation of the PIR compiler. 66 67The following are PIR keywords, and cannot currently be used as identifiers: 68 69 goto if int null 70 num pmc string unless 71 72=head4 Labels 73 74A label declaration consists of a label name followed by a colon. A label name 75conforms to the standard requirements for identifiers. A label declaration may 76occur at the start of a statement, or stand alone on a line, but always within 77a subroutine. 78 79A reference to a label consists of only the label name, and is generally used 80as an argument to an instruction or directive. 81 82A PIR label is accessible only in the subroutine where it's defined. A label 83name must be unique within a subroutine, but it can be reused in other 84subroutines. 85 86=begin PIR_FRAGMENT 87 88 goto label1 89 # ... 90 label1: 91 92=end PIR_FRAGMENT 93 94=head4 Registers and Variables 95 96There are two ways of referencing Parrot's registers. The first is 97through named local variables declared with C<.local>. 98 99=begin PIR_FRAGMENT 100 101 .local pmc foo 102 103=end PIR_FRAGMENT 104 105The type of a named variable can be C<int>, C<num>, C<string> or C<pmc>, 106corresponding to the types of registers. No other types are used. 107 108The second way of referencing a register is through a register variable 109C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type 110of the register (integer, string, number, or PMC). I<n> consists of 111digit(s) only. There is no limit on the size of I<n>. There is no direct 112correspondence between the value of I<n> and the position of the 113register in the register set, C<$P42> may be stored in the zeroth PMC 114register, if it is the only register in the subroutine. 115 116=head3 Constants 117 118Constants may be used in place of registers or variables. A constant is not 119allowed on the left side of an assignment, or in any other context where the 120variable would be modified. 121 122=over 4 123 124=item 'single-quoted string constant' 125 126Are delimited by single-quotes (C<'>). They are taken to be ASCII encoded. No 127escape sequences are processed. 128 129=item "double-quoted string constants" 130 131Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped 132by C<\>. The default format for a double-quoted string constant is 7-bit 133ASCII, other character sets and encodings must be marked explicitly using a 134format flag. 135 136=item <<"heredoc", <<'heredoc' 137 138Heredocs work like single or double quoted strings. All lines up to 139the terminating delimiter are slurped into the string. The delimiter 140has to be on its own line, at the beginning of the line and with no 141trailing whitespace. 142 143Assignment of a heredoc: 144 145=begin PIR_FRAGMENT 146 147 $S0 = <<"EOS" 148 ... 149EOS 150 151=end PIR_FRAGMENT 152 153A heredoc as an argument: 154 155=begin PIR_FRAGMENT 156 157 .local pmc function, arg 158 # ... 159 160 function(<<"END_OF_HERE", arg) 161 ... 162END_OF_HERE 163 164 .yield(<<'EOS') 165 ... 166EOS 167 168 .return(<<'EOS') 169 ... 170EOS 171 172=end PIR_FRAGMENT 173 174Although currently not possible, a future implementation of the PIR 175language will allow you to use multiple heredocs within a single 176statement or directive: 177 178=begin PIR_FRAGMENT_TODO 179 180 function(<<'INPUT', <<'OUTPUT', 'some test') 181 ... 182INPUT 183 ... 184OUTPUT 185 186=end PIR_FRAGMENT_TODO 187 188=item format:"string constant" 189 190Like above with a format attached to the string. Valid formats are 191currently: C<ascii> (the default), C<binary>, C<iso-8859-1>, C<utf8>, 192C<utf16>, C<ucs2>, and C<ucs4>. 193 194The format is attached to the string constant, and 195adopted by any string container the constant is assigned to. 196 197The standard escape sequences are honored within strings with an 198alternate format, so you can include a particular Unicode character 199as either a literal sequence of bytes, or as an escape sequence. 200 201=back 202 203=head3 String escape sequences 204 205Inside double-quoted strings the following escape sequences are processed. 206 207 \xhh 1..2 hex digits 208 \ooo 1..3 oct digits 209 \cX control char X 210 \x{h..h} 1..8 hex digits 211 \uhhhh 4 hex digits 212 \Uhhhhhhhh 8 hex digits 213 \a, \b, \t, \n, \v, \f, \r, \e, \\, \" 214 215=over 4 216 217=item numeric constants 218 219Both integers (C<42>) and numbers (C<3.14159>) may appear as constants. 220C<0x> and C<0b> denote hex and binary constants respectively. 221 222=back 223 224=head3 Directives 225 226=over 4 227 228=item .local <type> <identifier> 229 230Define a local name I<identifier> within a subroutine with the given 231I<type>. You can define multiple identifiers of the same type by 232separating them with commas: 233 234 .local int i, j 235 236=item .lex <string constant>, <reg> 237 238Declare a lexical variable that is an alias for a PMC register. For example 239the following two snippets have an identical effect: 240 241=begin PIR_FRAGMENT 242 243 .lex '$a', $P0 244 $P1 = new 'Integer' 245 $P0 = $P1 246 247=end PIR_FRAGMENT 248 249=begin PIR_FRAGMENT 250 251 .lex '$a', $P0 252 $P1 = new 'Integer' 253 store_lex '$a', $P1 254 255=end PIR_FRAGMENT 256 257 And these two snippets also have an identical effect: 258 259=begin PIR_FRAGMENT 260 261 .lex '$a', $P0 262 $P1 = new 'Integer' 263 $P1 = $P0 264 265=end PIR_FRAGMENT 266 267=begin PIR_FRAGMENT 268 269 .lex '$a', $P0 270 $P1 = new 'Integer' 271 $P1 = find_lex '$a' 272 273=end PIR_FRAGMENT 274 275=item .const <type> <identifier> = <const> 276 277Define a constant named I<identifier> of type I<type> and assign value 278I<const> to it. The I<type> must be C<int>, C<num>, C<string> or a string 279constant indicating the PMC type. This allows you to create PMC constants 280representing subroutines; the value of the constant in that case is the 281name of the subroutine. If the referred subroutine has an C<:immediate> 282modifier and it returns a value, then that value is stored instead of the 283subroutine. 284 285C<.const> declarations representing subroutines can only be written 286within a C<.sub>. The constant is stored in the constant table of the 287current bytecode file. 288 289=item .globalconst <type> <identifier> = <const> 290 291As C<.const> above, but the defined constant is globally accessible. 292C<.globalconst> may only be used within a C<.sub>. 293 294=item .sub 295 296 .sub <identifier> [:<modifier> ...] 297 .sub <quoted string> [:<modifier> ...] 298 299Define a subroutine. All code in a PIR source file must be defined in a 300subroutine. See the section L<Subroutine modifiers> for available 301modifiers. Optional modifiers are a list separated by spaces. 302 303The name of the sub may be either a bare identifier or a quoted string 304constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers> 305above), but string sub names can contain any characters, including characters 306from different character sets (see L<Constants> above). 307 308Always paired with C<.end>. 309 310=item .end 311 312End a subroutine. Always paired with C<.sub>. 313 314=item .namespace [ <identifier> ; <identifier> ] 315 316 .namespace [ <key>? ] 317 318 key: <identifier> [';' <identifier>]* 319 320Defines the namespace from this point onwards. By default the program is not 321in any namespace. If you specify more than one, separated by semicolons, it 322creates nested namespaces, by storing the inner namespace object in the outer 323namespace's global pad. 324 325You can specify the root namespace by using empty brackets, such as: 326 327=begin PIR 328 329 .namespace [ ] 330 331=end PIR 332 333The brackets are not optional, although the key inside them is. 334 335=item .loadlib 'lib_name' 336 337Load the given library at compile time, that is, as soon that line is 338parsed. See also the C<loadlib> opcode, which does the same at run time. 339 340A library loaded this way is also available at runtime, as if it has been 341loaded again in C<:load>, so there is no need to call C<loadlib> at runtime. 342 343=item .HLL <hll_name> 344 345Define the HLL namespace from that point on in the file. Takes one string 346constant, the name of the HLL. By default, the HLL namespace is 'parrot'. 347 348=item .line <integer> 349 350Set the current PIR line number to the value specified. This is useful in 351case the PIR code is generated from some source PIR files, and error messages 352should print the source file's line number, not the line number of the 353generated file. Note that line numbers increment per line of PIR; if you 354are trying to store High Level Language debug information, you should instead 355be using the C<.annotate> directive. 356 357=item .file <quoted_string> 358 359Set the current PIR file name to the value specified. This is useful in case 360the PIR code is generated from some source PIR files, and error messages 361should print the source file's name, not the name of the generated file. 362 363=item .annotate <key>, <value> 364 365Makes an entry in the bytecode annotations table. This is used to store high 366level language debug information. Examples: 367 368=begin PIR_FRAGMENT 369 370 .annotate "file", "aardvark.p6" 371 .annotate "line", 5 372 .annotate "column", 24 373 374=end PIR_FRAGMENT 375 376An annotation stays in effect until the next annotation with the same key or 377the end of the current file (that is, if you use a tool such as C<pbc_merge> 378to link multiple bytecode files, then annotations will not spill over from one 379mergee's bytecode to another). 380 381One annotation covers many PIR instructions. If the result of compiling one 382line of HLL code is 15 lines of PIR, you only need to emit one annotation 383before the first of those 15 lines to set the line number. 384 385=begin PIR_FRAGMENT 386 387 .annotate "line", 42 388 389=end PIR_FRAGMENT 390 391The key must always be a quoted string. The value may be an integer, a number 392or a quoted string. Note that integer values are stored most compactly; should 393you instead of the above annotate directive emit: 394 395=begin PIR_FRAGMENT 396 397 .annotate "line", "42" 398 399=end PIR_FRAGMENT 400 401then instead "42" is stored as a string, taking up more space in the resulting 402bytecode file. 403 404=back 405 406=head4 Subroutine modifiers 407 408=over 4 409 410=item :main 411 412Define "main" entry point to start execution. If multiple subroutines are 413marked as B<:main>, the B<last> marked subroutine is used. Only the first 414file loaded or compiled counts; subs marked as B<:main> are ignored by the 415B<load_bytecode> op. If no B<:main> modifier is specified, execution 416starts at the first subroutine in the file. 417 418=item :load 419 420Run this subroutine when loaded by the B<load_bytecode> op (i.e. neither in 421the initial program file nor compiled from memory). This is complementary to 422what B<:init> does (below); to get both behaviours, use B<:init :load>. If 423multiple subs have the B<:load> pragma, the subs are run in source code order. 424 425=item :init 426 427Run the subroutine when the program is run directly (that is, not loaded as a 428module), including when it is compiled from memory. This is complementary to 429what B<:load> does (above); to get both behaviours, use B<:init :load>. 430 431=item :anon 432 433Do not install this subroutine in the namespace. Allows the subroutine 434name to be reused. 435 436=item :multi(type1, type2...) 437 438Engage in multiple dispatch with the listed types. 439See F<docs/pdds/pdd27_multi_dispatch.pod> for more information on the 440multiple dispatch system. 441 442When used in combination with B<:method> (below), the first type (C<type1>) 443refers to the type of the invocant (C<self>). 444 445=item :immediate 446 447Execute this subroutine immediately after being compiled, which is analogous 448to C<BEGIN> in Perl 5. 449 450In addition, if the sub returns a PMC value, that value replaces the sub in 451the constant table of the bytecode file. This makes it possible to build 452constants at compile time, provided that (a) the generated constant can be 453computed at compile time (i.e. doesn't depend on the runtime environment), and 454(b) the constant value is of a PMC class that supports saving in a bytecode 455file. 456 457{{ TODO: need a freeze/thaw reference }}. 458 459For instance, after compilation of the sub 'init', that sub is executed 460immediately (hence the C<:immediate> modifier). Instead of storing the sub 461'init' in the constants table, the value returned by 'init' is stored, 462which in this example is a FixedIntegerArrray. 463 464=begin PIR 465 466 .sub main :main 467 .const "Sub" initsub = "init" 468 .end 469 470 .sub init :immediate 471 .local pmc array 472 array = new 'FixedIntegerArray' 473 array = 256 # set size to 256 474 475 # code to initialize array 476 .return (array) 477 .end 478 479=end PIR 480 481=item :postcomp 482 483Execute immediately after being compiled, but only if the subroutine is in the 484initial file (i.e. not in PIR compiled as result of a C<load_bytecode> 485instruction from another file). 486 487As an example, suppose file C<main.pir> contains: 488 489=begin PIR 490 491 .sub main 492 load_bytecode 'foo.pir' 493 .end 494 495=end PIR 496 497and the file C<foo.pir> contains: 498 499=begin PIR 500 501 .sub foo :immediate 502 $I0 = 4 503 .end 504 505 .sub bar :postcomp 506 $I0 = 3 507 .end 508 509=end PIR 510 511Executing C<foo.pir> will run both C<foo> and C<bar>. On the other hand, 512executing C<main.pir> will run only C<foo>. If C<foo.pir> is compiled to 513bytecode, only C<foo> will be run, and loading C<foo.pbc> will not run either 514C<foo> or C<bar>. 515 516=item :method 517 518=begin PIR 519 520 .sub bar :method 521 # ... 522 .end 523 524 .sub bar :method('foo') 525 # ... 526 .end 527 528=end PIR 529 530The marked C<.sub> is a method, added as a method in the class that 531corresponds to the current namespace, and not stored in the namespace. 532In the method body, the object PMC can be referred to with C<self>. 533 534If a string argument is given to C<:method> the method is stored with 535that name instead of the C<.sub> name. 536 537=item :vtable 538 539=begin PIR_INVALID 540 541 .sub bar :vtable 542 # ... 543 .end 544 545 .sub bar :vtable('foo') 546 # ... 547 .end 548 549=end PIR_INVALID 550 551The marked C<.sub> overrides a vtable function, and is not stored in the 552namespace. By default, it overrides a vtable function with the same name 553as the C<.sub> name. To override a different vtable function, use 554C<:vtable('...')>. For example, to have a C<.sub> named I<ToString> also 555be the vtable function C<get_string>), use C<:vtable('get_string')>. 556 557When the B<:vtable> modifier is set, the object PMC can be referred to with 558C<self>, as with the B<:method> modifier. 559 560=item :outer(subname) 561 562The marked C<.sub> is lexically nested within the sub known by 563I<subname>. 564 565=item :subid( <string_constant> ) 566 567Specifies a unique string identifier for the subroutine. This is useful for 568referring to a particular subroutine with C<:outer>, even though several 569subroutines in the file may have the same name (because they are multi, or in 570different namespaces). 571 572=item :instanceof( <string_constant> ) 573 574The C<:instanceof> pragma is an experimental pragma that creates a sub as a 575PMC type other than 'Sub'. However, as currently implemented it doesn't 576work well with C<:outer> or existing PMC types such as C<Closure>, 577C<Coroutine>, etc. 578 579=item :nsentry( <string_constant> ) 580 581Specify the name by which the subroutine is stored in the namespace. The 582default name by which a subroutine is stored in the namespace (if this 583modifier is missing), is the subroutine's name as given after the 584C<.sub> directive. This modifier allows to override this. 585 586=back 587 588 589=head4 Directives used for Parrot calling conventions. 590 591=over 4 592 593=item .begin_call and .end_call 594 595Directives to start and end a subroutine invocation, respectively. 596 597=item .begin_return and .end_return 598 599Directives to start and end a statement to return values. 600 601=item .begin_yield and .end_yield 602 603Directives to start and end a statement to yield values. 604 605=item .call 606 607Takes either 2 arguments: the sub and the return continuation, or the 608sub only. For the latter case an B<invokecc> gets emitted. Providing 609an explicit return continuation is more efficient, if its created 610outside of a loop and the call is done inside a loop. 611 612=item .invocant 613 614Directive to specify the object for a method call. Use it in combination 615with C<.call> to call methods. 616 617=item .set_return <var> [:<modifier>]* 618 619Between C<.begin_return> and C<.end_return>, specify one or 620more of the return value(s) of the current subroutine. Available 621modifiers: C<:flat>, C<:named>. 622 623=item .set_yield <var> [:<modifier>]* 624 625Between C<.begin_yield> and C<.end_yield>, specify one or 626more of the yield value(s) of the current subroutine. Available 627modifiers: C<:flat>, C<:named>. 628 629=item .set_arg <var> [:<modifier>]* 630 631Between C<.begin_call> and C<.call>, specify an argument to be 632passed. Available modifiers: C<:flat>, C<:named>. 633 634=item .get_result <var> [:<modifier>]* 635 636Between C<.call> and C<.end_call>, specify where one or more return 637value(s) should be stored. Available modifiers: C<:slurpy>, C<:named>, 638C<:optional>, and C<:opt_flag>. 639 640=back 641 642=head4 Directives for subroutine parameters 643 644=over 4 645 646=item .param <type> <identifier> [:<modifier>]* 647 648At the top of a subroutine, declare a local variable, in the manner 649of C<.local>, into which parameter(s) of the current subroutine should 650be stored. Available modifiers: C<:slurpy>, C<:named>, C<:optional>, 651C<:opt_flag>. 652 653=back 654 655=head4 Parameter Passing and Getting Flags 656 657See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of 658the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>, 659and C<FLAT>, which correspond to the calling convention modifiers 660C<:slurpy>, C<:optional>, C<:opt_flag>, and C<:flat>. 661 662 663=head4 Catching Exceptions 664 665Using the C<push_eh> op you can install an exception handler. If an exception 666is thrown, Parrot will execute the installed exception handler. In order to 667retrieve the thrown exception, use the C<.get_results> directive. This 668directive always takes one argument: an exception object. 669 670=begin PIR_FRAGMENT 671 672 push_eh handler 673 # ... 674 handler: 675 .local pmc exception 676 .get_results (exception) 677 # ... 678 679=end PIR_FRAGMENT 680 681 682This is syntactic sugar for the C<get_results> op, but any modifiers set 683on the targets will be handled automatically by the PIR compiler. The 684C<.get_results> directive must be the first instruction of the exception 685handler; only declarations (.lex, .local) may come first. 686 687To resume execution after handling the exception, just invoke the continuation 688stored in the exception. 689 690=begin PIR_FRAGMENT 691 692 .local pmc exception, continuation 693 # ... 694 .get_results(exception) 695 # ... 696 continuation = exception['resume'] 697 continuation() 698 # ... 699 700=end PIR_FRAGMENT 701 702See L<PDD23|pdds/pdd23_exceptions.pod> for accessing the various attributes 703of the exception object. 704 705=head3 Syntactic Sugar 706 707Any PASM opcode is a valid PIR instruction. In addition, PIR defines some 708syntactic shortcuts. These are provided for ease of use by humans producing 709and maintaining PIR code. 710 711=over 4 712 713=item goto <identifier> 714 715C<branch> to I<identifier> (label or subroutine name). 716 717Examples: 718 719 goto END 720 721=item if <var> goto <identifier> 722 723If I<var> evaluates as true, jump to the named I<identifier>. 724 725=item unless <var> goto <identifier> 726 727Unless I<var> evaluates as true, jump to the named I<identifier>. 728 729=item if null <var> goto <identifier> 730 731If I<var> evaluates as null, jump to the named I<identifier>. 732 733=item unless null <var> goto <identifier> 734 735Unless I<var> evaluates as null, jump to the named I<identifier>. 736 737=item if <var1> <relop> <var2> goto <identifier> 738 739The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. 740 which translate 741to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If 742I<var1 relop var2> evaluates as true, jump to the named I<identifier>. 743 744=item unless <var1> <relop> <var2> goto <identifier> 745 746The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless 747I<var1 relop var2> evaluates as true, jump to the named I<identifier>. 748 749=item <var1> = <var2> 750 751Assign a value. 752 753=item <var1> = <unary> <var2> 754 755Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT). 756 757=item <var1> = <var2> <binary> <var3> 758 759Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*> 760(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent). 761Binary C<.> is concatenation and only valid for string arguments. 762 763C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right. 764C<E<gt>E<gt>E<gt>> is the logical shift right. 765 766Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR). 767 768Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~> 769(bitwise XOR). 770 771Binary relational operations C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. 772 773=item <var1> <op>= <var2> 774 775This is equivalent to 776C<E<lt>var1E<gt> = E<lt>var1E<gt> E<lt>opE<gt> E<lt>var2E<gt>>. Where 777I<op> is called an assignment operator and can be any of the following 778binary operators described earlier: C<+>, C<->, C<*>, C</>, C<%>, C<.>, 779C<&>, C<|>, C<~>, C<E<lt>E<lt>>, C<E<gt>E<gt>> or C<E<gt>E<gt>E<gt>>. 780 781=item <var> = <var> [ <var> ] 782 783A keyed C<set> operation for PMCs to retrieve a value from an aggregate. 784This maps to: 785 786 set <var>, <var> [ <var> ] 787 788=item <var> [ <var> ] = <var> 789 790A keyed C<set> operation to set a value in an aggregate. This maps to: 791 792 set <var> [ <var> ], <var> 793 794=item <var> = <opcode> <arguments> 795 796Many opcodes can use this PIR syntactic sugar. The first argument for the 797opcode is placed before the C<=>, and all remaining arguments go after the 798opcode name. For example: 799 800=begin PIR_FRAGMENT 801 802 new $P0, 'Type' 803 804=end PIR_FRAGMENT 805 806becomes: 807 808=begin PIR_FRAGMENT 809 810 $P0 = new 'Type' 811 812=end PIR_FRAGMENT 813 814Note that this only works for opcodes that have a leading C<OUT> 815parameter. [this restriction unimplemented: TT #906] 816 817=item ([<var1> [:<mod1> ...], ...]) = <var2>([<arg1> [:<mod2> ...], ...]) 818 819This is short for: 820 821 .begin_call 822 .set_arg <arg1> <modifier2> 823 ... 824 .call <var2> 825 .get_result <var1> <modifier1> 826 ... 827 .end_call 828 829=item <var> = <var>([arg [:<modifier> ...], ...]) 830 831=item <var>([arg [:<modifier> ...], ...]) 832 833=item <var>."_method"([arg [:<modifier> ...], ...]) 834 835=item <var>.<var>([arg [:<modifier> ...], ...]) 836 837Function or method call. These notations are shorthand for a longer PCC 838function call. I<var> can denote a global subroutine, a local I<identifier> or 839a I<reg>. 840 841=item .return ([<var> [:<modifier> ...], ...]) 842 843Return from the current subroutine with zero or more values. 844 845=item .tailcall <var>(args) 846 847=item .tailcall <var>.'somemethod'(args) 848 849=item .tailcall <var>.<var>(args) 850 851 852Tail call: call a function or method and return from the sub with the 853function or method call return values. 854 855Internally, the call stack doesn't increase because of a tail call, so 856you can write recursive functions and not have stack overflows. 857 858Whitespace surrounding the dot ('.') that separates the object from the 859method is not allowed. 860 861=back 862 863=head3 Assignment and Morphing 864 865The C<=> syntactic sugar in PIR, when used in the simple case of: 866 867 <var1> = <var2> 868 869directly corresponds to the C<set> opcode. So, two low-level arguments (int, 870num, or string registers, variables, or constants) are a direct C assignment, 871or a C-level conversion (int cast, float cast, a string copy, or a call to one 872of the conversion functions like C<string_to_num>). 873 874Assigning a PMC argument to a low-level argument calls the 875C<get_integer>, C<get_number>, or C<get_string> vtable function on the 876PMC. Assigning a low-level argument to a PMC argument calls the 877C<set_integer_native>, C<set_number_native>, or C<set_string_native> 878vtable function on the PMC (assign to value semantics). Two PMC 879arguments are a direct C assignment (assign to container semantics). 880 881For assign to value semantics for two PMC arguments use C<assign>, which calls 882the C<assign_pmc> vtable function. 883 884=head3 Macros 885 886This section describes the macro layer of the PIR language. The macro layer of 887the PIR compiler handles the following directives: 888 889=over 4 890 891=item * C<.include> '<filename>' 892 893The C<.include> directive takes a string argument that contains the 894name of the PIR file that is included. The contents of the included 895file are inserted as if they were written at the point where the 896C<.include> directive occurs. 897 898The include file is searched for in the current directory and in 899runtime/parrot/include, in that order. The first file of that name to 900be found is included. 901 902The C<.include> directive's search order is subject to change. 903 904=item * C<.macro> <identifier> [<parameters>] 905 906The C<.macro> directive starts the a macro definition named by the specified 907identifier. The optional parameter list is a comma-separated list of 908identifiers, enclosed in parentheses. See C<.endm> for ending the macro 909definition. 910 911=item * C<.endm> 912 913Closes a macro definition. 914 915=item * C<.macro_const> <identifier> (<literal>|<reg>) 916 917=begin PIR 918 919 .macro_const PI 3.14 920 921=end PIR 922 923The C<.macro_const> directive is a special type of macro; it allows the user 924to use a symbolic name for a constant value. Like C<.macro>, the substitution 925occurs at compile time. It takes two arguments (not comma separated), the 926first is an identifier, the second a constant value or a register. 927 928=back 929 930The macro layer is completely implemented in the lexical analysis phase. 931The parser does not know anything about what happens in the lexical 932analysis phase. 933 934When the C<.include> directive is encountered, the specified file is opened 935and the following tokens that are requested by the parser are read from 936that file. 937 938A macro expansion is a dot-prefixed identifier. For instance, if a macro 939was defined as shown below: 940 941=begin PIR 942 943 .macro foo(bar) 944 # ... 945 .endm 946 947=end PIR 948 949this macro can be expanded by writing C<.foo(42)>. The body of the macro 950will be inserted at the point where the macro expansion is written. 951 952A C<.macro_const> expansion is more or less the same as a C<.macro> expansion, 953except that a constant expansion cannot take any arguments, and the 954substitution of a C<.macro_const> contains no newlines, so it can be used 955within a line of code. 956 957=head4 Macro parameter list 958 959The parameter list for a macro is specified in parentheses after the name of 960the macro. Macro parameters are not typed. 961 962=begin PIR 963 964 .macro foo(bar, baz, buz) 965 # ... 966 .endm 967 968=end PIR 969 970The number of arguments in the call to a macro must match the number of 971parameters in the macro's parameter list. Macros do not perform multidispatch, 972so you can't have two macros with the same name but different parameters. 973Calling a macro with the wrong number of arguments gives the user an error. 974 975If a macro defines no parameter list, parentheses are optional on both the 976definition and the call. This means that a macro defined as: 977 978=begin PIR 979 980 .macro foo 981 # ... 982 .endm 983 984=end PIR 985 986can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition 987written as: 988 989=begin PIR 990 991 .macro foo() 992 # ... 993 .endm 994 995=end PIR 996 997can also be expanded by writing either C<.foo> or C<.foo()>. 998 999B<Note: IMCC requires you to write parentheses if the macro was declared with 1000(empty) parentheses. Likewise, when no parentheses were written (implying an 1001empty parameter list), no parentheses may be used in the expansion.> 1002 1003=over 1004 1005=item * Heredoc arguments 1006 1007Heredoc arguments are not allowed when expanding a macro. 1008This means that, currently, when using IMCC, the following is not allowed: 1009 1010=begin PIR_TODO 1011 1012 .macro foo(bar) 1013 ... 1014 .endm 1015 1016 .foo(<<'EOS') 1017 This is a heredoc 1018 string. 1019 1020EOS 1021 1022=end PIR_TODO 1023 1024Using braces, { }, allows you to span multiple lines for an argument. 1025See runtime/parrot/include/hllmacros.pir for examples and possible usage. 1026A simple example is this: 1027 1028=begin PIR 1029 1030 .macro foo(a,b) 1031 .a 1032 .b 1033 .endm 1034 1035 .sub main 1036 .foo({ print "1" 1037 print "2" 1038 }, { 1039 print "3" 1040 print "4" 1041 }) 1042 .end 1043 1044=end PIR 1045 1046This will expand the macro C<foo>, after which the input to the PIR parser is: 1047 1048=begin PIR 1049 1050 .sub main 1051 print "1" 1052 print "2" 1053 print "3" 1054 print "4" 1055 .end 1056 1057=end PIR 1058 1059which will result in the output: 1060 1061 1234 1062 1063=back 1064 1065=head4 Unique local labels 1066 1067Within the macro body, the user can declare a unique label identifier using 1068the value of a macro parameter, like so: 1069 1070=begin PIR 1071 1072 .macro foo(a) 1073 # ... 1074 .label $a: 1075 # ... 1076 .endm 1077 1078=end PIR 1079 1080=head4 Unique local variables I<(not yet implemented)> 1081 1082B<Note: This is not yet implemented in IMCC>. 1083 1084Within the macro body, the user can declare a local variable with a unique 1085name. 1086 1087=begin PIR 1088 1089 .macro foo() 1090 # ... 1091 .macro_local int b 1092 # ... 1093 .b = 42 1094 print .b # prints the value of the unique variable (42) 1095 # ... 1096 .endm 1097 1098=end PIR 1099 1100The C<.macro_local> directive declares a local variable with a unique name in 1101the macro. When the macro C<.foo()> is called, the resulting code that is 1102given to the parser will read as follows: 1103 1104=begin PIR 1105 1106 .sub main 1107 .local int local__foo__b__2 1108 # ... 1109 local__foo__b__2 = 42 1110 print local__foo__b__2 1111 1112 .end 1113 1114=end PIR 1115 1116The user can also declare a local variable with a unique name set to the 1117symbolic value of one of the macro parameters. 1118 1119=begin PIR 1120 1121 .macro foo(b) 1122 # ... 1123 .macro_local int $b 1124 # ... 1125 .$b = 42 1126 print .$b # prints the value of the unique variable (42) 1127 print .b # prints the value of parameter "b", which is 1128 # also the name of the variable. 1129 # ... 1130 .endm 1131 1132=end PIR 1133 1134So, the special C<$> character indicates whether the symbol is interpreted as 1135just the value of the parameter, or that the variable by that name is meant. 1136Obviously, the value of C<b> should be a string. 1137 1138The automatic name munging on C<.macro_local> variables allows for using 1139multiple macros, like so: 1140 1141=begin PIR_TODO 1142 1143 .macro foo(a) 1144 .macro_local int $a 1145 .endm 1146 1147 .macro bar(b) 1148 .macro_local int $b 1149 .endm 1150 1151 .sub main 1152 .foo("x") 1153 .bar("x") 1154 .end 1155 1156=end PIR_TODO 1157 1158This will result in code for the parser as follows: 1159 1160=begin PIR 1161 1162 .sub main 1163 .local int local__foo__x__2 1164 .local int local__bar__x__4 1165 .end 1166 1167=end PIR 1168 1169Each expansion is associated with a unique number; for labels declared with 1170C<.macro_label> and locals declared with C<.macro_local> expansions, this 1171means that multiple expansions of a macro will not result in conflicting 1172label or local names. 1173 1174=head4 Ordinary local variables 1175 1176Defining a non-unique variable can still be done, using the normal syntax: 1177 1178=begin PIR 1179 1180 .macro foo(b) 1181 .local int b 1182 .macro_local int $b 1183 .endm 1184 1185=end PIR 1186 1187When invoking the macro C<foo> as follows: 1188 1189=begin PIR_FRAGMENT 1190 1191 .macro foo(b) 1192 #... 1193 .endm 1194 1195 .foo("x") 1196 1197=end PIR_FRAGMENT 1198 1199there will be two variables: C<b> and C<x>. When the macro is invoked twice: 1200 1201=begin PIR_TODO 1202 1203 .sub main 1204 .foo("x") 1205 .foo("y") 1206 .end 1207 1208=end PIR_TODO 1209 1210the resulting code that is given to the parser will read as follows: 1211 1212=begin PIR 1213 1214 .sub main 1215 .local int b 1216 .local int local__foo__x 1217 .local int b 1218 .local int local__foo__y 1219 .end 1220 1221=end PIR 1222 1223Obviously, this will result in an error, as the variable C<b> is defined 1224twice. If you intend the macro to create unique variables names, use 1225C<.macro_local> instead of C<.local> to take advantage of the name munging. 1226 1227=head2 Examples 1228 1229=head3 Subroutine Definition 1230 1231A simple subroutine, marked with C<:main>, indicating it's the entry point 1232in the file. Other sub modifiers include C<:load>, C<:init>, etc. 1233 1234=begin PIR 1235 1236 .sub sub_label :main 1237 .param int a 1238 .param int b 1239 .param int c 1240 1241 # ... 1242 .local pmc xy 1243 .return(xy) 1244 .end 1245 1246=end PIR 1247 1248=head3 Subroutine Call 1249 1250Invocation of a subroutine. In this case a continuation subroutine is 1251created. 1252 1253=begin PIR_FRAGMENT 1254 1255 .const "Sub" $P0 = "sub_label" 1256 $P1 = new 'Continuation' 1257 set_addr $P1, ret_addr 1258 # ... 1259 .local int x 1260 .local num y 1261 .local string z 1262 .begin_call 1263 .set_arg x 1264 .set_arg y 1265 .set_arg z 1266 .call $P0, $P1 # r = _sub_label(x, y, z) 1267 ret_addr: 1268 .local int r # optional - new result var 1269 .get_result r 1270 .end_call 1271 1272=end PIR_FRAGMENT 1273 1274=head3 Subroutine Call Syntactic Sugar 1275 1276Below there are three different ways to invoke the subroutine C<sub_label>. 1277The first retrieves a single return value, the second retrieves 3 return 1278values, whereas the last does not save any return values. 1279 1280=begin PIR_FRAGMENT 1281 1282 .local int r0, r1, r2 1283 r0 = sub_label($I0, $I1, $I2) 1284 (r0, r1, r2) = sub_label($I0, $I1, $I2) 1285 sub_label($I0, $I1, $I2) 1286 1287=end PIR_FRAGMENT 1288 1289This also works for NCI calls, as the subroutine PMC will be 1290a NCI sub, and on invocation will do the Right Thing. 1291 1292Instead of the label a subroutine object can be used too: 1293 1294=begin PIR_FRAGMENT_TODO 1295 1296 get_global $P0, "sub_label" 1297 $P0(args) 1298 1299=end PIR_FRAGMENT_TODO 1300 1301=head3 Methods 1302 1303=begin PIR_TODO 1304 1305 .namespace [ "Foo" ] 1306 1307 .sub _sub_label :method [,Subpragma, ...] 1308 .param int a 1309 .param int b 1310 .param int c 1311 # ... 1312 self."_other_meth"() 1313 # ... 1314 .begin_return 1315 .set_return xy 1316 .end_return 1317 ... 1318 .end 1319 1320=end PIR_TODO 1321 1322The variable "self" automatically refers to the invocating object, if the 1323subroutine declaration contains "method". 1324 1325=head3 Calling Methods 1326 1327The syntax is very similar to subroutine calls. The call is done with 1328C<.call> immediately preceded by C<.invocant>: 1329 1330=begin PIR_FRAGMENT_TODO 1331 1332 .local int x, y, z 1333 .local pmc class, obj 1334 newclass class, "Foo" 1335 new obj, class 1336 .begin_call 1337 .set_arg x 1338 .set_arg y 1339 .set_arg z 1340 .invocant obj 1341 .call "method" [, $P1 ] # r = obj."method"(x, y, z) 1342 .local int r # optional - new result var 1343 .get_result r 1344 .end_call 1345 ... 1346 1347=end PIR_FRAGMENT_TODO 1348 1349 1350The return continuation is optional. The method can be a string 1351constant or a string variable. 1352 1353=head3 Returning and Yielding 1354 1355 .return ( a, b ) # return the values of a and b 1356 1357 .return () # return no value 1358 1359 .tailcall func_call() # tail call function 1360 1361 .tailcall o."meth"() # tail method call 1362 1363Similarly, one can yield using the .yield directive 1364 1365 .yield ( a, b ) # yield with the values of a and b 1366 1367 .yield () # yield with no value 1368 1369 1370=head2 Implementation 1371 1372There are multiple implementations of PIR, each of which will meet this 1373specification for the syntax. Currently there are the following 1374implementations: 1375 1376=over 4 1377 1378=item * compilers/imcc 1379 1380This is the current implementation being used in Parrot. Some of the 1381specified syntactic constructs in this PDD are not implemented in 1382IMCC; these constructs are marked with notes saying so. 1383 1384=back 1385 1386=head2 References 1387 1388None. 1389 1390=cut 1391 1392__END__ 1393Local Variables: 1394 fill-column:78 1395End: 1396