1# Copyright (C) 2007-2014, Parrot Foundation.
2
3=head1 PDD 19: Parrot Intermediate Representation (PIR)
4
5=head2 Abstract
6
7This document outlines the architecture and core syntax of Parrot
8Intermediate Representation (PIR).
9
10=head2 Description
11
12PIR is a stable, middle-level language intended both as a target for the
13generated output from high-level language compilers, and for human use
14developing core features and extensions for Parrot.
15
16=head3 Basic Syntax
17
18A valid PIR program consists of a sequence of statements, directives, comments
19and empty lines.
20
21=head4 Statements
22
23A statement starts with an optional label, contains an instruction, and is
24terminated by a newline (<NL>). Each statement must be on its own line.
25
26  [label:] [instruction] <NL>
27
28An instruction may be either a low-level opcode or a higher-level PIR
29operation, such as a subroutine call, a method call, a directive, or PIR
30syntactic sugar.
31
32=head4 Directives
33
34A directive provides information for the PIR compiler that is outside the
35normal flow of executable statements. Directives are all prefixed with a ".",
36as in C<.local> or C<.sub>.
37
38=head4 Comments
39
40Comments start with C<#> and last until the following newline. PIR also allows
41comments in Pod format. Comments, Pod content, and empty lines are ignored.
42
43=head4 Identifiers
44
45Identifiers start with a letter or underscore, then may contain additionally
46letters, digits, and underscores. Identifiers don't have any limit on length
47at the moment, but some sane-but-generous length limit may be imposed in the
48future (256 chars, 1024 chars?). The following examples are all valid
49identifiers.
50
51    a
52    _a
53    A42
54
55Opcode names are not reserved words in PIR, and may be used as variable names.
56For example, you can define a local variable named C<print>.
57Note that currently, by using an opcode name as a local variable name, the
58variable will I<hide> the opcode name, effectively making the opcode unusable.
59In the future this will be resolved.
60
61The PIR language is designed to have as few reserved keywords as possible.
62Currently, in contrast to opcode names, PIR keywords I<are> reserved, and
63cannot be used as identifiers. Some opcode names are, in fact, PIR keywords,
64which therefore cannot be used as identifiers. This, too, will be resolved
65in a future re-implementation of the PIR compiler.
66
67The following are PIR keywords, and cannot currently be used as identifiers:
68
69 goto      if       int         null
70 num       pmc      string      unless
71
72=head4 Labels
73
74A label declaration consists of a label name followed by a colon. A label name
75conforms to the standard requirements for identifiers. A label declaration may
76occur at the start of a statement, or stand alone on a line, but always within
77a subroutine.
78
79A reference to a label consists of only the label name, and is generally used
80as an argument to an instruction or directive.
81
82A PIR label is accessible only in the subroutine where it's defined. A label
83name must be unique within a subroutine, but it can be reused in other
84subroutines.
85
86=begin PIR_FRAGMENT
87
88  goto label1
89     # ...
90  label1:
91
92=end PIR_FRAGMENT
93
94=head4 Registers and Variables
95
96There are two ways of referencing Parrot's registers. The first is
97through named local variables declared with C<.local>.
98
99=begin PIR_FRAGMENT
100
101  .local pmc foo
102
103=end PIR_FRAGMENT
104
105The type of a named variable can be C<int>, C<num>, C<string> or C<pmc>,
106corresponding to the types of registers. No other types are used.
107
108The second way of referencing a register is through a register variable
109C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type
110of the register (integer, string, number, or PMC). I<n> consists of
111digit(s) only. There is no limit on the size of I<n>. There is no direct
112correspondence between the value of I<n> and the position of the
113register in the register set, C<$P42> may be stored in the zeroth PMC
114register, if it is the only register in the subroutine.
115
116=head3 Constants
117
118Constants may be used in place of registers or variables. A constant is not
119allowed on the left side of an assignment, or in any other context where the
120variable would be modified.
121
122=over 4
123
124=item 'single-quoted string constant'
125
126Are delimited by single-quotes (C<'>). They are taken to be ASCII encoded. No
127escape sequences are processed.
128
129=item "double-quoted string constants"
130
131Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped
132by C<\>. The default format for a double-quoted string constant is 7-bit
133ASCII, other character sets and encodings must be marked explicitly using a
134format flag.
135
136=item <<"heredoc",  <<'heredoc'
137
138Heredocs work like single or double quoted strings. All lines up to
139the terminating delimiter are slurped into the string. The delimiter
140has to be on its own line, at the beginning of the line and with no
141trailing whitespace.
142
143Assignment of a heredoc:
144
145=begin PIR_FRAGMENT
146
147  $S0 = <<"EOS"
148  ...
149EOS
150
151=end PIR_FRAGMENT
152
153A heredoc as an argument:
154
155=begin PIR_FRAGMENT
156
157  .local pmc function, arg
158  # ...
159
160  function(<<"END_OF_HERE", arg)
161  ...
162END_OF_HERE
163
164  .yield(<<'EOS')
165  ...
166EOS
167
168  .return(<<'EOS')
169  ...
170EOS
171
172=end PIR_FRAGMENT
173
174Although currently not possible, a future implementation of the PIR
175language will allow you to use multiple heredocs within a single
176statement or directive:
177
178=begin PIR_FRAGMENT_TODO
179
180   function(<<'INPUT', <<'OUTPUT', 'some test')
181   ...
182INPUT
183   ...
184OUTPUT
185
186=end PIR_FRAGMENT_TODO
187
188=item format:"string constant"
189
190Like above with a format attached to the string. Valid formats are
191currently: C<ascii> (the default), C<binary>, C<iso-8859-1>, C<utf8>,
192C<utf16>, C<ucs2>, and C<ucs4>.
193
194The format is attached to the string constant, and
195adopted by any string container the constant is assigned to.
196
197The standard escape sequences are honored within strings with an
198alternate format, so you can include a particular Unicode character
199as either a literal sequence of bytes, or as an escape sequence.
200
201=back
202
203=head3 String escape sequences
204
205Inside double-quoted strings the following escape sequences are processed.
206
207  \xhh        1..2 hex digits
208  \ooo        1..3 oct digits
209  \cX         control char X
210  \x{h..h}    1..8 hex digits
211  \uhhhh      4 hex digits
212  \Uhhhhhhhh  8 hex digits
213  \a, \b, \t, \n, \v, \f, \r, \e, \\, \"
214
215=over 4
216
217=item numeric constants
218
219Both integers (C<42>) and numbers (C<3.14159>) may appear as constants.
220C<0x> and C<0b> denote hex and binary constants respectively.
221
222=back
223
224=head3 Directives
225
226=over 4
227
228=item .local <type> <identifier>
229
230Define a local name I<identifier> within a subroutine with the given
231I<type>. You can define multiple identifiers of the same type by
232separating them with commas:
233
234  .local int i, j
235
236=item .lex <string constant>, <reg>
237
238Declare a lexical variable that is an alias for a PMC register. For example
239the following two snippets have an identical effect:
240
241=begin PIR_FRAGMENT
242
243    .lex '$a', $P0
244    $P1 = new 'Integer'
245    $P0 = $P1
246
247=end PIR_FRAGMENT
248
249=begin PIR_FRAGMENT
250
251    .lex '$a', $P0
252    $P1 = new 'Integer'
253    store_lex '$a', $P1
254
255=end PIR_FRAGMENT
256
257    And these two snippets also have an identical effect:
258
259=begin PIR_FRAGMENT
260
261    .lex '$a', $P0
262    $P1 = new 'Integer'
263    $P1 = $P0
264
265=end PIR_FRAGMENT
266
267=begin PIR_FRAGMENT
268
269    .lex '$a', $P0
270    $P1 = new 'Integer'
271    $P1 = find_lex '$a'
272
273=end PIR_FRAGMENT
274
275=item .const <type> <identifier> = <const>
276
277Define a constant named I<identifier> of type I<type> and assign value
278I<const> to it. The I<type> must be C<int>, C<num>, C<string> or a string
279constant indicating the PMC type. This allows you to create PMC constants
280representing subroutines; the value of the constant in that case is the
281name of the subroutine. If the referred subroutine has an C<:immediate>
282modifier and it returns a value, then that value is stored instead of the
283subroutine.
284
285C<.const> declarations representing subroutines can only be written
286within a C<.sub>.  The constant is stored in the constant table of the
287current bytecode file.
288
289=item .globalconst <type> <identifier> = <const>
290
291As C<.const> above, but the defined constant is globally accessible.
292C<.globalconst> may only be used within a C<.sub>.
293
294=item .sub
295
296  .sub <identifier> [:<modifier> ...]
297  .sub <quoted string> [:<modifier> ...]
298
299Define a subroutine. All code in a PIR source file must be defined in a
300subroutine. See the section L<Subroutine modifiers> for available
301modifiers. Optional modifiers are a list separated by spaces.
302
303The name of the sub may be either a bare identifier or a quoted string
304constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
305above), but string sub names can contain any characters, including characters
306from different character sets (see L<Constants> above).
307
308Always paired with C<.end>.
309
310=item .end
311
312End a subroutine. Always paired with C<.sub>.
313
314=item .namespace [ <identifier> ; <identifier> ]
315
316   .namespace [ <key>? ]
317
318   key: <identifier> [';' <identifier>]*
319
320Defines the namespace from this point onwards.  By default the program is not
321in any namespace.  If you specify more than one, separated by semicolons, it
322creates nested namespaces, by storing the inner namespace object in the outer
323namespace's global pad.
324
325You can specify the root namespace by using empty brackets, such as:
326
327=begin PIR
328
329    .namespace [ ]
330
331=end PIR
332
333The brackets are not optional, although the key inside them is.
334
335=item .loadlib 'lib_name'
336
337Load the given library at compile time, that is, as soon that line is
338parsed.  See also the C<loadlib> opcode, which does the same at run time.
339
340A library loaded this way is also available at runtime, as if it has been
341loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.
342
343=item .HLL <hll_name>
344
345Define the HLL namespace from that point on in the file. Takes one string
346constant, the name of the HLL. By default, the HLL namespace is 'parrot'.
347
348=item .line <integer>
349
350Set the current PIR line number to the value specified. This is useful in
351case the PIR code is generated from some source PIR files, and error messages
352should print the source file's line number, not the line number of the
353generated file. Note that line numbers increment per line of PIR; if you
354are trying to store High Level Language debug information, you should instead
355be using the C<.annotate> directive.
356
357=item .file <quoted_string>
358
359Set the current PIR file name to the value specified. This is useful in case
360the PIR code is generated from some source PIR files, and error messages
361should print the source file's name, not the name of the generated file.
362
363=item .annotate <key>, <value>
364
365Makes an entry in the bytecode annotations table. This is used to store high
366level language debug information. Examples:
367
368=begin PIR_FRAGMENT
369
370  .annotate "file", "aardvark.p6"
371  .annotate "line", 5
372  .annotate "column", 24
373
374=end PIR_FRAGMENT
375
376An annotation stays in effect until the next annotation with the same key or
377the end of the current file (that is, if you use a tool such as C<pbc_merge>
378to link multiple bytecode files, then annotations will not spill over from one
379mergee's bytecode to another).
380
381One annotation covers many PIR instructions. If the result of compiling one
382line of HLL code is 15 lines of PIR, you only need to emit one annotation
383before the first of those 15 lines to set the line number.
384
385=begin PIR_FRAGMENT
386
387  .annotate "line", 42
388
389=end PIR_FRAGMENT
390
391The key must always be a quoted string. The value may be an integer, a number
392or a quoted string. Note that integer values are stored most compactly; should
393you instead of the above annotate directive emit:
394
395=begin PIR_FRAGMENT
396
397  .annotate "line", "42"
398
399=end PIR_FRAGMENT
400
401then instead "42" is stored as a string, taking up more space in the resulting
402bytecode file.
403
404=back
405
406=head4 Subroutine modifiers
407
408=over 4
409
410=item :main
411
412Define "main" entry point to start execution.  If multiple subroutines are
413marked as B<:main>, the B<last> marked subroutine is used.  Only the first
414file loaded or compiled counts; subs marked as B<:main> are ignored by the
415B<load_bytecode> op. If no B<:main> modifier is specified, execution
416starts at the first subroutine in the file.
417
418=item :load
419
420Run this subroutine when loaded by the B<load_bytecode> op (i.e. neither in
421the initial program file nor compiled from memory).  This is complementary to
422what B<:init> does (below); to get both behaviours, use B<:init :load>.  If
423multiple subs have the B<:load> pragma, the subs are run in source code order.
424
425=item :init
426
427Run the subroutine when the program is run directly (that is, not loaded as a
428module), including when it is compiled from memory.  This is complementary to
429what B<:load> does (above); to get both behaviours, use B<:init :load>.
430
431=item :anon
432
433Do not install this subroutine in the namespace. Allows the subroutine
434name to be reused.
435
436=item :multi(type1, type2...)
437
438Engage in multiple dispatch with the listed types.
439See F<docs/pdds/pdd27_multi_dispatch.pod> for more information on the
440multiple dispatch system.
441
442When used in combination with B<:method> (below), the first type (C<type1>)
443refers to the type of the invocant (C<self>).
444
445=item :immediate
446
447Execute this subroutine immediately after being compiled, which is analogous
448to C<BEGIN> in Perl 5.
449
450In addition, if the sub returns a PMC value, that value replaces the sub in
451the constant table of the bytecode file.  This makes it possible to build
452constants at compile time, provided that (a) the generated constant can be
453computed at compile time (i.e. doesn't depend on the runtime environment), and
454(b) the constant value is of a PMC class that supports saving in a bytecode
455file.
456
457{{ TODO: need a freeze/thaw reference }}.
458
459For instance, after compilation of the sub 'init', that sub is executed
460immediately (hence the C<:immediate> modifier). Instead of storing the sub
461'init' in the constants table, the value returned by 'init' is stored,
462which in this example is a FixedIntegerArrray.
463
464=begin PIR
465
466    .sub main :main
467      .const "Sub" initsub = "init"
468    .end
469
470    .sub init :immediate
471      .local pmc array
472      array = new 'FixedIntegerArray'
473      array = 256 # set size to 256
474
475      # code to initialize array
476      .return (array)
477    .end
478
479=end PIR
480
481=item :postcomp
482
483Execute immediately after being compiled, but only if the subroutine is in the
484initial file (i.e. not in PIR compiled as result of a C<load_bytecode>
485instruction from another file).
486
487As an example, suppose file C<main.pir> contains:
488
489=begin PIR
490
491    .sub main
492        load_bytecode 'foo.pir'
493    .end
494
495=end PIR
496
497and the file C<foo.pir> contains:
498
499=begin PIR
500
501    .sub foo :immediate
502        $I0 = 4
503    .end
504
505    .sub bar :postcomp
506        $I0 = 3
507    .end
508
509=end PIR
510
511Executing C<foo.pir> will run both C<foo> and C<bar>.  On the other hand,
512executing C<main.pir> will run only C<foo>.  If C<foo.pir> is compiled to
513bytecode, only C<foo> will be run, and loading C<foo.pbc> will not run either
514C<foo> or C<bar>.
515
516=item :method
517
518=begin PIR
519
520  .sub bar :method
521    # ...
522  .end
523
524  .sub bar :method('foo')
525    # ...
526  .end
527
528=end PIR
529
530The marked C<.sub> is a method, added as a method in the class that
531corresponds to the current namespace, and not stored in the namespace.
532In the method body, the object PMC can be referred to with C<self>.
533
534If a string argument is given to C<:method> the method is stored with
535that name instead of the C<.sub> name.
536
537=item :vtable
538
539=begin PIR_INVALID
540
541  .sub bar :vtable
542    # ...
543  .end
544
545  .sub bar :vtable('foo')
546    # ...
547  .end
548
549=end PIR_INVALID
550
551The marked C<.sub> overrides a vtable function, and is not stored in the
552namespace. By default, it overrides a vtable function with the same name
553as the C<.sub> name.  To override a different vtable function, use
554C<:vtable('...')>. For example, to have a C<.sub> named I<ToString> also
555be the vtable function C<get_string>), use C<:vtable('get_string')>.
556
557When the B<:vtable> modifier is set, the object PMC can be referred to with
558C<self>, as with the B<:method> modifier.
559
560=item :outer(subname)
561
562The marked C<.sub> is lexically nested within the sub known by
563I<subname>.
564
565=item :subid( <string_constant> )
566
567Specifies a unique string identifier for the subroutine. This is useful for
568referring to a particular subroutine with C<:outer>, even though several
569subroutines in the file may have the same name (because they are multi, or in
570different namespaces).
571
572=item :instanceof( <string_constant> )
573
574The C<:instanceof> pragma is an experimental pragma that creates a sub as a
575PMC type other than 'Sub'.  However, as currently implemented it doesn't
576work well with C<:outer> or existing PMC types such as C<Closure>,
577C<Coroutine>, etc.
578
579=item :nsentry( <string_constant> )
580
581Specify the name by which the subroutine is stored in the namespace. The
582default name by which a subroutine is stored in the namespace (if this
583modifier is missing), is the subroutine's name as given after the
584C<.sub> directive.  This modifier allows to override this.
585
586=back
587
588
589=head4 Directives used for Parrot calling conventions.
590
591=over 4
592
593=item .begin_call and .end_call
594
595Directives to start and end a subroutine invocation, respectively.
596
597=item .begin_return and .end_return
598
599Directives to start and end a statement to return values.
600
601=item .begin_yield and .end_yield
602
603Directives to start and end a statement to yield values.
604
605=item .call
606
607Takes either 2 arguments: the sub and the return continuation, or the
608sub only. For the latter case an B<invokecc> gets emitted. Providing
609an explicit return continuation is more efficient, if its created
610outside of a loop and the call is done inside a loop.
611
612=item .invocant
613
614Directive to specify the object for a method call. Use it in combination
615with C<.call> to call methods.
616
617=item .set_return <var> [:<modifier>]*
618
619Between C<.begin_return> and C<.end_return>, specify one or
620more of the return value(s) of the current subroutine.  Available
621modifiers: C<:flat>, C<:named>.
622
623=item .set_yield <var> [:<modifier>]*
624
625Between C<.begin_yield> and C<.end_yield>, specify one or
626more of the yield value(s) of the current subroutine.  Available
627modifiers: C<:flat>, C<:named>.
628
629=item .set_arg <var> [:<modifier>]*
630
631Between C<.begin_call> and C<.call>, specify an argument to be
632passed.  Available modifiers: C<:flat>, C<:named>.
633
634=item .get_result <var> [:<modifier>]*
635
636Between C<.call> and C<.end_call>, specify where one or more return
637value(s) should be stored.  Available modifiers: C<:slurpy>, C<:named>,
638C<:optional>, and C<:opt_flag>.
639
640=back
641
642=head4 Directives for subroutine parameters
643
644=over 4
645
646=item .param <type> <identifier> [:<modifier>]*
647
648At the top of a subroutine, declare a local variable, in the manner
649of C<.local>, into which parameter(s) of the current subroutine should
650be stored. Available modifiers: C<:slurpy>, C<:named>, C<:optional>,
651C<:opt_flag>.
652
653=back
654
655=head4 Parameter Passing and Getting Flags
656
657See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of
658the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>,
659and C<FLAT>, which correspond to the calling convention modifiers
660C<:slurpy>, C<:optional>, C<:opt_flag>, and C<:flat>.
661
662
663=head4 Catching Exceptions
664
665Using the C<push_eh> op you can install an exception handler. If an exception
666is thrown, Parrot will execute the installed exception handler. In order to
667retrieve the thrown exception, use the C<.get_results> directive. This
668directive always takes one argument: an exception object.
669
670=begin PIR_FRAGMENT
671
672   push_eh handler
673   # ...
674 handler:
675   .local pmc exception
676   .get_results (exception)
677   # ...
678
679=end PIR_FRAGMENT
680
681
682This is syntactic sugar for the C<get_results> op, but any modifiers set
683on the targets will be handled automatically by the PIR compiler.  The
684C<.get_results> directive must be the first instruction of the exception
685handler; only declarations (.lex, .local) may come first.
686
687To resume execution after handling the exception, just invoke the continuation
688stored in the exception.
689
690=begin PIR_FRAGMENT
691
692   .local pmc exception, continuation
693   # ...
694   .get_results(exception)
695   # ...
696   continuation = exception['resume']
697   continuation()
698   # ...
699
700=end PIR_FRAGMENT
701
702See L<PDD23|pdds/pdd23_exceptions.pod> for accessing the various attributes
703of the exception object.
704
705=head3 Syntactic Sugar
706
707Any PASM opcode is a valid PIR instruction. In addition, PIR defines some
708syntactic shortcuts. These are provided for ease of use by humans producing
709and maintaining PIR code.
710
711=over 4
712
713=item goto <identifier>
714
715C<branch> to I<identifier> (label or subroutine name).
716
717Examples:
718
719  goto END
720
721=item if <var> goto <identifier>
722
723If I<var> evaluates as true, jump to the named I<identifier>.
724
725=item unless <var> goto <identifier>
726
727Unless I<var> evaluates as true, jump to the named I<identifier>.
728
729=item if null <var> goto <identifier>
730
731If I<var> evaluates as null, jump to the named I<identifier>.
732
733=item unless null <var> goto <identifier>
734
735Unless I<var> evaluates as null, jump to the named I<identifier>.
736
737=item if <var1> <relop> <var2> goto <identifier>
738
739The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.
740 which translate
741to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
742I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
743
744=item unless <var1> <relop> <var2> goto <identifier>
745
746The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless
747I<var1 relop var2> evaluates as true, jump to the named I<identifier>.
748
749=item <var1> = <var2>
750
751Assign a value.
752
753=item <var1> = <unary> <var2>
754
755Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT).
756
757=item <var1> = <var2> <binary> <var3>
758
759Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*>
760(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent).
761Binary C<.> is concatenation and only valid for string arguments.
762
763C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right.
764C<E<gt>E<gt>E<gt>> is the logical shift right.
765
766Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR).
767
768Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~>
769(bitwise XOR).
770
771Binary relational operations  C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.
772
773=item <var1> <op>= <var2>
774
775This is equivalent to
776C<E<lt>var1E<gt> = E<lt>var1E<gt> E<lt>opE<gt> E<lt>var2E<gt>>. Where
777I<op> is called an assignment operator and can be any of the following
778binary operators described earlier: C<+>, C<->, C<*>, C</>, C<%>, C<.>,
779C<&>, C<|>, C<~>, C<E<lt>E<lt>>, C<E<gt>E<gt>> or C<E<gt>E<gt>E<gt>>.
780
781=item <var> = <var> [ <var> ]
782
783A keyed C<set> operation for PMCs to retrieve a value from an aggregate.
784This maps to:
785
786  set <var>, <var> [ <var> ]
787
788=item <var> [ <var> ] = <var>
789
790A keyed C<set> operation to set a value in an aggregate. This maps to:
791
792  set <var> [ <var> ], <var>
793
794=item <var> = <opcode> <arguments>
795
796Many opcodes can use this PIR syntactic sugar. The first argument for the
797opcode is placed before the C<=>, and all remaining arguments go after the
798opcode name. For example:
799
800=begin PIR_FRAGMENT
801
802  new $P0, 'Type'
803
804=end PIR_FRAGMENT
805
806becomes:
807
808=begin PIR_FRAGMENT
809
810  $P0 = new 'Type'
811
812=end PIR_FRAGMENT
813
814Note that this only works for opcodes that have a leading C<OUT>
815parameter. [this restriction unimplemented: TT #906]
816
817=item ([<var1> [:<mod1> ...], ...]) = <var2>([<arg1> [:<mod2> ...], ...])
818
819This is short for:
820
821  .begin_call
822  .set_arg <arg1> <modifier2>
823  ...
824  .call <var2>
825  .get_result <var1> <modifier1>
826  ...
827  .end_call
828
829=item <var> = <var>([arg [:<modifier> ...], ...])
830
831=item <var>([arg [:<modifier> ...], ...])
832
833=item <var>."_method"([arg [:<modifier> ...], ...])
834
835=item <var>.<var>([arg [:<modifier> ...], ...])
836
837Function or method call. These notations are shorthand for a longer PCC
838function call. I<var> can denote a global subroutine, a local I<identifier> or
839a I<reg>.
840
841=item .return ([<var> [:<modifier> ...], ...])
842
843Return from the current subroutine with zero or more values.
844
845=item .tailcall <var>(args)
846
847=item .tailcall <var>.'somemethod'(args)
848
849=item .tailcall <var>.<var>(args)
850
851
852Tail call: call a function or method and return from the sub with the
853function or method call return values.
854
855Internally, the call stack doesn't increase because of a tail call, so
856you can write recursive functions and not have stack overflows.
857
858Whitespace surrounding the dot ('.') that separates the object from the
859method is not allowed.
860
861=back
862
863=head3 Assignment and Morphing
864
865The C<=> syntactic sugar in PIR, when used in the simple case of:
866
867  <var1> = <var2>
868
869directly corresponds to the C<set> opcode. So, two low-level arguments (int,
870num, or string registers, variables, or constants) are a direct C assignment,
871or a C-level conversion (int cast, float cast, a string copy, or a call to one
872of the conversion functions like C<string_to_num>).
873
874Assigning a PMC argument to a low-level argument calls the
875C<get_integer>, C<get_number>, or C<get_string> vtable function on the
876PMC. Assigning a low-level argument to a PMC argument calls the
877C<set_integer_native>, C<set_number_native>, or C<set_string_native>
878vtable function on the PMC (assign to value semantics). Two PMC
879arguments are a direct C assignment (assign to container semantics).
880
881For assign to value semantics for two PMC arguments use C<assign>, which calls
882the C<assign_pmc> vtable function.
883
884=head3 Macros
885
886This section describes the macro layer of the PIR language. The macro layer of
887the PIR compiler handles the following directives:
888
889=over 4
890
891=item * C<.include> '<filename>'
892
893The C<.include> directive takes a string argument that contains the
894name of the PIR file that is included. The contents of the included
895file are inserted as if they were written at the point where the
896C<.include> directive occurs.
897
898The include file is searched for in the current directory and in
899runtime/parrot/include, in that order. The first file of that name to
900be found is included.
901
902The C<.include> directive's search order is subject to change.
903
904=item * C<.macro> <identifier> [<parameters>]
905
906The C<.macro> directive starts the a macro definition named by the specified
907identifier. The optional parameter list is a comma-separated list of
908identifiers, enclosed in parentheses.  See C<.endm> for ending the macro
909definition.
910
911=item * C<.endm>
912
913Closes a macro definition.
914
915=item * C<.macro_const> <identifier> (<literal>|<reg>)
916
917=begin PIR
918
919 .macro_const   PI  3.14
920
921=end PIR
922
923The C<.macro_const> directive is a special type of macro; it allows the user
924to use a symbolic name for a constant value. Like C<.macro>, the substitution
925occurs at compile time. It takes two arguments (not comma separated), the
926first is an identifier, the second a constant value or a register.
927
928=back
929
930The macro layer is completely implemented in the lexical analysis phase.
931The parser does not know anything about what happens in the lexical
932analysis phase.
933
934When the C<.include> directive is encountered, the specified file is opened
935and the following tokens that are requested by the parser are read from
936that file.
937
938A macro expansion is a dot-prefixed identifier. For instance, if a macro
939was defined as shown below:
940
941=begin PIR
942
943 .macro foo(bar)
944   # ...
945 .endm
946
947=end PIR
948
949this macro can be expanded by writing C<.foo(42)>. The body of the macro
950will be inserted at the point where the macro expansion is written.
951
952A C<.macro_const> expansion is more or less the same as a C<.macro> expansion,
953except that a constant expansion cannot take any arguments, and the
954substitution of a C<.macro_const> contains no newlines, so it can be used
955within a line of code.
956
957=head4 Macro parameter list
958
959The parameter list for a macro is specified in parentheses after the name of
960the macro. Macro parameters are not typed.
961
962=begin PIR
963
964 .macro foo(bar, baz, buz)
965   # ...
966 .endm
967
968=end PIR
969
970The number of arguments in the call to a macro must match the number of
971parameters in the macro's parameter list. Macros do not perform multidispatch,
972so you can't have two macros with the same name but different parameters.
973Calling a macro with the wrong number of arguments gives the user an error.
974
975If a macro defines no parameter list, parentheses are optional on both the
976definition and the call.  This means that a macro defined as:
977
978=begin PIR
979
980 .macro foo
981   # ...
982 .endm
983
984=end PIR
985
986can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition
987written as:
988
989=begin PIR
990
991 .macro foo()
992   # ...
993 .endm
994
995=end PIR
996
997can also be expanded by writing either C<.foo> or C<.foo()>.
998
999B<Note: IMCC requires you to write parentheses if the macro was declared with
1000(empty) parentheses. Likewise, when no parentheses were written (implying an
1001empty parameter list), no parentheses may be used in the expansion.>
1002
1003=over
1004
1005=item * Heredoc arguments
1006
1007Heredoc arguments are not allowed when expanding a macro.
1008This means that, currently, when using IMCC, the following is not allowed:
1009
1010=begin PIR_TODO
1011
1012   .macro foo(bar)
1013   ...
1014   .endm
1015
1016   .foo(<<'EOS')
1017 This is a heredoc
1018    string.
1019
1020EOS
1021
1022=end PIR_TODO
1023
1024Using braces, { }, allows you to span multiple lines for an argument.
1025See runtime/parrot/include/hllmacros.pir for examples and possible usage.
1026A simple example is this:
1027
1028=begin PIR
1029
1030 .macro foo(a,b)
1031   .a
1032   .b
1033 .endm
1034
1035 .sub main
1036   .foo({ print "1"
1037          print "2"
1038        }, {
1039          print "3"
1040          print "4"
1041        })
1042 .end
1043
1044=end PIR
1045
1046This will expand the macro C<foo>, after which the input to the PIR parser is:
1047
1048=begin PIR
1049
1050 .sub main
1051   print "1"
1052   print "2"
1053   print "3"
1054   print "4"
1055 .end
1056
1057=end PIR
1058
1059which will result in the output:
1060
1061 1234
1062
1063=back
1064
1065=head4 Unique local labels
1066
1067Within the macro body, the user can declare a unique label identifier using
1068the value of a macro parameter, like so:
1069
1070=begin PIR
1071
1072  .macro foo(a)
1073    # ...
1074 .label $a:
1075    # ...
1076  .endm
1077
1078=end PIR
1079
1080=head4 Unique local variables I<(not yet implemented)>
1081
1082B<Note: This is not yet implemented in IMCC>.
1083
1084Within the macro body, the user can declare a local variable with a unique
1085name.
1086
1087=begin PIR
1088
1089  .macro foo()
1090    # ...
1091  .macro_local int b
1092    # ...
1093  .b = 42
1094  print .b # prints the value of the unique variable (42)
1095    # ...
1096  .endm
1097
1098=end PIR
1099
1100The C<.macro_local> directive declares a local variable with a unique name in
1101the macro. When the macro C<.foo()> is called, the resulting code that is
1102given to the parser will read as follows:
1103
1104=begin PIR
1105
1106  .sub main
1107    .local int local__foo__b__2
1108      # ...
1109    local__foo__b__2 = 42
1110    print local__foo__b__2
1111
1112  .end
1113
1114=end PIR
1115
1116The user can also declare a local variable with a unique name set to the
1117symbolic value of one of the macro parameters.
1118
1119=begin PIR
1120
1121  .macro foo(b)
1122    # ...
1123  .macro_local int $b
1124    # ...
1125  .$b = 42
1126  print .$b # prints the value of the unique variable (42)
1127  print .b  # prints the value of parameter "b", which is
1128            # also the name of the variable.
1129  #  ...
1130  .endm
1131
1132=end PIR
1133
1134So, the special C<$> character indicates whether the symbol is interpreted as
1135just the value of the parameter, or that the variable by that name is meant.
1136Obviously, the value of C<b> should be a string.
1137
1138The automatic name munging on C<.macro_local> variables allows for using
1139multiple macros, like so:
1140
1141=begin PIR_TODO
1142
1143  .macro foo(a)
1144  .macro_local int $a
1145  .endm
1146
1147  .macro bar(b)
1148  .macro_local int $b
1149  .endm
1150
1151  .sub main
1152    .foo("x")
1153    .bar("x")
1154  .end
1155
1156=end PIR_TODO
1157
1158This will result in code for the parser as follows:
1159
1160=begin PIR
1161
1162  .sub main
1163    .local int local__foo__x__2
1164    .local int local__bar__x__4
1165  .end
1166
1167=end PIR
1168
1169Each expansion is associated with a unique number; for labels declared with
1170C<.macro_label> and locals declared with C<.macro_local> expansions, this
1171means that multiple expansions of a macro will not result in conflicting
1172label or local names.
1173
1174=head4 Ordinary local variables
1175
1176Defining a non-unique variable can still be done, using the normal syntax:
1177
1178=begin PIR
1179
1180  .macro foo(b)
1181  .local int b
1182  .macro_local int $b
1183  .endm
1184
1185=end PIR
1186
1187When invoking the macro C<foo> as follows:
1188
1189=begin PIR_FRAGMENT
1190
1191  .macro foo(b)
1192    #...
1193  .endm
1194
1195  .foo("x")
1196
1197=end PIR_FRAGMENT
1198
1199there will be two variables: C<b> and C<x>. When the macro is invoked twice:
1200
1201=begin PIR_TODO
1202
1203  .sub main
1204    .foo("x")
1205    .foo("y")
1206  .end
1207
1208=end PIR_TODO
1209
1210the resulting code that is given to the parser will read as follows:
1211
1212=begin PIR
1213
1214  .sub main
1215    .local int b
1216    .local int local__foo__x
1217    .local int b
1218    .local int local__foo__y
1219  .end
1220
1221=end PIR
1222
1223Obviously, this will result in an error, as the variable C<b> is defined
1224twice.  If you intend the macro to create unique variables names, use
1225C<.macro_local> instead of C<.local> to take advantage of the name munging.
1226
1227=head2 Examples
1228
1229=head3 Subroutine Definition
1230
1231A simple subroutine, marked with C<:main>, indicating it's the entry point
1232in the file. Other sub modifiers include C<:load>, C<:init>, etc.
1233
1234=begin PIR
1235
1236    .sub sub_label :main
1237      .param int a
1238      .param int b
1239      .param int c
1240
1241      # ...
1242      .local pmc xy
1243      .return(xy)
1244    .end
1245
1246=end PIR
1247
1248=head3 Subroutine Call
1249
1250Invocation of a subroutine. In this case a continuation subroutine is
1251created.
1252
1253=begin PIR_FRAGMENT
1254
1255    .const "Sub" $P0 = "sub_label"
1256    $P1 = new 'Continuation'
1257    set_addr $P1, ret_addr
1258    # ...
1259    .local int x
1260    .local num y
1261    .local string z
1262    .begin_call
1263      .set_arg x
1264      .set_arg y
1265      .set_arg z
1266      .call $P0, $P1    # r = _sub_label(x, y, z)
1267  ret_addr:
1268      .local int r      # optional - new result var
1269      .get_result r
1270    .end_call
1271
1272=end PIR_FRAGMENT
1273
1274=head3 Subroutine Call Syntactic Sugar
1275
1276Below there are three different ways to invoke the subroutine C<sub_label>.
1277The first retrieves a single return value, the second retrieves 3 return
1278values, whereas the last does not save any return values.
1279
1280=begin PIR_FRAGMENT
1281
1282  .local int r0, r1, r2
1283  r0 = sub_label($I0, $I1, $I2)
1284  (r0, r1, r2) = sub_label($I0, $I1, $I2)
1285  sub_label($I0, $I1, $I2)
1286
1287=end PIR_FRAGMENT
1288
1289This also works for NCI calls, as the subroutine PMC will be
1290a NCI sub, and on invocation will do the Right Thing.
1291
1292Instead of the label a subroutine object can be used too:
1293
1294=begin PIR_FRAGMENT_TODO
1295
1296   get_global $P0, "sub_label"
1297   $P0(args)
1298
1299=end PIR_FRAGMENT_TODO
1300
1301=head3 Methods
1302
1303=begin PIR_TODO
1304
1305  .namespace [ "Foo" ]
1306
1307  .sub _sub_label :method [,Subpragma, ...]
1308    .param int a
1309    .param int b
1310    .param int c
1311    # ...
1312    self."_other_meth"()
1313    # ...
1314    .begin_return
1315    .set_return xy
1316    .end_return
1317    ...
1318  .end
1319
1320=end PIR_TODO
1321
1322The variable "self" automatically refers to the invocating object, if the
1323subroutine declaration contains "method".
1324
1325=head3 Calling Methods
1326
1327The syntax is very similar to subroutine calls. The call is done with
1328C<.call> immediately preceded by C<.invocant>:
1329
1330=begin PIR_FRAGMENT_TODO
1331
1332   .local int x, y, z
1333   .local pmc class, obj
1334   newclass class, "Foo"
1335   new obj, class
1336   .begin_call
1337   .set_arg x
1338   .set_arg y
1339   .set_arg z
1340   .invocant obj
1341   .call "method" [, $P1 ] # r = obj."method"(x, y, z)
1342   .local int r  # optional - new result var
1343   .get_result r
1344   .end_call
1345   ...
1346
1347=end PIR_FRAGMENT_TODO
1348
1349
1350The return continuation is optional. The method can be a string
1351constant or a string variable.
1352
1353=head3 Returning and Yielding
1354
1355  .return ( a, b )      # return the values of a and b
1356
1357  .return ()            # return no value
1358
1359  .tailcall func_call()   # tail call function
1360
1361  .tailcall o."meth"()    # tail method call
1362
1363Similarly, one can yield using the .yield directive
1364
1365  .yield ( a, b )      # yield with the values of a and b
1366
1367  .yield ()            # yield with no value
1368
1369
1370=head2 Implementation
1371
1372There are multiple implementations of PIR, each of which will meet this
1373specification for the syntax. Currently there are the following
1374implementations:
1375
1376=over 4
1377
1378=item * compilers/imcc
1379
1380This is the current implementation being used in Parrot. Some of the
1381specified syntactic constructs in this PDD are not implemented in
1382IMCC; these constructs are marked with notes saying so.
1383
1384=back
1385
1386=head2 References
1387
1388None.
1389
1390=cut
1391
1392__END__
1393Local Variables:
1394  fill-column:78
1395End:
1396