docs/pdds/pdd19_pir.pod

# Copyright (C) 2007-2014, Parrot Foundation.

=head1 PDD 19: Parrot Intermediate Representation (PIR)

=head2 Abstract

This document outlines the architecture and core syntax of Parrot
Intermediate Representation (PIR).

=head2 Description

PIR is a stable, middle-level language intended both as a target for the
generated output from high-level language compilers, and for human use
developing core features and extensions for Parrot.

=head3 Basic Syntax

A valid PIR program consists of a sequence of statements, directives, comments
and empty lines.

=head4 Statements

A statement starts with an optional label, contains an instruction, and is
terminated by a newline (<NL>). Each statement must be on its own line.

  [label:] [instruction] <NL>

An instruction may be either a low-level opcode or a higher-level PIR
operation, such as a subroutine call, a method call, a directive, or PIR
syntactic sugar.

=head4 Directives

A directive provides information for the PIR compiler that is outside the
normal flow of executable statements. Directives are all prefixed with a ".",
as in C<.local> or C<.sub>.

=head4 Comments

Comments start with C<#> and last until the following newline. PIR also allows
comments in Pod format. Comments, Pod content, and empty lines are ignored.

=head4 Identifiers

Identifiers start with a letter or underscore, then may contain additionally
letters, digits, and underscores. Identifiers don't have any limit on length
at the moment, but some sane-but-generous length limit may be imposed in the
future (256 chars, 1024 chars?). The following examples are all valid
identifiers.

    a
    _a
    A42

Opcode names are not reserved words in PIR, and may be used as variable names.
For example, you can define a local variable named C<print>.
Note that currently, by using an opcode name as a local variable name, the
variable will I<hide> the opcode name, effectively making the opcode unusable.
In the future this will be resolved.

The PIR language is designed to have as few reserved keywords as possible.
Currently, in contrast to opcode names, PIR keywords I<are> reserved, and
cannot be used as identifiers. Some opcode names are, in fact, PIR keywords,
which therefore cannot be used as identifiers. This, too, will be resolved
in a future re-implementation of the PIR compiler.

The following are PIR keywords, and cannot currently be used as identifiers:

 goto      if       int         null
 num       pmc      string      unless

=head4 Labels

A label declaration consists of a label name followed by a colon. A label name
conforms to the standard requirements for identifiers. A label declaration may
occur at the start of a statement, or stand alone on a line, but always within
a subroutine.

A reference to a label consists of only the label name, and is generally used
as an argument to an instruction or directive.

A PIR label is accessible only in the subroutine where it's defined. A label
name must be unique within a subroutine, but it can be reused in other
subroutines.

=begin PIR_FRAGMENT

  goto label1
     # ...
  label1:

=end PIR_FRAGMENT

=head4 Registers and Variables

There are two ways of referencing Parrot's registers. The first is
through named local variables declared with C<.local>.

=begin PIR_FRAGMENT

  .local pmc foo

=end PIR_FRAGMENT

The type of a named variable can be C<int>, C<num>, C<string> or C<pmc>,
corresponding to the types of registers. No other types are used.

The second way of referencing a register is through a register variable
C<$In>, C<$Sn>, C<$Nn>, or C<$Pn>. The capital letter indicates the type
of the register (integer, string, number, or PMC). I<n> consists of
digit(s) only. There is no limit on the size of I<n>. There is no direct
correspondence between the value of I<n> and the position of the
register in the register set, C<$P42> may be stored in the zeroth PMC
register, if it is the only register in the subroutine.

=head3 Constants

Constants may be used in place of registers or variables. A constant is not
allowed on the left side of an assignment, or in any other context where the
variable would be modified.

=over 4

=item 'single-quoted string constant'

Are delimited by single-quotes (C<'>). They are taken to be ASCII encoded. No
escape sequences are processed.

=item "double-quoted string constants"

Are delimited by double-quotes (C<">). A C<"> inside a string must be escaped
by C<\>. The default format for a double-quoted string constant is 7-bit
ASCII, other character sets and encodings must be marked explicitly using a
format flag.

=item <<"heredoc",  <<'heredoc'

Heredocs work like single or double quoted strings. All lines up to
the terminating delimiter are slurped into the string. The delimiter
has to be on its own line, at the beginning of the line and with no
trailing whitespace.

Assignment of a heredoc:

=begin PIR_FRAGMENT

  $S0 = <<"EOS"
  ...
EOS

=end PIR_FRAGMENT

A heredoc as an argument:

=begin PIR_FRAGMENT

  .local pmc function, arg
  # ...

  function(<<"END_OF_HERE", arg)
  ...
END_OF_HERE

  .yield(<<'EOS')
  ...
EOS

  .return(<<'EOS')
  ...
EOS

=end PIR_FRAGMENT

Although currently not possible, a future implementation of the PIR
language will allow you to use multiple heredocs within a single
statement or directive:

=begin PIR_FRAGMENT_TODO

   function(<<'INPUT', <<'OUTPUT', 'some test')
   ...
INPUT
   ...
OUTPUT

=end PIR_FRAGMENT_TODO

=item format:"string constant"

Like above with a format attached to the string. Valid formats are
currently: C<ascii> (the default), C<binary>, C<iso-8859-1>, C<utf8>,
C<utf16>, C<ucs2>, and C<ucs4>.

The format is attached to the string constant, and
adopted by any string container the constant is assigned to.

The standard escape sequences are honored within strings with an
alternate format, so you can include a particular Unicode character
as either a literal sequence of bytes, or as an escape sequence.

=back

=head3 String escape sequences

Inside double-quoted strings the following escape sequences are processed.

  \xhh        1..2 hex digits
  \ooo        1..3 oct digits
  \cX         control char X
  \x{h..h}    1..8 hex digits
  \uhhhh      4 hex digits
  \Uhhhhhhhh  8 hex digits
  \a, \b, \t, \n, \v, \f, \r, \e, \\, \"

=over 4

=item numeric constants

Both integers (C<42>) and numbers (C<3.14159>) may appear as constants.
C<0x> and C<0b> denote hex and binary constants respectively.

=back

=head3 Directives

=over 4

=item .local <type> <identifier>

Define a local name I<identifier> within a subroutine with the given
I<type>. You can define multiple identifiers of the same type by
separating them with commas:

  .local int i, j

=item .lex <string constant>, <reg>

Declare a lexical variable that is an alias for a PMC register. For example
the following two snippets have an identical effect:

=begin PIR_FRAGMENT

    .lex '$a', $P0
    $P1 = new 'Integer'
    $P0 = $P1

=end PIR_FRAGMENT

=begin PIR_FRAGMENT

    .lex '$a', $P0
    $P1 = new 'Integer'
    store_lex '$a', $P1

=end PIR_FRAGMENT

    And these two snippets also have an identical effect:

=begin PIR_FRAGMENT

    .lex '$a', $P0
    $P1 = new 'Integer'
    $P1 = $P0

=end PIR_FRAGMENT

=begin PIR_FRAGMENT

    .lex '$a', $P0
    $P1 = new 'Integer'
    $P1 = find_lex '$a'

=end PIR_FRAGMENT

=item .const <type> <identifier> = <const>

Define a constant named I<identifier> of type I<type> and assign value
I<const> to it. The I<type> must be C<int>, C<num>, C<string> or a string
constant indicating the PMC type. This allows you to create PMC constants
representing subroutines; the value of the constant in that case is the
name of the subroutine. If the referred subroutine has an C<:immediate>
modifier and it returns a value, then that value is stored instead of the
subroutine.

C<.const> declarations representing subroutines can only be written
within a C<.sub>.  The constant is stored in the constant table of the
current bytecode file.

=item .globalconst <type> <identifier> = <const>

As C<.const> above, but the defined constant is globally accessible.
C<.globalconst> may only be used within a C<.sub>.

=item .sub

  .sub <identifier> [:<modifier> ...]
  .sub <quoted string> [:<modifier> ...]

Define a subroutine. All code in a PIR source file must be defined in a
subroutine. See the section L<Subroutine modifiers> for available
modifiers. Optional modifiers are a list separated by spaces.

The name of the sub may be either a bare identifier or a quoted string
constant. Bare identifiers must be valid PIR identifiers (see L<Identifiers>
above), but string sub names can contain any characters, including characters
from different character sets (see L<Constants> above).

Always paired with C<.end>.

=item .end

End a subroutine. Always paired with C<.sub>.

=item .namespace [ <identifier> ; <identifier> ]

   .namespace [ <key>? ]

   key: <identifier> [';' <identifier>]*

Defines the namespace from this point onwards.  By default the program is not
in any namespace.  If you specify more than one, separated by semicolons, it
creates nested namespaces, by storing the inner namespace object in the outer
namespace's global pad.

You can specify the root namespace by using empty brackets, such as:

=begin PIR

    .namespace [ ]

=end PIR

The brackets are not optional, although the key inside them is.

=item .loadlib 'lib_name'

Load the given library at compile time, that is, as soon that line is
parsed.  See also the C<loadlib> opcode, which does the same at run time.

A library loaded this way is also available at runtime, as if it has been
loaded again in C<:load>, so there is no need to call C<loadlib> at runtime.

=item .HLL <hll_name>

Define the HLL namespace from that point on in the file. Takes one string
constant, the name of the HLL. By default, the HLL namespace is 'parrot'.

=item .line <integer>

Set the current PIR line number to the value specified. This is useful in
case the PIR code is generated from some source PIR files, and error messages
should print the source file's line number, not the line number of the
generated file. Note that line numbers increment per line of PIR; if you
are trying to store High Level Language debug information, you should instead
be using the C<.annotate> directive.

=item .file <quoted_string>

Set the current PIR file name to the value specified. This is useful in case
the PIR code is generated from some source PIR files, and error messages
should print the source file's name, not the name of the generated file.

=item .annotate <key>, <value>

Makes an entry in the bytecode annotations table. This is used to store high
level language debug information. Examples:

=begin PIR_FRAGMENT

  .annotate "file", "aardvark.p6"
  .annotate "line", 5
  .annotate "column", 24

=end PIR_FRAGMENT

An annotation stays in effect until the next annotation with the same key or
the end of the current file (that is, if you use a tool such as C<pbc_merge>
to link multiple bytecode files, then annotations will not spill over from one
mergee's bytecode to another).

One annotation covers many PIR instructions. If the result of compiling one
line of HLL code is 15 lines of PIR, you only need to emit one annotation
before the first of those 15 lines to set the line number.

=begin PIR_FRAGMENT

  .annotate "line", 42

=end PIR_FRAGMENT

The key must always be a quoted string. The value may be an integer, a number
or a quoted string. Note that integer values are stored most compactly; should
you instead of the above annotate directive emit:

=begin PIR_FRAGMENT

  .annotate "line", "42"

=end PIR_FRAGMENT

then instead "42" is stored as a string, taking up more space in the resulting
bytecode file.

=back

=head4 Subroutine modifiers

=over 4

=item :main

Define "main" entry point to start execution.  If multiple subroutines are
marked as B<:main>, the B<last> marked subroutine is used.  Only the first
file loaded or compiled counts; subs marked as B<:main> are ignored by the
B<load_bytecode> op. If no B<:main> modifier is specified, execution
starts at the first subroutine in the file.

=item :load

Run this subroutine when loaded by the B<load_bytecode> op (i.e. neither in
the initial program file nor compiled from memory).  This is complementary to
what B<:init> does (below); to get both behaviours, use B<:init :load>.  If
multiple subs have the B<:load> pragma, the subs are run in source code order.

=item :init

Run the subroutine when the program is run directly (that is, not loaded as a
module), including when it is compiled from memory.  This is complementary to
what B<:load> does (above); to get both behaviours, use B<:init :load>.

=item :anon

Do not install this subroutine in the namespace. Allows the subroutine
name to be reused.

=item :multi(type1, type2...)

Engage in multiple dispatch with the listed types.
See F<docs/pdds/pdd27_multi_dispatch.pod> for more information on the
multiple dispatch system.

When used in combination with B<:method> (below), the first type (C<type1>)
refers to the type of the invocant (C<self>).

=item :immediate

Execute this subroutine immediately after being compiled, which is analogous
to C<BEGIN> in Perl 5.

In addition, if the sub returns a PMC value, that value replaces the sub in
the constant table of the bytecode file.  This makes it possible to build
constants at compile time, provided that (a) the generated constant can be
computed at compile time (i.e. doesn't depend on the runtime environment), and
(b) the constant value is of a PMC class that supports saving in a bytecode
file.

{{ TODO: need a freeze/thaw reference }}.

For instance, after compilation of the sub 'init', that sub is executed
immediately (hence the C<:immediate> modifier). Instead of storing the sub
'init' in the constants table, the value returned by 'init' is stored,
which in this example is a FixedIntegerArrray.

=begin PIR

    .sub main :main
      .const "Sub" initsub = "init"
    .end

    .sub init :immediate
      .local pmc array
      array = new 'FixedIntegerArray'
      array = 256 # set size to 256

      # code to initialize array
      .return (array)
    .end

=end PIR

=item :postcomp

Execute immediately after being compiled, but only if the subroutine is in the
initial file (i.e. not in PIR compiled as result of a C<load_bytecode>
instruction from another file).

As an example, suppose file C<main.pir> contains:

=begin PIR

    .sub main
        load_bytecode 'foo.pir'
    .end

=end PIR

and the file C<foo.pir> contains:

=begin PIR

    .sub foo :immediate
        $I0 = 4
    .end

    .sub bar :postcomp
        $I0 = 3
    .end

=end PIR

Executing C<foo.pir> will run both C<foo> and C<bar>.  On the other hand,
executing C<main.pir> will run only C<foo>.  If C<foo.pir> is compiled to
bytecode, only C<foo> will be run, and loading C<foo.pbc> will not run either
C<foo> or C<bar>.

=item :method

=begin PIR

  .sub bar :method
    # ...
  .end

  .sub bar :method('foo')
    # ...
  .end

=end PIR

The marked C<.sub> is a method, added as a method in the class that
corresponds to the current namespace, and not stored in the namespace.
In the method body, the object PMC can be referred to with C<self>.

If a string argument is given to C<:method> the method is stored with
that name instead of the C<.sub> name.

=item :vtable

=begin PIR_INVALID

  .sub bar :vtable
    # ...
  .end

  .sub bar :vtable('foo')
    # ...
  .end

=end PIR_INVALID

The marked C<.sub> overrides a vtable function, and is not stored in the
namespace. By default, it overrides a vtable function with the same name
as the C<.sub> name.  To override a different vtable function, use
C<:vtable('...')>. For example, to have a C<.sub> named I<ToString> also
be the vtable function C<get_string>), use C<:vtable('get_string')>.

When the B<:vtable> modifier is set, the object PMC can be referred to with
C<self>, as with the B<:method> modifier.

=item :outer(subname)

The marked C<.sub> is lexically nested within the sub known by
I<subname>.

=item :subid( <string_constant> )

Specifies a unique string identifier for the subroutine. This is useful for
referring to a particular subroutine with C<:outer>, even though several
subroutines in the file may have the same name (because they are multi, or in
different namespaces).

=item :instanceof( <string_constant> )

The C<:instanceof> pragma is an experimental pragma that creates a sub as a
PMC type other than 'Sub'.  However, as currently implemented it doesn't
work well with C<:outer> or existing PMC types such as C<Closure>,
C<Coroutine>, etc.

=item :nsentry( <string_constant> )

Specify the name by which the subroutine is stored in the namespace. The
default name by which a subroutine is stored in the namespace (if this
modifier is missing), is the subroutine's name as given after the
C<.sub> directive.  This modifier allows to override this.

=back


=head4 Directives used for Parrot calling conventions.

=over 4

=item .begin_call and .end_call

Directives to start and end a subroutine invocation, respectively.

=item .begin_return and .end_return

Directives to start and end a statement to return values.

=item .begin_yield and .end_yield

Directives to start and end a statement to yield values.

=item .call

Takes either 2 arguments: the sub and the return continuation, or the
sub only. For the latter case an B<invokecc> gets emitted. Providing
an explicit return continuation is more efficient, if its created
outside of a loop and the call is done inside a loop.

=item .invocant

Directive to specify the object for a method call. Use it in combination
with C<.call> to call methods.

=item .set_return <var> [:<modifier>]*

Between C<.begin_return> and C<.end_return>, specify one or
more of the return value(s) of the current subroutine.  Available
modifiers: C<:flat>, C<:named>.

=item .set_yield <var> [:<modifier>]*

Between C<.begin_yield> and C<.end_yield>, specify one or
more of the yield value(s) of the current subroutine.  Available
modifiers: C<:flat>, C<:named>.

=item .set_arg <var> [:<modifier>]*

Between C<.begin_call> and C<.call>, specify an argument to be
passed.  Available modifiers: C<:flat>, C<:named>.

=item .get_result <var> [:<modifier>]*

Between C<.call> and C<.end_call>, specify where one or more return
value(s) should be stored.  Available modifiers: C<:slurpy>, C<:named>,
C<:optional>, and C<:opt_flag>.

=back

=head4 Directives for subroutine parameters

=over 4

=item .param <type> <identifier> [:<modifier>]*

At the top of a subroutine, declare a local variable, in the manner
of C<.local>, into which parameter(s) of the current subroutine should
be stored. Available modifiers: C<:slurpy>, C<:named>, C<:optional>,
C<:opt_flag>.

=back

=head4 Parameter Passing and Getting Flags

See L<PDD03|pdds/pdd03_calling_conventions.pod> for a description of
the meaning of the flag bits C<SLURPY>, C<OPTIONAL>, C<OPT_FLAG>,
and C<FLAT>, which correspond to the calling convention modifiers
C<:slurpy>, C<:optional>, C<:opt_flag>, and C<:flat>.


=head4 Catching Exceptions

Using the C<push_eh> op you can install an exception handler. If an exception
is thrown, Parrot will execute the installed exception handler. In order to
retrieve the thrown exception, use the C<.get_results> directive. This
directive always takes one argument: an exception object.

=begin PIR_FRAGMENT

   push_eh handler
   # ...
 handler:
   .local pmc exception
   .get_results (exception)
   # ...

=end PIR_FRAGMENT


This is syntactic sugar for the C<get_results> op, but any modifiers set
on the targets will be handled automatically by the PIR compiler.  The
C<.get_results> directive must be the first instruction of the exception
handler; only declarations (.lex, .local) may come first.

To resume execution after handling the exception, just invoke the continuation
stored in the exception.

=begin PIR_FRAGMENT

   .local pmc exception, continuation
   # ...
   .get_results(exception)
   # ...
   continuation = exception['resume']
   continuation()
   # ...

=end PIR_FRAGMENT

See L<PDD23|pdds/pdd23_exceptions.pod> for accessing the various attributes
of the exception object.

=head3 Syntactic Sugar

Any PASM opcode is a valid PIR instruction. In addition, PIR defines some
syntactic shortcuts. These are provided for ease of use by humans producing
and maintaining PIR code.

=over 4

=item goto <identifier>

C<branch> to I<identifier> (label or subroutine name).

Examples:

  goto END

=item if <var> goto <identifier>

If I<var> evaluates as true, jump to the named I<identifier>.

=item unless <var> goto <identifier>

Unless I<var> evaluates as true, jump to the named I<identifier>.

=item if null <var> goto <identifier>

If I<var> evaluates as null, jump to the named I<identifier>.

=item unless null <var> goto <identifier>

Unless I<var> evaluates as null, jump to the named I<identifier>.

=item if <var1> <relop> <var2> goto <identifier>

The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.
 which translate
to the PASM opcodes C<lt>, C<le>, C<eq>, C<ne>, C<ge> or C<gt>. If
I<var1 relop var2> evaluates as true, jump to the named I<identifier>.

=item unless <var1> <relop> <var2> goto <identifier>

The I<relop> can be: C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>. Unless
I<var1 relop var2> evaluates as true, jump to the named I<identifier>.

=item <var1> = <var2>

Assign a value.

=item <var1> = <unary> <var2>

Unary operations C<!> (NOT), C<-> (negation) and C<~> (bitwise NOT).

=item <var1> = <var2> <binary> <var3>

Binary arithmetic operations C<+> (addition), C<-> (subtraction), C<*>
(multiplication), C</> (division), C<%> (modulus) and C<**> (exponent).
Binary C<.> is concatenation and only valid for string arguments.

C<E<lt>E<lt>> and C<E<gt>E<gt>> are arithmetic shifts left and right.
C<E<gt>E<gt>E<gt>> is the logical shift right.

Binary logic operations C<&&> (AND), C<||> (OR) and C<~~> (XOR).

Binary bitwise operations C<&> (bitwise AND), C<|> (bitwise OR) and C<~>
(bitwise XOR).

Binary relational operations  C<E<lt>, E<lt>=, ==, != E<gt>= E<gt>>.

=item <var1> <op>= <var2>

This is equivalent to
C<E<lt>var1E<gt> = E<lt>var1E<gt> E<lt>opE<gt> E<lt>var2E<gt>>. Where
I<op> is called an assignment operator and can be any of the following
binary operators described earlier: C<+>, C<->, C<*>, C</>, C<%>, C<.>,
C<&>, C<|>, C<~>, C<E<lt>E<lt>>, C<E<gt>E<gt>> or C<E<gt>E<gt>E<gt>>.

=item <var> = <var> [ <var> ]

A keyed C<set> operation for PMCs to retrieve a value from an aggregate.
This maps to:

  set <var>, <var> [ <var> ]

=item <var> [ <var> ] = <var>

A keyed C<set> operation to set a value in an aggregate. This maps to:

  set <var> [ <var> ], <var>

=item <var> = <opcode> <arguments>

Many opcodes can use this PIR syntactic sugar. The first argument for the
opcode is placed before the C<=>, and all remaining arguments go after the
opcode name. For example:

=begin PIR_FRAGMENT

  new $P0, 'Type'

=end PIR_FRAGMENT

becomes:

=begin PIR_FRAGMENT

  $P0 = new 'Type'

=end PIR_FRAGMENT

Note that this only works for opcodes that have a leading C<OUT>
parameter. [this restriction unimplemented: TT #906]

=item ([<var1> [:<mod1> ...], ...]) = <var2>([<arg1> [:<mod2> ...], ...])

This is short for:

  .begin_call
  .set_arg <arg1> <modifier2>
  ...
  .call <var2>
  .get_result <var1> <modifier1>
  ...
  .end_call

=item <var> = <var>([arg [:<modifier> ...], ...])

=item <var>([arg [:<modifier> ...], ...])

=item <var>."_method"([arg [:<modifier> ...], ...])

=item <var>.<var>([arg [:<modifier> ...], ...])

Function or method call. These notations are shorthand for a longer PCC
function call. I<var> can denote a global subroutine, a local I<identifier> or
a I<reg>.

=item .return ([<var> [:<modifier> ...], ...])

Return from the current subroutine with zero or more values.

=item .tailcall <var>(args)

=item .tailcall <var>.'somemethod'(args)

=item .tailcall <var>.<var>(args)


Tail call: call a function or method and return from the sub with the
function or method call return values.

Internally, the call stack doesn't increase because of a tail call, so
you can write recursive functions and not have stack overflows.

Whitespace surrounding the dot ('.') that separates the object from the
method is not allowed.

=back

=head3 Assignment and Morphing

The C<=> syntactic sugar in PIR, when used in the simple case of:

  <var1> = <var2>

directly corresponds to the C<set> opcode. So, two low-level arguments (int,
num, or string registers, variables, or constants) are a direct C assignment,
or a C-level conversion (int cast, float cast, a string copy, or a call to one
of the conversion functions like C<string_to_num>).

Assigning a PMC argument to a low-level argument calls the
C<get_integer>, C<get_number>, or C<get_string> vtable function on the
PMC. Assigning a low-level argument to a PMC argument calls the
C<set_integer_native>, C<set_number_native>, or C<set_string_native>
vtable function on the PMC (assign to value semantics). Two PMC
arguments are a direct C assignment (assign to container semantics).

For assign to value semantics for two PMC arguments use C<assign>, which calls
the C<assign_pmc> vtable function.

=head3 Macros

This section describes the macro layer of the PIR language. The macro layer of
the PIR compiler handles the following directives:

=over 4

=item * C<.include> '<filename>'

The C<.include> directive takes a string argument that contains the
name of the PIR file that is included. The contents of the included
file are inserted as if they were written at the point where the
C<.include> directive occurs.

The include file is searched for in the current directory and in
runtime/parrot/include, in that order. The first file of that name to
be found is included.

The C<.include> directive's search order is subject to change.

=item * C<.macro> <identifier> [<parameters>]

The C<.macro> directive starts the a macro definition named by the specified
identifier. The optional parameter list is a comma-separated list of
identifiers, enclosed in parentheses.  See C<.endm> for ending the macro
definition.

=item * C<.endm>

Closes a macro definition.

=item * C<.macro_const> <identifier> (<literal>|<reg>)

=begin PIR

 .macro_const   PI  3.14

=end PIR

The C<.macro_const> directive is a special type of macro; it allows the user
to use a symbolic name for a constant value. Like C<.macro>, the substitution
occurs at compile time. It takes two arguments (not comma separated), the
first is an identifier, the second a constant value or a register.

=back

The macro layer is completely implemented in the lexical analysis phase.
The parser does not know anything about what happens in the lexical
analysis phase.

When the C<.include> directive is encountered, the specified file is opened
and the following tokens that are requested by the parser are read from
that file.

A macro expansion is a dot-prefixed identifier. For instance, if a macro
was defined as shown below:

=begin PIR

 .macro foo(bar)
   # ...
 .endm

=end PIR

this macro can be expanded by writing C<.foo(42)>. The body of the macro
will be inserted at the point where the macro expansion is written.

A C<.macro_const> expansion is more or less the same as a C<.macro> expansion,
except that a constant expansion cannot take any arguments, and the
substitution of a C<.macro_const> contains no newlines, so it can be used
within a line of code.

=head4 Macro parameter list

The parameter list for a macro is specified in parentheses after the name of
the macro. Macro parameters are not typed.

=begin PIR

 .macro foo(bar, baz, buz)
   # ...
 .endm

=end PIR

The number of arguments in the call to a macro must match the number of
parameters in the macro's parameter list. Macros do not perform multidispatch,
so you can't have two macros with the same name but different parameters.
Calling a macro with the wrong number of arguments gives the user an error.

If a macro defines no parameter list, parentheses are optional on both the
definition and the call.  This means that a macro defined as:

=begin PIR

 .macro foo
   # ...
 .endm

=end PIR

can be expanded by writing either C<.foo> or C<.foo()>. And a macro definition
written as:

=begin PIR

 .macro foo()
   # ...
 .endm

=end PIR

can also be expanded by writing either C<.foo> or C<.foo()>.

B<Note: IMCC requires you to write parentheses if the macro was declared with
(empty) parentheses. Likewise, when no parentheses were written (implying an
empty parameter list), no parentheses may be used in the expansion.>

=over

=item * Heredoc arguments

Heredoc arguments are not allowed when expanding a macro.
This means that, currently, when using IMCC, the following is not allowed:

=begin PIR_TODO

   .macro foo(bar)
   ...
   .endm

   .foo(<<'EOS')
 This is a heredoc
    string.

EOS

=end PIR_TODO

Using braces, { }, allows you to span multiple lines for an argument.
See runtime/parrot/include/hllmacros.pir for examples and possible usage.
A simple example is this:

=begin PIR

 .macro foo(a,b)
   .a
   .b
 .endm

 .sub main
   .foo({ print "1"
          print "2"
        }, {
          print "3"
          print "4"
        })
 .end

=end PIR

This will expand the macro C<foo>, after which the input to the PIR parser is:

=begin PIR

 .sub main
   print "1"
   print "2"
   print "3"
   print "4"
 .end

=end PIR

which will result in the output:

 1234

=back

=head4 Unique local labels

Within the macro body, the user can declare a unique label identifier using
the value of a macro parameter, like so:

=begin PIR

  .macro foo(a)
    # ...
 .label $a:
    # ...
  .endm

=end PIR

=head4 Unique local variables I<(not yet implemented)>

B<Note: This is not yet implemented in IMCC>.

Within the macro body, the user can declare a local variable with a unique
name.

=begin PIR

  .macro foo()
    # ...
  .macro_local int b
    # ...
  .b = 42
  print .b # prints the value of the unique variable (42)
    # ...
  .endm

=end PIR

The C<.macro_local> directive declares a local variable with a unique name in
the macro. When the macro C<.foo()> is called, the resulting code that is
given to the parser will read as follows:

=begin PIR

  .sub main
    .local int local__foo__b__2
      # ...
    local__foo__b__2 = 42
    print local__foo__b__2

  .end

=end PIR

The user can also declare a local variable with a unique name set to the
symbolic value of one of the macro parameters.

=begin PIR

  .macro foo(b)
    # ...
  .macro_local int $b
    # ...
  .$b = 42
  print .$b # prints the value of the unique variable (42)
  print .b  # prints the value of parameter "b", which is
            # also the name of the variable.
  #  ...
  .endm

=end PIR

So, the special C<$> character indicates whether the symbol is interpreted as
just the value of the parameter, or that the variable by that name is meant.
Obviously, the value of C<b> should be a string.

The automatic name munging on C<.macro_local> variables allows for using
multiple macros, like so:

=begin PIR_TODO

  .macro foo(a)
  .macro_local int $a
  .endm

  .macro bar(b)
  .macro_local int $b
  .endm

  .sub main
    .foo("x")
    .bar("x")
  .end

=end PIR_TODO

This will result in code for the parser as follows:

=begin PIR

  .sub main
    .local int local__foo__x__2
    .local int local__bar__x__4
  .end

=end PIR

Each expansion is associated with a unique number; for labels declared with
C<.macro_label> and locals declared with C<.macro_local> expansions, this
means that multiple expansions of a macro will not result in conflicting
label or local names.

=head4 Ordinary local variables

Defining a non-unique variable can still be done, using the normal syntax:

=begin PIR

  .macro foo(b)
  .local int b
  .macro_local int $b
  .endm

=end PIR

When invoking the macro C<foo> as follows:

=begin PIR_FRAGMENT

  .macro foo(b)
    #...
  .endm

  .foo("x")

=end PIR_FRAGMENT

there will be two variables: C<b> and C<x>. When the macro is invoked twice:

=begin PIR_TODO

  .sub main
    .foo("x")
    .foo("y")
  .end

=end PIR_TODO

the resulting code that is given to the parser will read as follows:

=begin PIR

  .sub main
    .local int b
    .local int local__foo__x
    .local int b
    .local int local__foo__y
  .end

=end PIR

Obviously, this will result in an error, as the variable C<b> is defined
twice.  If you intend the macro to create unique variables names, use
C<.macro_local> instead of C<.local> to take advantage of the name munging.

=head2 Examples

=head3 Subroutine Definition

A simple subroutine, marked with C<:main>, indicating it's the entry point
in the file. Other sub modifiers include C<:load>, C<:init>, etc.

=begin PIR

    .sub sub_label :main
      .param int a
      .param int b
      .param int c

      # ...
      .local pmc xy
      .return(xy)
    .end

=end PIR

=head3 Subroutine Call

Invocation of a subroutine. In this case a continuation subroutine is
created.

=begin PIR_FRAGMENT

    .const "Sub" $P0 = "sub_label"
    $P1 = new 'Continuation'
    set_addr $P1, ret_addr
    # ...
    .local int x
    .local num y
    .local string z
    .begin_call
      .set_arg x
      .set_arg y
      .set_arg z
      .call $P0, $P1    # r = _sub_label(x, y, z)
  ret_addr:
      .local int r      # optional - new result var
      .get_result r
    .end_call

=end PIR_FRAGMENT

=head3 Subroutine Call Syntactic Sugar

Below there are three different ways to invoke the subroutine C<sub_label>.
The first retrieves a single return value, the second retrieves 3 return
values, whereas the last does not save any return values.

=begin PIR_FRAGMENT

  .local int r0, r1, r2
  r0 = sub_label($I0, $I1, $I2)
  (r0, r1, r2) = sub_label($I0, $I1, $I2)
  sub_label($I0, $I1, $I2)

=end PIR_FRAGMENT

This also works for NCI calls, as the subroutine PMC will be
a NCI sub, and on invocation will do the Right Thing.

Instead of the label a subroutine object can be used too:

=begin PIR_FRAGMENT_TODO

   get_global $P0, "sub_label"
   $P0(args)

=end PIR_FRAGMENT_TODO

=head3 Methods

=begin PIR_TODO

  .namespace [ "Foo" ]

  .sub _sub_label :method [,Subpragma, ...]
    .param int a
    .param int b
    .param int c
    # ...
    self."_other_meth"()
    # ...
    .begin_return
    .set_return xy
    .end_return
    ...
  .end

=end PIR_TODO

The variable "self" automatically refers to the invocating object, if the
subroutine declaration contains "method".

=head3 Calling Methods

The syntax is very similar to subroutine calls. The call is done with
C<.call> immediately preceded by C<.invocant>:

=begin PIR_FRAGMENT_TODO

   .local int x, y, z
   .local pmc class, obj
   newclass class, "Foo"
   new obj, class
   .begin_call
   .set_arg x
   .set_arg y
   .set_arg z
   .invocant obj
   .call "method" [, $P1 ] # r = obj."method"(x, y, z)
   .local int r  # optional - new result var
   .get_result r
   .end_call
   ...

=end PIR_FRAGMENT_TODO


The return continuation is optional. The method can be a string
constant or a string variable.

=head3 Returning and Yielding

  .return ( a, b )      # return the values of a and b

  .return ()            # return no value

  .tailcall func_call()   # tail call function

  .tailcall o."meth"()    # tail method call

Similarly, one can yield using the .yield directive

  .yield ( a, b )      # yield with the values of a and b

  .yield ()            # yield with no value


=head2 Implementation

There are multiple implementations of PIR, each of which will meet this
specification for the syntax. Currently there are the following
implementations:

=over 4

=item * compilers/imcc

This is the current implementation being used in Parrot. Some of the
specified syntactic constructs in this PDD are not implemented in
IMCC; these constructs are marked with notes saying so.

=back

=head2 References

None.

=cut

__END__
Local Variables:
  fill-column:78
End: