1# Copyright (C) 2001-2014, Parrot Foundation.
2
3=head1 [DRAFT] PDD 13: Bytecode
4
5=head2 Abstract
6
7This PDD describes the file format for Parrot Bytecode (PBC) files and the
8interface through which they may be manipulated programmatically.
9
10=head2 Synopsis
11
12Parrot bytecode is a binary representation of instructions and data for
13execution on the virtual machine.
14
15=head2 Description
16
17PBC, Parrot bytecode, is the binary format used internally by the Parrot VM to
18store the data necessary to execute a compiled PIR program.  The sequence of
19instructions making up a Parrot program, a constants table, an annotations
20table and any ancillary data are stored in a PBC.  These files usually have
21the extension C<.pbc>.
22
23The PBC format is designed so that any valid PBC file can be read and executed
24by Parrot on any platform, but may be encoded more optimally for a particular
25platform.
26
27It is possible to add arbitrary annotations to the instruction sequence, for
28example line numbers in high level languages and other debug data.
29
30PMCs are be used to represent packfiles and packfile segments to provide a
31programmatic interface, both to Parrot programs and Parrot internals.
32
33=head2 Implementation
34
35=head3 Packfiles
36
37This section of the documentation describes the format of Parrot packfiles.
38These contain the bytecode (sequence of instructions), constants table, fixup
39table, debug data, annotations and possibly more.
40
41Note that, unless otherwise stated, all offsets and lengths are given in terms
42of Parrot opcodes, not bytes. An opcode corresponds to the word size, defined
43as long. The ptrsize is silently assumed to be the same as the opcode size.
44
45
46=head4 Packfile Header
47
48PBC files start with a variable length header. All data in this header is
49stored as strings or in a single byte so endianness and word size need not be
50considered when reading it.
51
52Note that in this section only, offsets and lengths are in bytes.
53
54  +--------+--------+--------------------------------------------------------+
55  | Offset | Length | Description                                            |
56  +--------+--------+--------------------------------------------------------+
57  | 0      | 8      | 0xFE 0x50 0x42 0x43 0x0D 0x0A 0x1A 0x0A                |
58  |        |        | Parrot "Magic String" to identify a PBC file. In C,    |
59  |        |        | this is the string C<\376PBC\r\n\032\n> or             |
60  |        |        | C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a>.                   |
61  +--------+--------+--------------------------------------------------------+
62  | 8      | 1      | Word size in bytes of words making up the segments of  |
63  |        |        | the PBC file. Must be one of:                          |
64  |        |        |    0x04 - 4 byte (32-bit) words                        |
65  |        |        |    0x08 - 8 byte (64-bit) words                        |
66  +--------+--------+--------------------------------------------------------+
67  | 9      | 1      | Byte order within the words making up the segments of  |
68  |        |        | the PBC file. Must be one of:                          |
69  |        |        |    0x00 - Little Endian                                |
70  |        |        |    0x01 - Big Endian                                   |
71  +--------+--------+--------------------------------------------------------+
72  | 10     | 1      | The encoding of floating point numbers in the file.    |
73  |        |        | Must be one of:                                        |
74  |        |        |    0x00 - IEEE 754 8 byte double                       |
75  |        |        |    0x01 - i386 little endian 12 byte long double       |
76  |        |        |    0x02 - IEEE 754 16 byte long double                 |
77  +--------+--------+--------------------------------------------------------+
78  | 11     | 1      | Major version number of the version of Parrot that     |
79  |        |        | wrote this bytecode file. For example, if Parrot 0.9.5 |
80  |        |        | wrote it, this byte would have the value 0.            |
81  +--------+--------+--------------------------------------------------------+
82  | 12     | 1      | Minor version number of the version of Parrot that     |
83  |        |        | wrote this bytecode file. For example, if Parrot 0.9.5 |
84  |        |        | wrote it, this byte would have the value 9.            |
85  +--------+--------+--------------------------------------------------------+
86  | 13     | 1      | Patch version number of the version of Parrot that     |
87  |        |        | wrote this bytecode file. For example, if Parrot 0.9.5 |
88  |        |        | wrote it, this byte would have the value 5.            |
89  +--------+--------+--------------------------------------------------------+
90  | 14     | 1      | Major version number of the bytecode file format. See  |
91  |        |        | the section below on bytecode file format version      |
92  |        |        | numbers.                                               |
93  +--------+--------+--------------------------------------------------------+
94  | 15     | 1      | Minor version number of the bytecode file format. See  |
95  |        |        | the section below on bytecode file format version      |
96  |        |        | numbers.                                               |
97  +--------+--------+--------------------------------------------------------+
98  | 16     | 1      | The type of the UUID associated with this packfile.    |
99  |        |        | Must be one of:                                        |
100  |        |        |    0x00 - No UUID                                      |
101  |        |        |    0x01 - MD5                                          |
102  +--------+--------+--------------------------------------------------------+
103  | 17     | 1      | Length of the UUID associated with this packfile. May  |
104  |        |        | be zero if the type of the UUID is 0x00. Maximum       |
105  |        |        | value is 255.                                          |
106  +--------+--------+--------------------------------------------------------+
107  | 18     | u      | A UUID of u bytes in length, where u was specified as  |
108  |        |        | the length of the UUID in the previous field. Be sure  |
109  |        |        | that UUIDs are stored and read as strings. The UUID is |
110  |        |        | computed by applying the hash function specified in    |
111  |        |        | the UUID type field over the entire packfile not       |
112  |        |        | including this header and its trailing zero padding.   |
113  +--------+--------+--------------------------------------------------------+
114  | 18 + u | n      | Zero-padding to make the total header length a         |
115  |        |        | multiple of 16 bytes in length.                        |
116  |        |        |    n = u % 16 ? 16 - (u % 16) : 0                      |
117  +--------+--------+--------------------------------------------------------+
118
119Everything beyond the header is an opcode, with word length and byte ordering
120as defined in the header.  If the word length and byte ordering of the machine
121that is reading the PBC file do not match these, it needs to transform the
122words making up the rest of the packfile.
123
124=over 4
125
126=item * Bytecode File Version Numbers
127
128The bytecode file version number exists to decouple the format of the bytecode
129file from the version of the Parrot implementation that is reading/writing it.
130It has a major and a minor part.
131
132The major version number should be incremented whenever there is a change to
133the layout of bytecode files. This includes new segments, changes to segment
134headers or changes to the format of the data held within a segment.
135
136The minor version number should be incremented in all other cases when a
137change is made that means a previous version of Parrot would not be able to
138run the program encoded in the packfile. This includes:
139
140=over 4
141
142=item Opcode renumbering
143
144=item Addition of new opcodes and removal of existing ones
145
146=item Addition of new core PMCs and removal of existing ones
147
148=item Changes to the interface (externally visible behaviour) of an opcode or
149PMC
150
151=back
152
153Parrot currently exits when reading an incompatible bytecode file
154version number. It is possible for a single version of Parrot to support
155reading and writing more than one bytecode file format, but this is not
156currently implemented. Future versions of Parrot may also provide a
157bytecode migration tool, to convert a bytecode file to a more recent
158format.
159
160The bytecode format versions are listed in the PBC_COMPAT file, sorted
161with the latest version first in the file:
162
163  MAJOR.MINOR DATE NAME DESCRIPTION
164
165=back
166
167We should be aware that some systems such as a Sparc/PPC 64-bit use strict
1688-byte ptr_alignment per default, and all C<(opcode_t*)cursor++> or
169C<(opcode_t*)cursor +=> advances must ensure that the cursor ptr is 8-byte
170aligned. We enforce 16-byte alignment at the start and end of all segments
171and ptrsize alignment for all items (strings, integers, and opcode_t ops),
172but not in-between, esp. with 4-byte integers and 4-byte opcode_t pointers.
173
174So we relax pointer alignment strictness on Sparc64, but may add a
175C<--64compat> option to parrot in the future to produce 8-byte aligned data.
176Operations on aligned pointers are much faster than on un-aligned pointers.
177
178
179=head4 Directory Format Header
180
181Packfiles contain a directory describing the segments that it contains.
182This header specifies the format of the directory.
183
184  +--------+--------+--------------------------------------------------------+
185  | Offset | Length | Description                                            |
186  +--------+--------+--------------------------------------------------------+
187  | 0      | 1      | The format of the directory. Must be:                  |
188  |        |        |    0x01 - Directory Format 1                           |
189  +--------+--------+--------------------------------------------------------+
190  | 1      | 3      | Must be:                                               |
191  |        |        |    0x00 0x00 0x00 - Reserved                           |
192  +--------+--------+--------------------------------------------------------+
193
194Currently only C<Format 1> exists. In the future, the format of the
195directory may change. A single version of Parrot may then become capable of
196generating and reading files of more than one directory format. This header
197enables Parrot to detect whether it is able to read the directory segment in
198the packfile.
199
200This header must be followed immediately by a directory segment.
201
202
203=head4 Packfile Segment Header
204
205All segments, regardless of type, start with a 1 opcode segment header. All
206other segments below are prefixed with this.
207
208  +--------+--------+--------------------------------------------------------+
209  | Offset | Length | Description                                            |
210  +--------+--------+--------------------------------------------------------+
211  | 0      | 1      | The total size of the segment in opcodes, including    |
212  |        |        | this header.                                           |
213  +--------+--------+--------------------------------------------------------+
214  | 1      | 1      | Internal type of the segment                           |
215  +--------+--------+--------------------------------------------------------+
216  | 2      | 1      | Internal id                                            |
217  +--------+--------+--------------------------------------------------------+
218  | 3      | 1      | Size of the following op array, 0 if none              |
219  +--------+--------+--------------------------------------------------------+
220
221
222=head4 Segment Padding
223
224All segments must have trailing zero (NULL) values appended so they are a
225multiple of 16 bytes in length. (This allows wordsize support of up to
226128 bits.)
227
228
229=head4 Directory Segment
230
231This segment lists the other segments that make up the packfile and where in
232the file they are located. It must occur immediately after the directory
233format header. Only one of these segments may occur in a packfile. In the
234future, a hierarchy of directories may be allowed.
235
236The directory segment adds one additional header after the standard packfile
237header data, which specifies the number of entries in the directory.
238
239  +--------+--------+--------------------------------------------------------+
240  | Offset | Length | Description                                            |
241  +--------+--------+--------------------------------------------------------+
242  | 1      | 1      | The number of entries in the directory.                |
243  +--------+--------+--------------------------------------------------------+
244
245Following this are C<n> variable length entries formatted as described in the
246following table. Offsets are in words, but are given relative to the start of
247an individual entry.
248
249  +--------+--------+--------------------------------------------------------+
250  | Offset | Length | Description                                            |
251  +--------+--------+--------------------------------------------------------+
252  | 0      | 1      | The type of the segment. Must be one of the following: |
253  |        |        |    0x00 - Reserved (Directory Segment)                 |
254  |        |        |    0x01 - Default Segment                              |
255  |        |        |    0x02 - Fixup Segment                                |
256  |        |        |    0x03 - Constant Table Segment                       |
257  |        |        |    0x04 - Bytecode Segment                             |
258  |        |        |    0x05 - PIR Debug Segment                            |
259  |        |        |    0x06 - Annotations Segment                          |
260  +--------+--------+--------------------------------------------------------+
261  | 1      | n      | The name of the segment, as a (NULL terminated) ASCII  |
262  |        |        | C string. This must be padded with trailing NULL       |
263  |        |        | (zero) values to be a full word in size.               |
264  +--------+--------+--------------------------------------------------------+
265  | n + 1  | 1      | The offset to the segment, relative to the start of    |
266  |        |        | the packfile. Specified as a number of words, where    |
267  |        |        | the word size is that specified in the header. (Parrot |
268  |        |        | may need to do some computation to transform this to   |
269  |        |        | an offset in terms of its own word size.) As segments  |
270  |        |        | must always be aligned on 16-byte boundaries, this     |
271  |        |        | scheme scales up to 128-bit platforms.                 |
272  +--------+--------+--------------------------------------------------------+
273  | n + 2  | 1      | The length of the segment, including its header, in    |
274  |        |        | words. This must match the length stored at the start  |
275  |        |        | of the header of the segment the entry is describing.  |
276  +--------+--------+--------------------------------------------------------+
277
278
279=head4 Default Segment
280
281The default segment has no additional headers. It will, if possible, be memory
282mapped. More than one may exist in the packfile, and they are identified by
283name. They may be used for storing any data that does not fit into any other
284segment, for example the source code from a high level language (HLL).
285
286
287=head4 Bytecode Segment
288
289This segment has no additional headers. It stores a stream of instructions in
290bytecode format, with the length given in the last field of the segment
291header.
292
293Instructions have variable length. Each instruction starts with an operation
294code (opcode).
295
296  +--------+--------+--------------------------------------------------------+
297  | Offset | Length | Description                                            |
298  +--------+--------+--------------------------------------------------------+
299  | 0      | 1      | A valid Parrot opcode, as specified in the opcode      |
300  |        |        | list include/parrot/oplib/ops.h.                       |
301  +--------+--------+--------------------------------------------------------+
302
303Zero or more operands follow the opcode. All opcodes take a fixed number of
304operands.  An individual operand is always one word in length and may be of
305one of the following forms.
306
307  +------------------+-------------------------------------------------------+
308  | Operand Type     | Description                                           |
309  +------------------+-------------------------------------------------------+
310  | Register         | An integer specifying a register number.              |
311  +------------------+-------------------------------------------------------+
312  | Integer Constant | An integer that is the constant itself. That is, the  |
313  |                  | constant is stored directly in the instruction        |
314  |                  | stream. Storing integer constants of length greater   |
315  |                  | than 32 bits has undefined behaviour and should be    |
316  |                  | considered unportable.                                |
317  +------------------+-------------------------------------------------------+
318  | Number Constant  | An index into the constants table.                    |
319  +------------------+-------------------------------------------------------+
320  | String Constant  | An index into the constants table.                    |
321  +------------------+-------------------------------------------------------+
322  | PMC Constant     | An index into the constants table.                    |
323  +------------------+-------------------------------------------------------+
324
325
326=head4 Constants Segment
327
328This segment stores number, string and PMC constants.
329
330The first element is the number of constants contained.
331
332  +--------+--------+--------------------------------------------------------+
333  | Offset | Length | Description                                            |
334  +--------+--------+--------------------------------------------------------+
335  | 2      | 1      | The number of constants in the table.                  |
336  +--------+--------+--------------------------------------------------------+
337
338Following this are C<n> constants, each with a single word header specifying
339the type of constant that follows.
340
341  +--------+--------+--------------------------------------------------------+
342  | Offset | Length | Description                                            |
343  +--------+--------+--------------------------------------------------------+
344  | 0      | 1      | The type of the constant. Must be one of:              |
345  |        |        |    0x00 - No constant                                  |
346  |        |        |    0x6E - Number constant (ASCII 'n')                  |
347  |        |        |    0x73 - String constant (ASCII 's')                  |
348  |        |        |    0x70 - PMC constant (ASCII 'p')                     |
349  |        |        |    0x6B - Key constant (ASCII 'k')                     |
350  +--------+--------+--------------------------------------------------------+
351
352All constants that are not a multiple of the word size in length must be
353padded with trailing zero bytes up to a word size boundary.
354
355=over 4
356
357=item * Number Constants
358
359The number is stored in the format defined in the Packfile header. Any padding
360that is needed will follow.
361
362=item * String Constants
363
364String constants are stored in the following format, with offsets relative to
365the start of the constant including its type.
366
367  +--------+--------+--------------------------------------------------------+
368  | Offset | Length | Description                                            |
369  +--------+--------+--------------------------------------------------------+
370  | 1      | 1      | Flags, copied from the string structure.               |
371  +--------+--------+--------------------------------------------------------+
372  | 2      | 1      | Character set; either the index of a built-in one or a |
373  |        |        | dynamically loaded one whose index is in a range given |
374  |        |        | in the dependencies table. Note that dynamically       |
375  |        |        | loaded character sets are not currently supported.     |
376  +--------+--------+--------------------------------------------------------+
377  | 3      | 1      | Encoding, either the index of a built-in one or a      |
378  |        |        | dynamically loaded one whose index is in a range given |
379  |        |        | in the dependencies table. Note that dynamically       |
380  |        |        | loaded encodings are not currently supported.          |
381  +--------+--------+--------------------------------------------------------+
382  | 4      | 1      | Length of the string data in bytes.                    |
383  +--------+--------+--------------------------------------------------------+
384  | 5      | n      | String data with trailing zero padding as required.    |
385  +--------+--------+--------------------------------------------------------+
386
387Note: The encoding and charset are currently packed together with the Flags,
388using an unique field of Length 1.
389
390
391=item * PMC Constants
392
393PMCs that can be saved in packfiles as constants implement the freeze and thaw
394vtable functions. Their frozen data is placed in a string, stored in the same
395format as a string constant.
396
397=item * Key Constants
398
399Key constants are made up a number of components, where one component is a
400"dimension" in the key. The number of components in the key is stored at the
401start of the constant.
402
403  +--------+--------+--------------------------------------------------------+
404  | Offset | Length | Description                                            |
405  +--------+--------+--------------------------------------------------------+
406  | 1      | 1      | Number of key components that follow.                  |
407  +--------+--------+--------------------------------------------------------+
408
409Following this are C<n> entries of two words each that specify the key's
410type and value. The key value may be a register or another constant, but not
411another key constant. All constants other than integer constants are indexes
412into the constants table.
413
414  +--------+--------+--------------------------------------------------------+
415  | Offset | Length | Description                                            |
416  +--------+--------+--------------------------------------------------------+
417  | 0      | 1      | Type of the key. Must be one of:                       |
418  |        |        |    0x00 - Integer register                             |
419  |        |        |    0x01 - String register                              |
420  |        |        |    0x02 - PMC register                                 |
421  |        |        |    0x03 - Number register                              |
422  |        |        |    0x10 - Integer constant                             |
423  |        |        |    0x11 - String constant (constant table index)       |
424  |        |        |    0x12 - PMC constant (constant table index)          |
425  |        |        |    0x13 - Number constant (constant table index)       |
426  +--------+--------+--------------------------------------------------------+
427  | 1      | 1      | Value of the key.                                      |
428  +--------+--------+--------------------------------------------------------+
429
430=back
431
432=head4 Fixup Segment
433
434The fixup segment maps names of subs to offsets in the bytecode stream.
435
436The number of fixup table entries, n, is given by the last field of the
437segment header.
438
439This is followed by n fixup table entries, of variable length, that take the
440following form.
441
442  +--------+--------+--------------------------------------------------------+
443  | Offset | Length | Description                                            |
444  +--------+--------+--------------------------------------------------------+
445  | 0      | 1      | Type of the fixup. Must be:                            |
446  |        |        |    0x01 - Subroutine fixup constant string             |
447  |        |        |    0x02 - Subroutine fixup ascii string                |
448  +--------+--------+--------------------------------------------------------+
449  | 1      | -      | The label that is being fixed up. A string constant,   |
450  |        |        | stored as an index into the constants table in the 01  |
451  |        |        | case, a NULL terminated ASCII string padded to word    |
452  |        |        | length with zeroes in the 02.                          |
453  +--------+--------+--------------------------------------------------------+
454  | -      | 1      | This is an index into the constants table for the sub  |
455  |        |        | PMC corresponding to the label.                        |
456  +--------+--------+--------------------------------------------------------+
457
458
459=head4 PIR Debug Segment
460
461This segment stores a list of mappings between offsets in the bytecode and
462filenames, indicating that the bytecode from that point on until the next
463entry was generated from the PIR found in the given filename
464
465The segment begins with an opcode with n, the number of file mappings.  Then
466come n mappings:
467
468  +--------+--------+--------------------------------------------------------+
469  | Offset | Length | Description                                            |
470  +--------+--------+--------------------------------------------------------+
471  | 0      | 1      | Offset in the bytecode.                                |
472  +--------+--------+--------------------------------------------------------+
473  | 1      | 1      | A string constant holding the filename, stored as an   |
474  |        |        | index into the constants table.                        |
475  +--------+--------+--------------------------------------------------------+
476
477
478=head4 Annotations Segment
479
480Annotations allow any instruction in the bytecode stream to have zero or more
481key/value pairs associated with it. These can be retrieved at runtime. High
482level languages can use annotations to store file names, line numbers, column
483numbers and any other data, for debug purposes or otherwise, that they need.
484
485The segment comes in three parts:
486
487=over 4
488
489=item A list of annotation keys (for example, "line" and "file").
490
491=item An annotation groups table, used to group together annotations for a
492particular HLL source file (an annotation group starting clears all active
493annotations, so they will not spill over between source files; it also
494allows for faster lookup of annotations).
495
496{{ TODO: Does it clear all annotations, or all annotation groups? }}
497
498=item A list of indexes into the bytecode stream and key/value pairings (for
499example, starting at instruction 235, the annotation "line" has value "42").
500
501=back
502
503The last field of the segment header is not used.
504
505The first word in the segment supplies the number of keys.
506
507  +--------+--------+--------------------------------------------------------+
508  | Offset | Length | Description                                            |
509  +--------+--------+--------------------------------------------------------+
510  | 1      | 1      | Number of annotation key entries that follow.          |
511  |        |        |    n                                                   |
512  +--------+--------+--------------------------------------------------------+
513
514Following this are C<n> annotation key entries. There is one entry per key
515(such as "line" or "file"), but the bytecode may be annotated many times
516with that key. Key entries take the following format.
517
518  +--------+--------+--------------------------------------------------------+
519  | Offset | Length | Description                                            |
520  +--------+--------+--------------------------------------------------------+
521  | 0      | 1      | Index into the constants table of a string containing  |
522  |        |        | the name of the key.                                   |
523  +--------+--------+--------------------------------------------------------+
524  | 1      | 1      | The type of value that is stored with the key.         |
525  |        |        |    0x00 - Integer                                      |
526  |        |        |    0x01 - String Constant                              |
527  |        |        |    0x02 - Number Constant                              |
528  |        |        |    0x03 - PMC Constant                                 |
529  +--------+--------+--------------------------------------------------------+
530
531The annotation groups table comes next. This starts with a single integer to
532specify the number of entries in the table.
533
534  +--------+--------+--------------------------------------------------------+
535  | Offset | Length | Description                                            |
536  +--------+--------+--------------------------------------------------------+
537  | 1      | 1      | Number of annotation group entries that follow.        |
538  +--------+--------+--------------------------------------------------------+
539
540A group entry maps an offset in the bytecode segment to an offset in the list
541of annotations (that is, offset 0 refers to the first word following this
542table). The list of offsets into the bytecode segment (and by the definition
543of this segment, the offsets into the annotations list) must be in ascending
544order.
545
546  +--------+--------+--------------------------------------------------------+
547  | Offset | Length | Description                                            |
548  +--------+--------+--------------------------------------------------------+
549  | 0      | 1      | Offset into the bytecode segment where the             |
550  |        |        | instructions for a particular high level source file   |
551  |        |        | start.                                                 |
552  +--------+--------+--------------------------------------------------------+
553  | 1      | 1      | Offset into the annotations list specifying where the  |
554  |        |        | annotations for the given instruction start.           |
555  +--------+--------+--------------------------------------------------------+
556
557The rest of the segment is made up of a sequence of bytecode offset to key and
558value mappings. First comes the number of them that follow:
559
560  +--------+--------+--------------------------------------------------------+
561  | Offset | Length | Description                                            |
562  +--------+--------+--------------------------------------------------------+
563  | 1      | 1      | Number of bytecode to keypair mappings that follow.    |
564  |        |        |    n                                                   |
565  +--------+--------+--------------------------------------------------------+
566
567Then there are n entries of the following format:
568
569  +--------+--------+--------------------------------------------------------+
570  | Offset | Length | Description                                            |
571  +--------+--------+--------------------------------------------------------+
572  | 0      | 1      | Offset into the bytecode segment, in words, of the     |
573  |        |        | instruction being annotated. At runtime, this will     |
574  |        |        | correspond to the program counter.                     |
575  +--------+--------+--------------------------------------------------------+
576  | 1      | 1      | The key of the annotation, specified as an index into  |
577  |        |        | the zero-based list of keys specified in the first     |
578  |        |        | part of the segment. That is, if key "line" was the    |
579  |        |        | first entry and "file" the second, they would have     |
580  |        |        | indices 0 and 1 respectively.                          |
581  +--------+--------+--------------------------------------------------------+
582  | 2      | 2      | The value of the annotation. If the annotation type    |
583  |        |        | (specified with the key) is an integer, the value is   |
584  |        |        | placed directly into this word. Otherwise, an index    |
585  |        |        | into the constants table is used.                      |
586  +--------+--------+--------------------------------------------------------+
587
588Note that the value of an annotation with a particular key is taken to apply
589to all following instructions up to the point of a new value being specified
590for that key with another annotation. This means that if 20 instructions make
591up the compiled form of a single line of code, only one line annotation is
592required. Note that this also implies that annotations must be placed in
593the same order as the instructions.
594
595=head3 Packfile PMCs
596
597A packfile can be represented in memory by Parrot as a tree of PMCs. These
598provide a programmatic way to construct and walk packfiles, both for the
599Parrot internals and from programs running on the Parrot VM.
600
601{{ TODO... ManagedStruct and UnmanagedStruct may be helpful for these;
602consider switching these PMCs over to use them at some point. }}
603
604
605=head4 Packfile.pmc
606
607This PMC represents the packfile overall. It will be constructed by the VM
608when reading a packfile. It implements the following methods and vtable
609functions.
610
611=over 4
612
613=item * C<get_string> (vtable)
614
615Serializes this packfile data structure into a bytestream ready to be written
616to disk (that is, maps from PMCs to on-disk representation).
617
618=item * C<set_string_native> (vtable)
619
620Takes a string containing an entire packfile in the on-disk format, attempts
621to unpack it into a tree of Packfile PMCs and sets this Packfile PMC to
622represent the top of that tree (that is, maps from on-disk representation to a
623tree of PMCs).
624
625=item * C<get_integer_keyed_str> (vtable)
626
627Used to get data about fields in the header that have an integer value. Valid
628keys are:
629
630=over 4
631
632=item wordsize
633
634=item byteorder
635
636=item fptype
637
638=item version_major
639
640=item version_minor
641
642=item version_patch
643
644=item bytecode_major
645
646=item bytecode_minor
647
648=item uuid_type
649
650=back
651
652=item * C<get_string_keyed_str> (vtable)
653
654Used to get data about fields in the header that have a string value. Valid
655keys are:
656
657=over 4
658
659=item uuid
660
661=back
662
663=item * C<set_integer_keyed_str> (vtable)
664
665Used to set fields in the packfile header. Some fields are not allowed to be
666written since they are determined by the VM when serializing the packfile for
667storage on disk. The fields that may be set are:
668
669=over 4
670
671=item version_major
672
673=item version_minor
674
675=item version_patch
676
677=item uuid_type
678
679=back
680
681Be very careful when setting a version number; you should usually trust the VM
682to do the right thing with this.
683
684Setting the uuid_type will not result in immediate re-computation of the
685UUID, but rather will only cause it to be computed using the selected
686algorithm when the packfile is serialized (by calling the C<get_string>
687vtable function). Setting an invalid uuid_type value will cause an exception
688to be thrown immediately.
689
690=item * C<get_directory()>
691
692Returns the PackfileDirectory PMC that represents the directory segment at the
693start of the packfile.
694
695=back
696
697=head4 PackfileSegment.pmc
698
699An abstract PMC that is the base class for all other segments. It has two
700abstract methods, which are to be implemented by all subclasses. They will not
701be listed under the method list for other segment PMCs to save space.
702
703=over 4
704
705=item * C<STRING* pack()>
706
707Packs the segment into the on-disk format and returns a string holding it.
708
709=item * C<unpack(STRING*)>
710
711Takes the packed representation for a segment of the given type and then
712unpacks it, setting this PMC to represent that segment as a result of the
713unpacking. If an error occurs during the unpacking process, an exception will
714be thrown.
715
716=back
717
718=head4 PackfileDirectory.pmc (isa PackfileSegment)
719
720This PMC represents a directory segment. Essentially it is an hash of
721PackfileSegment PMCs. It implements the following methods:
722
723=over 4
724
725=item * C<elements> (vtable)
726
727Gets the number of segments listed in the directory.
728
729=item * C<get_pmc_keyed_str> (vtable)
730
731Searches the directory for a segment with the given name and, if one exists,
732returns a PackfileSegment PMC (or one of its subclasses) representing it.
733
734=item * C<set_pmc_keyed_str> (vtable)
735
736Adds a PackfileSegment PMC (or a subclass of it) to the directory with the
737name specified by the key. This is the only way to add another segment to the
738directory. If a segment of the given name already exists in the directory, it
739will be replaced with the supplied PMC.
740
741=item * C<delete_keyed_str> (vtable)
742
743Removes the PackfileSegment PMC from the directory which has the name
744specified by the key.  This is the only way to remove a segment from the
745directory.
746
747=item * C<get_iter> (vtable)
748
749Returns iterator for existing keys.
750
751=back
752
753=head4 PackfileRawSegment.pmc (isa PackfileSegment)
754
755This PMC presents a segment of a packfile as an array of integers. This is the
756lowest possible level of access to a segment, and covers both the default and
757bytecode segment types. It implements the following methods:
758
759=over 4
760
761=item * C<get_type>
762
763Get type of PackfileRawSegment.
764
765=item * C<set_type>
766
767Set type of PackfileRawSegment.
768
769=item * C<get_iter>
770
771Returns iterator for Segment.
772
773=item * C<get_integer_keyed_int> (vtable)
774
775Reads the integer at the specified offset into the segment, excluding the data
776in the common segment header but including the data making up additional
777fields in the header for a specific type of segment.
778
779=item * C<set_integer_keyed_int> (vtable)
780
781Stores an integer at the specified offset into the segment. Will throw an
782exception if the segment is memory mapped.
783
784=item * C<elements> (vtable)
785
786Gets the length of the segment in words, excluding the length of the common
787segment but including the data making up additional fields in the header for a
788specific type of segment.
789
790=back
791
792=head4 PackfileConstantTable.pmc (isa PackfileSegment)
793
794This PMC represents a constants table. It provides access to constants through
795the keyed integer interface (the interpreter may choose to access underlying
796structures directly to improve performance, however).
797
798The table of constants can be added to using the keyed set methods; it will
799grow automatically.
800
801The PMC implements the following methods:
802
803=over 4
804
805=item * C<get_iter>
806
807Returns iterator for stored Constants.
808
809=item * C<elements> (vtable)
810
811Gets the number of constants contained in the table.
812
813=item * C<get_number_keyed_int> (vtable)
814
815Gets the value of the number constant at the specified index in the constants
816table. If the constant at that position in the table is not a number, an
817exception will be thrown.
818
819=item * C<get_string_keyed_int> (vtable)
820
821Gets the value of the string constant at the specified index in the constants
822table. If the constant at that position in the table is not a string, an
823exception will be thrown.
824
825=item * C<get_pmc_keyed_int> (vtable)
826
827Gets the value of the PMC or key constant at the specified index in the
828constants table. If the constant at that position in the table is not a PMC
829or key, an exception will be thrown.
830
831=item * C<set_number_keyed_int> (vtable)
832
833Sets the value of the number constant at the specified index in the constants
834table. If the constant at that position in the table is not already a number
835constant, an exception will be thrown. If it does not exist, the table will be
836extended.
837
838=item * C<set_string_keyed_int> (vtable)
839
840Sets the value of the string constant at the specified index in the constants
841table. If the constant at that position in the table is not already a string
842constant, an exception will be thrown. If it does not exist, the table will be
843extended.
844
845=item * C<set_pmc_keyed_int> (vtable)
846
847Sets the value of the PMC or key constant at the specified index in the
848constants table. If the constant at that position in the table is not already
849a PMC or key constant, an exception will be thrown. If it does not exist, the
850table will be extended.
851
852=item * C<int get_type(int)>
853
854Returns an integer value denoting the type of the constant at the specified
855index. Possible values are:
856
857  +--------+-----------------------------------------------------------------+
858  | Value  | Constant Type                                                   |
859  +--------+-----------------------------------------------------------------+
860  | 0x00   | No Constant                                                     |
861  +--------+-----------------------------------------------------------------+
862  | 0x6E   | Number Constant                                                 |
863  +--------+-----------------------------------------------------------------+
864  | 0x73   | String Constant                                                 |
865  +--------+-----------------------------------------------------------------+
866  | 0x70   | PMC Constant                                                    |
867  +--------+-----------------------------------------------------------------+
868  | 0x6B   | Key Constant                                                    |
869  +--------+-----------------------------------------------------------------+
870
871=back
872
873=head4 PackfileFixupTable.pmc (isa PackfileSegment)
874
875This PMC provides a keyed integer interface to the fixup table. Each entry in
876the table is represented by a PackfileFixupEntry PMC. It implements the
877following methods:
878
879=over 4
880
881=item * C<get_iter> (vtable)
882
883Returns iterator for stored fixup entries.
884
885=item * C<elements> (vtable)
886
887Gets the number of entries in the fixup table.
888
889=item * C<get_pmc_keyed_int> (vtable)
890
891Gets a PackfileFixupEntry PMC for the fixup entry at the position given in
892the key. If the index is out of range, an exception will be thrown.
893
894=item * C<set_pmc_keyed_int> (vtable)
895
896Used to add a PackfileFixupEntry PMC to the fixups table or to replace an
897existing one. If the PMC that is supplied is not of type PackfileFixupEntry,
898an exception will thrown.
899
900=back
901
902=head4 PackfileFixupEntry.pmc
903
904This PMC represents an entry in the fixup table. It implements the following
905methods.
906
907=over 4
908
909=item * C<get_string> (vtable)
910
911Gets the label field of the fixup entry.
912
913=item * C<set_string_native> (vtable)
914
915Sets the label field of the fixup entry.
916
917=item * C<get_integer> (vtable)
918
919Gets the offset field of the fixup entry.
920
921=item * C<set_integer_native> (vtable)
922
923Sets the offset field of the fixup entry.
924
925=item * C<int get_type()>
926
927Gets the type of the fixup entry. See the entries table for possible fixup
928types.
929
930=item * C<set_type(int)>
931
932Sets the type of the fixup entry. See the entries table for possible fixup
933types. Specifying an invalid type will result in an exception.
934
935=back
936
937=head4 PackfileAnnotations.pmc (isa PackfileSegment)
938
939This PMC represents the bytecode annotations table. The following methods are
940implemented:
941
942=over 4
943
944=item * C<elements> (vtable)
945
946Gets the number of annotations in the table.
947
948=item * C<get_iter> (vtable)
949
950Get iterator for stored annotations.
951
952=item * C<get_pmc_keyed_int> (vtable)
953
954Gets the annotation at the specified index. If there is no annotation at that
955index, an exception will be thrown. The PMC that is returned will always be a
956PackfileAnnotation PMC.
957
958=item * C<set_pmc_keyed_int> (vtable)
959
960Sets the annotation at the specified index. If there is no annotation at that
961index, it is added to the list of annotations. An exception will be thrown
962unless all of the following conditions are met:
963
964=over 4
965
966=item - The type of the PMC passed is PackfileAnnotation
967
968=item - The entry at the previous index is defined
969
970=item - The offset of the previous entry is less than this entry
971
972=item - The offset of the next entry, if it exists, is greater than this entry
973
974=back
975
976=back
977
978=head4 PackfileAnnotation.pmc
979
980This PMC represents an individual bytecode annotation entry in the annotations
981segment. It implements the following methods:
982
983=over 4
984
985=item * C<int get_offset()>
986
987Gets the offset into the bytecode of the instruction that is being annotated.
988
989=item * C<set_offset(int)>
990
991Sets the offset into the bytecode of the instruction that is being annotated.
992
993=item * C<int get_name()>
994
995Gets the name of the annotation.
996
997=item * C<int set_name()>
998
999Sets the name of the annotation.
1000
1001=item * C<get_integer> (vtable)
1002
1003Gets the integer value of the annotation.
1004
1005=item * C<set_integer> (vtable)
1006
1007Sets the integer value of the annotation.
1008
1009=item * C<get_string> (vtable)
1010
1011Gets the string value of the annotation.
1012
1013=item * C<set_string> (vtable)
1014
1015Sets the string value of the annotation.
1016
1017=item * C<get_number> (vtable)
1018
1019Gets the number value of the annotation.
1020
1021=item * C<set_number> (vtable)
1022
1023Sets the number value of the annotation.
1024
1025=back
1026
1027=head2 Language Notes
1028
1029None.
1030
1031=head2 Attachments
1032
1033None.
1034
1035=head2 Footnotes
1036
1037=head3 Changes From Previous Versions
1038
1039A number of things in this PDD differ from the older implementation,
1040and few items with the more convenient PMC access are not yet implemented.
1041This section details these changes from the old implementation
1042and some of the reasoning behind them.
1043
1044=head4 Packfile Header
1045
1046The format of the packfile header changed completely, based upon a
1047proposal at
1048L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/1f1af615edec7449/ebfdbb5180a9d813?lnk=gst>
1049and the requirement to have a UUID. The old INT field in the previous header
1050format is used nowhere in Parrot and was removed, the parrot patch version
1051number along with the major and minor was added. The opcode type is also gone
1052due to non-use. The opcode type is always long.
1053
1054The version number now reflects the earliest version of Parrot that is capable
1055of running the bytecode file, to enable cross-version compatibility that will
1056be needed in the future.
1057
1058
1059=head4 Segment Header
1060
1061Having the type associated with the segment inside the VM is fine, but since
1062it is in the directory segment anyway it seems odd to duplicate it here. Also
1063removed the id (did not seem to be used anywhere) and the second size (always
1064computable by knowing the size of this header, so it appears redundant).
1065
1066
1067=head4 Fixup Segment
1068
1069We need to support unicode sub names, so fixup labels should be an index into
1070the constants table to the relevant string instead of just a C string as they
1071are now.
1072
1073
1074=head4 Annotations Segment
1075
1076This is new and replaces and builds upon the debug segment. See here for some
1077on-list discussion:
1078
1079L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/b0d36dafb42d96c4/4d6ad2ad2243e677?lnk=gst&rnum=2#4d6ad2ad2243e677>
1080
1081
1082=head4 Packfile PMCs
1083
1084This idea will see packfiles and segments within them being represented by
1085PMCs, easing memory management and providing an interface to packfiles for
1086Parrot programs.
1087
1088Here are mailing list comments that provide one of the motivations or hints
1089of the original proposal.
1090
1091L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/778ea0ac4c8676f7/b249306b543b040a?lnk=gst&q=packfile+PMCs&rnum=2#b249306b543b040a>
1092
1093=head2 References
1094
1095None.
1096
1097=cut
1098
1099__END__
1100Local Variables:
1101  fill-column:78
1102End:
1103vim: expandtab shiftwidth=4:
1104