1# Copyright (C) 2001-2014, Parrot Foundation. 2 3=head1 [DRAFT] PDD 13: Bytecode 4 5=head2 Abstract 6 7This PDD describes the file format for Parrot Bytecode (PBC) files and the 8interface through which they may be manipulated programmatically. 9 10=head2 Synopsis 11 12Parrot bytecode is a binary representation of instructions and data for 13execution on the virtual machine. 14 15=head2 Description 16 17PBC, Parrot bytecode, is the binary format used internally by the Parrot VM to 18store the data necessary to execute a compiled PIR program. The sequence of 19instructions making up a Parrot program, a constants table, an annotations 20table and any ancillary data are stored in a PBC. These files usually have 21the extension C<.pbc>. 22 23The PBC format is designed so that any valid PBC file can be read and executed 24by Parrot on any platform, but may be encoded more optimally for a particular 25platform. 26 27It is possible to add arbitrary annotations to the instruction sequence, for 28example line numbers in high level languages and other debug data. 29 30PMCs are be used to represent packfiles and packfile segments to provide a 31programmatic interface, both to Parrot programs and Parrot internals. 32 33=head2 Implementation 34 35=head3 Packfiles 36 37This section of the documentation describes the format of Parrot packfiles. 38These contain the bytecode (sequence of instructions), constants table, fixup 39table, debug data, annotations and possibly more. 40 41Note that, unless otherwise stated, all offsets and lengths are given in terms 42of Parrot opcodes, not bytes. An opcode corresponds to the word size, defined 43as long. The ptrsize is silently assumed to be the same as the opcode size. 44 45 46=head4 Packfile Header 47 48PBC files start with a variable length header. All data in this header is 49stored as strings or in a single byte so endianness and word size need not be 50considered when reading it. 51 52Note that in this section only, offsets and lengths are in bytes. 53 54 +--------+--------+--------------------------------------------------------+ 55 | Offset | Length | Description | 56 +--------+--------+--------------------------------------------------------+ 57 | 0 | 8 | 0xFE 0x50 0x42 0x43 0x0D 0x0A 0x1A 0x0A | 58 | | | Parrot "Magic String" to identify a PBC file. In C, | 59 | | | this is the string C<\376PBC\r\n\032\n> or | 60 | | | C<\xfe\x50\x42\x43\x0d\x0a\x1a\x0a>. | 61 +--------+--------+--------------------------------------------------------+ 62 | 8 | 1 | Word size in bytes of words making up the segments of | 63 | | | the PBC file. Must be one of: | 64 | | | 0x04 - 4 byte (32-bit) words | 65 | | | 0x08 - 8 byte (64-bit) words | 66 +--------+--------+--------------------------------------------------------+ 67 | 9 | 1 | Byte order within the words making up the segments of | 68 | | | the PBC file. Must be one of: | 69 | | | 0x00 - Little Endian | 70 | | | 0x01 - Big Endian | 71 +--------+--------+--------------------------------------------------------+ 72 | 10 | 1 | The encoding of floating point numbers in the file. | 73 | | | Must be one of: | 74 | | | 0x00 - IEEE 754 8 byte double | 75 | | | 0x01 - i386 little endian 12 byte long double | 76 | | | 0x02 - IEEE 754 16 byte long double | 77 +--------+--------+--------------------------------------------------------+ 78 | 11 | 1 | Major version number of the version of Parrot that | 79 | | | wrote this bytecode file. For example, if Parrot 0.9.5 | 80 | | | wrote it, this byte would have the value 0. | 81 +--------+--------+--------------------------------------------------------+ 82 | 12 | 1 | Minor version number of the version of Parrot that | 83 | | | wrote this bytecode file. For example, if Parrot 0.9.5 | 84 | | | wrote it, this byte would have the value 9. | 85 +--------+--------+--------------------------------------------------------+ 86 | 13 | 1 | Patch version number of the version of Parrot that | 87 | | | wrote this bytecode file. For example, if Parrot 0.9.5 | 88 | | | wrote it, this byte would have the value 5. | 89 +--------+--------+--------------------------------------------------------+ 90 | 14 | 1 | Major version number of the bytecode file format. See | 91 | | | the section below on bytecode file format version | 92 | | | numbers. | 93 +--------+--------+--------------------------------------------------------+ 94 | 15 | 1 | Minor version number of the bytecode file format. See | 95 | | | the section below on bytecode file format version | 96 | | | numbers. | 97 +--------+--------+--------------------------------------------------------+ 98 | 16 | 1 | The type of the UUID associated with this packfile. | 99 | | | Must be one of: | 100 | | | 0x00 - No UUID | 101 | | | 0x01 - MD5 | 102 +--------+--------+--------------------------------------------------------+ 103 | 17 | 1 | Length of the UUID associated with this packfile. May | 104 | | | be zero if the type of the UUID is 0x00. Maximum | 105 | | | value is 255. | 106 +--------+--------+--------------------------------------------------------+ 107 | 18 | u | A UUID of u bytes in length, where u was specified as | 108 | | | the length of the UUID in the previous field. Be sure | 109 | | | that UUIDs are stored and read as strings. The UUID is | 110 | | | computed by applying the hash function specified in | 111 | | | the UUID type field over the entire packfile not | 112 | | | including this header and its trailing zero padding. | 113 +--------+--------+--------------------------------------------------------+ 114 | 18 + u | n | Zero-padding to make the total header length a | 115 | | | multiple of 16 bytes in length. | 116 | | | n = u % 16 ? 16 - (u % 16) : 0 | 117 +--------+--------+--------------------------------------------------------+ 118 119Everything beyond the header is an opcode, with word length and byte ordering 120as defined in the header. If the word length and byte ordering of the machine 121that is reading the PBC file do not match these, it needs to transform the 122words making up the rest of the packfile. 123 124=over 4 125 126=item * Bytecode File Version Numbers 127 128The bytecode file version number exists to decouple the format of the bytecode 129file from the version of the Parrot implementation that is reading/writing it. 130It has a major and a minor part. 131 132The major version number should be incremented whenever there is a change to 133the layout of bytecode files. This includes new segments, changes to segment 134headers or changes to the format of the data held within a segment. 135 136The minor version number should be incremented in all other cases when a 137change is made that means a previous version of Parrot would not be able to 138run the program encoded in the packfile. This includes: 139 140=over 4 141 142=item Opcode renumbering 143 144=item Addition of new opcodes and removal of existing ones 145 146=item Addition of new core PMCs and removal of existing ones 147 148=item Changes to the interface (externally visible behaviour) of an opcode or 149PMC 150 151=back 152 153Parrot currently exits when reading an incompatible bytecode file 154version number. It is possible for a single version of Parrot to support 155reading and writing more than one bytecode file format, but this is not 156currently implemented. Future versions of Parrot may also provide a 157bytecode migration tool, to convert a bytecode file to a more recent 158format. 159 160The bytecode format versions are listed in the PBC_COMPAT file, sorted 161with the latest version first in the file: 162 163 MAJOR.MINOR DATE NAME DESCRIPTION 164 165=back 166 167We should be aware that some systems such as a Sparc/PPC 64-bit use strict 1688-byte ptr_alignment per default, and all C<(opcode_t*)cursor++> or 169C<(opcode_t*)cursor +=> advances must ensure that the cursor ptr is 8-byte 170aligned. We enforce 16-byte alignment at the start and end of all segments 171and ptrsize alignment for all items (strings, integers, and opcode_t ops), 172but not in-between, esp. with 4-byte integers and 4-byte opcode_t pointers. 173 174So we relax pointer alignment strictness on Sparc64, but may add a 175C<--64compat> option to parrot in the future to produce 8-byte aligned data. 176Operations on aligned pointers are much faster than on un-aligned pointers. 177 178 179=head4 Directory Format Header 180 181Packfiles contain a directory describing the segments that it contains. 182This header specifies the format of the directory. 183 184 +--------+--------+--------------------------------------------------------+ 185 | Offset | Length | Description | 186 +--------+--------+--------------------------------------------------------+ 187 | 0 | 1 | The format of the directory. Must be: | 188 | | | 0x01 - Directory Format 1 | 189 +--------+--------+--------------------------------------------------------+ 190 | 1 | 3 | Must be: | 191 | | | 0x00 0x00 0x00 - Reserved | 192 +--------+--------+--------------------------------------------------------+ 193 194Currently only C<Format 1> exists. In the future, the format of the 195directory may change. A single version of Parrot may then become capable of 196generating and reading files of more than one directory format. This header 197enables Parrot to detect whether it is able to read the directory segment in 198the packfile. 199 200This header must be followed immediately by a directory segment. 201 202 203=head4 Packfile Segment Header 204 205All segments, regardless of type, start with a 1 opcode segment header. All 206other segments below are prefixed with this. 207 208 +--------+--------+--------------------------------------------------------+ 209 | Offset | Length | Description | 210 +--------+--------+--------------------------------------------------------+ 211 | 0 | 1 | The total size of the segment in opcodes, including | 212 | | | this header. | 213 +--------+--------+--------------------------------------------------------+ 214 | 1 | 1 | Internal type of the segment | 215 +--------+--------+--------------------------------------------------------+ 216 | 2 | 1 | Internal id | 217 +--------+--------+--------------------------------------------------------+ 218 | 3 | 1 | Size of the following op array, 0 if none | 219 +--------+--------+--------------------------------------------------------+ 220 221 222=head4 Segment Padding 223 224All segments must have trailing zero (NULL) values appended so they are a 225multiple of 16 bytes in length. (This allows wordsize support of up to 226128 bits.) 227 228 229=head4 Directory Segment 230 231This segment lists the other segments that make up the packfile and where in 232the file they are located. It must occur immediately after the directory 233format header. Only one of these segments may occur in a packfile. In the 234future, a hierarchy of directories may be allowed. 235 236The directory segment adds one additional header after the standard packfile 237header data, which specifies the number of entries in the directory. 238 239 +--------+--------+--------------------------------------------------------+ 240 | Offset | Length | Description | 241 +--------+--------+--------------------------------------------------------+ 242 | 1 | 1 | The number of entries in the directory. | 243 +--------+--------+--------------------------------------------------------+ 244 245Following this are C<n> variable length entries formatted as described in the 246following table. Offsets are in words, but are given relative to the start of 247an individual entry. 248 249 +--------+--------+--------------------------------------------------------+ 250 | Offset | Length | Description | 251 +--------+--------+--------------------------------------------------------+ 252 | 0 | 1 | The type of the segment. Must be one of the following: | 253 | | | 0x00 - Reserved (Directory Segment) | 254 | | | 0x01 - Default Segment | 255 | | | 0x02 - Fixup Segment | 256 | | | 0x03 - Constant Table Segment | 257 | | | 0x04 - Bytecode Segment | 258 | | | 0x05 - PIR Debug Segment | 259 | | | 0x06 - Annotations Segment | 260 +--------+--------+--------------------------------------------------------+ 261 | 1 | n | The name of the segment, as a (NULL terminated) ASCII | 262 | | | C string. This must be padded with trailing NULL | 263 | | | (zero) values to be a full word in size. | 264 +--------+--------+--------------------------------------------------------+ 265 | n + 1 | 1 | The offset to the segment, relative to the start of | 266 | | | the packfile. Specified as a number of words, where | 267 | | | the word size is that specified in the header. (Parrot | 268 | | | may need to do some computation to transform this to | 269 | | | an offset in terms of its own word size.) As segments | 270 | | | must always be aligned on 16-byte boundaries, this | 271 | | | scheme scales up to 128-bit platforms. | 272 +--------+--------+--------------------------------------------------------+ 273 | n + 2 | 1 | The length of the segment, including its header, in | 274 | | | words. This must match the length stored at the start | 275 | | | of the header of the segment the entry is describing. | 276 +--------+--------+--------------------------------------------------------+ 277 278 279=head4 Default Segment 280 281The default segment has no additional headers. It will, if possible, be memory 282mapped. More than one may exist in the packfile, and they are identified by 283name. They may be used for storing any data that does not fit into any other 284segment, for example the source code from a high level language (HLL). 285 286 287=head4 Bytecode Segment 288 289This segment has no additional headers. It stores a stream of instructions in 290bytecode format, with the length given in the last field of the segment 291header. 292 293Instructions have variable length. Each instruction starts with an operation 294code (opcode). 295 296 +--------+--------+--------------------------------------------------------+ 297 | Offset | Length | Description | 298 +--------+--------+--------------------------------------------------------+ 299 | 0 | 1 | A valid Parrot opcode, as specified in the opcode | 300 | | | list include/parrot/oplib/ops.h. | 301 +--------+--------+--------------------------------------------------------+ 302 303Zero or more operands follow the opcode. All opcodes take a fixed number of 304operands. An individual operand is always one word in length and may be of 305one of the following forms. 306 307 +------------------+-------------------------------------------------------+ 308 | Operand Type | Description | 309 +------------------+-------------------------------------------------------+ 310 | Register | An integer specifying a register number. | 311 +------------------+-------------------------------------------------------+ 312 | Integer Constant | An integer that is the constant itself. That is, the | 313 | | constant is stored directly in the instruction | 314 | | stream. Storing integer constants of length greater | 315 | | than 32 bits has undefined behaviour and should be | 316 | | considered unportable. | 317 +------------------+-------------------------------------------------------+ 318 | Number Constant | An index into the constants table. | 319 +------------------+-------------------------------------------------------+ 320 | String Constant | An index into the constants table. | 321 +------------------+-------------------------------------------------------+ 322 | PMC Constant | An index into the constants table. | 323 +------------------+-------------------------------------------------------+ 324 325 326=head4 Constants Segment 327 328This segment stores number, string and PMC constants. 329 330The first element is the number of constants contained. 331 332 +--------+--------+--------------------------------------------------------+ 333 | Offset | Length | Description | 334 +--------+--------+--------------------------------------------------------+ 335 | 2 | 1 | The number of constants in the table. | 336 +--------+--------+--------------------------------------------------------+ 337 338Following this are C<n> constants, each with a single word header specifying 339the type of constant that follows. 340 341 +--------+--------+--------------------------------------------------------+ 342 | Offset | Length | Description | 343 +--------+--------+--------------------------------------------------------+ 344 | 0 | 1 | The type of the constant. Must be one of: | 345 | | | 0x00 - No constant | 346 | | | 0x6E - Number constant (ASCII 'n') | 347 | | | 0x73 - String constant (ASCII 's') | 348 | | | 0x70 - PMC constant (ASCII 'p') | 349 | | | 0x6B - Key constant (ASCII 'k') | 350 +--------+--------+--------------------------------------------------------+ 351 352All constants that are not a multiple of the word size in length must be 353padded with trailing zero bytes up to a word size boundary. 354 355=over 4 356 357=item * Number Constants 358 359The number is stored in the format defined in the Packfile header. Any padding 360that is needed will follow. 361 362=item * String Constants 363 364String constants are stored in the following format, with offsets relative to 365the start of the constant including its type. 366 367 +--------+--------+--------------------------------------------------------+ 368 | Offset | Length | Description | 369 +--------+--------+--------------------------------------------------------+ 370 | 1 | 1 | Flags, copied from the string structure. | 371 +--------+--------+--------------------------------------------------------+ 372 | 2 | 1 | Character set; either the index of a built-in one or a | 373 | | | dynamically loaded one whose index is in a range given | 374 | | | in the dependencies table. Note that dynamically | 375 | | | loaded character sets are not currently supported. | 376 +--------+--------+--------------------------------------------------------+ 377 | 3 | 1 | Encoding, either the index of a built-in one or a | 378 | | | dynamically loaded one whose index is in a range given | 379 | | | in the dependencies table. Note that dynamically | 380 | | | loaded encodings are not currently supported. | 381 +--------+--------+--------------------------------------------------------+ 382 | 4 | 1 | Length of the string data in bytes. | 383 +--------+--------+--------------------------------------------------------+ 384 | 5 | n | String data with trailing zero padding as required. | 385 +--------+--------+--------------------------------------------------------+ 386 387Note: The encoding and charset are currently packed together with the Flags, 388using an unique field of Length 1. 389 390 391=item * PMC Constants 392 393PMCs that can be saved in packfiles as constants implement the freeze and thaw 394vtable functions. Their frozen data is placed in a string, stored in the same 395format as a string constant. 396 397=item * Key Constants 398 399Key constants are made up a number of components, where one component is a 400"dimension" in the key. The number of components in the key is stored at the 401start of the constant. 402 403 +--------+--------+--------------------------------------------------------+ 404 | Offset | Length | Description | 405 +--------+--------+--------------------------------------------------------+ 406 | 1 | 1 | Number of key components that follow. | 407 +--------+--------+--------------------------------------------------------+ 408 409Following this are C<n> entries of two words each that specify the key's 410type and value. The key value may be a register or another constant, but not 411another key constant. All constants other than integer constants are indexes 412into the constants table. 413 414 +--------+--------+--------------------------------------------------------+ 415 | Offset | Length | Description | 416 +--------+--------+--------------------------------------------------------+ 417 | 0 | 1 | Type of the key. Must be one of: | 418 | | | 0x00 - Integer register | 419 | | | 0x01 - String register | 420 | | | 0x02 - PMC register | 421 | | | 0x03 - Number register | 422 | | | 0x10 - Integer constant | 423 | | | 0x11 - String constant (constant table index) | 424 | | | 0x12 - PMC constant (constant table index) | 425 | | | 0x13 - Number constant (constant table index) | 426 +--------+--------+--------------------------------------------------------+ 427 | 1 | 1 | Value of the key. | 428 +--------+--------+--------------------------------------------------------+ 429 430=back 431 432=head4 Fixup Segment 433 434The fixup segment maps names of subs to offsets in the bytecode stream. 435 436The number of fixup table entries, n, is given by the last field of the 437segment header. 438 439This is followed by n fixup table entries, of variable length, that take the 440following form. 441 442 +--------+--------+--------------------------------------------------------+ 443 | Offset | Length | Description | 444 +--------+--------+--------------------------------------------------------+ 445 | 0 | 1 | Type of the fixup. Must be: | 446 | | | 0x01 - Subroutine fixup constant string | 447 | | | 0x02 - Subroutine fixup ascii string | 448 +--------+--------+--------------------------------------------------------+ 449 | 1 | - | The label that is being fixed up. A string constant, | 450 | | | stored as an index into the constants table in the 01 | 451 | | | case, a NULL terminated ASCII string padded to word | 452 | | | length with zeroes in the 02. | 453 +--------+--------+--------------------------------------------------------+ 454 | - | 1 | This is an index into the constants table for the sub | 455 | | | PMC corresponding to the label. | 456 +--------+--------+--------------------------------------------------------+ 457 458 459=head4 PIR Debug Segment 460 461This segment stores a list of mappings between offsets in the bytecode and 462filenames, indicating that the bytecode from that point on until the next 463entry was generated from the PIR found in the given filename 464 465The segment begins with an opcode with n, the number of file mappings. Then 466come n mappings: 467 468 +--------+--------+--------------------------------------------------------+ 469 | Offset | Length | Description | 470 +--------+--------+--------------------------------------------------------+ 471 | 0 | 1 | Offset in the bytecode. | 472 +--------+--------+--------------------------------------------------------+ 473 | 1 | 1 | A string constant holding the filename, stored as an | 474 | | | index into the constants table. | 475 +--------+--------+--------------------------------------------------------+ 476 477 478=head4 Annotations Segment 479 480Annotations allow any instruction in the bytecode stream to have zero or more 481key/value pairs associated with it. These can be retrieved at runtime. High 482level languages can use annotations to store file names, line numbers, column 483numbers and any other data, for debug purposes or otherwise, that they need. 484 485The segment comes in three parts: 486 487=over 4 488 489=item A list of annotation keys (for example, "line" and "file"). 490 491=item An annotation groups table, used to group together annotations for a 492particular HLL source file (an annotation group starting clears all active 493annotations, so they will not spill over between source files; it also 494allows for faster lookup of annotations). 495 496{{ TODO: Does it clear all annotations, or all annotation groups? }} 497 498=item A list of indexes into the bytecode stream and key/value pairings (for 499example, starting at instruction 235, the annotation "line" has value "42"). 500 501=back 502 503The last field of the segment header is not used. 504 505The first word in the segment supplies the number of keys. 506 507 +--------+--------+--------------------------------------------------------+ 508 | Offset | Length | Description | 509 +--------+--------+--------------------------------------------------------+ 510 | 1 | 1 | Number of annotation key entries that follow. | 511 | | | n | 512 +--------+--------+--------------------------------------------------------+ 513 514Following this are C<n> annotation key entries. There is one entry per key 515(such as "line" or "file"), but the bytecode may be annotated many times 516with that key. Key entries take the following format. 517 518 +--------+--------+--------------------------------------------------------+ 519 | Offset | Length | Description | 520 +--------+--------+--------------------------------------------------------+ 521 | 0 | 1 | Index into the constants table of a string containing | 522 | | | the name of the key. | 523 +--------+--------+--------------------------------------------------------+ 524 | 1 | 1 | The type of value that is stored with the key. | 525 | | | 0x00 - Integer | 526 | | | 0x01 - String Constant | 527 | | | 0x02 - Number Constant | 528 | | | 0x03 - PMC Constant | 529 +--------+--------+--------------------------------------------------------+ 530 531The annotation groups table comes next. This starts with a single integer to 532specify the number of entries in the table. 533 534 +--------+--------+--------------------------------------------------------+ 535 | Offset | Length | Description | 536 +--------+--------+--------------------------------------------------------+ 537 | 1 | 1 | Number of annotation group entries that follow. | 538 +--------+--------+--------------------------------------------------------+ 539 540A group entry maps an offset in the bytecode segment to an offset in the list 541of annotations (that is, offset 0 refers to the first word following this 542table). The list of offsets into the bytecode segment (and by the definition 543of this segment, the offsets into the annotations list) must be in ascending 544order. 545 546 +--------+--------+--------------------------------------------------------+ 547 | Offset | Length | Description | 548 +--------+--------+--------------------------------------------------------+ 549 | 0 | 1 | Offset into the bytecode segment where the | 550 | | | instructions for a particular high level source file | 551 | | | start. | 552 +--------+--------+--------------------------------------------------------+ 553 | 1 | 1 | Offset into the annotations list specifying where the | 554 | | | annotations for the given instruction start. | 555 +--------+--------+--------------------------------------------------------+ 556 557The rest of the segment is made up of a sequence of bytecode offset to key and 558value mappings. First comes the number of them that follow: 559 560 +--------+--------+--------------------------------------------------------+ 561 | Offset | Length | Description | 562 +--------+--------+--------------------------------------------------------+ 563 | 1 | 1 | Number of bytecode to keypair mappings that follow. | 564 | | | n | 565 +--------+--------+--------------------------------------------------------+ 566 567Then there are n entries of the following format: 568 569 +--------+--------+--------------------------------------------------------+ 570 | Offset | Length | Description | 571 +--------+--------+--------------------------------------------------------+ 572 | 0 | 1 | Offset into the bytecode segment, in words, of the | 573 | | | instruction being annotated. At runtime, this will | 574 | | | correspond to the program counter. | 575 +--------+--------+--------------------------------------------------------+ 576 | 1 | 1 | The key of the annotation, specified as an index into | 577 | | | the zero-based list of keys specified in the first | 578 | | | part of the segment. That is, if key "line" was the | 579 | | | first entry and "file" the second, they would have | 580 | | | indices 0 and 1 respectively. | 581 +--------+--------+--------------------------------------------------------+ 582 | 2 | 2 | The value of the annotation. If the annotation type | 583 | | | (specified with the key) is an integer, the value is | 584 | | | placed directly into this word. Otherwise, an index | 585 | | | into the constants table is used. | 586 +--------+--------+--------------------------------------------------------+ 587 588Note that the value of an annotation with a particular key is taken to apply 589to all following instructions up to the point of a new value being specified 590for that key with another annotation. This means that if 20 instructions make 591up the compiled form of a single line of code, only one line annotation is 592required. Note that this also implies that annotations must be placed in 593the same order as the instructions. 594 595=head3 Packfile PMCs 596 597A packfile can be represented in memory by Parrot as a tree of PMCs. These 598provide a programmatic way to construct and walk packfiles, both for the 599Parrot internals and from programs running on the Parrot VM. 600 601{{ TODO... ManagedStruct and UnmanagedStruct may be helpful for these; 602consider switching these PMCs over to use them at some point. }} 603 604 605=head4 Packfile.pmc 606 607This PMC represents the packfile overall. It will be constructed by the VM 608when reading a packfile. It implements the following methods and vtable 609functions. 610 611=over 4 612 613=item * C<get_string> (vtable) 614 615Serializes this packfile data structure into a bytestream ready to be written 616to disk (that is, maps from PMCs to on-disk representation). 617 618=item * C<set_string_native> (vtable) 619 620Takes a string containing an entire packfile in the on-disk format, attempts 621to unpack it into a tree of Packfile PMCs and sets this Packfile PMC to 622represent the top of that tree (that is, maps from on-disk representation to a 623tree of PMCs). 624 625=item * C<get_integer_keyed_str> (vtable) 626 627Used to get data about fields in the header that have an integer value. Valid 628keys are: 629 630=over 4 631 632=item wordsize 633 634=item byteorder 635 636=item fptype 637 638=item version_major 639 640=item version_minor 641 642=item version_patch 643 644=item bytecode_major 645 646=item bytecode_minor 647 648=item uuid_type 649 650=back 651 652=item * C<get_string_keyed_str> (vtable) 653 654Used to get data about fields in the header that have a string value. Valid 655keys are: 656 657=over 4 658 659=item uuid 660 661=back 662 663=item * C<set_integer_keyed_str> (vtable) 664 665Used to set fields in the packfile header. Some fields are not allowed to be 666written since they are determined by the VM when serializing the packfile for 667storage on disk. The fields that may be set are: 668 669=over 4 670 671=item version_major 672 673=item version_minor 674 675=item version_patch 676 677=item uuid_type 678 679=back 680 681Be very careful when setting a version number; you should usually trust the VM 682to do the right thing with this. 683 684Setting the uuid_type will not result in immediate re-computation of the 685UUID, but rather will only cause it to be computed using the selected 686algorithm when the packfile is serialized (by calling the C<get_string> 687vtable function). Setting an invalid uuid_type value will cause an exception 688to be thrown immediately. 689 690=item * C<get_directory()> 691 692Returns the PackfileDirectory PMC that represents the directory segment at the 693start of the packfile. 694 695=back 696 697=head4 PackfileSegment.pmc 698 699An abstract PMC that is the base class for all other segments. It has two 700abstract methods, which are to be implemented by all subclasses. They will not 701be listed under the method list for other segment PMCs to save space. 702 703=over 4 704 705=item * C<STRING* pack()> 706 707Packs the segment into the on-disk format and returns a string holding it. 708 709=item * C<unpack(STRING*)> 710 711Takes the packed representation for a segment of the given type and then 712unpacks it, setting this PMC to represent that segment as a result of the 713unpacking. If an error occurs during the unpacking process, an exception will 714be thrown. 715 716=back 717 718=head4 PackfileDirectory.pmc (isa PackfileSegment) 719 720This PMC represents a directory segment. Essentially it is an hash of 721PackfileSegment PMCs. It implements the following methods: 722 723=over 4 724 725=item * C<elements> (vtable) 726 727Gets the number of segments listed in the directory. 728 729=item * C<get_pmc_keyed_str> (vtable) 730 731Searches the directory for a segment with the given name and, if one exists, 732returns a PackfileSegment PMC (or one of its subclasses) representing it. 733 734=item * C<set_pmc_keyed_str> (vtable) 735 736Adds a PackfileSegment PMC (or a subclass of it) to the directory with the 737name specified by the key. This is the only way to add another segment to the 738directory. If a segment of the given name already exists in the directory, it 739will be replaced with the supplied PMC. 740 741=item * C<delete_keyed_str> (vtable) 742 743Removes the PackfileSegment PMC from the directory which has the name 744specified by the key. This is the only way to remove a segment from the 745directory. 746 747=item * C<get_iter> (vtable) 748 749Returns iterator for existing keys. 750 751=back 752 753=head4 PackfileRawSegment.pmc (isa PackfileSegment) 754 755This PMC presents a segment of a packfile as an array of integers. This is the 756lowest possible level of access to a segment, and covers both the default and 757bytecode segment types. It implements the following methods: 758 759=over 4 760 761=item * C<get_type> 762 763Get type of PackfileRawSegment. 764 765=item * C<set_type> 766 767Set type of PackfileRawSegment. 768 769=item * C<get_iter> 770 771Returns iterator for Segment. 772 773=item * C<get_integer_keyed_int> (vtable) 774 775Reads the integer at the specified offset into the segment, excluding the data 776in the common segment header but including the data making up additional 777fields in the header for a specific type of segment. 778 779=item * C<set_integer_keyed_int> (vtable) 780 781Stores an integer at the specified offset into the segment. Will throw an 782exception if the segment is memory mapped. 783 784=item * C<elements> (vtable) 785 786Gets the length of the segment in words, excluding the length of the common 787segment but including the data making up additional fields in the header for a 788specific type of segment. 789 790=back 791 792=head4 PackfileConstantTable.pmc (isa PackfileSegment) 793 794This PMC represents a constants table. It provides access to constants through 795the keyed integer interface (the interpreter may choose to access underlying 796structures directly to improve performance, however). 797 798The table of constants can be added to using the keyed set methods; it will 799grow automatically. 800 801The PMC implements the following methods: 802 803=over 4 804 805=item * C<get_iter> 806 807Returns iterator for stored Constants. 808 809=item * C<elements> (vtable) 810 811Gets the number of constants contained in the table. 812 813=item * C<get_number_keyed_int> (vtable) 814 815Gets the value of the number constant at the specified index in the constants 816table. If the constant at that position in the table is not a number, an 817exception will be thrown. 818 819=item * C<get_string_keyed_int> (vtable) 820 821Gets the value of the string constant at the specified index in the constants 822table. If the constant at that position in the table is not a string, an 823exception will be thrown. 824 825=item * C<get_pmc_keyed_int> (vtable) 826 827Gets the value of the PMC or key constant at the specified index in the 828constants table. If the constant at that position in the table is not a PMC 829or key, an exception will be thrown. 830 831=item * C<set_number_keyed_int> (vtable) 832 833Sets the value of the number constant at the specified index in the constants 834table. If the constant at that position in the table is not already a number 835constant, an exception will be thrown. If it does not exist, the table will be 836extended. 837 838=item * C<set_string_keyed_int> (vtable) 839 840Sets the value of the string constant at the specified index in the constants 841table. If the constant at that position in the table is not already a string 842constant, an exception will be thrown. If it does not exist, the table will be 843extended. 844 845=item * C<set_pmc_keyed_int> (vtable) 846 847Sets the value of the PMC or key constant at the specified index in the 848constants table. If the constant at that position in the table is not already 849a PMC or key constant, an exception will be thrown. If it does not exist, the 850table will be extended. 851 852=item * C<int get_type(int)> 853 854Returns an integer value denoting the type of the constant at the specified 855index. Possible values are: 856 857 +--------+-----------------------------------------------------------------+ 858 | Value | Constant Type | 859 +--------+-----------------------------------------------------------------+ 860 | 0x00 | No Constant | 861 +--------+-----------------------------------------------------------------+ 862 | 0x6E | Number Constant | 863 +--------+-----------------------------------------------------------------+ 864 | 0x73 | String Constant | 865 +--------+-----------------------------------------------------------------+ 866 | 0x70 | PMC Constant | 867 +--------+-----------------------------------------------------------------+ 868 | 0x6B | Key Constant | 869 +--------+-----------------------------------------------------------------+ 870 871=back 872 873=head4 PackfileFixupTable.pmc (isa PackfileSegment) 874 875This PMC provides a keyed integer interface to the fixup table. Each entry in 876the table is represented by a PackfileFixupEntry PMC. It implements the 877following methods: 878 879=over 4 880 881=item * C<get_iter> (vtable) 882 883Returns iterator for stored fixup entries. 884 885=item * C<elements> (vtable) 886 887Gets the number of entries in the fixup table. 888 889=item * C<get_pmc_keyed_int> (vtable) 890 891Gets a PackfileFixupEntry PMC for the fixup entry at the position given in 892the key. If the index is out of range, an exception will be thrown. 893 894=item * C<set_pmc_keyed_int> (vtable) 895 896Used to add a PackfileFixupEntry PMC to the fixups table or to replace an 897existing one. If the PMC that is supplied is not of type PackfileFixupEntry, 898an exception will thrown. 899 900=back 901 902=head4 PackfileFixupEntry.pmc 903 904This PMC represents an entry in the fixup table. It implements the following 905methods. 906 907=over 4 908 909=item * C<get_string> (vtable) 910 911Gets the label field of the fixup entry. 912 913=item * C<set_string_native> (vtable) 914 915Sets the label field of the fixup entry. 916 917=item * C<get_integer> (vtable) 918 919Gets the offset field of the fixup entry. 920 921=item * C<set_integer_native> (vtable) 922 923Sets the offset field of the fixup entry. 924 925=item * C<int get_type()> 926 927Gets the type of the fixup entry. See the entries table for possible fixup 928types. 929 930=item * C<set_type(int)> 931 932Sets the type of the fixup entry. See the entries table for possible fixup 933types. Specifying an invalid type will result in an exception. 934 935=back 936 937=head4 PackfileAnnotations.pmc (isa PackfileSegment) 938 939This PMC represents the bytecode annotations table. The following methods are 940implemented: 941 942=over 4 943 944=item * C<elements> (vtable) 945 946Gets the number of annotations in the table. 947 948=item * C<get_iter> (vtable) 949 950Get iterator for stored annotations. 951 952=item * C<get_pmc_keyed_int> (vtable) 953 954Gets the annotation at the specified index. If there is no annotation at that 955index, an exception will be thrown. The PMC that is returned will always be a 956PackfileAnnotation PMC. 957 958=item * C<set_pmc_keyed_int> (vtable) 959 960Sets the annotation at the specified index. If there is no annotation at that 961index, it is added to the list of annotations. An exception will be thrown 962unless all of the following conditions are met: 963 964=over 4 965 966=item - The type of the PMC passed is PackfileAnnotation 967 968=item - The entry at the previous index is defined 969 970=item - The offset of the previous entry is less than this entry 971 972=item - The offset of the next entry, if it exists, is greater than this entry 973 974=back 975 976=back 977 978=head4 PackfileAnnotation.pmc 979 980This PMC represents an individual bytecode annotation entry in the annotations 981segment. It implements the following methods: 982 983=over 4 984 985=item * C<int get_offset()> 986 987Gets the offset into the bytecode of the instruction that is being annotated. 988 989=item * C<set_offset(int)> 990 991Sets the offset into the bytecode of the instruction that is being annotated. 992 993=item * C<int get_name()> 994 995Gets the name of the annotation. 996 997=item * C<int set_name()> 998 999Sets the name of the annotation. 1000 1001=item * C<get_integer> (vtable) 1002 1003Gets the integer value of the annotation. 1004 1005=item * C<set_integer> (vtable) 1006 1007Sets the integer value of the annotation. 1008 1009=item * C<get_string> (vtable) 1010 1011Gets the string value of the annotation. 1012 1013=item * C<set_string> (vtable) 1014 1015Sets the string value of the annotation. 1016 1017=item * C<get_number> (vtable) 1018 1019Gets the number value of the annotation. 1020 1021=item * C<set_number> (vtable) 1022 1023Sets the number value of the annotation. 1024 1025=back 1026 1027=head2 Language Notes 1028 1029None. 1030 1031=head2 Attachments 1032 1033None. 1034 1035=head2 Footnotes 1036 1037=head3 Changes From Previous Versions 1038 1039A number of things in this PDD differ from the older implementation, 1040and few items with the more convenient PMC access are not yet implemented. 1041This section details these changes from the old implementation 1042and some of the reasoning behind them. 1043 1044=head4 Packfile Header 1045 1046The format of the packfile header changed completely, based upon a 1047proposal at 1048L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/1f1af615edec7449/ebfdbb5180a9d813?lnk=gst> 1049and the requirement to have a UUID. The old INT field in the previous header 1050format is used nowhere in Parrot and was removed, the parrot patch version 1051number along with the major and minor was added. The opcode type is also gone 1052due to non-use. The opcode type is always long. 1053 1054The version number now reflects the earliest version of Parrot that is capable 1055of running the bytecode file, to enable cross-version compatibility that will 1056be needed in the future. 1057 1058 1059=head4 Segment Header 1060 1061Having the type associated with the segment inside the VM is fine, but since 1062it is in the directory segment anyway it seems odd to duplicate it here. Also 1063removed the id (did not seem to be used anywhere) and the second size (always 1064computable by knowing the size of this header, so it appears redundant). 1065 1066 1067=head4 Fixup Segment 1068 1069We need to support unicode sub names, so fixup labels should be an index into 1070the constants table to the relevant string instead of just a C string as they 1071are now. 1072 1073 1074=head4 Annotations Segment 1075 1076This is new and replaces and builds upon the debug segment. See here for some 1077on-list discussion: 1078 1079L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/b0d36dafb42d96c4/4d6ad2ad2243e677?lnk=gst&rnum=2#4d6ad2ad2243e677> 1080 1081 1082=head4 Packfile PMCs 1083 1084This idea will see packfiles and segments within them being represented by 1085PMCs, easing memory management and providing an interface to packfiles for 1086Parrot programs. 1087 1088Here are mailing list comments that provide one of the motivations or hints 1089of the original proposal. 1090 1091L<http://groups.google.com/group/perl.perl6.internals/browse_thread/thread/778ea0ac4c8676f7/b249306b543b040a?lnk=gst&q=packfile+PMCs&rnum=2#b249306b543b040a> 1092 1093=head2 References 1094 1095None. 1096 1097=cut 1098 1099__END__ 1100Local Variables: 1101 fill-column:78 1102End: 1103vim: expandtab shiftwidth=4: 1104