1# Copyright (C) 2001-2010, Parrot Foundation. 2 3=head1 [DRAFT] PDD 8: PMC Keys 4 5=head2 Abstract 6 7This PDD aims to clear up the confusion regarding the implementation of keyed 8access to PMCs in Parrot. 9 10=head2 Description 11 12First, let's define some terminology. An C<aggregate PMC> is one which stores 13or references other values as elements. The aggregate PMC allows indexed 14access to element by implementing some of the C<_keyed> variants of VTABLE 15functions. These variants are called C<indexing> operations, as they act on a 16specific indexed element of an aggregate PMC. Examples of a aggregate PMCs 17include C<Hash>, C<FixedIntegerArray> and C<ResizablePMCArray>. 18 19Non-aggregates may also support C<_keyed> variants of the VTABLE functions, 20but they not do anything particularly clever. For instance, PMC types 21implementing Perl references will merely pass the index on to the referent. 22These aren't aggregates because they don't directly store or reference 23elements. 24 25Indexing operations take one or more aggregate B<keys>. At runtime these 26operations will index into the B<aggregate> using the C<key> and return a 27B<value>. Here is a well-known indexing operation in Perl 6: 28 29 @a[12] = $b; 30 31The B<key> here is the constant integer C<12> The aggregate is the 32C<Perl6Array> C<@a>. In the process of this assignment, Parrot will have to 33extract the PMC in element 12 of the array, producing a C<value>. 34C<$b> is then assigned to this value. 35 36Now, how does this all get implemented? 37 38=head2 Implementation 39 40=head3 The key structure 41 42The key structure must bundle multiple keys. This is to allow indexing into 43multidimensional aggregate PMCs. These keys may be specified as integer, 44string, number or PMC. 45 46For this reason the following structure was produced. Individual keys (e.g. a 47single C<Integer> or C<String>) are stored in a C<Key> PMC. The type of the 48key is encoded in the private flags of the PMC as specified below. The value 49of the C<Key> PMC is stored in the PMC's data structure internally and can be 50accessed using the appropriate get_* VTABLE functions. 51 52For example, indexing the multidimensional array C<@foo[$a,12;"hi"]> 53produces three PMCs; one with a PMC type, one with an integer type 54and one with a string type. 55 56The key type is encoded in the PMC flags using 8 bits based on the following 57scheme (from includes/parrot/key.h): 58 59 typedef enum { 60 KEY_integer_FLAG = PObj_private0_FLAG, 61 KEY_number_FLAG = PObj_private1_FLAG, 62 KEY_hash_iterator_FLAGS = PObj_private0_FLAG | PObj_private1_FLAG, 63 KEY_string_FLAG = PObj_private2_FLAG, 64 KEY_pmc_FLAG = PObj_private3_FLAG, 65 KEY_register_FLAG = PObj_private4_FLAG, 66 67 KEY_type_FLAGS = KEY_integer_FLAG | 68 KEY_number_FLAG | 69 KEY_string_FLAG | 70 KEY_pmc_FLAG | 71 KEY_register_FLA G | 72 KEY_hash_iterator_FLAGS 73 } KEY_flags 74 75 76The C<KEY_register_FLAG> is used to indicate that value if the key is in a 77register. In this case, the Key PMC's data contains the integer number of the 78appropriate register in the current context. 79 80Parrot must also have a way to combine multiple keys into a key that can be 81treated as a single unit. This is done by forming a singly linked list such 82that each key points at the next. Within a single Key PMC, the pointer to the 83next key is stored in C<PMC_data(key)>. The linked list structure allows the 84use of partial keys in multidimensional lookups, since the next key can be 85generated while the aggregate PMC is being traversed. 86 87These definitions, along with declarations of support routines used to 88manipulate keys, can be found in F<include/parrot/key.h> 89 90=head3 Aggregate and non-aggregate PMCs 91 92We've already said that what separates the aggregate PMCs from the 93non-aggregates is their implementation of the C<_keyed> vtable functions. So 94it is Hereby Decreed that the default vtable which everyone inherits from 95defines the C<_keyed> forms to throw an exception. 96 97=over 3 98 99=item Todo 100 101Need discussion on whether C<EXCEPTION_OUT_OF_BOUNDS> is a good exception for 102this, or whether something else should be used. It's really a compiler 103screw-up, since code which indexes a non-aggregate shouldn't be generated. 104 105=back 106 107=head3 C<_keyed> vtable functions 108 109So what of these magical C<_keyed> vtable functions? They are generated when 110you add the C<keyed> tag to the appropriate entry in F<src/vtable.tbl>. They 111are constructed by following B<every> C<PMC> argument with a second C<PMC> 112argument which acts as the key for that argument; the name of the second 113C<PMC> argument is formed by adding C<_key> onto the end of the first C<PMC> 114argument. 115 116The reason why every PMC argument has an associated key is twofold. Firstly, 117it means that 118 119 @a[$b] = $c 120 121and 122 123 $a = @b[$c] 124 125use the same vtable function, reducing the multiplicity of methods. Secondly, 126a three-argument C<assign> as suggested by the code above would be ambiguous - 127the code above uses 3 PMCs in different ways. 128 129Also, operations which take an aggregate key for one of their arguments should 130take aggregate keys for B<all> of their arguments. This is to avoid the 131following: 132 133 void foo_keyed_i(PMC* x, PMC* x_key, INT a) 134 void foo_keyed_n(PMC* x, PMC* x_key, NUM a) 135 void foo_keyed_p(PMC* x, PMC* x_key, PMC a) 136 void foo_keyed_p_keyed(PMC* x, PMC* x_key, PMC* a, PMC* a_key) 137 138These are all replaced with the single entry 139 140 void foo_keyed(PMC* x, PMC* a_key, PMC* a, PMC* a_key) 141 142(Think how much worse it gets when there are three or more PMCs in an 143entry...) 144 145Yes. This means that you may need to turn some things into C<PMC>s that you 146didn't want to. Since the alternative is mega pollution and duplication in the 147vtable table, and since the majority of things that you'll deal with in a real 148world situation are expected to be C<PMC>s anyway, this shouldn't be too much 149of a problem. 150 151So, if you have a PMC in a C<_keyed> method which you don't want to index, 152pass in C<NULL> instead of a real key. Code implementing these methods should 153understand C<PMC* foo, PMC* NULL> as meaning the entirety of C<foo> in some 154sense; this is trivial to understand if C<foo> is non-aggregate, and 155implementation-defined if C<foo> is aggregate. If you remember that a key PMC 156is really a linked list, you'll notice that after traversing down through the 157list, you'll reach a C<NULL> which again means the entirety of whatever object 158you traversed to. 159 160Similarly, non-C<_keyed> methods on aggregates are implementation defined; for 161instance, a C<set_integer> on a C<PerlArray> may be understood as setting 162C<@array.length>. 163 164Historically, we first implemented keys as two separate keyed methods per 165applicable method - C<..._index> and C<..._index_s> for integer and string 166indexing respectively. However, this didn't give us the flexibility and 167scalability that key structures give us. 168 169=head3 Input to the assembler 170 171There are several different valid specifications of an aggregate key to the 172assembler. These are: 173 174 op arg, P1[1234] # Constant integer key 175 op arg, P1[I1] # Integer key 176 177 op arg, P1[12.34] # Constant number key - handled as constant key 178 op arg, P1["foo"] # Constant string key - handled as constant key 179 op arg, P1[I1;I2] # Multi-level key - handled as constant key 180 181 op arg, P1[P1] # Register key 182 183(Rationale: fits programmer's expectation, easier to understand at a glance 184than C<op P1, P2, P3>. Also, is C<op P1, P2, P3> the same as C<op P1[P2], P3> 185or C<op P1, P2[P3]>, or are these three separate PMCs?) 186 187In all there are four types of key. The first two are integer keys and 188constant integer keys which are optimisations for the common case of single 189level integer keys. 190 191The other two are constant keys, which can handle any combination of constants 192and registers with any number of levels; and register keys which are 193represented by a single PMC register that is assumed to contain a PMC of the 194Key class. 195 196=head3 What the assembler did next 197 198When the assembler sees an aggregate key, it "detaches" the key to form a 199separate argument. It then decides on the type of key. For integer keys (both 200constant and register) the data is encoded in the same way as an ordinary 201integer argument. For register keys the data is encoded as for an ordinary PMC 202register argument, while for constant keys a key constant is generated that 203encodes the list of constants and registers that make up the key and an 204appropriate index into the constant table is encoded as the argument. 205 206Next it selects the appropriate op. Register keys have the signature C<k> and 207constant keys have the signature C<kc>. Integer register and constant keys are 208encoded as C<ki> and C<kic> respectively. 209 210=begin PIR_FRAGMENT 211 212 set $P1["hi"], 1234 213 214=end PIR_FRAGMENT 215 216finds an op named C<set_p_kc_i>. On the other hand, 217 218=begin PIR_FRAGMENT 219 220 set $P1[$P1], 1234 221 222=end PIR_FRAGMENT 223 224produces an op named C<set_p_k_i>. Likewise, this: 225 226=begin PIR_FRAGMENT 227 228 set $P1[1], 1234 229 230=end PIR_FRAGMENT 231 232produces an op named C<set_p_kic>, and this: 233 234=begin PIR_FRAGMENT 235 236 set $P1[$I1], 1234 237 238=end PIR_FRAGMENT 239 240produces an op named C<set_p_ki>. 241 242=head3 Bytecode representation 243 244The bytecode representation of these keys are as follows: constant keys are 245treated just like another constant, and are an index into the packfile's 246constant table. 247 248Each key in that constant table consists of one word specifying its length in 249terms of number of keys. For instance, C<["hi"]> has length 1; 250C<["hi";P1;S1;123]> has length 4. Next, each key is specified using two words. 251The first word is a type specifier: 252 253 1 - Integer constant 254 2 - Number constant 255 4 - String constant 256 7 - Integer register 257 8 - Number register 258 9 - PMC register 259 10 - String register 260 261and the second word is either a value (for integer constants), a register 262number (for registers) or an index into the appropriate constant table. 263 264The type values shown above are actually the C<PARROT_ARG_*> values taken from 265F<include/parrot/op.h>. 266 267=head2 References 268 269None. 270 271=cut 272 273__END__ 274Local Variables: 275 fill-column:78 276End: 277vim: expandtab shiftwidth=4: 278