1# Copyright (C) 2001-2010, Parrot Foundation.
2
3=head1 [DRAFT] PDD 8: PMC Keys
4
5=head2 Abstract
6
7This PDD aims to clear up the confusion regarding the implementation of keyed
8access to PMCs in Parrot.
9
10=head2 Description
11
12First, let's define some terminology. An C<aggregate PMC> is one which stores
13or references other values as elements.  The aggregate PMC allows indexed
14access to element by implementing some of the C<_keyed> variants of VTABLE
15functions.  These variants are called C<indexing> operations, as they act on a
16specific indexed element of an aggregate PMC.  Examples of a aggregate PMCs
17include C<Hash>, C<FixedIntegerArray> and C<ResizablePMCArray>.
18
19Non-aggregates may also support C<_keyed> variants of the VTABLE functions,
20but they not do anything particularly clever.  For instance, PMC types
21implementing Perl references will merely pass the index on to the referent.
22These aren't aggregates because they don't directly store or reference
23elements.
24
25Indexing operations take one or more aggregate B<keys>.  At runtime these
26operations will index into the B<aggregate> using the C<key> and return a
27B<value>.  Here is a well-known indexing operation in Perl 6:
28
29    @a[12] = $b;
30
31The B<key> here is the constant integer C<12>  The aggregate is the
32C<Perl6Array> C<@a>.  In the process of this assignment, Parrot will have to
33extract the PMC in element 12 of the array, producing a C<value>.
34C<$b> is then assigned to this value.
35
36Now, how does this all get implemented?
37
38=head2 Implementation
39
40=head3 The key structure
41
42The key structure must bundle multiple keys.  This is to allow indexing into
43multidimensional aggregate PMCs.  These keys may be specified as integer,
44string, number or PMC.
45
46For this reason the following structure was produced.  Individual keys (e.g. a
47single C<Integer> or C<String>) are stored in a C<Key> PMC.  The type of the
48key is encoded in the private flags of the PMC as specified below.   The value
49of the C<Key> PMC is stored in the PMC's data structure internally and can be
50accessed using the appropriate get_* VTABLE functions.
51
52For example, indexing the multidimensional array C<@foo[$a,12;"hi"]>
53produces three PMCs; one with a PMC type, one with an integer type
54and one with a string type.
55
56The key type is encoded in the PMC flags using 8 bits based on the following
57scheme (from includes/parrot/key.h):
58
59    typedef enum {
60        KEY_integer_FLAG        = PObj_private0_FLAG,
61        KEY_number_FLAG         = PObj_private1_FLAG,
62        KEY_hash_iterator_FLAGS = PObj_private0_FLAG | PObj_private1_FLAG,
63        KEY_string_FLAG         = PObj_private2_FLAG,
64        KEY_pmc_FLAG            = PObj_private3_FLAG,
65        KEY_register_FLAG       = PObj_private4_FLAG,
66
67        KEY_type_FLAGS          = KEY_integer_FLAG   |
68                                  KEY_number_FLAG    |
69                                  KEY_string_FLAG    |
70                                  KEY_pmc_FLAG       |
71                                  KEY_register_FLA G |
72                                  KEY_hash_iterator_FLAGS
73    } KEY_flags
74
75
76The C<KEY_register_FLAG> is used to indicate that value if the key is in a
77register.  In this case, the Key PMC's data contains the integer number of the
78appropriate register in the current context.
79
80Parrot must also have a way to combine multiple keys into a key that can be
81treated as a single unit.  This is done by forming a singly linked list such
82that each key points at the next.  Within a single Key PMC, the pointer to the
83next key is stored in C<PMC_data(key)>.  The linked list structure allows the
84use of partial keys in multidimensional lookups, since the next key can be
85generated while the aggregate PMC is being traversed.
86
87These definitions, along with declarations of support routines used to
88manipulate keys, can be found in F<include/parrot/key.h>
89
90=head3 Aggregate and non-aggregate PMCs
91
92We've already said that what separates the aggregate PMCs from the
93non-aggregates is their implementation of the C<_keyed> vtable functions. So
94it is Hereby Decreed that the default vtable which everyone inherits from
95defines the C<_keyed> forms to throw an exception.
96
97=over 3
98
99=item Todo
100
101Need discussion on whether C<EXCEPTION_OUT_OF_BOUNDS> is a good exception for
102this, or whether something else should be used. It's really a compiler
103screw-up, since code which indexes a non-aggregate shouldn't be generated.
104
105=back
106
107=head3 C<_keyed> vtable functions
108
109So what of these magical C<_keyed> vtable functions? They are generated when
110you add the C<keyed> tag to the appropriate entry in F<src/vtable.tbl>. They
111are constructed by following B<every> C<PMC> argument with a second C<PMC>
112argument which acts as the key for that argument; the name of the second
113C<PMC> argument is formed by adding C<_key> onto the end of the first C<PMC>
114argument.
115
116The reason why every PMC argument has an associated key is twofold. Firstly,
117it means that
118
119    @a[$b] = $c
120
121and
122
123    $a = @b[$c]
124
125use the same vtable function, reducing the multiplicity of methods. Secondly,
126a three-argument C<assign> as suggested by the code above would be ambiguous -
127the code above uses 3 PMCs in different ways.
128
129Also, operations which take an aggregate key for one of their arguments should
130take aggregate keys for B<all> of their arguments. This is to avoid the
131following:
132
133    void foo_keyed_i(PMC* x, PMC* x_key, INT a)
134    void foo_keyed_n(PMC* x, PMC* x_key, NUM a)
135    void foo_keyed_p(PMC* x, PMC* x_key, PMC a)
136    void foo_keyed_p_keyed(PMC* x, PMC* x_key, PMC* a, PMC* a_key)
137
138These are all replaced with the single entry
139
140    void foo_keyed(PMC* x, PMC* a_key, PMC* a, PMC* a_key)
141
142(Think how much worse it gets when there are three or more PMCs in an
143entry...)
144
145Yes. This means that you may need to turn some things into C<PMC>s that you
146didn't want to. Since the alternative is mega pollution and duplication in the
147vtable table, and since the majority of things that you'll deal with in a real
148world situation are expected to be C<PMC>s anyway, this shouldn't be too much
149of a problem.
150
151So, if you have a PMC in a C<_keyed> method which you don't want to index,
152pass in C<NULL> instead of a real key. Code implementing these methods should
153understand C<PMC* foo, PMC* NULL> as meaning the entirety of C<foo> in some
154sense; this is trivial to understand if C<foo> is non-aggregate, and
155implementation-defined if C<foo> is aggregate. If you remember that a key PMC
156is really a linked list, you'll notice that after traversing down through the
157list, you'll reach a C<NULL> which again means the entirety of whatever object
158you traversed to.
159
160Similarly, non-C<_keyed> methods on aggregates are implementation defined; for
161instance, a C<set_integer> on a C<PerlArray> may be understood as setting
162C<@array.length>.
163
164Historically, we first implemented keys as two separate keyed methods per
165applicable method - C<..._index> and C<..._index_s> for integer and string
166indexing respectively. However, this didn't give us the flexibility and
167scalability that key structures give us.
168
169=head3 Input to the assembler
170
171There are several different valid specifications of an aggregate key to the
172assembler. These are:
173
174    op arg, P1[1234]  # Constant integer key
175    op arg, P1[I1]    # Integer key
176
177    op arg, P1[12.34] # Constant number key - handled as constant key
178    op arg, P1["foo"] # Constant string key - handled as constant key
179    op arg, P1[I1;I2] # Multi-level key - handled as constant key
180
181    op arg, P1[P1]    # Register key
182
183(Rationale: fits programmer's expectation, easier to understand at a glance
184than C<op P1, P2, P3>. Also, is C<op P1, P2, P3> the same as C<op P1[P2], P3>
185or C<op P1, P2[P3]>, or are these three separate PMCs?)
186
187In all there are four types of key. The first two are integer keys and
188constant integer keys which are optimisations for the common case of single
189level integer keys.
190
191The other two are constant keys, which can handle any combination of constants
192and registers with any number of levels; and register keys which are
193represented by a single PMC register that is assumed to  contain a PMC of the
194Key class.
195
196=head3 What the assembler did next
197
198When the assembler sees an aggregate key, it "detaches" the key to form a
199separate argument. It then decides on the type of key. For  integer keys (both
200constant and register) the data is encoded in the same way as an ordinary
201integer argument. For register keys the data is encoded as for an ordinary PMC
202register argument, while for constant keys a key constant is generated that
203encodes the list of constants and registers that make up the key and an
204appropriate index into the constant table is encoded as the argument.
205
206Next it selects the appropriate op. Register keys have the signature C<k> and
207constant keys have the signature C<kc>. Integer register and constant keys are
208encoded as C<ki> and C<kic> respectively.
209
210=begin PIR_FRAGMENT
211
212    set $P1["hi"], 1234
213
214=end PIR_FRAGMENT
215
216finds an op named C<set_p_kc_i>. On the other hand,
217
218=begin PIR_FRAGMENT
219
220    set $P1[$P1], 1234
221
222=end PIR_FRAGMENT
223
224produces an op named C<set_p_k_i>. Likewise, this:
225
226=begin PIR_FRAGMENT
227
228    set $P1[1], 1234
229
230=end PIR_FRAGMENT
231
232produces an op named C<set_p_kic>, and this:
233
234=begin PIR_FRAGMENT
235
236    set $P1[$I1], 1234
237
238=end PIR_FRAGMENT
239
240produces an op named C<set_p_ki>.
241
242=head3 Bytecode representation
243
244The bytecode representation of these keys are as follows: constant keys are
245treated just like another constant, and are an index into the packfile's
246constant table.
247
248Each key in that constant table consists of one word specifying its length in
249terms of number of keys. For instance, C<["hi"]> has length 1;
250C<["hi";P1;S1;123]> has length 4. Next, each key is specified using two words.
251The first word is a type specifier:
252
253    1 - Integer constant
254    2 - Number constant
255    4 - String constant
256    7 - Integer register
257    8 - Number register
258    9 - PMC register
259   10 - String register
260
261and the second word is either a value (for integer constants), a register
262number (for registers) or an index into the appropriate constant table.
263
264The type values shown above are actually the C<PARROT_ARG_*> values taken from
265F<include/parrot/op.h>.
266
267=head2 References
268
269None.
270
271=cut
272
273__END__
274Local Variables:
275  fill-column:78
276End:
277vim: expandtab shiftwidth=4:
278