xref: /openbsd/gnu/usr.bin/perl/pod/perlclassguts.pod (revision f2a19305)
1=head1 NAME
2
3perlclassguts - Internals of how C<feature 'class'> and class syntax works
4
5=head1 DESCRIPTION
6
7This document provides in-depth information about the way in which the perl
8interpreter implements the C<feature 'class'> syntax and overall behaviour.
9It is not intended as an end-user guide on how to use the feature. For that,
10see L<perlclass>.
11
12The reader is assumed to be generally familiar with the perl interpreter
13internals overall. For a more general overview of these details, see also
14L<perlguts>.
15
16=head1 DATA STORAGE
17
18=head2 Classes
19
20A class is fundamentally a package, and exists in the symbol table as an HV
21with an aux structure in exactly the same way as a non-class package. It is
22distinguished from a non-class package by the fact that the
23C<HvSTASH_IS_CLASS()> macro will return true on it.
24
25Extra information relating to it being a class is stored in the
26C<struct xpvhv_aux> structure attached to the stash, in the following fields:
27
28    HV          *xhv_class_superclass;
29    CV          *xhv_class_initfields_cv;
30    AV          *xhv_class_adjust_blocks;
31    PADNAMELIST *xhv_class_fields;
32    PADOFFSET    xhv_class_next_fieldix;
33    HV          *xhv_class_param_map;
34
35=over 4
36
37=item *
38
39C<xhv_class_superclass> will be C<NULL> for a class with no superclass. It
40will point directly to the stash of the parent class if one has been set with
41the C<:isa()> class attribute.
42
43=item *
44
45C<xhv_class_initfields_cv> will contain a C<CV *> pointing to a function to be
46invoked as part of the constructor of this class or any subclass thereof. This
47CV is responsible for initializing all the fields defined by this class for a
48new instance. This CV will be an anonymous real function - i.e. while it has no
49name and no GV, it is I<not> a protosub and may be directly invoked.
50
51=item *
52
53C<xhv_class_adjust_blocks> may point to an AV containing CV pointers to each of
54the C<ADJUST> blocks defined on the class. If the class has a superclass, this
55array will additionally contain duplicate pointers of the CVs of its parent
56class. The AV is created lazily the first time an element is pushed to it; it
57is valid for there not to be one, and this pointer will be C<NULL> in that
58case.
59
60The CVs are stored directly, not via RVs. Each CV will be an anonymous real
61function.
62
63=item *
64
65C<xhv_class_fields> will point to a C<PADNAMELIST> containing C<PADNAME>s,
66each being one defined field of the class. They are stored in order of
67declaration. Note however, that the index into this array will not necessarily
68be equal to the C<fieldix> of each field, because in the case of a subclass,
69the array will begin at zero but the index of the first field in it will be
70non-zero if its parent class contains any fields at all.
71
72For more information on how individual fields are represented, see L</Fields>.
73
74=item *
75
76C<xhv_class_next_fieldix> gives the field index that will be assigned to the
77next field to be added to the class. It is only useful at compile-time.
78
79=item *
80
81C<xhv_class_param_map> may point to an HV which maps field C<:param> attribute
82names to the field index of the field with that name. This mapping is copied
83from parent classes; each class will contain the sum total of all its parents
84in addition to its own.
85
86=back
87
88=head2 Fields
89
90A field is still fundamentally a lexical variable declared in a scope, and
91exists in the C<PADNAMELIST> of its corresponding CV. Methods and other
92method-like CVs can still capture them exactly as they can with regular
93lexicals. A field is distinguished from other kinds of pad entry in that the
94C<PadnameIsFIELD()> macro will return true on it.
95
96Extra information relating to it being a field is stored in an additional
97structure accessible via the C<PadnameFIELDINFO()> macro on the padname. This
98structure has the following fields:
99
100    PADOFFSET  fieldix;
101    HV        *fieldstash;
102    OP        *defop;
103    SV        *paramname;
104    bool       def_if_undef;
105    bool       def_if_false;
106
107=over 4
108
109=item *
110
111C<fieldix> stores the "field index" of the field; that is, the index into the
112instance field array where this field's value will be stored. Note that the
113first index in the array is not specially reserved. The first field in a class
114will start from field index 0.
115
116=item *
117
118C<fieldstash> stores a pointer to the stash of the class that defined this
119field. This is necessary in case there are multiple classes defined within the
120same scope; it is used to disambiguate the fields of each.
121
122    {
123        class C1; field $x;
124        class C2; field $x;
125    }
126
127=item *
128
129C<defop> may store a pointer to a defaulting expression optree for this field.
130Defaulting expressions are optional; this field may be C<NULL>.
131
132=item *
133
134C<paramname> may point to a regular string SV containing the C<:param> name
135attribute given to the field. If none, it will be C<NULL>.
136
137=item *
138
139One of C<def_if_undef> and C<def_if_false> will be true if the defaulting
140expression was set using the C<//=> or C<||=> operators respectively.
141
142=back
143
144=head2 Methods
145
146A method is still fundamentally a CV, and has the same basic representation as
147one. It has an optree and a pad, and is stored via a GV in the stash of its
148containing package. It is distinguished from a non-method CV by the fact that
149the C<CvIsMETHOD()> macro will return true on it.
150
151(Note: This macro should not be confused with the one that was previously
152called C<CvMETHOD()>. That one does not relate to the class system, and was
153renamed to C<CvNOWARN_AMBIGUOUS()> to avoid this confusion.)
154
155There is currently no extra information that needs to be stored about a method
156CV, so the structure does not add any new fields.
157
158=head2 Instances
159
160Object instances are represented by an entirely new SV type, whose base type
161is C<SVt_PVOBJ>. This should still be blessed into its class stash and wrapped
162in an RV in the usual manner for classical object.
163
164As these are their own unique container type, distinct from hashes or arrays,
165the core C<builtin::reftype> function returns a new value when asked about
166these. That value is C<"OBJECT">.
167
168Internally, such an object is an array of SV pointers whose size is fixed at
169creation time (because the number of fields in a class is known after
170compilation). An object instance stores the max field index within it (for
171basic error-checking on access), and a fixed-size array of SV pointers storing
172the individual field values.
173
174Fields of array and hash type directly store AV or HV pointers into the array;
175they are not stored via an intervening RV.
176
177=head1 API
178
179The data structures described above are supported by the following API
180functions.
181
182=head2 Class Manipulation
183
184=head3 class_setup_stash
185
186    void class_setup_stash(HV *stash);
187
188Called by the parser on encountering the C<class> keyword. It upgrades the
189stash into being a class and prepares it for receiving class-specific items
190like methods and fields.
191
192=head3 class_seal_stash
193
194    void class_seal_stash(HV *stash);
195
196Called by the parser at the end of a C<class> block, or for unit classes its
197containing scope. This function performs various finalisation activities that
198are required before instances of the class can be constructed, but could not
199have been done until all the information about the members of the class is
200known.
201
202Any additions to or modifications of the class under compilation must be
203performed between these two function calls. Classes cannot be modified once
204they have been sealed.
205
206=head3 class_add_field
207
208    void class_add_field(HV *stash, PADNAME *pn);
209
210Called by F<pad.c> as part of defining a new field name in the current pad.
211Note that this function does I<not> create the padname; that must already be
212done by F<pad.c>. This API function simply informs the class that the new
213field name has been created and is now available for it.
214
215=head3 class_add_ADJUST
216
217    void class_add_ADJUST(HV *stash, CV *cv);
218
219Called by the parser once it has parsed and constructed a CV for a new
220C<ADJUST> block. This gets added to the list stored by the class.
221
222=head2 Field Manipulation
223
224=head3 class_prepare_initfield_parse
225
226    void class_prepare_initfield_parse();
227
228Called by the parser just before parsing an initializing expression for a
229field variable. This makes use of a suspended compcv to combine all the field
230initializing expressions into the same CV.
231
232=head3 class_set_field_defop
233
234    void class_set_field_defop(PADNAME *pn, OPCODE defmode, OP *defop);
235
236Called by the parser after it has parsed an initializing expression for the
237field. Sets the defaulting expression and mode of application. C<defmode>
238should either be zero, or one of C<OP_ORASSIGN> or C<OP_DORASSIGN> depending
239on the defaulting mode.
240
241=head3 padadd_FIELD
242
243    #define padadd_FIELD
244
245This flag constant tells the C<pad_add_name_*> family of functions that the
246new name should be added as a field. There is no need to call
247C<class_add_field()>; this will be done automatically.
248
249=head2 Method Manipulation
250
251=head3 class_prepare_method_parse
252
253    void class_prepare_method_parse(CV *cv);
254
255Called by the parser after C<start_subparse()> but immediately before doing
256anything else. This prepares the C<PL_compcv> for parsing a method; arranging
257for the C<CvIsMETHOD> test to be true, adding the C<$self> lexical, and any
258other activities that may be required.
259
260=head3 class_wrap_method_body
261
262    OP *class_wrap_method_body(OP *o);
263
264Called by the parser at the end of parsing a method body into an optree but
265just before wrapping it in the eventual CV. This function inserts extra ops
266into the optree to make the method work correctly.
267
268=head2 Object Instances
269
270=head3 SVt_PVOBJ
271
272    #define SVt_PVOBJ
273
274An SV type constant used for comparison with the C<SvTYPE()> macro.
275
276=head3 ObjectMAXFIELD
277
278    SSize_t ObjectMAXFIELD(sv);
279
280A function-like macro that obtains the maximum valid field index that can be
281accessed from the C<ObjectFIELDS> array.
282
283=head3 ObjectFIELDS
284
285    SV **ObjectFIELDS(sv);
286
287A function-like macro that obtains the fields array directly out of an object
288instance. Fields can be accessed by their field index, from 0 up to the maximum
289valid index given by C<ObjectMAXFIELD>.
290
291=head1 OPCODES
292
293=head2 OP_METHSTART
294
295    newUNOP_AUX(OP_METHSTART, ...);
296
297An C<OP_METHSTART> is an C<UNOP_AUX> which must be present at the start of a
298method CV in order to make it work properly. This is inserted by
299C<class_wrap_method_body()>, and even appears before any optree fragment
300associated with signature argument checking or extraction.
301
302This op is responsible for shifting the value of C<$self> out of the arguments
303list and binding any field variables that the method requires access to into
304the pad. The AUX vector will contain details of the field/pad index pairings
305required.
306
307This op also performs sanity checking on the invocant value. It checks that it
308is definitely an object reference of a compatible class type. If not, an
309exception is thrown.
310
311If the C<op_private> field includes the C<OPpINITFIELDS> flag, this indicates
312that the op begins the special C<xhv_class_initfields_cv> CV. In this case it
313should additionally take the second value from the arguments list, which
314should be a plain HV pointer (I<directly>, not via RV). and bind it to the
315second pad slot, where the generated optree will expect to find it.
316
317=head2 OP_INITFIELD
318
319An C<OP_INITFIELD> is only invoked as part of the C<xhv_class_initfields_cv>
320CV during the construction phase of an instance. This is the time that the
321individual SVs that make up the mutable fields of the instance (including AVs
322and HVs) are actually assigned into the C<ObjectFIELDS> array. The
323C<OPpINITFIELD_AV> and C<OPpINITFIELD_HV> private flags indicate whether it is
324creating an AV or HV; if neither is set then an SV is created.
325
326If the op has the C<OPf_STACKED> flag it expects to find an initializing value
327on the stack. For SVs this is the topmost SV on the data stack. For AVs and
328HVs it expects a marked list.
329
330=head1 COMPILE-TIME BEHAVIOUR
331
332=head2 C<ADJUST> Phasers
333
334During compiletime, parsing of an C<ADJUST> phaser is handled in a
335fundamentally different way to the existing perl phasers (C<BEGIN>, etc...)
336
337Rather than taking the usual route, the tokenizer recognises that the
338C<ADJUST> keyword introduces a phaser block. The parser then parses the body
339of this block similarly to how it would parse an (anonymous) method body,
340creating a CV that has no name GV. This is then inserted directly into the
341class information by calling C<class_add_ADJUST>, entirely bypassing the
342symbol table.
343
344=head2 Attributes
345
346During compilation, attributes of both classes and fields are handled in a
347different way to existing perl attributes on subroutines and lexical
348variables.
349
350The parser still forms an C<OP_LIST> optree of C<OP_CONST> nodes, but these
351are passed to the C<class_apply_attributes> or C<class_apply_field_attributes>
352functions. Rather than using a class lookup for a method in the class being
353parsed, a fixed internal list of known attributes is used to find functions to
354apply the attribute to the class or field. In future this may support
355user-supplied extension attribute, though at present it only recognises ones
356defined by the core itself.
357
358=head2 Field Initializing Expressions
359
360During compilation, the parser makes use of a suspended compcv when parsing
361the defaulting expression for a field. All the expressions for all the fields
362in the class share the same suspended compcv, which is then compiled up into
363the same internal CV called by the constructor to initialize all the fields
364provided by that class.
365
366=head1 RUNTIME BEHAVIOUR
367
368=head2 Constructor
369
370The generated constructor for a class itself is an XSUB which performs three
371tasks in order: it creates the instance SV itself, invokes the field
372initializers, then invokes the ADJUST block CVs. The constructor for any class
373is always the same basic shape, regardless of whether the class has a
374superclass or not.
375
376The field initializers are collected into a generated optree-based CV called
377the field initializer CV. This is the CV which contains all the optree
378fragments for the field initializing expressions. When invoked, the field
379initializer CV might make a chained call to the superclass initializer if one
380exists, before invoking all of the individual field initialization ops. The
381field initializer CV is invoked with two items on the stack; being the
382instance SV and a direct HV containing the constructor parameters. Note
383carefully: this HV is passed I<directly>, not via an RV reference. This is
384permitted because both the caller and the callee are directly generated code
385and not arbitrary pure-perl subroutines.
386
387The ADJUST block CVs are all collected into a single flat list, merging all of
388the ones defined by the superclass as well. They are all invoked in order,
389after the field initializer CV.
390
391=head2 C<$self> Access During Methods
392
393When C<class_prepare_method_parse()> is called, it arranges that the pad of
394the new CV body will begin with a lexical called C<$self>. Because the pad
395should be freshly-created at this point, this will have the pad index of 1.
396The function checks this and aborts if that is not true.
397
398Because of this fact, code within the body of a method or method-like CV can
399reliably use pad index 1 to obtain the invocant reference. The C<OP_INITFIELD>
400opcode also relies on this fact.
401
402In similar fashion, during the C<xhv_class_initfields_cv> the next pad slot is
403relied on to store the constructor parameters HV, at pad index 2.
404
405=head1 AUTHORS
406
407Paul Evans
408
409=cut
410