1@c Copyright (C) 2002-2021 Free Software Foundation, Inc.
2@c This is part of the GCC manual.
3@c For copying conditions, see the file gcc.texi.
4
5@node Type Information
6@chapter Memory Management and Type Information
7@cindex GGC
8@findex GTY
9
10GCC uses some fairly sophisticated memory management techniques, which
11involve determining information about GCC's data structures from GCC's
12source code and using this information to perform garbage collection and
13implement precompiled headers.
14
15A full C++ parser would be too complicated for this task, so a limited
16subset of C++ is interpreted and special markers are used to determine
17what parts of the source to look at.  All @code{struct}, @code{union}
18and @code{template} structure declarations that define data structures
19that are allocated under control of the garbage collector must be
20marked.  All global variables that hold pointers to garbage-collected
21memory must also be marked.  Finally, all global variables that need
22to be saved and restored by a precompiled header must be marked.  (The
23precompiled header mechanism can only save static variables if they're
24scalar. Complex data structures must be allocated in garbage-collected
25memory to be saved in a precompiled header.)
26
27The full format of a marker is
28@smallexample
29GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{}))
30@end smallexample
31@noindent
32but in most cases no options are needed.  The outer double parentheses
33are still necessary, though: @code{GTY(())}.  Markers can appear:
34
35@itemize @bullet
36@item
37In a structure definition, before the open brace;
38@item
39In a global variable declaration, after the keyword @code{static} or
40@code{extern}; and
41@item
42In a structure field definition, before the name of the field.
43@end itemize
44
45Here are some examples of marking simple data structures and globals.
46
47@smallexample
48struct GTY(()) @var{tag}
49@{
50  @var{fields}@dots{}
51@};
52
53typedef struct GTY(()) @var{tag}
54@{
55  @var{fields}@dots{}
56@} *@var{typename};
57
58static GTY(()) struct @var{tag} *@var{list};   /* @r{points to GC memory} */
59static GTY(()) int @var{counter};        /* @r{save counter in a PCH} */
60@end smallexample
61
62The parser understands simple typedefs such as
63@code{typedef struct @var{tag} *@var{name};} and
64@code{typedef int @var{name};}.
65These don't need to be marked.
66
67Since @code{gengtype}'s understanding of C++ is limited, there are
68several constructs and declarations that are not supported inside
69classes/structures marked for automatic GC code generation.  The
70following C++ constructs produce a @code{gengtype} error on
71structures/classes marked for automatic GC code generation:
72
73@itemize @bullet
74@item
75Type definitions inside classes/structures are not supported.
76@item
77Enumerations inside classes/structures are not supported.
78@end itemize
79
80If you have a class or structure using any of the above constructs,
81you need to mark that class as @code{GTY ((user))} and provide your
82own marking routines (see section @ref{User GC} for details).
83
84It is always valid to include function definitions inside classes.
85Those are always ignored by @code{gengtype}, as it only cares about
86data members.
87
88@menu
89* GTY Options::         What goes inside a @code{GTY(())}.
90* Inheritance and GTY:: Adding GTY to a class hierarchy.
91* User GC::		Adding user-provided GC marking routines.
92* GGC Roots::           Making global variables GGC roots.
93* Files::               How the generated files work.
94* Invoking the garbage collector::   How to invoke the garbage collector.
95* Troubleshooting::     When something does not work as expected.
96@end menu
97
98@node GTY Options
99@section The Inside of a @code{GTY(())}
100
101Sometimes the C code is not enough to fully describe the type
102structure.  Extra information can be provided with @code{GTY} options
103and additional markers.  Some options take a parameter, which may be
104either a string or a type name, depending on the parameter.  If an
105option takes no parameter, it is acceptable either to omit the
106parameter entirely, or to provide an empty string as a parameter.  For
107example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are
108equivalent.
109
110When the parameter is a string, often it is a fragment of C code.  Four
111special escapes may be used in these strings, to refer to pieces of
112the data structure being marked:
113
114@cindex % in GTY option
115@table @code
116@item %h
117The current structure.
118@item %1
119The structure that immediately contains the current structure.
120@item %0
121The outermost structure that contains the current structure.
122@item %a
123A partial expression of the form @code{[i1][i2]@dots{}} that indexes
124the array item currently being marked.
125@end table
126
127For instance, suppose that you have a structure of the form
128@smallexample
129struct A @{
130  @dots{}
131@};
132struct B @{
133  struct A foo[12];
134@};
135@end smallexample
136@noindent
137and @code{b} is a variable of type @code{struct B}.  When marking
138@samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]},
139@code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a}
140would expand to @samp{[11]}.
141
142As in ordinary C, adjacent strings will be concatenated; this is
143helpful when you have a complicated expression.
144@smallexample
145@group
146GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE"
147                  " ? TYPE_NEXT_VARIANT (&%h.generic)"
148                  " : TREE_CHAIN (&%h.generic)")))
149@end group
150@end smallexample
151
152The available options are:
153
154@table @code
155@findex length
156@item length ("@var{expression}")
157
158There are two places the type machinery will need to be explicitly told
159the length of an array of non-atomic objects.  The first case is when a
160structure ends in a variable-length array, like this:
161@smallexample
162struct GTY(()) rtvec_def @{
163  int num_elem;         /* @r{number of elements} */
164  rtx GTY ((length ("%h.num_elem"))) elem[1];
165@};
166@end smallexample
167
168In this case, the @code{length} option is used to override the specified
169array length (which should usually be @code{1}).  The parameter of the
170option is a fragment of C code that calculates the length.
171
172The second case is when a structure or a global variable contains a
173pointer to an array, like this:
174@smallexample
175struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter;
176@end smallexample
177In this case, @code{iter} has been allocated by writing something like
178@smallexample
179  x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse);
180@end smallexample
181and the @code{collapse} provides the length of the field.
182
183This second use of @code{length} also works on global variables, like:
184@verbatim
185static GTY((length("reg_known_value_size"))) rtx *reg_known_value;
186@end verbatim
187
188Note that the @code{length} option is only meant for use with arrays of
189non-atomic objects, that is, objects that contain pointers pointing to
190other GTY-managed objects.  For other GC-allocated arrays and strings
191you should use @code{atomic}.
192
193@findex skip
194@item skip
195
196If @code{skip} is applied to a field, the type machinery will ignore it.
197This is somewhat dangerous; the only safe use is in a union when one
198field really isn't ever used.
199
200@findex for_user
201@item for_user
202
203Use this to mark types that need to be marked by user gc routines, but are not
204refered to in a template argument.  So if you have some user gc type T1 and a
205non user gc type T2 you can give T2 the for_user option so that the marking
206functions for T1 can call non mangled functions to mark T2.
207
208@findex desc
209@findex tag
210@findex default
211@item desc ("@var{expression}")
212@itemx tag ("@var{constant}")
213@itemx default
214
215The type machinery needs to be told which field of a @code{union} is
216currently active.  This is done by giving each field a constant
217@code{tag} value, and then specifying a discriminator using @code{desc}.
218The value of the expression given by @code{desc} is compared against
219each @code{tag} value, each of which should be different.  If no
220@code{tag} is matched, the field marked with @code{default} is used if
221there is one, otherwise no field in the union will be marked.
222
223In the @code{desc} option, the ``current structure'' is the union that
224it discriminates.  Use @code{%1} to mean the structure containing it.
225There are no escapes available to the @code{tag} option, since it is a
226constant.
227
228For example,
229@smallexample
230struct GTY(()) tree_binding
231@{
232  struct tree_common common;
233  union tree_binding_u @{
234    tree GTY ((tag ("0"))) scope;
235    struct cp_binding_level * GTY ((tag ("1"))) level;
236  @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope;
237  tree value;
238@};
239@end smallexample
240
241In this example, the value of BINDING_HAS_LEVEL_P when applied to a
242@code{struct tree_binding *} is presumed to be 0 or 1.  If 1, the type
243mechanism will treat the field @code{level} as being present and if 0,
244will treat the field @code{scope} as being present.
245
246The @code{desc} and @code{tag} options can also be used for inheritance
247to denote which subclass an instance is.  See @ref{Inheritance and GTY}
248for more information.
249
250@findex cache
251@item cache
252
253When the @code{cache} option is applied to a global variable gt_cleare_cache is
254called on that variable between the mark and sweep phases of garbage
255collection.  The gt_clear_cache function is free to mark blocks as used, or to
256clear pointers in the variable.
257
258@findex deletable
259@item deletable
260
261@code{deletable}, when applied to a global variable, indicates that when
262garbage collection runs, there's no need to mark anything pointed to
263by this variable, it can just be set to @code{NULL} instead.  This is used
264to keep a list of free structures around for re-use.
265
266@findex maybe_undef
267@item maybe_undef
268
269When applied to a field, @code{maybe_undef} indicates that it's OK if
270the structure that this fields points to is never defined, so long as
271this field is always @code{NULL}.  This is used to avoid requiring
272backends to define certain optional structures.  It doesn't work with
273language frontends.
274
275@findex nested_ptr
276@item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}")
277
278The type machinery expects all pointers to point to the start of an
279object.  Sometimes for abstraction purposes it's convenient to have
280a pointer which points inside an object.  So long as it's possible to
281convert the original object to and from the pointer, such pointers
282can still be used.  @var{type} is the type of the original object,
283the @var{to expression} returns the pointer given the original object,
284and the @var{from expression} returns the original object given
285the pointer.  The pointer will be available using the @code{%h}
286escape.
287
288@findex chain_next
289@findex chain_prev
290@findex chain_circular
291@item chain_next ("@var{expression}")
292@itemx chain_prev ("@var{expression}")
293@itemx chain_circular ("@var{expression}")
294
295It's helpful for the type machinery to know if objects are often
296chained together in long lists; this lets it generate code that uses
297less stack space by iterating along the list instead of recursing down
298it.  @code{chain_next} is an expression for the next item in the list,
299@code{chain_prev} is an expression for the previous item.  For singly
300linked lists, use only @code{chain_next}; for doubly linked lists, use
301both.  The machinery requires that taking the next item of the
302previous item gives the original item.  @code{chain_circular} is similar
303to @code{chain_next}, but can be used for circular single linked lists.
304
305@findex reorder
306@item reorder ("@var{function name}")
307
308Some data structures depend on the relative ordering of pointers.  If
309the precompiled header machinery needs to change that ordering, it
310will call the function referenced by the @code{reorder} option, before
311changing the pointers in the object that's pointed to by the field the
312option applies to.  The function must take four arguments, with the
313signature @samp{@w{void *, void *, gt_pointer_operator, void *}}.
314The first parameter is a pointer to the structure that contains the
315object being updated, or the object itself if there is no containing
316structure.  The second parameter is a cookie that should be ignored.
317The third parameter is a routine that, given a pointer, will update it
318to its correct new value.  The fourth parameter is a cookie that must
319be passed to the second parameter.
320
321PCH cannot handle data structures that depend on the absolute values
322of pointers.  @code{reorder} functions can be expensive.  When
323possible, it is better to depend on properties of the data, like an ID
324number or the hash of a string instead.
325
326@findex atomic
327@item atomic
328
329The @code{atomic} option can only be used with pointers.  It informs
330the GC machinery that the memory that the pointer points to does not
331contain any pointers, and hence it should be treated by the GC and PCH
332machinery as an ``atomic'' block of memory that does not need to be
333examined when scanning memory for pointers.  In particular, the
334machinery will not scan that memory for pointers to mark them as
335reachable (when marking pointers for GC) or to relocate them (when
336writing a PCH file).
337
338The @code{atomic} option differs from the @code{skip} option.
339@code{atomic} keeps the memory under Garbage Collection, but makes the
340GC ignore the contents of the memory.  @code{skip} is more drastic in
341that it causes the pointer and the memory to be completely ignored by
342the Garbage Collector.  So, memory marked as @code{atomic} is
343automatically freed when no longer reachable, while memory marked as
344@code{skip} is not.
345
346The @code{atomic} option must be used with great care, because all
347sorts of problem can occur if used incorrectly, that is, if the memory
348the pointer points to does actually contain a pointer.
349
350Here is an example of how to use it:
351@smallexample
352struct GTY(()) my_struct @{
353  int number_of_elements;
354  unsigned int * GTY ((atomic)) elements;
355@};
356@end smallexample
357In this case, @code{elements} is a pointer under GC, and the memory it
358points to needs to be allocated using the Garbage Collector, and will
359be freed automatically by the Garbage Collector when it is no longer
360referenced.  But the memory that the pointer points to is an array of
361@code{unsigned int} elements, and the GC must not try to scan it to
362find pointers to mark or relocate, which is why it is marked with the
363@code{atomic} option.
364
365Note that, currently, global variables cannot be marked with
366@code{atomic}; only fields of a struct can.  This is a known
367limitation.  It would be useful to be able to mark global pointers
368with @code{atomic} to make the PCH machinery aware of them so that
369they are saved and restored correctly to PCH files.
370
371@findex special
372@item special ("@var{name}")
373
374The @code{special} option is used to mark types that have to be dealt
375with by special case machinery.  The parameter is the name of the
376special case.  See @file{gengtype.c} for further details.  Avoid
377adding new special cases unless there is no other alternative.
378
379@findex user
380@item user
381
382The @code{user} option indicates that the code to mark structure
383fields is completely handled by user-provided routines.  See section
384@ref{User GC} for details on what functions need to be provided.
385@end table
386
387@node Inheritance and GTY
388@section Support for inheritance
389gengtype has some support for simple class hierarchies.  You can use
390this to have gengtype autogenerate marking routines, provided:
391
392@itemize @bullet
393@item
394There must be a concrete base class, with a discriminator expression
395that can be used to identify which subclass an instance is.
396@item
397Only single inheritance is used.
398@item
399None of the classes within the hierarchy are templates.
400@end itemize
401
402If your class hierarchy does not fit in this pattern, you must use
403@ref{User GC} instead.
404
405The base class and its discriminator must be identified using the ``desc''
406option.  Each concrete subclass must use the ``tag'' option to identify
407which value of the discriminator it corresponds to.
408
409Every class in the hierarchy must have a @code{GTY(())} marker, as
410gengtype will only attempt to parse classes that have such a marker
411@footnote{Classes lacking such a marker will not be identified as being
412part of the hierarchy, and so the marking routines will not handle them,
413leading to a assertion failure within the marking routines due to an
414unknown tag value (assuming that assertions are enabled).}.
415
416@smallexample
417class GTY((desc("%h.kind"), tag("0"))) example_base
418@{
419public:
420    int kind;
421    tree a;
422@};
423
424class GTY((tag("1"))) some_subclass : public example_base
425@{
426public:
427    tree b;
428@};
429
430class GTY((tag("2"))) some_other_subclass : public example_base
431@{
432public:
433    tree c;
434@};
435@end smallexample
436
437The generated marking routines for the above will contain a ``switch''
438on ``kind'', visiting all appropriate fields.  For example, if kind is
4392, it will cast to ``some_other_subclass'' and visit fields a, b, and c.
440
441@node User GC
442@section Support for user-provided GC marking routines
443@cindex user gc
444The garbage collector supports types for which no automatic marking
445code is generated.  For these types, the user is required to provide
446three functions: one to act as a marker for garbage collection, and
447two functions to act as marker and pointer walker for pre-compiled
448headers.
449
450Given a structure @code{struct GTY((user)) my_struct}, the following functions
451should be defined to mark @code{my_struct}:
452
453@smallexample
454void gt_ggc_mx (my_struct *p)
455@{
456  /* This marks field 'fld'.  */
457  gt_ggc_mx (p->fld);
458@}
459
460void gt_pch_nx (my_struct *p)
461@{
462  /* This marks field 'fld'.  */
463  gt_pch_nx (tp->fld);
464@}
465
466void gt_pch_nx (my_struct *p, gt_pointer_operator op, void *cookie)
467@{
468  /* For every field 'fld', call the given pointer operator.  */
469  op (&(tp->fld), cookie);
470@}
471@end smallexample
472
473In general, each marker @code{M} should call @code{M} for every
474pointer field in the structure.  Fields that are not allocated in GC
475or are not pointers must be ignored.
476
477For embedded lists (e.g., structures with a @code{next} or @code{prev}
478pointer), the marker must follow the chain and mark every element in
479it.
480
481Note that the rules for the pointer walker @code{gt_pch_nx (my_struct
482*, gt_pointer_operator, void *)} are slightly different.  In this
483case, the operation @code{op} must be applied to the @emph{address} of
484every pointer field.
485
486@subsection User-provided marking routines for template types
487When a template type @code{TP} is marked with @code{GTY}, all
488instances of that type are considered user-provided types.  This means
489that the individual instances of @code{TP} do not need to be marked
490with @code{GTY}.  The user needs to provide template functions to mark
491all the fields of the type.
492
493The following code snippets represent all the functions that need to
494be provided. Note that type @code{TP} may reference to more than one
495type. In these snippets, there is only one type @code{T}, but there
496could be more.
497
498@smallexample
499template<typename T>
500void gt_ggc_mx (TP<T> *tp)
501@{
502  extern void gt_ggc_mx (T&);
503
504  /* This marks field 'fld' of type 'T'.  */
505  gt_ggc_mx (tp->fld);
506@}
507
508template<typename T>
509void gt_pch_nx (TP<T> *tp)
510@{
511  extern void gt_pch_nx (T&);
512
513  /* This marks field 'fld' of type 'T'.  */
514  gt_pch_nx (tp->fld);
515@}
516
517template<typename T>
518void gt_pch_nx (TP<T *> *tp, gt_pointer_operator op, void *cookie)
519@{
520  /* For every field 'fld' of 'tp' with type 'T *', call the given
521     pointer operator.  */
522  op (&(tp->fld), cookie);
523@}
524
525template<typename T>
526void gt_pch_nx (TP<T> *tp, gt_pointer_operator, void *cookie)
527@{
528  extern void gt_pch_nx (T *, gt_pointer_operator, void *);
529
530  /* For every field 'fld' of 'tp' with type 'T', call the pointer
531     walker for all the fields of T.  */
532  gt_pch_nx (&(tp->fld), op, cookie);
533@}
534@end smallexample
535
536Support for user-defined types is currently limited. The following
537restrictions apply:
538
539@enumerate
540@item Type @code{TP} and all the argument types @code{T} must be
541marked with @code{GTY}.
542
543@item Type @code{TP} can only have type names in its argument list.
544
545@item The pointer walker functions are different for @code{TP<T>} and
546@code{TP<T *>}. In the case of @code{TP<T>}, references to
547@code{T} must be handled by calling @code{gt_pch_nx} (which
548will, in turn, walk all the pointers inside fields of @code{T}).
549In the case of @code{TP<T *>}, references to @code{T *} must be
550handled by calling the @code{op} function on the address of the
551pointer (see the code snippets above).
552@end enumerate
553
554@node GGC Roots
555@section Marking Roots for the Garbage Collector
556@cindex roots, marking
557@cindex marking roots
558
559In addition to keeping track of types, the type machinery also locates
560the global variables (@dfn{roots}) that the garbage collector starts
561at.  Roots must be declared using one of the following syntaxes:
562
563@itemize @bullet
564@item
565@code{extern GTY(([@var{options}])) @var{type} @var{name};}
566@item
567@code{static GTY(([@var{options}])) @var{type} @var{name};}
568@end itemize
569@noindent
570The syntax
571@itemize @bullet
572@item
573@code{GTY(([@var{options}])) @var{type} @var{name};}
574@end itemize
575@noindent
576is @emph{not} accepted.  There should be an @code{extern} declaration
577of such a variable in a header somewhere---mark that, not the
578definition.  Or, if the variable is only used in one file, make it
579@code{static}.
580
581@node Files
582@section Source Files Containing Type Information
583@cindex generated files
584@cindex files, generated
585
586Whenever you add @code{GTY} markers to a source file that previously
587had none, or create a new source file containing @code{GTY} markers,
588there are three things you need to do:
589
590@enumerate
591@item
592You need to add the file to the list of source files the type
593machinery scans.  There are four cases:
594
595@enumerate a
596@item
597For a back-end file, this is usually done
598automatically; if not, you should add it to @code{target_gtfiles} in
599the appropriate port's entries in @file{config.gcc}.
600
601@item
602For files shared by all front ends, add the filename to the
603@code{GTFILES} variable in @file{Makefile.in}.
604
605@item
606For files that are part of one front end, add the filename to the
607@code{gtfiles} variable defined in the appropriate
608@file{config-lang.in}.
609Headers should appear before non-headers in this list.
610
611@item
612For files that are part of some but not all front ends, add the
613filename to the @code{gtfiles} variable of @emph{all} the front ends
614that use it.
615@end enumerate
616
617@item
618If the file was a header file, you'll need to check that it's included
619in the right place to be visible to the generated files.  For a back-end
620header file, this should be done automatically.  For a front-end header
621file, it needs to be included by the same file that includes
622@file{gtype-@var{lang}.h}.  For other header files, it needs to be
623included in @file{gtype-desc.c}, which is a generated file, so add it to
624@code{ifiles} in @code{open_base_file} in @file{gengtype.c}.
625
626For source files that aren't header files, the machinery will generate a
627header file that should be included in the source file you just changed.
628The file will be called @file{gt-@var{path}.h} where @var{path} is the
629pathname relative to the @file{gcc} directory with slashes replaced by
630@verb{|-|}, so for example the header file to be included in
631@file{cp/parser.c} is called @file{gt-cp-parser.c}.  The
632generated header file should be included after everything else in the
633source file.  Don't forget to mention this file as a dependency in the
634@file{Makefile}!
635
636@end enumerate
637
638For language frontends, there is another file that needs to be included
639somewhere.  It will be called @file{gtype-@var{lang}.h}, where
640@var{lang} is the name of the subdirectory the language is contained in.
641
642Plugins can add additional root tables.  Run the @code{gengtype}
643utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir}
644@var{file-list} @var{plugin*.c}} with your plugin files
645@var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file.
646The GCC build tree is needed to be present in that mode.
647
648
649@node Invoking the garbage collector
650@section How to invoke the garbage collector
651@cindex garbage collector, invocation
652@findex ggc_collect
653
654The GCC garbage collector GGC is only invoked explicitly. In contrast
655with many other garbage collectors, it is not implicitly invoked by
656allocation routines when a lot of memory has been consumed. So the
657only way to have GGC reclaim storage is to call the @code{ggc_collect}
658function explicitly.  This call is an expensive operation, as it may
659have to scan the entire heap.  Beware that local variables (on the GCC
660call stack) are not followed by such an invocation (as many other
661garbage collectors do): you should reference all your data from static
662or external @code{GTY}-ed variables, and it is advised to call
663@code{ggc_collect} with a shallow call stack.  The GGC is an exact mark
664and sweep garbage collector (so it does not scan the call stack for
665pointers).  In practice GCC passes don't often call @code{ggc_collect}
666themselves, because it is called by the pass manager between passes.
667
668At the time of the @code{ggc_collect} call all pointers in the GC-marked
669structures must be valid or @code{NULL}.  In practice this means that
670there should not be uninitialized pointer fields in the structures even
671if your code never reads or writes those fields at a particular
672instance.  One way to ensure this is to use cleared versions of
673allocators unless all the fields are initialized manually immediately
674after allocation.
675
676@node Troubleshooting
677@section Troubleshooting the garbage collector
678@cindex garbage collector, troubleshooting
679
680With the current garbage collector implementation, most issues should
681show up as GCC compilation errors.  Some of the most commonly
682encountered issues are described below.
683
684@itemize @bullet
685@item Gengtype does not produce allocators for a @code{GTY}-marked type.
686Gengtype checks if there is at least one possible path from GC roots to
687at least one instance of each type before outputting allocators.  If
688there is no such path, the @code{GTY} markers will be ignored and no
689allocators will be output.  Solve this by making sure that there exists
690at least one such path.  If creating it is unfeasible or raises a ``code
691smell'', consider if you really must use GC for allocating such type.
692
693@item Link-time errors about undefined @code{gt_ggc_r_foo_bar} and
694similarly-named symbols.  Check if your @file{foo_bar} source file has
695@code{#include "gt-foo_bar.h"} as its very last line.
696
697@end itemize
698