1@c Copyright (C) 2002, 2003, 2004, 2007, 2008, 2009, 2010 2@c Free Software Foundation, Inc. 3@c This is part of the GCC manual. 4@c For copying conditions, see the file gcc.texi. 5 6@node Type Information 7@chapter Memory Management and Type Information 8@cindex GGC 9@findex GTY 10 11GCC uses some fairly sophisticated memory management techniques, which 12involve determining information about GCC's data structures from GCC's 13source code and using this information to perform garbage collection and 14implement precompiled headers. 15 16A full C parser would be too complicated for this task, so a limited 17subset of C is interpreted and special markers are used to determine 18what parts of the source to look at. All @code{struct} and 19@code{union} declarations that define data structures that are 20allocated under control of the garbage collector must be marked. All 21global variables that hold pointers to garbage-collected memory must 22also be marked. Finally, all global variables that need to be saved 23and restored by a precompiled header must be marked. (The precompiled 24header mechanism can only save static variables if they're scalar. 25Complex data structures must be allocated in garbage-collected memory 26to be saved in a precompiled header.) 27 28The full format of a marker is 29@smallexample 30GTY (([@var{option}] [(@var{param})], [@var{option}] [(@var{param})] @dots{})) 31@end smallexample 32@noindent 33but in most cases no options are needed. The outer double parentheses 34are still necessary, though: @code{GTY(())}. Markers can appear: 35 36@itemize @bullet 37@item 38In a structure definition, before the open brace; 39@item 40In a global variable declaration, after the keyword @code{static} or 41@code{extern}; and 42@item 43In a structure field definition, before the name of the field. 44@end itemize 45 46Here are some examples of marking simple data structures and globals. 47 48@smallexample 49struct GTY(()) @var{tag} 50@{ 51 @var{fields}@dots{} 52@}; 53 54typedef struct GTY(()) @var{tag} 55@{ 56 @var{fields}@dots{} 57@} *@var{typename}; 58 59static GTY(()) struct @var{tag} *@var{list}; /* @r{points to GC memory} */ 60static GTY(()) int @var{counter}; /* @r{save counter in a PCH} */ 61@end smallexample 62 63The parser understands simple typedefs such as 64@code{typedef struct @var{tag} *@var{name};} and 65@code{typedef int @var{name};}. 66These don't need to be marked. 67 68@menu 69* GTY Options:: What goes inside a @code{GTY(())}. 70* GGC Roots:: Making global variables GGC roots. 71* Files:: How the generated files work. 72* Invoking the garbage collector:: How to invoke the garbage collector. 73* Troubleshooting:: When something does not work as expected. 74@end menu 75 76@node GTY Options 77@section The Inside of a @code{GTY(())} 78 79Sometimes the C code is not enough to fully describe the type 80structure. Extra information can be provided with @code{GTY} options 81and additional markers. Some options take a parameter, which may be 82either a string or a type name, depending on the parameter. If an 83option takes no parameter, it is acceptable either to omit the 84parameter entirely, or to provide an empty string as a parameter. For 85example, @code{@w{GTY ((skip))}} and @code{@w{GTY ((skip ("")))}} are 86equivalent. 87 88When the parameter is a string, often it is a fragment of C code. Four 89special escapes may be used in these strings, to refer to pieces of 90the data structure being marked: 91 92@cindex % in GTY option 93@table @code 94@item %h 95The current structure. 96@item %1 97The structure that immediately contains the current structure. 98@item %0 99The outermost structure that contains the current structure. 100@item %a 101A partial expression of the form @code{[i1][i2]@dots{}} that indexes 102the array item currently being marked. 103@end table 104 105For instance, suppose that you have a structure of the form 106@smallexample 107struct A @{ 108 @dots{} 109@}; 110struct B @{ 111 struct A foo[12]; 112@}; 113@end smallexample 114@noindent 115and @code{b} is a variable of type @code{struct B}. When marking 116@samp{b.foo[11]}, @code{%h} would expand to @samp{b.foo[11]}, 117@code{%0} and @code{%1} would both expand to @samp{b}, and @code{%a} 118would expand to @samp{[11]}. 119 120As in ordinary C, adjacent strings will be concatenated; this is 121helpful when you have a complicated expression. 122@smallexample 123@group 124GTY ((chain_next ("TREE_CODE (&%h.generic) == INTEGER_TYPE" 125 " ? TYPE_NEXT_VARIANT (&%h.generic)" 126 " : TREE_CHAIN (&%h.generic)"))) 127@end group 128@end smallexample 129 130The available options are: 131 132@table @code 133@findex length 134@item length ("@var{expression}") 135 136There are two places the type machinery will need to be explicitly told 137the length of an array. The first case is when a structure ends in a 138variable-length array, like this: 139@smallexample 140struct GTY(()) rtvec_def @{ 141 int num_elem; /* @r{number of elements} */ 142 rtx GTY ((length ("%h.num_elem"))) elem[1]; 143@}; 144@end smallexample 145 146In this case, the @code{length} option is used to override the specified 147array length (which should usually be @code{1}). The parameter of the 148option is a fragment of C code that calculates the length. 149 150The second case is when a structure or a global variable contains a 151pointer to an array, like this: 152@smallexample 153struct gimple_omp_for_iter * GTY((length ("%h.collapse"))) iter; 154@end smallexample 155In this case, @code{iter} has been allocated by writing something like 156@smallexample 157 x->iter = ggc_alloc_cleared_vec_gimple_omp_for_iter (collapse); 158@end smallexample 159and the @code{collapse} provides the length of the field. 160 161This second use of @code{length} also works on global variables, like: 162@verbatim 163static GTY((length("reg_known_value_size"))) rtx *reg_known_value; 164@end verbatim 165 166@findex skip 167@item skip 168 169If @code{skip} is applied to a field, the type machinery will ignore it. 170This is somewhat dangerous; the only safe use is in a union when one 171field really isn't ever used. 172 173@findex desc 174@findex tag 175@findex default 176@item desc ("@var{expression}") 177@itemx tag ("@var{constant}") 178@itemx default 179 180The type machinery needs to be told which field of a @code{union} is 181currently active. This is done by giving each field a constant 182@code{tag} value, and then specifying a discriminator using @code{desc}. 183The value of the expression given by @code{desc} is compared against 184each @code{tag} value, each of which should be different. If no 185@code{tag} is matched, the field marked with @code{default} is used if 186there is one, otherwise no field in the union will be marked. 187 188In the @code{desc} option, the ``current structure'' is the union that 189it discriminates. Use @code{%1} to mean the structure containing it. 190There are no escapes available to the @code{tag} option, since it is a 191constant. 192 193For example, 194@smallexample 195struct GTY(()) tree_binding 196@{ 197 struct tree_common common; 198 union tree_binding_u @{ 199 tree GTY ((tag ("0"))) scope; 200 struct cp_binding_level * GTY ((tag ("1"))) level; 201 @} GTY ((desc ("BINDING_HAS_LEVEL_P ((tree)&%0)"))) xscope; 202 tree value; 203@}; 204@end smallexample 205 206In this example, the value of BINDING_HAS_LEVEL_P when applied to a 207@code{struct tree_binding *} is presumed to be 0 or 1. If 1, the type 208mechanism will treat the field @code{level} as being present and if 0, 209will treat the field @code{scope} as being present. 210 211@findex param_is 212@findex use_param 213@item param_is (@var{type}) 214@itemx use_param 215 216Sometimes it's convenient to define some data structure to work on 217generic pointers (that is, @code{PTR}) and then use it with a specific 218type. @code{param_is} specifies the real type pointed to, and 219@code{use_param} says where in the generic data structure that type 220should be put. 221 222For instance, to have a @code{htab_t} that points to trees, one would 223write the definition of @code{htab_t} like this: 224@smallexample 225typedef struct GTY(()) @{ 226 @dots{} 227 void ** GTY ((use_param, @dots{})) entries; 228 @dots{} 229@} htab_t; 230@end smallexample 231and then declare variables like this: 232@smallexample 233 static htab_t GTY ((param_is (union tree_node))) ict; 234@end smallexample 235 236@findex param@var{n}_is 237@findex use_param@var{n} 238@item param@var{n}_is (@var{type}) 239@itemx use_param@var{n} 240 241In more complicated cases, the data structure might need to work on 242several different types, which might not necessarily all be pointers. 243For this, @code{param1_is} through @code{param9_is} may be used to 244specify the real type of a field identified by @code{use_param1} through 245@code{use_param9}. 246 247@findex use_params 248@item use_params 249 250When a structure contains another structure that is parameterized, 251there's no need to do anything special, the inner structure inherits the 252parameters of the outer one. When a structure contains a pointer to a 253parameterized structure, the type machinery won't automatically detect 254this (it could, it just doesn't yet), so it's necessary to tell it that 255the pointed-to structure should use the same parameters as the outer 256structure. This is done by marking the pointer with the 257@code{use_params} option. 258 259@findex deletable 260@item deletable 261 262@code{deletable}, when applied to a global variable, indicates that when 263garbage collection runs, there's no need to mark anything pointed to 264by this variable, it can just be set to @code{NULL} instead. This is used 265to keep a list of free structures around for re-use. 266 267@findex if_marked 268@item if_marked ("@var{expression}") 269 270Suppose you want some kinds of object to be unique, and so you put them 271in a hash table. If garbage collection marks the hash table, these 272objects will never be freed, even if the last other reference to them 273goes away. GGC has special handling to deal with this: if you use the 274@code{if_marked} option on a global hash table, GGC will call the 275routine whose name is the parameter to the option on each hash table 276entry. If the routine returns nonzero, the hash table entry will 277be marked as usual. If the routine returns zero, the hash table entry 278will be deleted. 279 280The routine @code{ggc_marked_p} can be used to determine if an element 281has been marked already; in fact, the usual case is to use 282@code{if_marked ("ggc_marked_p")}. 283 284@findex mark_hook 285@item mark_hook ("@var{hook-routine-name}") 286 287If provided for a structure or union type, the given 288@var{hook-routine-name} (between double-quotes) is the name of a 289routine called when the garbage collector has just marked the data as 290reachable. This routine should not change the data, or call any ggc 291routine. Its only argument is a pointer to the just marked (const) 292structure or union. 293 294@findex maybe_undef 295@item maybe_undef 296 297When applied to a field, @code{maybe_undef} indicates that it's OK if 298the structure that this fields points to is never defined, so long as 299this field is always @code{NULL}. This is used to avoid requiring 300backends to define certain optional structures. It doesn't work with 301language frontends. 302 303@findex nested_ptr 304@item nested_ptr (@var{type}, "@var{to expression}", "@var{from expression}") 305 306The type machinery expects all pointers to point to the start of an 307object. Sometimes for abstraction purposes it's convenient to have 308a pointer which points inside an object. So long as it's possible to 309convert the original object to and from the pointer, such pointers 310can still be used. @var{type} is the type of the original object, 311the @var{to expression} returns the pointer given the original object, 312and the @var{from expression} returns the original object given 313the pointer. The pointer will be available using the @code{%h} 314escape. 315 316@findex chain_next 317@findex chain_prev 318@findex chain_circular 319@item chain_next ("@var{expression}") 320@itemx chain_prev ("@var{expression}") 321@itemx chain_circular ("@var{expression}") 322 323It's helpful for the type machinery to know if objects are often 324chained together in long lists; this lets it generate code that uses 325less stack space by iterating along the list instead of recursing down 326it. @code{chain_next} is an expression for the next item in the list, 327@code{chain_prev} is an expression for the previous item. For singly 328linked lists, use only @code{chain_next}; for doubly linked lists, use 329both. The machinery requires that taking the next item of the 330previous item gives the original item. @code{chain_circular} is similar 331to @code{chain_next}, but can be used for circular single linked lists. 332 333@findex reorder 334@item reorder ("@var{function name}") 335 336Some data structures depend on the relative ordering of pointers. If 337the precompiled header machinery needs to change that ordering, it 338will call the function referenced by the @code{reorder} option, before 339changing the pointers in the object that's pointed to by the field the 340option applies to. The function must take four arguments, with the 341signature @samp{@w{void *, void *, gt_pointer_operator, void *}}. 342The first parameter is a pointer to the structure that contains the 343object being updated, or the object itself if there is no containing 344structure. The second parameter is a cookie that should be ignored. 345The third parameter is a routine that, given a pointer, will update it 346to its correct new value. The fourth parameter is a cookie that must 347be passed to the second parameter. 348 349PCH cannot handle data structures that depend on the absolute values 350of pointers. @code{reorder} functions can be expensive. When 351possible, it is better to depend on properties of the data, like an ID 352number or the hash of a string instead. 353 354@findex variable_size 355@item variable_size 356 357The type machinery expects the types to be of constant size. When this 358is not true, for example, with structs that have array fields or unions, 359the type machinery cannot tell how many bytes need to be allocated at 360each allocation. The @code{variable_size} is used to mark such types. 361The type machinery then provides allocators that take a parameter 362indicating an exact size of object being allocated. Note that the size 363must be provided in bytes whereas the @code{length} option works with 364array lengths in number of elements. 365 366For example, 367@smallexample 368struct GTY((variable_size)) sorted_fields_type @{ 369 int len; 370 tree GTY((length ("%h.len"))) elts[1]; 371@}; 372@end smallexample 373 374Then the objects of @code{struct sorted_fields_type} are allocated in GC 375memory as follows: 376@smallexample 377 field_vec = ggc_alloc_sorted_fields_type (size); 378@end smallexample 379 380If @var{field_vec->elts} stores @var{n} elements, then @var{size} 381could be calculated as follows: 382@smallexample 383 size_t size = sizeof (struct sorted_fields_type) + n * sizeof (tree); 384@end smallexample 385 386@findex atomic 387@item atomic 388 389The @code{atomic} option can only be used with pointers. It informs 390the GC machinery that the memory that the pointer points to does not 391contain any pointers, and hence it should be treated by the GC and PCH 392machinery as an ``atomic'' block of memory that does not need to be 393examined when scanning memory for pointers. In particular, the 394machinery will not scan that memory for pointers to mark them as 395reachable (when marking pointers for GC) or to relocate them (when 396writing a PCH file). 397 398The @code{atomic} option differs from the @code{skip} option. 399@code{atomic} keeps the memory under Garbage Collection, but makes the 400GC ignore the contents of the memory. @code{skip} is more drastic in 401that it causes the pointer and the memory to be completely ignored by 402the Garbage Collector. So, memory marked as @code{atomic} is 403automatically freed when no longer reachable, while memory marked as 404@code{skip} is not. 405 406The @code{atomic} option must be used with great care, because all 407sorts of problem can occur if used incorrectly, that is, if the memory 408the pointer points to does actually contain a pointer. 409 410Here is an example of how to use it: 411@smallexample 412struct GTY(()) my_struct @{ 413 int number_of_elements; 414 unsigned int GTY ((atomic)) * elements; 415@}; 416@end smallexample 417In this case, @code{elements} is a pointer under GC, and the memory it 418points to needs to be allocated using the Garbage Collector, and will 419be freed automatically by the Garbage Collector when it is no longer 420referenced. But the memory that the pointer points to is an array of 421@code{unsigned int} elements, and the GC must not try to scan it to 422find pointers to mark or relocate, which is why it is marked with the 423@code{atomic} option. 424 425Note that, currently, global variables can not be marked with 426@code{atomic}; only fields of a struct can. This is a known 427limitation. It would be useful to be able to mark global pointers 428with @code{atomic} to make the PCH machinery aware of them so that 429they are saved and restored correctly to PCH files. 430 431@findex special 432@item special ("@var{name}") 433 434The @code{special} option is used to mark types that have to be dealt 435with by special case machinery. The parameter is the name of the 436special case. See @file{gengtype.c} for further details. Avoid 437adding new special cases unless there is no other alternative. 438@end table 439 440@node GGC Roots 441@section Marking Roots for the Garbage Collector 442@cindex roots, marking 443@cindex marking roots 444 445In addition to keeping track of types, the type machinery also locates 446the global variables (@dfn{roots}) that the garbage collector starts 447at. Roots must be declared using one of the following syntaxes: 448 449@itemize @bullet 450@item 451@code{extern GTY(([@var{options}])) @var{type} @var{name};} 452@item 453@code{static GTY(([@var{options}])) @var{type} @var{name};} 454@end itemize 455@noindent 456The syntax 457@itemize @bullet 458@item 459@code{GTY(([@var{options}])) @var{type} @var{name};} 460@end itemize 461@noindent 462is @emph{not} accepted. There should be an @code{extern} declaration 463of such a variable in a header somewhere---mark that, not the 464definition. Or, if the variable is only used in one file, make it 465@code{static}. 466 467@node Files 468@section Source Files Containing Type Information 469@cindex generated files 470@cindex files, generated 471 472Whenever you add @code{GTY} markers to a source file that previously 473had none, or create a new source file containing @code{GTY} markers, 474there are three things you need to do: 475 476@enumerate 477@item 478You need to add the file to the list of source files the type 479machinery scans. There are four cases: 480 481@enumerate a 482@item 483For a back-end file, this is usually done 484automatically; if not, you should add it to @code{target_gtfiles} in 485the appropriate port's entries in @file{config.gcc}. 486 487@item 488For files shared by all front ends, add the filename to the 489@code{GTFILES} variable in @file{Makefile.in}. 490 491@item 492For files that are part of one front end, add the filename to the 493@code{gtfiles} variable defined in the appropriate 494@file{config-lang.in}. For C, the file is @file{c-config-lang.in}. 495Headers should appear before non-headers in this list. 496 497@item 498For files that are part of some but not all front ends, add the 499filename to the @code{gtfiles} variable of @emph{all} the front ends 500that use it. 501@end enumerate 502 503@item 504If the file was a header file, you'll need to check that it's included 505in the right place to be visible to the generated files. For a back-end 506header file, this should be done automatically. For a front-end header 507file, it needs to be included by the same file that includes 508@file{gtype-@var{lang}.h}. For other header files, it needs to be 509included in @file{gtype-desc.c}, which is a generated file, so add it to 510@code{ifiles} in @code{open_base_file} in @file{gengtype.c}. 511 512For source files that aren't header files, the machinery will generate a 513header file that should be included in the source file you just changed. 514The file will be called @file{gt-@var{path}.h} where @var{path} is the 515pathname relative to the @file{gcc} directory with slashes replaced by 516@verb{|-|}, so for example the header file to be included in 517@file{cp/parser.c} is called @file{gt-cp-parser.c}. The 518generated header file should be included after everything else in the 519source file. Don't forget to mention this file as a dependency in the 520@file{Makefile}! 521 522@end enumerate 523 524For language frontends, there is another file that needs to be included 525somewhere. It will be called @file{gtype-@var{lang}.h}, where 526@var{lang} is the name of the subdirectory the language is contained in. 527 528Plugins can add additional root tables. Run the @code{gengtype} 529utility in plugin mode as @code{gengtype -P pluginout.h @var{source-dir} 530@var{file-list} @var{plugin*.c}} with your plugin files 531@var{plugin*.c} using @code{GTY} to generate the @var{pluginout.h} file. 532The GCC build tree is needed to be present in that mode. 533 534 535@node Invoking the garbage collector 536@section How to invoke the garbage collector 537@cindex garbage collector, invocation 538@findex ggc_collect 539 540The GCC garbage collector GGC is only invoked explicitly. In contrast 541with many other garbage collectors, it is not implicitly invoked by 542allocation routines when a lot of memory has been consumed. So the 543only way to have GGC reclaim storage it to call the @code{ggc_collect} 544function explicitly. This call is an expensive operation, as it may 545have to scan the entire heap. Beware that local variables (on the GCC 546call stack) are not followed by such an invocation (as many other 547garbage collectors do): you should reference all your data from static 548or external @code{GTY}-ed variables, and it is advised to call 549@code{ggc_collect} with a shallow call stack. The GGC is an exact mark 550and sweep garbage collector (so it does not scan the call stack for 551pointers). In practice GCC passes don't often call @code{ggc_collect} 552themselves, because it is called by the pass manager between passes. 553 554At the time of the @code{ggc_collect} call all pointers in the GC-marked 555structures must be valid or @code{NULL}. In practice this means that 556there should not be uninitialized pointer fields in the structures even 557if your code never reads or writes those fields at a particular 558instance. One way to ensure this is to use cleared versions of 559allocators unless all the fields are initialized manually immediately 560after allocation. 561 562@node Troubleshooting 563@section Troubleshooting the garbage collector 564@cindex garbage collector, troubleshooting 565 566With the current garbage collector implementation, most issues should 567show up as GCC compilation errors. Some of the most commonly 568encountered issues are described below. 569 570@itemize @bullet 571@item Gengtype does not produce allocators for a @code{GTY}-marked type. 572Gengtype checks if there is at least one possible path from GC roots to 573at least one instance of each type before outputting allocators. If 574there is no such path, the @code{GTY} markers will be ignored and no 575allocators will be output. Solve this by making sure that there exists 576at least one such path. If creating it is unfeasible or raises a ``code 577smell'', consider if you really must use GC for allocating such type. 578 579@item Link-time errors about undefined @code{gt_ggc_r_foo_bar} and 580similarly-named symbols. Check if your @file{foo_bar} source file has 581@code{#include "gt-foo_bar.h"} as its very last line. 582 583@end itemize 584