1@c markers: BUG TODO 2 3@c Copyright (C) 1988-2014 Free Software Foundation, Inc. 4@c This is part of the GCC manual. 5@c For copying conditions, see the file gcc.texi. 6 7@node Passes 8@chapter Passes and Files of the Compiler 9@cindex passes and files of the compiler 10@cindex files and passes of the compiler 11@cindex compiler passes and files 12@cindex pass dumps 13 14This chapter is dedicated to giving an overview of the optimization and 15code generation passes of the compiler. In the process, it describes 16some of the language front end interface, though this description is no 17where near complete. 18 19@menu 20* Parsing pass:: The language front end turns text into bits. 21* Cilk Plus Transformation:: Transform Cilk Plus Code to equivalent C/C++. 22* Gimplification pass:: The bits are turned into something we can optimize. 23* Pass manager:: Sequencing the optimization passes. 24* Tree SSA passes:: Optimizations on a high-level representation. 25* RTL passes:: Optimizations on a low-level representation. 26* Optimization info:: Dumping optimization information from passes. 27@end menu 28 29@node Parsing pass 30@section Parsing pass 31@cindex GENERIC 32@findex lang_hooks.parse_file 33The language front end is invoked only once, via 34@code{lang_hooks.parse_file}, to parse the entire input. The language 35front end may use any intermediate language representation deemed 36appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus 37a double handful of language specific tree codes defined in 38@file{c-common.def}. The Fortran front end uses a completely different 39private representation. 40 41@cindex GIMPLE 42@cindex gimplification 43@cindex gimplifier 44@cindex language-independent intermediate representation 45@cindex intermediate representation lowering 46@cindex lowering, language-dependent intermediate representation 47At some point the front end must translate the representation used in the 48front end to a representation understood by the language-independent 49portions of the compiler. Current practice takes one of two forms. 50The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, 51and uses the gimplifier callbacks to convert the language-specific tree 52nodes directly to GIMPLE before passing the function off to be compiled. 53The Fortran front end converts from a private representation to GENERIC, 54which is later lowered to GIMPLE when the function is compiled. Which 55route to choose probably depends on how well GENERIC (plus extensions) 56can be made to match up with the source language and necessary parsing 57data structures. 58 59BUG: Gimplification must occur before nested function lowering, 60and nested function lowering must be done by the front end before 61passing the data off to cgraph. 62 63TODO: Cgraph should control nested function lowering. It would 64only be invoked when it is certain that the outer-most function 65is used. 66 67TODO: Cgraph needs a gimplify_function callback. It should be 68invoked when (1) it is certain that the function is used, (2) 69warning flags specified by the user require some amount of 70compilation in order to honor, (3) the language indicates that 71semantic analysis is not complete until gimplification occurs. 72Hum@dots{} this sounds overly complicated. Perhaps we should just 73have the front end gimplify always; in most cases it's only one 74function call. 75 76The front end needs to pass all function definitions and top level 77declarations off to the middle-end so that they can be compiled and 78emitted to the object file. For a simple procedural language, it is 79usually most convenient to do this as each top level declaration or 80definition is seen. There is also a distinction to be made between 81generating functional code and generating complete debug information. 82The only thing that is absolutely required for functional code is that 83function and data @emph{definitions} be passed to the middle-end. For 84complete debug information, function, data and type declarations 85should all be passed as well. 86 87@findex rest_of_decl_compilation 88@findex rest_of_type_compilation 89@findex cgraph_finalize_function 90In any case, the front end needs each complete top-level function or 91data declaration, and each data definition should be passed to 92@code{rest_of_decl_compilation}. Each complete type definition should 93be passed to @code{rest_of_type_compilation}. Each function definition 94should be passed to @code{cgraph_finalize_function}. 95 96TODO: I know rest_of_compilation currently has all sorts of 97RTL generation semantics. I plan to move all code generation 98bits (both Tree and RTL) to compile_function. Should we hide 99cgraph from the front ends and move back to rest_of_compilation 100as the official interface? Possibly we should rename all three 101interfaces such that the names match in some meaningful way and 102that is more descriptive than "rest_of". 103 104The middle-end will, at its option, emit the function and data 105definitions immediately or queue them for later processing. 106 107@node Cilk Plus Transformation 108@section Cilk Plus Transformation 109@cindex CILK_PLUS 110 111If Cilk Plus generation (flag @option{-fcilkplus}) is enabled, all the Cilk 112Plus code is transformed into equivalent C and C++ functions. Majority of this 113transformation occurs toward the end of the parsing and right before the 114gimplification pass. 115 116These are the major components to the Cilk Plus language extension: 117@itemize @bullet 118@item Array Notations: 119During parsing phase, all the array notation specific information is stored in 120@code{ARRAY_NOTATION_REF} tree using the function 121@code{c_parser_array_notation}. During the end of parsing, we check the entire 122function to see if there are any array notation specific code (using the 123function @code{contains_array_notation_expr}). If this function returns 124true, then we expand them using either @code{expand_array_notation_exprs} or 125@code{build_array_notation_expr}. For the cases where array notations are 126inside conditions, they are transformed using the function 127@code{fix_conditional_array_notations}. The C language-specific routines are 128located in @file{c/c-array-notation.c} and the equivalent C++ routines are in 129the file @file{cp/cp-array-notation.c}. Common routines such as functions to 130initialize built-in functions are stored in @file{array-notation-common.c}. 131 132@item Cilk keywords: 133@itemize @bullet 134@item @code{_Cilk_spawn}: 135The @code{_Cilk_spawn} keyword is parsed and the function it contains is marked 136as a spawning function. The spawning function is called the spawner. At 137the end of the parsing phase, appropriate built-in functions are 138added to the spawner that are defined in the Cilk runtime. The appropriate 139locations of these functions, and the internal structures are detailed in 140@code{cilk_init_builtins} in the file @file{cilk-common.c}. The pointers to 141Cilk functions and fields of internal structures are described 142in @file{cilk.h}. The built-in functions are described in 143@file{cilk-builtins.def}. 144 145During gimplification, a new "spawn-helper" function is created. 146The spawned function is replaced with a spawn helper function in the spawner. 147The spawned function-call is moved into the spawn helper. The main function 148that does these transformations is @code{gimplify_cilk_spawn} in 149@file{c-family/cilk.c}. In the spawn-helper, the gimplification function 150@code{gimplify_call_expr}, inserts a function call @code{__cilkrts_detach}. 151This function is expanded by @code{builtin_expand_cilk_detach} located in 152@file{c-family/cilk.c}. 153 154@item @code{_Cilk_sync}: 155@code{_Cilk_sync} is parsed like a keyword. During gimplification, 156the function @code{gimplify_cilk_sync} in @file{c-family/cilk.c}, will replace 157this keyword with a set of functions that are stored in the Cilk runtime. 158One of the internal functions inserted during gimplification, 159@code{__cilkrts_pop_frame} must be expanded by the compiler and is 160done by @code{builtin_expand_cilk_pop_frame} in @file{cilk-common.c}. 161 162@end itemize 163@end itemize 164 165Documentation about Cilk Plus and language specification is provided under the 166"Learn" section in @w{@uref{http://www.cilkplus.org/}}. It is worth mentioning 167that the current implementation follows ABI 1.1. 168 169@node Gimplification pass 170@section Gimplification pass 171 172@cindex gimplification 173@cindex GIMPLE 174@dfn{Gimplification} is a whimsical term for the process of converting 175the intermediate representation of a function into the GIMPLE language 176(@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', 177``gimplify'', ``gimplifier'' and the like are sprinkled throughout this 178section of code. 179 180While a front end may certainly choose to generate GIMPLE directly if 181it chooses, this can be a moderately complex process unless the 182intermediate language used by the front end is already fairly simple. 183Usually it is easier to generate GENERIC trees plus extensions 184and let the language-independent gimplifier do most of the work. 185 186@findex gimplify_function_tree 187@findex gimplify_expr 188@findex lang_hooks.gimplify_expr 189The main entry point to this pass is @code{gimplify_function_tree} 190located in @file{gimplify.c}. From here we process the entire 191function gimplifying each statement in turn. The main workhorse 192for this pass is @code{gimplify_expr}. Approximately everything 193passes through here at least once, and it is from here that we 194invoke the @code{lang_hooks.gimplify_expr} callback. 195 196The callback should examine the expression in question and return 197@code{GS_UNHANDLED} if the expression is not a language specific 198construct that requires attention. Otherwise it should alter the 199expression in some way to such that forward progress is made toward 200producing valid GIMPLE@. If the callback is certain that the 201transformation is complete and the expression is valid GIMPLE, it 202should return @code{GS_ALL_DONE}. Otherwise it should return 203@code{GS_OK}, which will cause the expression to be processed again. 204If the callback encounters an error during the transformation (because 205the front end is relying on the gimplification process to finish 206semantic checks), it should return @code{GS_ERROR}. 207 208@node Pass manager 209@section Pass manager 210 211The pass manager is located in @file{passes.c}, @file{tree-optimize.c} 212and @file{tree-pass.h}. 213It processes passes as described in @file{passes.def}. 214Its job is to run all of the individual passes in the correct order, 215and take care of standard bookkeeping that applies to every pass. 216 217The theory of operation is that each pass defines a structure that 218represents everything we need to know about that pass---when it 219should be run, how it should be run, what intermediate language 220form or on-the-side data structures it needs. We register the pass 221to be run in some particular order, and the pass manager arranges 222for everything to happen in the correct order. 223 224The actuality doesn't completely live up to the theory at present. 225Command-line switches and @code{timevar_id_t} enumerations must still 226be defined elsewhere. The pass manager validates constraints but does 227not attempt to (re-)generate data structures or lower intermediate 228language form based on the requirements of the next pass. Nevertheless, 229what is present is useful, and a far sight better than nothing at all. 230 231Each pass should have a unique name. 232Each pass may have its own dump file (for GCC debugging purposes). 233Passes with a name starting with a star do not dump anything. 234Sometimes passes are supposed to share a dump file / option name. 235To still give these unique names, you can use a prefix that is delimited 236by a space from the part that is used for the dump file / option name. 237E.g. When the pass name is "ud dce", the name used for dump file/options 238is "dce". 239 240TODO: describe the global variables set up by the pass manager, 241and a brief description of how a new pass should use it. 242I need to look at what info RTL passes use first@enddots{} 243 244@node Tree SSA passes 245@section Tree SSA passes 246 247The following briefly describes the Tree optimization passes that are 248run after gimplification and what source files they are located in. 249 250@itemize @bullet 251@item Remove useless statements 252 253This pass is an extremely simple sweep across the gimple code in which 254we identify obviously dead code and remove it. Here we do things like 255simplify @code{if} statements with constant conditions, remove 256exception handling constructs surrounding code that obviously cannot 257throw, remove lexical bindings that contain no variables, and other 258assorted simplistic cleanups. The idea is to get rid of the obvious 259stuff quickly rather than wait until later when it's more work to get 260rid of it. This pass is located in @file{tree-cfg.c} and described by 261@code{pass_remove_useless_stmts}. 262 263@item OpenMP lowering 264 265If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers 266OpenMP constructs into GIMPLE. 267 268Lowering of OpenMP constructs involves creating replacement 269expressions for local variables that have been mapped using data 270sharing clauses, exposing the control flow of most synchronization 271directives and adding region markers to facilitate the creation of the 272control flow graph. The pass is located in @file{omp-low.c} and is 273described by @code{pass_lower_omp}. 274 275@item OpenMP expansion 276 277If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands 278parallel regions into their own functions to be invoked by the thread 279library. The pass is located in @file{omp-low.c} and is described by 280@code{pass_expand_omp}. 281 282@item Lower control flow 283 284This pass flattens @code{if} statements (@code{COND_EXPR}) 285and moves lexical bindings (@code{BIND_EXPR}) out of line. After 286this pass, all @code{if} statements will have exactly two @code{goto} 287statements in its @code{then} and @code{else} arms. Lexical binding 288information for each statement will be found in @code{TREE_BLOCK} rather 289than being inferred from its position under a @code{BIND_EXPR}. This 290pass is found in @file{gimple-low.c} and is described by 291@code{pass_lower_cf}. 292 293@item Lower exception handling control flow 294 295This pass decomposes high-level exception handling constructs 296(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form 297that explicitly represents the control flow involved. After this 298pass, @code{lookup_stmt_eh_region} will return a non-negative 299number for any statement that may have EH control flow semantics; 300examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} 301for exact semantics. Exact control flow may be extracted from 302@code{foreach_reachable_handler}. The EH region nesting tree is defined 303in @file{except.h} and built in @file{except.c}. The lowering pass 304itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. 305 306@item Build the control flow graph 307 308This pass decomposes a function into basic blocks and creates all of 309the edges that connect them. It is located in @file{tree-cfg.c} and 310is described by @code{pass_build_cfg}. 311 312@item Find all referenced variables 313 314This pass walks the entire function and collects an array of all 315variables referenced in the function, @code{referenced_vars}. The 316index at which a variable is found in the array is used as a UID 317for the variable within this function. This data is needed by the 318SSA rewriting routines. The pass is located in @file{tree-dfa.c} 319and is described by @code{pass_referenced_vars}. 320 321@item Enter static single assignment form 322 323This pass rewrites the function such that it is in SSA form. After 324this pass, all @code{is_gimple_reg} variables will be referenced by 325@code{SSA_NAME}, and all occurrences of other variables will be 326annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have 327been inserted as necessary for each basic block. This pass is 328located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. 329 330@item Warn for uninitialized variables 331 332This pass scans the function for uses of @code{SSA_NAME}s that 333are fed by default definition. For non-parameter variables, such 334uses are uninitialized. The pass is run twice, before and after 335optimization (if turned on). In the first pass we only warn for uses that are 336positively uninitialized; in the second pass we warn for uses that 337are possibly uninitialized. The pass is located in @file{tree-ssa.c} 338and is defined by @code{pass_early_warn_uninitialized} and 339@code{pass_late_warn_uninitialized}. 340 341@item Dead code elimination 342 343This pass scans the function for statements without side effects whose 344result is unused. It does not do memory life analysis, so any value 345that is stored in memory is considered used. The pass is run multiple 346times throughout the optimization process. It is located in 347@file{tree-ssa-dce.c} and is described by @code{pass_dce}. 348 349@item Dominator optimizations 350 351This pass performs trivial dominator-based copy and constant propagation, 352expression simplification, and jump threading. It is run multiple times 353throughout the optimization process. It is located in @file{tree-ssa-dom.c} 354and is described by @code{pass_dominator}. 355 356@item Forward propagation of single-use variables 357 358This pass attempts to remove redundant computation by substituting 359variables that are used once into the expression that uses them and 360seeing if the result can be simplified. It is located in 361@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. 362 363@item Copy Renaming 364 365This pass attempts to change the name of compiler temporaries involved in 366copy operations such that SSA->normal can coalesce the copy away. When compiler 367temporaries are copies of user variables, it also renames the compiler 368temporary to the user variable resulting in better use of user symbols. It is 369located in @file{tree-ssa-copyrename.c} and is described by 370@code{pass_copyrename}. 371 372@item PHI node optimizations 373 374This pass recognizes forms of PHI inputs that can be represented as 375conditional expressions and rewrites them into straight line code. 376It is located in @file{tree-ssa-phiopt.c} and is described by 377@code{pass_phiopt}. 378 379@item May-alias optimization 380 381This pass performs a flow sensitive SSA-based points-to analysis. 382The resulting may-alias, must-alias, and escape analysis information 383is used to promote variables from in-memory addressable objects to 384non-aliased variables that can be renamed into SSA form. We also 385update the @code{VDEF}/@code{VUSE} memory tags for non-renameable 386aggregates so that we get fewer false kills. The pass is located 387in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. 388 389Interprocedural points-to information is located in 390@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. 391 392@item Profiling 393 394This pass rewrites the function in order to collect runtime block 395and value profiling data. Such data may be fed back into the compiler 396on a subsequent run so as to allow optimization based on expected 397execution frequencies. The pass is located in @file{predict.c} and 398is described by @code{pass_profile}. 399 400@item Lower complex arithmetic 401 402This pass rewrites complex arithmetic operations into their component 403scalar arithmetic operations. The pass is located in @file{tree-complex.c} 404and is described by @code{pass_lower_complex}. 405 406@item Scalar replacement of aggregates 407 408This pass rewrites suitable non-aliased local aggregate variables into 409a set of scalar variables. The resulting scalar variables are 410rewritten into SSA form, which allows subsequent optimization passes 411to do a significantly better job with them. The pass is located in 412@file{tree-sra.c} and is described by @code{pass_sra}. 413 414@item Dead store elimination 415 416This pass eliminates stores to memory that are subsequently overwritten 417by another store, without any intervening loads. The pass is located 418in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. 419 420@item Tail recursion elimination 421 422This pass transforms tail recursion into a loop. It is located in 423@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. 424 425@item Forward store motion 426 427This pass sinks stores and assignments down the flowgraph closer to their 428use point. The pass is located in @file{tree-ssa-sink.c} and is 429described by @code{pass_sink_code}. 430 431@item Partial redundancy elimination 432 433This pass eliminates partially redundant computations, as well as 434performing load motion. The pass is located in @file{tree-ssa-pre.c} 435and is described by @code{pass_pre}. 436 437Just before partial redundancy elimination, if 438@option{-funsafe-math-optimizations} is on, GCC tries to convert 439divisions to multiplications by the reciprocal. The pass is located 440in @file{tree-ssa-math-opts.c} and is described by 441@code{pass_cse_reciprocal}. 442 443@item Full redundancy elimination 444 445This is a simpler form of PRE that only eliminates redundancies that 446occur on all paths. It is located in @file{tree-ssa-pre.c} and 447described by @code{pass_fre}. 448 449@item Loop optimization 450 451The main driver of the pass is placed in @file{tree-ssa-loop.c} 452and described by @code{pass_loop}. 453 454The optimizations performed by this pass are: 455 456Loop invariant motion. This pass moves only invariants that 457would be hard to handle on RTL level (function calls, operations that expand to 458nontrivial sequences of insns). With @option{-funswitch-loops} it also moves 459operands of conditions that are invariant out of the loop, so that we can use 460just trivial invariantness analysis in loop unswitching. The pass also includes 461store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. 462 463Canonical induction variable creation. This pass creates a simple counter 464for number of iterations of the loop and replaces the exit condition of the 465loop using it, in case when a complicated analysis is necessary to determine 466the number of iterations. Later optimizations then may determine the number 467easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. 468 469Induction variable optimizations. This pass performs standard induction 470variable optimizations, including strength reduction, induction variable 471merging and induction variable elimination. The pass is implemented in 472@file{tree-ssa-loop-ivopts.c}. 473 474Loop unswitching. This pass moves the conditional jumps that are invariant 475out of the loops. To achieve this, a duplicate of the loop is created for 476each possible outcome of conditional jump(s). The pass is implemented in 477@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the 478RTL level loop unswitching in @file{loop-unswitch.c}, but currently 479the RTL level pass is not completely redundant yet due to deficiencies 480in tree level alias analysis. 481 482The optimizations also use various utility functions contained in 483@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and 484@file{cfgloopmanip.c}. 485 486Vectorization. This pass transforms loops to operate on vector types 487instead of scalar types. Data parallelism across loop iterations is exploited 488to group data elements from consecutive iterations into a vector and operate 489on them in parallel. Depending on available target support the loop is 490conceptually unrolled by a factor @code{VF} (vectorization factor), which is 491the number of elements operated upon in parallel in each iteration, and the 492@code{VF} copies of each scalar operation are fused to form a vector operation. 493Additional loop transformations such as peeling and versioning may take place 494to align the number of iterations, and to align the memory accesses in the 495loop. 496The pass is implemented in @file{tree-vectorizer.c} (the main driver), 497@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts 498and general loop utilities), @file{tree-vect-slp} (loop-aware SLP 499functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. 500Analysis of data references is in @file{tree-data-ref.c}. 501 502SLP Vectorization. This pass performs vectorization of straight-line code. The 503pass is implemented in @file{tree-vectorizer.c} (the main driver), 504@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and 505@file{tree-vect-data-refs.c}. 506 507Autoparallelization. This pass splits the loop iteration space to run 508into several threads. The pass is implemented in @file{tree-parloops.c}. 509 510Graphite is a loop transformation framework based on the polyhedral 511model. Graphite stands for Gimple Represented as Polyhedra. The 512internals of this infrastructure are documented in 513@w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on 514this representation are implemented in the various @file{graphite-*} 515files. 516 517@item Tree level if-conversion for vectorizer 518 519This pass applies if-conversion to simple loops to help vectorizer. 520We identify if convertible loops, if-convert statements and merge 521basic blocks in one big block. The idea is to present loop in such 522form so that vectorizer can have one to one mapping between statements 523and available vector operations. This pass is located in 524@file{tree-if-conv.c} and is described by @code{pass_if_conversion}. 525 526@item Conditional constant propagation 527 528This pass relaxes a lattice of values in order to identify those 529that must be constant even in the presence of conditional branches. 530The pass is located in @file{tree-ssa-ccp.c} and is described 531by @code{pass_ccp}. 532 533A related pass that works on memory loads and stores, and not just 534register values, is located in @file{tree-ssa-ccp.c} and described by 535@code{pass_store_ccp}. 536 537@item Conditional copy propagation 538 539This is similar to constant propagation but the lattice of values is 540the ``copy-of'' relation. It eliminates redundant copies from the 541code. The pass is located in @file{tree-ssa-copy.c} and described by 542@code{pass_copy_prop}. 543 544A related pass that works on memory copies, and not just register 545copies, is located in @file{tree-ssa-copy.c} and described by 546@code{pass_store_copy_prop}. 547 548@item Value range propagation 549 550This transformation is similar to constant propagation but 551instead of propagating single constant values, it propagates 552known value ranges. The implementation is based on Patterson's 553range propagation algorithm (Accurate Static Branch Prediction by 554Value Range Propagation, J. R. C. Patterson, PLDI '95). In 555contrast to Patterson's algorithm, this implementation does not 556propagate branch probabilities nor it uses more than a single 557range per SSA name. This means that the current implementation 558cannot be used for branch prediction (though adapting it would 559not be difficult). The pass is located in @file{tree-vrp.c} and is 560described by @code{pass_vrp}. 561 562@item Folding built-in functions 563 564This pass simplifies built-in functions, as applicable, with constant 565arguments or with inferable string lengths. It is located in 566@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. 567 568@item Split critical edges 569 570This pass identifies critical edges and inserts empty basic blocks 571such that the edge is no longer critical. The pass is located in 572@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. 573 574@item Control dependence dead code elimination 575 576This pass is a stronger form of dead code elimination that can 577eliminate unnecessary control flow statements. It is located 578in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. 579 580@item Tail call elimination 581 582This pass identifies function calls that may be rewritten into 583jumps. No code transformation is actually applied here, but the 584data and control flow problem is solved. The code transformation 585requires target support, and so is delayed until RTL@. In the 586meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. 587The pass is located in @file{tree-tailcall.c} and is described by 588@code{pass_tail_calls}. The RTL transformation is handled by 589@code{fixup_tail_calls} in @file{calls.c}. 590 591@item Warn for function return without value 592 593For non-void functions, this pass locates return statements that do 594not specify a value and issues a warning. Such a statement may have 595been injected by falling off the end of the function. This pass is 596run last so that we have as much time as possible to prove that the 597statement is not reachable. It is located in @file{tree-cfg.c} and 598is described by @code{pass_warn_function_return}. 599 600@item Leave static single assignment form 601 602This pass rewrites the function such that it is in normal form. At 603the same time, we eliminate as many single-use temporaries as possible, 604so the intermediate language is no longer GIMPLE, but GENERIC@. The 605pass is located in @file{tree-outof-ssa.c} and is described by 606@code{pass_del_ssa}. 607 608@item Merge PHI nodes that feed into one another 609 610This is part of the CFG cleanup passes. It attempts to join PHI nodes 611from a forwarder CFG block into another block with PHI nodes. The 612pass is located in @file{tree-cfgcleanup.c} and is described by 613@code{pass_merge_phi}. 614 615@item Return value optimization 616 617If a function always returns the same local variable, and that local 618variable is an aggregate type, then the variable is replaced with the 619return value for the function (i.e., the function's DECL_RESULT). This 620is equivalent to the C++ named return value optimization applied to 621GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by 622@code{pass_nrv}. 623 624@item Return slot optimization 625 626If a function returns a memory object and is called as @code{var = 627foo()}, this pass tries to change the call so that the address of 628@code{var} is sent to the caller to avoid an extra memory copy. This 629pass is located in @code{tree-nrv.c} and is described by 630@code{pass_return_slot}. 631 632@item Optimize calls to @code{__builtin_object_size} 633 634This is a propagation pass similar to CCP that tries to remove calls 635to @code{__builtin_object_size} when the size of the object can be 636computed at compile-time. This pass is located in 637@file{tree-object-size.c} and is described by 638@code{pass_object_sizes}. 639 640@item Loop invariant motion 641 642This pass removes expensive loop-invariant computations out of loops. 643The pass is located in @file{tree-ssa-loop.c} and described by 644@code{pass_lim}. 645 646@item Loop nest optimizations 647 648This is a family of loop transformations that works on loop nests. It 649includes loop interchange, scaling, skewing and reversal and they are 650all geared to the optimization of data locality in array traversals 651and the removal of dependencies that hamper optimizations such as loop 652parallelization and vectorization. The pass is located in 653@file{tree-loop-linear.c} and described by 654@code{pass_linear_transform}. 655 656@item Removal of empty loops 657 658This pass removes loops with no code in them. The pass is located in 659@file{tree-ssa-loop-ivcanon.c} and described by 660@code{pass_empty_loop}. 661 662@item Unrolling of small loops 663 664This pass completely unrolls loops with few iterations. The pass 665is located in @file{tree-ssa-loop-ivcanon.c} and described by 666@code{pass_complete_unroll}. 667 668@item Predictive commoning 669 670This pass makes the code reuse the computations from the previous 671iterations of the loops, especially loads and stores to memory. 672It does so by storing the values of these computations to a bank 673of temporary variables that are rotated at the end of loop. To avoid 674the need for this rotation, the loop is then unrolled and the copies 675of the loop body are rewritten to use the appropriate version of 676the temporary variable. This pass is located in @file{tree-predcom.c} 677and described by @code{pass_predcom}. 678 679@item Array prefetching 680 681This pass issues prefetch instructions for array references inside 682loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and 683described by @code{pass_loop_prefetch}. 684 685@item Reassociation 686 687This pass rewrites arithmetic expressions to enable optimizations that 688operate on them, like redundancy elimination and vectorization. The 689pass is located in @file{tree-ssa-reassoc.c} and described by 690@code{pass_reassoc}. 691 692@item Optimization of @code{stdarg} functions 693 694This pass tries to avoid the saving of register arguments into the 695stack on entry to @code{stdarg} functions. If the function doesn't 696use any @code{va_start} macros, no registers need to be saved. If 697@code{va_start} macros are used, the @code{va_list} variables don't 698escape the function, it is only necessary to save registers that will 699be used in @code{va_arg} macros. For instance, if @code{va_arg} is 700only used with integral types in the function, floating point 701registers don't need to be saved. This pass is located in 702@code{tree-stdarg.c} and described by @code{pass_stdarg}. 703 704@end itemize 705 706@node RTL passes 707@section RTL passes 708 709The following briefly describes the RTL generation and optimization 710passes that are run after the Tree optimization passes. 711 712@itemize @bullet 713@item RTL generation 714 715@c Avoiding overfull is tricky here. 716The source files for RTL generation include 717@file{stmt.c}, 718@file{calls.c}, 719@file{expr.c}, 720@file{explow.c}, 721@file{expmed.c}, 722@file{function.c}, 723@file{optabs.c} 724and @file{emit-rtl.c}. 725Also, the file 726@file{insn-emit.c}, generated from the machine description by the 727program @code{genemit}, is used in this pass. The header file 728@file{expr.h} is used for communication within this pass. 729 730@findex genflags 731@findex gencodes 732The header files @file{insn-flags.h} and @file{insn-codes.h}, 733generated from the machine description by the programs @code{genflags} 734and @code{gencodes}, tell this pass which standard names are available 735for use and which patterns correspond to them. 736 737@item Generation of exception landing pads 738 739This pass generates the glue that handles communication between the 740exception handling library routines and the exception handlers within 741the function. Entry points in the function that are invoked by the 742exception handling library are called @dfn{landing pads}. The code 743for this pass is located in @file{except.c}. 744 745@item Control flow graph cleanup 746 747This pass removes unreachable code, simplifies jumps to next, jumps to 748jump, jumps across jumps, etc. The pass is run multiple times. 749For historical reasons, it is occasionally referred to as the ``jump 750optimization pass''. The bulk of the code for this pass is in 751@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} 752and @file{jump.c}. 753 754@item Forward propagation of single-def values 755 756This pass attempts to remove redundant computation by substituting 757variables that come from a single definition, and 758seeing if the result can be simplified. It performs copy propagation 759and addressing mode selection. The pass is run twice, with values 760being propagated into loops only on the second run. The code is 761located in @file{fwprop.c}. 762 763@item Common subexpression elimination 764 765This pass removes redundant computation within basic blocks, and 766optimizes addressing modes based on cost. The pass is run twice. 767The code for this pass is located in @file{cse.c}. 768 769@item Global common subexpression elimination 770 771This pass performs two 772different types of GCSE depending on whether you are optimizing for 773size or not (LCM based GCSE tends to increase code size for a gain in 774speed, while Morel-Renvoise based GCSE does not). 775When optimizing for size, GCSE is done using Morel-Renvoise Partial 776Redundancy Elimination, with the exception that it does not try to move 777invariants out of loops---that is left to the loop optimization pass. 778If MR PRE GCSE is done, code hoisting (aka unification) is also done, as 779well as load motion. 780If you are optimizing for speed, LCM (lazy code motion) based GCSE is 781done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM 782based GCSE also does loop invariant code motion. We also perform load 783and store motion when optimizing for speed. 784Regardless of which type of GCSE is used, the GCSE pass also performs 785global constant and copy propagation. 786The source file for this pass is @file{gcse.c}, and the LCM routines 787are in @file{lcm.c}. 788 789@item Loop optimization 790 791This pass performs several loop related optimizations. 792The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain 793generic loop analysis and manipulation code. Initialization and finalization 794of loop structures is handled by @file{loop-init.c}. 795A loop invariant motion pass is implemented in @file{loop-invariant.c}. 796Basic block level optimizations---unrolling, peeling and unswitching loops--- 797are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. 798Replacing of the exit condition of loops by special machine-dependent 799instructions is handled by @file{loop-doloop.c}. 800 801@item Jump bypassing 802 803This pass is an aggressive form of GCSE that transforms the control 804flow graph of a function by propagating constants into conditional 805branch instructions. The source file for this pass is @file{gcse.c}. 806 807@item If conversion 808 809This pass attempts to replace conditional branches and surrounding 810assignments with arithmetic, boolean value producing comparison 811instructions, and conditional move instructions. In the very last 812invocation after reload/LRA, it will generate predicated instructions 813when supported by the target. The code is located in @file{ifcvt.c}. 814 815@item Web construction 816 817This pass splits independent uses of each pseudo-register. This can 818improve effect of the other transformation, such as CSE or register 819allocation. The code for this pass is located in @file{web.c}. 820 821@item Instruction combination 822 823This pass attempts to combine groups of two or three instructions that 824are related by data flow into single instructions. It combines the 825RTL expressions for the instructions by substitution, simplifies the 826result using algebra, and then attempts to match the result against 827the machine description. The code is located in @file{combine.c}. 828 829@item Mode switching optimization 830 831This pass looks for instructions that require the processor to be in a 832specific ``mode'' and minimizes the number of mode changes required to 833satisfy all users. What these modes are, and what they apply to are 834completely target-specific. The code for this pass is located in 835@file{mode-switching.c}. 836 837@cindex modulo scheduling 838@cindex sms, swing, software pipelining 839@item Modulo scheduling 840 841This pass looks at innermost loops and reorders their instructions 842by overlapping different iterations. Modulo scheduling is performed 843immediately before instruction scheduling. The code for this pass is 844located in @file{modulo-sched.c}. 845 846@item Instruction scheduling 847 848This pass looks for instructions whose output will not be available by 849the time that it is used in subsequent instructions. Memory loads and 850floating point instructions often have this behavior on RISC machines. 851It re-orders instructions within a basic block to try to separate the 852definition and use of items that otherwise would cause pipeline 853stalls. This pass is performed twice, before and after register 854allocation. The code for this pass is located in @file{haifa-sched.c}, 855@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and 856@file{sched-vis.c}. 857 858@item Register allocation 859 860These passes make sure that all occurrences of pseudo registers are 861eliminated, either by allocating them to a hard register, replacing 862them by an equivalent expression (e.g.@: a constant) or by placing 863them on the stack. This is done in several subpasses: 864 865@itemize @bullet 866@item 867The integrated register allocator (@acronym{IRA}). It is called 868integrated because coalescing, register live range splitting, and hard 869register preferencing are done on-the-fly during coloring. It also 870has better integration with the reload/LRA pass. Pseudo-registers spilled 871by the allocator or the reload/LRA have still a chance to get 872hard-registers if the reload/LRA evicts some pseudo-registers from 873hard-registers. The allocator helps to choose better pseudos for 874spilling based on their live ranges and to coalesce stack slots 875allocated for the spilled pseudo-registers. IRA is a regional 876register allocator which is transformed into Chaitin-Briggs allocator 877if there is one region. By default, IRA chooses regions using 878register pressure but the user can force it to use one region or 879regions corresponding to all loops. 880 881Source files of the allocator are @file{ira.c}, @file{ira-build.c}, 882@file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, 883@file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} 884and @file{ira-int.h} used for the communication between the allocator 885and the rest of the compiler and between the IRA files. 886 887@cindex reloading 888@item 889Reloading. This pass renumbers pseudo registers with the hardware 890registers numbers they were allocated. Pseudo registers that did not 891get hard registers are replaced with stack slots. Then it finds 892instructions that are invalid because a value has failed to end up in 893a register, or has ended up in a register of the wrong kind. It fixes 894up these instructions by reloading the problematical values 895temporarily into registers. Additional instructions are generated to 896do the copying. 897 898The reload pass also optionally eliminates the frame pointer and inserts 899instructions to save and restore call-clobbered registers around calls. 900 901Source files are @file{reload.c} and @file{reload1.c}, plus the header 902@file{reload.h} used for communication between them. 903 904@cindex Local Register Allocator (LRA) 905@item 906This pass is a modern replacement of the reload pass. Source files 907are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, 908@file{lra-constraints.c}, @file{lra-eliminations.c}, 909@file{lra-equivs.c}, @file{lra-lives.c}, @file{lra-saves.c}, 910@file{lra-spills.c}, the header @file{lra-int.h} used for 911communication between them, and the header @file{lra.h} used for 912communication between LRA and the rest of compiler. 913 914Unlike the reload pass, intermediate LRA decisions are reflected in 915RTL as much as possible. This reduces the number of target-dependent 916macros and hooks, leaving instruction constraints as the primary 917source of control. 918 919LRA is run on targets for which TARGET_LRA_P returns true. 920@end itemize 921 922@item Basic block reordering 923 924This pass implements profile guided code positioning. If profile 925information is not available, various types of static analysis are 926performed to make the predictions normally coming from the profile 927feedback (IE execution frequency, branch probability, etc). It is 928implemented in the file @file{bb-reorder.c}, and the various 929prediction routines are in @file{predict.c}. 930 931@item Variable tracking 932 933This pass computes where the variables are stored at each 934position in code and generates notes describing the variable locations 935to RTL code. The location lists are then generated according to these 936notes to debug information if the debugging information format supports 937location lists. The code is located in @file{var-tracking.c}. 938 939@item Delayed branch scheduling 940 941This optional pass attempts to find instructions that can go into the 942delay slots of other instructions, usually jumps and calls. The code 943for this pass is located in @file{reorg.c}. 944 945@item Branch shortening 946 947On many RISC machines, branch instructions have a limited range. 948Thus, longer sequences of instructions must be used for long branches. 949In this pass, the compiler figures out what how far each instruction 950will be from each other instruction, and therefore whether the usual 951instructions, or the longer sequences, must be used for each branch. 952The code for this pass is located in @file{final.c}. 953 954@item Register-to-stack conversion 955 956Conversion from usage of some hard registers to usage of a register 957stack may be done at this point. Currently, this is supported only 958for the floating-point registers of the Intel 80387 coprocessor. The 959code for this pass is located in @file{reg-stack.c}. 960 961@item Final 962 963This pass outputs the assembler code for the function. The source files 964are @file{final.c} plus @file{insn-output.c}; the latter is generated 965automatically from the machine description by the tool @file{genoutput}. 966The header file @file{conditions.h} is used for communication between 967these files. 968 969@item Debugging information output 970 971This is run after final because it must output the stack slot offsets 972for pseudo registers that did not get hard registers. Source files 973are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for 974SDB symbol table format, @file{dwarfout.c} for DWARF symbol table 975format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2 976symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table 977format. 978 979@end itemize 980 981@node Optimization info 982@section Optimization info 983@include optinfo.texi 984