1@c markers: CROSSREF BUG TODO 2 3@c Copyright (C) 1988, 1989, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 4@c 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. 5@c This is part of the GCC manual. 6@c For copying conditions, see the file gcc.texi. 7 8@node Passes 9@chapter Passes and Files of the Compiler 10@cindex passes and files of the compiler 11@cindex files and passes of the compiler 12@cindex compiler passes and files 13 14This chapter is dedicated to giving an overview of the optimization and 15code generation passes of the compiler. In the process, it describes 16some of the language front end interface, though this description is no 17where near complete. 18 19@menu 20* Parsing pass:: The language front end turns text into bits. 21* Gimplification pass:: The bits are turned into something we can optimize. 22* Pass manager:: Sequencing the optimization passes. 23* Tree-SSA passes:: Optimizations on a high-level representation. 24* RTL passes:: Optimizations on a low-level representation. 25@end menu 26 27@node Parsing pass 28@section Parsing pass 29@cindex GENERIC 30@findex lang_hooks.parse_file 31The language front end is invoked only once, via 32@code{lang_hooks.parse_file}, to parse the entire input. The language 33front end may use any intermediate language representation deemed 34appropriate. The C front end uses GENERIC trees (CROSSREF), plus 35a double handful of language specific tree codes defined in 36@file{c-common.def}. The Fortran front end uses a completely different 37private representation. 38 39@cindex GIMPLE 40@cindex gimplification 41@cindex gimplifier 42@cindex language-independent intermediate representation 43@cindex intermediate representation lowering 44@cindex lowering, language-dependent intermediate representation 45At some point the front end must translate the representation used in the 46front end to a representation understood by the language-independent 47portions of the compiler. Current practice takes one of two forms. 48The C front end manually invokes the gimplifier (CROSSREF) on each function, 49and uses the gimplifier callbacks to convert the language-specific tree 50nodes directly to GIMPLE (CROSSREF) before passing the function off to 51be compiled. 52The Fortran front end converts from a private representation to GENERIC, 53which is later lowered to GIMPLE when the function is compiled. Which 54route to choose probably depends on how well GENERIC (plus extensions) 55can be made to match up with the source language and necessary parsing 56data structures. 57 58BUG: Gimplification must occur before nested function lowering, 59and nested function lowering must be done by the front end before 60passing the data off to cgraph. 61 62TODO: Cgraph should control nested function lowering. It would 63only be invoked when it is certain that the outer-most function 64is used. 65 66TODO: Cgraph needs a gimplify_function callback. It should be 67invoked when (1) it is certain that the function is used, (2) 68warning flags specified by the user require some amount of 69compilation in order to honor, (3) the language indicates that 70semantic analysis is not complete until gimplification occurs. 71Hum@dots{} this sounds overly complicated. Perhaps we should just 72have the front end gimplify always; in most cases it's only one 73function call. 74 75The front end needs to pass all function definitions and top level 76declarations off to the middle-end so that they can be compiled and 77emitted to the object file. For a simple procedural language, it is 78usually most convenient to do this as each top level declaration or 79definition is seen. There is also a distinction to be made between 80generating functional code and generating complete debug information. 81The only thing that is absolutely required for functional code is that 82function and data @emph{definitions} be passed to the middle-end. For 83complete debug information, function, data and type declarations 84should all be passed as well. 85 86@findex rest_of_decl_compilation 87@findex rest_of_type_compilation 88@findex cgraph_finalize_function 89In any case, the front end needs each complete top-level function or 90data declaration, and each data definition should be passed to 91@code{rest_of_decl_compilation}. Each complete type definition should 92be passed to @code{rest_of_type_compilation}. Each function definition 93should be passed to @code{cgraph_finalize_function}. 94 95TODO: I know rest_of_compilation currently has all sorts of 96rtl-generation semantics. I plan to move all code generation 97bits (both tree and rtl) to compile_function. Should we hide 98cgraph from the front ends and move back to rest_of_compilation 99as the official interface? Possibly we should rename all three 100interfaces such that the names match in some meaningful way and 101that is more descriptive than "rest_of". 102 103The middle-end will, at its option, emit the function and data 104definitions immediately or queue them for later processing. 105 106@node Gimplification pass 107@section Gimplification pass 108 109@cindex gimplification 110@cindex GIMPLE 111@dfn{Gimplification} is a whimsical term for the process of converting 112the intermediate representation of a function into the GIMPLE language 113(CROSSREF). The term stuck, and so words like ``gimplification'', 114``gimplify'', ``gimplifier'' and the like are sprinkled throughout this 115section of code. 116 117@cindex GENERIC 118While a front end may certainly choose to generate GIMPLE directly if 119it chooses, this can be a moderately complex process unless the 120intermediate language used by the front end is already fairly simple. 121Usually it is easier to generate GENERIC trees plus extensions 122and let the language-independent gimplifier do most of the work. 123 124@findex gimplify_function_tree 125@findex gimplify_expr 126@findex lang_hooks.gimplify_expr 127The main entry point to this pass is @code{gimplify_function_tree} 128located in @file{gimplify.c}. From here we process the entire 129function gimplifying each statement in turn. The main workhorse 130for this pass is @code{gimplify_expr}. Approximately everything 131passes through here at least once, and it is from here that we 132invoke the @code{lang_hooks.gimplify_expr} callback. 133 134The callback should examine the expression in question and return 135@code{GS_UNHANDLED} if the expression is not a language specific 136construct that requires attention. Otherwise it should alter the 137expression in some way to such that forward progress is made toward 138producing valid GIMPLE@. If the callback is certain that the 139transformation is complete and the expression is valid GIMPLE, it 140should return @code{GS_ALL_DONE}. Otherwise it should return 141@code{GS_OK}, which will cause the expression to be processed again. 142If the callback encounters an error during the transformation (because 143the front end is relying on the gimplification process to finish 144semantic checks), it should return @code{GS_ERROR}. 145 146@node Pass manager 147@section Pass manager 148 149The pass manager is located in @file{passes.c}, @file{tree-optimize.c} 150and @file{tree-pass.h}. 151Its job is to run all of the individual passes in the correct order, 152and take care of standard bookkeeping that applies to every pass. 153 154The theory of operation is that each pass defines a structure that 155represents everything we need to know about that pass---when it 156should be run, how it should be run, what intermediate language 157form or on-the-side data structures it needs. We register the pass 158to be run in some particular order, and the pass manager arranges 159for everything to happen in the correct order. 160 161The actuality doesn't completely live up to the theory at present. 162Command-line switches and @code{timevar_id_t} enumerations must still 163be defined elsewhere. The pass manager validates constraints but does 164not attempt to (re-)generate data structures or lower intermediate 165language form based on the requirements of the next pass. Nevertheless, 166what is present is useful, and a far sight better than nothing at all. 167 168TODO: describe the global variables set up by the pass manager, 169and a brief description of how a new pass should use it. 170I need to look at what info rtl passes use first... 171 172@node Tree-SSA passes 173@section Tree-SSA passes 174 175The following briefly describes the tree optimization passes that are 176run after gimplification and what source files they are located in. 177 178@itemize @bullet 179@item Remove useless statements 180 181This pass is an extremely simple sweep across the gimple code in which 182we identify obviously dead code and remove it. Here we do things like 183simplify @code{if} statements with constant conditions, remove 184exception handling constructs surrounding code that obviously cannot 185throw, remove lexical bindings that contain no variables, and other 186assorted simplistic cleanups. The idea is to get rid of the obvious 187stuff quickly rather than wait until later when it's more work to get 188rid of it. This pass is located in @file{tree-cfg.c} and described by 189@code{pass_remove_useless_stmts}. 190 191@item Mudflap declaration registration 192 193If mudflap (@pxref{Optimize Options,,-fmudflap -fmudflapth 194-fmudflapir,gcc,Using the GNU Compiler Collection (GCC)}) is 195enabled, we generate code to register some variable declarations with 196the mudflap runtime. Specifically, the runtime tracks the lifetimes of 197those variable declarations that have their addresses taken, or whose 198bounds are unknown at compile time (@code{extern}). This pass generates 199new exception handling constructs (@code{try}/@code{finally}), and so 200must run before those are lowered. In addition, the pass enqueues 201declarations of static variables whose lifetimes extend to the entire 202program. The pass is located in @file{tree-mudflap.c} and is described 203by @code{pass_mudflap_1}. 204 205@item OpenMP lowering 206 207If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers 208OpenMP constructs into GIMPLE. 209 210Lowering of OpenMP constructs involves creating replacement 211expressions for local variables that have been mapped using data 212sharing clauses, exposing the control flow of most synchronization 213directives and adding region markers to facilitate the creation of the 214control flow graph. The pass is located in @file{omp-low.c} and is 215described by @code{pass_lower_omp}. 216 217@item OpenMP expansion 218 219If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands 220parallel regions into their own functions to be invoked by the thread 221library. The pass is located in @file{omp-low.c} and is described by 222@code{pass_expand_omp}. 223 224@item Lower control flow 225 226This pass flattens @code{if} statements (@code{COND_EXPR}) 227and moves lexical bindings (@code{BIND_EXPR}) out of line. After 228this pass, all @code{if} statements will have exactly two @code{goto} 229statements in its @code{then} and @code{else} arms. Lexical binding 230information for each statement will be found in @code{TREE_BLOCK} rather 231than being inferred from its position under a @code{BIND_EXPR}. This 232pass is found in @file{gimple-low.c} and is described by 233@code{pass_lower_cf}. 234 235@item Lower exception handling control flow 236 237This pass decomposes high-level exception handling constructs 238(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form 239that explicitly represents the control flow involved. After this 240pass, @code{lookup_stmt_eh_region} will return a non-negative 241number for any statement that may have EH control flow semantics; 242examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} 243for exact semantics. Exact control flow may be extracted from 244@code{foreach_reachable_handler}. The EH region nesting tree is defined 245in @file{except.h} and built in @file{except.c}. The lowering pass 246itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. 247 248@item Build the control flow graph 249 250This pass decomposes a function into basic blocks and creates all of 251the edges that connect them. It is located in @file{tree-cfg.c} and 252is described by @code{pass_build_cfg}. 253 254@item Find all referenced variables 255 256This pass walks the entire function and collects an array of all 257variables referenced in the function, @code{referenced_vars}. The 258index at which a variable is found in the array is used as a UID 259for the variable within this function. This data is needed by the 260SSA rewriting routines. The pass is located in @file{tree-dfa.c} 261and is described by @code{pass_referenced_vars}. 262 263@item Enter static single assignment form 264 265This pass rewrites the function such that it is in SSA form. After 266this pass, all @code{is_gimple_reg} variables will be referenced by 267@code{SSA_NAME}, and all occurrences of other variables will be 268annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have 269been inserted as necessary for each basic block. This pass is 270located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. 271 272@item Warn for uninitialized variables 273 274This pass scans the function for uses of @code{SSA_NAME}s that 275are fed by default definition. For non-parameter variables, such 276uses are uninitialized. The pass is run twice, before and after 277optimization. In the first pass we only warn for uses that are 278positively uninitialized; in the second pass we warn for uses that 279are possibly uninitialized. The pass is located in @file{tree-ssa.c} 280and is defined by @code{pass_early_warn_uninitialized} and 281@code{pass_late_warn_uninitialized}. 282 283@item Dead code elimination 284 285This pass scans the function for statements without side effects whose 286result is unused. It does not do memory life analysis, so any value 287that is stored in memory is considered used. The pass is run multiple 288times throughout the optimization process. It is located in 289@file{tree-ssa-dce.c} and is described by @code{pass_dce}. 290 291@item Dominator optimizations 292 293This pass performs trivial dominator-based copy and constant propagation, 294expression simplification, and jump threading. It is run multiple times 295throughout the optimization process. It it located in @file{tree-ssa-dom.c} 296and is described by @code{pass_dominator}. 297 298@item Redundant PHI elimination 299 300This pass removes PHI nodes for which all of the arguments are the same 301value, excluding feedback. Such degenerate forms are typically created 302by removing unreachable code. The pass is run multiple times throughout 303the optimization process. It is located in @file{tree-ssa.c} and is 304described by @code{pass_redundant_phi}.o 305 306@item Forward propagation of single-use variables 307 308This pass attempts to remove redundant computation by substituting 309variables that are used once into the expression that uses them and 310seeing if the result can be simplified. It is located in 311@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. 312 313@item Copy Renaming 314 315This pass attempts to change the name of compiler temporaries involved in 316copy operations such that SSA->normal can coalesce the copy away. When compiler 317temporaries are copies of user variables, it also renames the compiler 318temporary to the user variable resulting in better use of user symbols. It is 319located in @file{tree-ssa-copyrename.c} and is described by 320@code{pass_copyrename}. 321 322@item PHI node optimizations 323 324This pass recognizes forms of PHI inputs that can be represented as 325conditional expressions and rewrites them into straight line code. 326It is located in @file{tree-ssa-phiopt.c} and is described by 327@code{pass_phiopt}. 328 329@item May-alias optimization 330 331This pass performs a flow sensitive SSA-based points-to analysis. 332The resulting may-alias, must-alias, and escape analysis information 333is used to promote variables from in-memory addressable objects to 334non-aliased variables that can be renamed into SSA form. We also 335update the @code{VDEF}/@code{VUSE} memory tags for non-renameable 336aggregates so that we get fewer false kills. The pass is located 337in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. 338 339Interprocedural points-to information is located in 340@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. 341 342@item Profiling 343 344This pass rewrites the function in order to collect runtime block 345and value profiling data. Such data may be fed back into the compiler 346on a subsequent run so as to allow optimization based on expected 347execution frequencies. The pass is located in @file{predict.c} and 348is described by @code{pass_profile}. 349 350@item Lower complex arithmetic 351 352This pass rewrites complex arithmetic operations into their component 353scalar arithmetic operations. The pass is located in @file{tree-complex.c} 354and is described by @code{pass_lower_complex}. 355 356@item Scalar replacement of aggregates 357 358This pass rewrites suitable non-aliased local aggregate variables into 359a set of scalar variables. The resulting scalar variables are 360rewritten into SSA form, which allows subsequent optimization passes 361to do a significantly better job with them. The pass is located in 362@file{tree-sra.c} and is described by @code{pass_sra}. 363 364@item Dead store elimination 365 366This pass eliminates stores to memory that are subsequently overwritten 367by another store, without any intervening loads. The pass is located 368in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. 369 370@item Tail recursion elimination 371 372This pass transforms tail recursion into a loop. It is located in 373@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. 374 375@item Forward store motion 376 377This pass sinks stores and assignments down the flowgraph closer to it's 378use point. The pass is located in @file{tree-ssa-sink.c} and is 379described by @code{pass_sink_code}. 380 381@item Partial redundancy elimination 382 383This pass eliminates partially redundant computations, as well as 384performing load motion. The pass is located in @file{tree-ssa-pre.c} 385and is described by @code{pass_pre}. 386 387Just before partial redundancy elimination, if 388@option{-funsafe-math-optimizations} is on, GCC tries to convert 389divisions to multiplications by the reciprocal. The pass is located 390in @file{tree-ssa-math-opts.c} and is described by 391@code{pass_cse_reciprocal}. 392 393@item Full redundancy elimination 394 395This is a simpler form of PRE that only eliminate redundancies that 396occur an all paths. It is located in @file{tree-ssa-pre.c} and 397described by @code{pass_fre}. 398 399@item Loop optimization 400 401The main driver of the pass is placed in @file{tree-ssa-loop.c} 402and described by @code{pass_loop}. 403 404The optimizations performed by this pass are: 405 406Loop invariant motion. This pass moves only invariants that 407would be hard to handle on rtl level (function calls, operations that expand to 408nontrivial sequences of insns). With @option{-funswitch-loops} it also moves 409operands of conditions that are invariant out of the loop, so that we can use 410just trivial invariantness analysis in loop unswitching. The pass also includes 411store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. 412 413Canonical induction variable creation. This pass creates a simple counter 414for number of iterations of the loop and replaces the exit condition of the 415loop using it, in case when a complicated analysis is necessary to determine 416the number of iterations. Later optimizations then may determine the number 417easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. 418 419Induction variable optimizations. This pass performs standard induction 420variable optimizations, including strength reduction, induction variable 421merging and induction variable elimination. The pass is implemented in 422@file{tree-ssa-loop-ivopts.c}. 423 424Loop unswitching. This pass moves the conditional jumps that are invariant 425out of the loops. To achieve this, a duplicate of the loop is created for 426each possible outcome of conditional jump(s). The pass is implemented in 427@file{tree-ssa-loop-unswitch.c}. This pass should eventually replace the 428rtl-level loop unswitching in @file{loop-unswitch.c}, but currently 429the rtl-level pass is not completely redundant yet due to deficiencies 430in tree level alias analysis. 431 432The optimizations also use various utility functions contained in 433@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and 434@file{cfgloopmanip.c}. 435 436Vectorization. This pass transforms loops to operate on vector types 437instead of scalar types. Data parallelism across loop iterations is exploited 438to group data elements from consecutive iterations into a vector and operate 439on them in parallel. Depending on available target support the loop is 440conceptually unrolled by a factor @code{VF} (vectorization factor), which is 441the number of elements operated upon in parallel in each iteration, and the 442@code{VF} copies of each scalar operation are fused to form a vector operation. 443Additional loop transformations such as peeling and versioning may take place 444to align the number of iterations, and to align the memory accesses in the loop. 445The pass is implemented in @file{tree-vectorizer.c} (the main driver and general 446utilities), @file{tree-vect-analyze.c} and @file{tree-vect-transform.c}. 447Analysis of data references is in @file{tree-data-ref.c}. 448 449@item Tree level if-conversion for vectorizer 450 451This pass applies if-conversion to simple loops to help vectorizer. 452We identify if convertible loops, if-convert statements and merge 453basic blocks in one big block. The idea is to present loop in such 454form so that vectorizer can have one to one mapping between statements 455and available vector operations. This patch re-introduces COND_EXPR 456at GIMPLE level. This pass is located in @file{tree-if-conv.c} and is 457described by @code{pass_if_conversion}. 458 459@item Conditional constant propagation 460 461This pass relaxes a lattice of values in order to identify those 462that must be constant even in the presence of conditional branches. 463The pass is located in @file{tree-ssa-ccp.c} and is described 464by @code{pass_ccp}. 465 466A related pass that works on memory loads and stores, and not just 467register values, is located in @file{tree-ssa-ccp.c} and described by 468@code{pass_store_ccp}. 469 470@item Conditional copy propagation 471 472This is similar to constant propagation but the lattice of values is 473the ``copy-of'' relation. It eliminates redundant copies from the 474code. The pass is located in @file{tree-ssa-copy.c} and described by 475@code{pass_copy_prop}. 476 477A related pass that works on memory copies, and not just register 478copies, is located in @file{tree-ssa-copy.c} and described by 479@code{pass_store_copy_prop}. 480 481@item Value range propagation 482 483This transformation is similar to constant propagation but 484instead of propagating single constant values, it propagates 485known value ranges. The implementation is based on Patterson's 486range propagation algorithm (Accurate Static Branch Prediction by 487Value Range Propagation, J. R. C. Patterson, PLDI '95). In 488contrast to Patterson's algorithm, this implementation does not 489propagate branch probabilities nor it uses more than a single 490range per SSA name. This means that the current implementation 491cannot be used for branch prediction (though adapting it would 492not be difficult). The pass is located in @file{tree-vrp.c} and is 493described by @code{pass_vrp}. 494 495@item Folding built-in functions 496 497This pass simplifies built-in functions, as applicable, with constant 498arguments or with inferrable string lengths. It is located in 499@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. 500 501@item Split critical edges 502 503This pass identifies critical edges and inserts empty basic blocks 504such that the edge is no longer critical. The pass is located in 505@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. 506 507@item Control dependence dead code elimination 508 509This pass is a stronger form of dead code elimination that can 510eliminate unnecessary control flow statements. It is located 511in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. 512 513@item Tail call elimination 514 515This pass identifies function calls that may be rewritten into 516jumps. No code transformation is actually applied here, but the 517data and control flow problem is solved. The code transformation 518requires target support, and so is delayed until RTL@. In the 519meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. 520The pass is located in @file{tree-tailcall.c} and is described by 521@code{pass_tail_calls}. The RTL transformation is handled by 522@code{fixup_tail_calls} in @file{calls.c}. 523 524@item Warn for function return without value 525 526For non-void functions, this pass locates return statements that do 527not specify a value and issues a warning. Such a statement may have 528been injected by falling off the end of the function. This pass is 529run last so that we have as much time as possible to prove that the 530statement is not reachable. It is located in @file{tree-cfg.c} and 531is described by @code{pass_warn_function_return}. 532 533@item Mudflap statement annotation 534 535If mudflap is enabled, we rewrite some memory accesses with code to 536validate that the memory access is correct. In particular, expressions 537involving pointer dereferences (@code{INDIRECT_REF}, @code{ARRAY_REF}, 538etc.) are replaced by code that checks the selected address range 539against the mudflap runtime's database of valid regions. This check 540includes an inline lookup into a direct-mapped cache, based on 541shift/mask operations of the pointer value, with a fallback function 542call into the runtime. The pass is located in @file{tree-mudflap.c} and 543is described by @code{pass_mudflap_2}. 544 545@item Leave static single assignment form 546 547This pass rewrites the function such that it is in normal form. At 548the same time, we eliminate as many single-use temporaries as possible, 549so the intermediate language is no longer GIMPLE, but GENERIC@. The 550pass is located in @file{tree-outof-ssa.c} and is described by 551@code{pass_del_ssa}. 552 553@item Merge PHI nodes that feed into one another 554 555This is part of the CFG cleanup passes. It attempts to join PHI nodes 556from a forwarder CFG block into another block with PHI nodes. The 557pass is located in @file{tree-cfgcleanup.c} and is described by 558@code{pass_merge_phi}. 559 560@item Return value optimization 561 562If a function always returns the same local variable, and that local 563variable is an aggregate type, then the variable is replaced with the 564return value for the function (i.e., the function's DECL_RESULT). This 565is equivalent to the C++ named return value optimization applied to 566GIMPLE. The pass is located in @file{tree-nrv.c} and is described by 567@code{pass_nrv}. 568 569@item Return slot optimization 570 571If a function returns a memory object and is called as @code{var = 572foo()}, this pass tries to change the call so that the address of 573@code{var} is sent to the caller to avoid an extra memory copy. This 574pass is located in @code{tree-nrv.c} and is described by 575@code{pass_return_slot}. 576 577@item Optimize calls to @code{__builtin_object_size} 578 579This is a propagation pass similar to CCP that tries to remove calls 580to @code{__builtin_object_size} when the size of the object can be 581computed at compile-time. This pass is located in 582@file{tree-object-size.c} and is described by 583@code{pass_object_sizes}. 584 585@item Loop invariant motion 586 587This pass removes expensive loop-invariant computations out of loops. 588The pass is located in @file{tree-ssa-loop.c} and described by 589@code{pass_lim}. 590 591@item Loop nest optimizations 592 593This is a family of loop transformations that works on loop nests. It 594includes loop interchange, scaling, skewing and reversal and they are 595all geared to the optimization of data locality in array traversals 596and the removal of dependencies that hamper optimizations such as loop 597parallelization and vectorization. The pass is located in 598@file{tree-loop-linear.c} and described by 599@code{pass_linear_transform}. 600 601@item Removal of empty loops 602 603This pass removes loops with no code in them. The pass is located in 604@file{tree-ssa-loop-ivcanon.c} and described by 605@code{pass_empty_loop}. 606 607@item Unrolling of small loops 608 609This pass completely unrolls loops with few iterations. The pass 610is located in @file{tree-ssa-loop-ivcanon.c} and described by 611@code{pass_complete_unroll}. 612 613@item Array prefetching 614 615This pass issues prefetch instructions for array references inside 616loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and 617described by @code{pass_loop_prefetch}. 618 619@item Reassociation 620 621This pass rewrites arithmetic expressions to enable optimizations that 622operate on them, like redundancy elimination and vectorization. The 623pass is located in @file{tree-ssa-reassoc.c} and described by 624@code{pass_reassoc}. 625 626@item Optimization of @code{stdarg} functions 627 628This pass tries to avoid the saving of register arguments into the 629stack on entry to @code{stdarg} functions. If the function doesn't 630use any @code{va_start} macros, no registers need to be saved. If 631@code{va_start} macros are used, the @code{va_list} variables don't 632escape the function, it is only necessary to save registers that will 633be used in @code{va_arg} macros. For instance, if @code{va_arg} is 634only used with integral types in the function, floating point 635registers don't need to be saved. This pass is located in 636@code{tree-stdarg.c} and described by @code{pass_stdarg}. 637 638@end itemize 639 640@node RTL passes 641@section RTL passes 642 643The following briefly describes the rtl generation and optimization 644passes that are run after tree optimization. 645 646@itemize @bullet 647@item RTL generation 648 649@c Avoiding overfull is tricky here. 650The source files for RTL generation include 651@file{stmt.c}, 652@file{calls.c}, 653@file{expr.c}, 654@file{explow.c}, 655@file{expmed.c}, 656@file{function.c}, 657@file{optabs.c} 658and @file{emit-rtl.c}. 659Also, the file 660@file{insn-emit.c}, generated from the machine description by the 661program @code{genemit}, is used in this pass. The header file 662@file{expr.h} is used for communication within this pass. 663 664@findex genflags 665@findex gencodes 666The header files @file{insn-flags.h} and @file{insn-codes.h}, 667generated from the machine description by the programs @code{genflags} 668and @code{gencodes}, tell this pass which standard names are available 669for use and which patterns correspond to them. 670 671@item Generate exception handling landing pads 672 673This pass generates the glue that handles communication between the 674exception handling library routines and the exception handlers within 675the function. Entry points in the function that are invoked by the 676exception handling library are called @dfn{landing pads}. The code 677for this pass is located within @file{except.c}. 678 679@item Cleanup control flow graph 680 681This pass removes unreachable code, simplifies jumps to next, jumps to 682jump, jumps across jumps, etc. The pass is run multiple times. 683For historical reasons, it is occasionally referred to as the ``jump 684optimization pass''. The bulk of the code for this pass is in 685@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} 686and @file{jump.c}. 687 688@item Common subexpression elimination 689 690This pass removes redundant computation within basic blocks, and 691optimizes addressing modes based on cost. The pass is run twice. 692The source is located in @file{cse.c}. 693 694@item Global common subexpression elimination. 695 696This pass performs two 697different types of GCSE depending on whether you are optimizing for 698size or not (LCM based GCSE tends to increase code size for a gain in 699speed, while Morel-Renvoise based GCSE does not). 700When optimizing for size, GCSE is done using Morel-Renvoise Partial 701Redundancy Elimination, with the exception that it does not try to move 702invariants out of loops---that is left to the loop optimization pass. 703If MR PRE GCSE is done, code hoisting (aka unification) is also done, as 704well as load motion. 705If you are optimizing for speed, LCM (lazy code motion) based GCSE is 706done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM 707based GCSE also does loop invariant code motion. We also perform load 708and store motion when optimizing for speed. 709Regardless of which type of GCSE is used, the GCSE pass also performs 710global constant and copy propagation. 711The source file for this pass is @file{gcse.c}, and the LCM routines 712are in @file{lcm.c}. 713 714@item Loop optimization 715 716This pass performs several loop related optimizations. 717The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain 718generic loop analysis and manipulation code. Initialization and finalization 719of loop structures is handled by @file{loop-init.c}. 720A loop invariant motion pass is implemented in @file{loop-invariant.c}. 721Basic block level optimizations---unrolling, peeling and unswitching loops--- 722are implemented in @file{loop-unswitch.c} and @file{loop-unroll.c}. 723Replacing of the exit condition of loops by special machine-dependent 724instructions is handled by @file{loop-doloop.c}. 725 726@item Jump bypassing 727 728This pass is an aggressive form of GCSE that transforms the control 729flow graph of a function by propagating constants into conditional 730branch instructions. The source file for this pass is @file{gcse.c}. 731 732@item If conversion 733 734This pass attempts to replace conditional branches and surrounding 735assignments with arithmetic, boolean value producing comparison 736instructions, and conditional move instructions. In the very last 737invocation after reload, it will generate predicated instructions 738when supported by the target. The pass is located in @file{ifcvt.c}. 739 740@item Web construction 741 742This pass splits independent uses of each pseudo-register. This can 743improve effect of the other transformation, such as CSE or register 744allocation. Its source files are @file{web.c}. 745 746@item Life analysis 747 748This pass computes which pseudo-registers are live at each point in 749the program, and makes the first instruction that uses a value point 750at the instruction that computed the value. It then deletes 751computations whose results are never used, and combines memory 752references with add or subtract instructions to make autoincrement or 753autodecrement addressing. The pass is located in @file{flow.c}. 754 755@item Instruction combination 756 757This pass attempts to combine groups of two or three instructions that 758are related by data flow into single instructions. It combines the 759RTL expressions for the instructions by substitution, simplifies the 760result using algebra, and then attempts to match the result against 761the machine description. The pass is located in @file{combine.c}. 762 763@item Register movement 764 765This pass looks for cases where matching constraints would force an 766instruction to need a reload, and this reload would be a 767register-to-register move. It then attempts to change the registers 768used by the instruction to avoid the move instruction. 769The pass is located in @file{regmove.c}. 770 771@item Optimize mode switching 772 773This pass looks for instructions that require the processor to be in a 774specific ``mode'' and minimizes the number of mode changes required to 775satisfy all users. What these modes are, and what they apply to are 776completely target-specific. 777The source is located in @file{mode-switching.c}. 778 779@cindex modulo scheduling 780@cindex sms, swing, software pipelining 781@item Modulo scheduling 782 783This pass looks at innermost loops and reorders their instructions 784by overlapping different iterations. Modulo scheduling is performed 785immediately before instruction scheduling. 786The pass is located in (@file{modulo-sched.c}). 787 788@item Instruction scheduling 789 790This pass looks for instructions whose output will not be available by 791the time that it is used in subsequent instructions. Memory loads and 792floating point instructions often have this behavior on RISC machines. 793It re-orders instructions within a basic block to try to separate the 794definition and use of items that otherwise would cause pipeline 795stalls. This pass is performed twice, before and after register 796allocation. The pass is located in @file{haifa-sched.c}, 797@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and 798@file{sched-vis.c}. 799 800@item Register allocation 801 802These passes make sure that all occurrences of pseudo registers are 803eliminated, either by allocating them to a hard register, replacing 804them by an equivalent expression (e.g.@: a constant) or by placing 805them on the stack. This is done in several subpasses: 806 807@itemize @bullet 808@item 809Register class preferencing. The RTL code is scanned to find out 810which register class is best for each pseudo register. The source 811file is @file{regclass.c}. 812 813@item 814Local register allocation. This pass allocates hard registers to 815pseudo registers that are used only within one basic block. Because 816the basic block is linear, it can use fast and powerful techniques to 817do a decent job. The source is located in @file{local-alloc.c}. 818 819@item 820Global register allocation. This pass allocates hard registers for 821the remaining pseudo registers (those whose life spans are not 822contained in one basic block). The pass is located in @file{global.c}. 823 824@cindex reloading 825@item 826Reloading. This pass renumbers pseudo registers with the hardware 827registers numbers they were allocated. Pseudo registers that did not 828get hard registers are replaced with stack slots. Then it finds 829instructions that are invalid because a value has failed to end up in 830a register, or has ended up in a register of the wrong kind. It fixes 831up these instructions by reloading the problematical values 832temporarily into registers. Additional instructions are generated to 833do the copying. 834 835The reload pass also optionally eliminates the frame pointer and inserts 836instructions to save and restore call-clobbered registers around calls. 837 838Source files are @file{reload.c} and @file{reload1.c}, plus the header 839@file{reload.h} used for communication between them. 840@end itemize 841 842@item Basic block reordering 843 844This pass implements profile guided code positioning. If profile 845information is not available, various types of static analysis are 846performed to make the predictions normally coming from the profile 847feedback (IE execution frequency, branch probability, etc). It is 848implemented in the file @file{bb-reorder.c}, and the various 849prediction routines are in @file{predict.c}. 850 851@item Variable tracking 852 853This pass computes where the variables are stored at each 854position in code and generates notes describing the variable locations 855to RTL code. The location lists are then generated according to these 856notes to debug information if the debugging information format supports 857location lists. 858 859@item Delayed branch scheduling 860 861This optional pass attempts to find instructions that can go into the 862delay slots of other instructions, usually jumps and calls. The 863source file name is @file{reorg.c}. 864 865@item Branch shortening 866 867On many RISC machines, branch instructions have a limited range. 868Thus, longer sequences of instructions must be used for long branches. 869In this pass, the compiler figures out what how far each instruction 870will be from each other instruction, and therefore whether the usual 871instructions, or the longer sequences, must be used for each branch. 872 873@item Register-to-stack conversion 874 875Conversion from usage of some hard registers to usage of a register 876stack may be done at this point. Currently, this is supported only 877for the floating-point registers of the Intel 80387 coprocessor. The 878source file name is @file{reg-stack.c}. 879 880@item Final 881 882This pass outputs the assembler code for the function. The source files 883are @file{final.c} plus @file{insn-output.c}; the latter is generated 884automatically from the machine description by the tool @file{genoutput}. 885The header file @file{conditions.h} is used for communication between 886these files. If mudflap is enabled, the queue of deferred declarations 887and any addressed constants (e.g., string literals) is processed by 888@code{mudflap_finish_file} into a synthetic constructor function 889containing calls into the mudflap runtime. 890 891@item Debugging information output 892 893This is run after final because it must output the stack slot offsets 894for pseudo registers that did not get hard registers. Source files 895are @file{dbxout.c} for DBX symbol table format, @file{sdbout.c} for 896SDB symbol table format, @file{dwarfout.c} for DWARF symbol table 897format, files @file{dwarf2out.c} and @file{dwarf2asm.c} for DWARF2 898symbol table format, and @file{vmsdbgout.c} for VMS debug symbol table 899format. 900 901@end itemize 902