1@c markers: BUG TODO 2 3@c Copyright (C) 1988-2020 Free Software Foundation, Inc. 4@c This is part of the GCC manual. 5@c For copying conditions, see the file gcc.texi. 6 7@node Passes 8@chapter Passes and Files of the Compiler 9@cindex passes and files of the compiler 10@cindex files and passes of the compiler 11@cindex compiler passes and files 12@cindex pass dumps 13 14This chapter is dedicated to giving an overview of the optimization and 15code generation passes of the compiler. In the process, it describes 16some of the language front end interface, though this description is no 17where near complete. 18 19@menu 20* Parsing pass:: The language front end turns text into bits. 21* Gimplification pass:: The bits are turned into something we can optimize. 22* Pass manager:: Sequencing the optimization passes. 23* IPA passes:: Inter-procedural optimizations. 24* Tree SSA passes:: Optimizations on a high-level representation. 25* RTL passes:: Optimizations on a low-level representation. 26* Optimization info:: Dumping optimization information from passes. 27@end menu 28 29@node Parsing pass 30@section Parsing pass 31@cindex GENERIC 32@findex lang_hooks.parse_file 33The language front end is invoked only once, via 34@code{lang_hooks.parse_file}, to parse the entire input. The language 35front end may use any intermediate language representation deemed 36appropriate. The C front end uses GENERIC trees (@pxref{GENERIC}), plus 37a double handful of language specific tree codes defined in 38@file{c-common.def}. The Fortran front end uses a completely different 39private representation. 40 41@cindex GIMPLE 42@cindex gimplification 43@cindex gimplifier 44@cindex language-independent intermediate representation 45@cindex intermediate representation lowering 46@cindex lowering, language-dependent intermediate representation 47At some point the front end must translate the representation used in the 48front end to a representation understood by the language-independent 49portions of the compiler. Current practice takes one of two forms. 50The C front end manually invokes the gimplifier (@pxref{GIMPLE}) on each function, 51and uses the gimplifier callbacks to convert the language-specific tree 52nodes directly to GIMPLE before passing the function off to be compiled. 53The Fortran front end converts from a private representation to GENERIC, 54which is later lowered to GIMPLE when the function is compiled. Which 55route to choose probably depends on how well GENERIC (plus extensions) 56can be made to match up with the source language and necessary parsing 57data structures. 58 59BUG: Gimplification must occur before nested function lowering, 60and nested function lowering must be done by the front end before 61passing the data off to cgraph. 62 63TODO: Cgraph should control nested function lowering. It would 64only be invoked when it is certain that the outer-most function 65is used. 66 67TODO: Cgraph needs a gimplify_function callback. It should be 68invoked when (1) it is certain that the function is used, (2) 69warning flags specified by the user require some amount of 70compilation in order to honor, (3) the language indicates that 71semantic analysis is not complete until gimplification occurs. 72Hum@dots{} this sounds overly complicated. Perhaps we should just 73have the front end gimplify always; in most cases it's only one 74function call. 75 76The front end needs to pass all function definitions and top level 77declarations off to the middle-end so that they can be compiled and 78emitted to the object file. For a simple procedural language, it is 79usually most convenient to do this as each top level declaration or 80definition is seen. There is also a distinction to be made between 81generating functional code and generating complete debug information. 82The only thing that is absolutely required for functional code is that 83function and data @emph{definitions} be passed to the middle-end. For 84complete debug information, function, data and type declarations 85should all be passed as well. 86 87@findex rest_of_decl_compilation 88@findex rest_of_type_compilation 89@findex cgraph_finalize_function 90In any case, the front end needs each complete top-level function or 91data declaration, and each data definition should be passed to 92@code{rest_of_decl_compilation}. Each complete type definition should 93be passed to @code{rest_of_type_compilation}. Each function definition 94should be passed to @code{cgraph_finalize_function}. 95 96TODO: I know rest_of_compilation currently has all sorts of 97RTL generation semantics. I plan to move all code generation 98bits (both Tree and RTL) to compile_function. Should we hide 99cgraph from the front ends and move back to rest_of_compilation 100as the official interface? Possibly we should rename all three 101interfaces such that the names match in some meaningful way and 102that is more descriptive than "rest_of". 103 104The middle-end will, at its option, emit the function and data 105definitions immediately or queue them for later processing. 106 107@node Gimplification pass 108@section Gimplification pass 109 110@cindex gimplification 111@cindex GIMPLE 112@dfn{Gimplification} is a whimsical term for the process of converting 113the intermediate representation of a function into the GIMPLE language 114(@pxref{GIMPLE}). The term stuck, and so words like ``gimplification'', 115``gimplify'', ``gimplifier'' and the like are sprinkled throughout this 116section of code. 117 118While a front end may certainly choose to generate GIMPLE directly if 119it chooses, this can be a moderately complex process unless the 120intermediate language used by the front end is already fairly simple. 121Usually it is easier to generate GENERIC trees plus extensions 122and let the language-independent gimplifier do most of the work. 123 124@findex gimplify_function_tree 125@findex gimplify_expr 126@findex lang_hooks.gimplify_expr 127The main entry point to this pass is @code{gimplify_function_tree} 128located in @file{gimplify.c}. From here we process the entire 129function gimplifying each statement in turn. The main workhorse 130for this pass is @code{gimplify_expr}. Approximately everything 131passes through here at least once, and it is from here that we 132invoke the @code{lang_hooks.gimplify_expr} callback. 133 134The callback should examine the expression in question and return 135@code{GS_UNHANDLED} if the expression is not a language specific 136construct that requires attention. Otherwise it should alter the 137expression in some way to such that forward progress is made toward 138producing valid GIMPLE@. If the callback is certain that the 139transformation is complete and the expression is valid GIMPLE, it 140should return @code{GS_ALL_DONE}. Otherwise it should return 141@code{GS_OK}, which will cause the expression to be processed again. 142If the callback encounters an error during the transformation (because 143the front end is relying on the gimplification process to finish 144semantic checks), it should return @code{GS_ERROR}. 145 146@node Pass manager 147@section Pass manager 148 149The pass manager is located in @file{passes.c}, @file{tree-optimize.c} 150and @file{tree-pass.h}. 151It processes passes as described in @file{passes.def}. 152Its job is to run all of the individual passes in the correct order, 153and take care of standard bookkeeping that applies to every pass. 154 155The theory of operation is that each pass defines a structure that 156represents everything we need to know about that pass---when it 157should be run, how it should be run, what intermediate language 158form or on-the-side data structures it needs. We register the pass 159to be run in some particular order, and the pass manager arranges 160for everything to happen in the correct order. 161 162The actuality doesn't completely live up to the theory at present. 163Command-line switches and @code{timevar_id_t} enumerations must still 164be defined elsewhere. The pass manager validates constraints but does 165not attempt to (re-)generate data structures or lower intermediate 166language form based on the requirements of the next pass. Nevertheless, 167what is present is useful, and a far sight better than nothing at all. 168 169Each pass should have a unique name. 170Each pass may have its own dump file (for GCC debugging purposes). 171Passes with a name starting with a star do not dump anything. 172Sometimes passes are supposed to share a dump file / option name. 173To still give these unique names, you can use a prefix that is delimited 174by a space from the part that is used for the dump file / option name. 175E.g. When the pass name is "ud dce", the name used for dump file/options 176is "dce". 177 178TODO: describe the global variables set up by the pass manager, 179and a brief description of how a new pass should use it. 180I need to look at what info RTL passes use first@enddots{} 181 182@node IPA passes 183@section Inter-procedural optimization passes 184@cindex IPA passes 185@cindex inter-procedural optimization passes 186 187The inter-procedural optimization (IPA) passes use call graph 188information to perform transformations across function boundaries. 189IPA is a critical part of link-time optimization (LTO) and 190whole-program (WHOPR) optimization, and these passes are structured 191with the needs of LTO and WHOPR in mind by dividing their operations 192into stages. For detailed discussion of the LTO/WHOPR IPA pass stages 193and interfaces, see @ref{IPA}. 194 195The following briefly describes the inter-procedural optimization (IPA) 196passes, which are split into small IPA passes, regular IPA passes, 197and late IPA passes, according to the LTO/WHOPR processing model. 198 199@menu 200* Small IPA passes:: 201* Regular IPA passes:: 202* Late IPA passes:: 203@end menu 204 205@node Small IPA passes 206@subsection Small IPA passes 207@cindex small IPA passes 208A small IPA pass is a pass derived from @code{simple_ipa_opt_pass}. 209As described in @ref{IPA}, it does everything at once and 210defines only the @emph{Execute} stage. During this 211stage it accesses and modifies the function bodies. 212No @code{generate_summary}, @code{read_summary}, or @code{write_summary} 213hooks are defined. 214 215@itemize @bullet 216@item IPA free lang data 217 218This pass frees resources that are used by the front end but are 219not needed once it is done. It is located in @file{tree.c} and is described by 220@code{pass_ipa_free_lang_data}. 221 222@item IPA function and variable visibility 223 224This is a local function pass handling visibilities of all symbols. This 225happens before LTO streaming, so @option{-fwhole-program} should be ignored 226at this level. It is located in @file{ipa-visibility.c} and is described by 227@code{pass_ipa_function_and_variable_visibility}. 228 229@item IPA remove symbols 230 231This pass performs reachability analysis and reclaims all unreachable nodes. 232It is located in @file{passes.c} and is described by 233@code{pass_ipa_remove_symbols}. 234 235@item IPA OpenACC 236 237This is a pass group for OpenACC processing. It is located in 238@file{tree-ssa-loop.c} and is described by @code{pass_ipa_oacc}. 239 240@item IPA points-to analysis 241 242This is a tree-based points-to analysis pass. The idea behind this analyzer 243is to generate set constraints from the program, then solve the resulting 244constraints in order to generate the points-to sets. It is located in 245@file{tree-ssa-structalias.c} and is described by @code{pass_ipa_pta}. 246 247@item IPA OpenACC kernels 248 249This is a pass group for processing OpenACC kernels regions. It is a 250subpass of the IPA OpenACC pass group that runs on offloaded functions 251containing OpenACC kernels loops. It is located in 252@file{tree-ssa-loop.c} and is described by 253@code{pass_ipa_oacc_kernels}. 254 255@item Target clone 256 257This is a pass for parsing functions with multiple target attributes. 258It is located in @file{multiple_target.c} and is described by 259@code{pass_target_clone}. 260 261@item IPA auto profile 262 263This pass uses AutoFDO profiling data to annotate the control flow graph. 264It is located in @file{auto-profile.c} and is described by 265@code{pass_ipa_auto_profile}. 266 267@item IPA tree profile 268 269This pass does profiling for all functions in the call graph. 270It calculates branch 271probabilities and basic block execution counts. It is located 272in @file{tree-profile.c} and is described by @code{pass_ipa_tree_profile}. 273 274@item IPA free function summary 275 276This pass is a small IPA pass when argument @code{small_p} is true. 277It releases inline function summaries and call summaries. 278It is located in @file{ipa-fnsummary.c} and is described by 279@code{pass_ipa_free_free_fn_summary}. 280 281@item IPA increase alignment 282 283This pass increases the alignment of global arrays to improve 284vectorization. It is located in @file{tree-vectorizer.c} 285and is described by @code{pass_ipa_increase_alignment}. 286 287@item IPA transactional memory 288 289This pass is for transactional memory support. 290It is located in @file{trans-mem.c} and is described by 291@code{pass_ipa_tm}. 292 293@item IPA lower emulated TLS 294 295This pass lowers thread-local storage (TLS) operations 296to emulation functions provided by libgcc. 297It is located in @file{tree-emutls.c} and is described by 298@code{pass_ipa_lower_emutls}. 299 300@end itemize 301 302@node Regular IPA passes 303@subsection Regular IPA passes 304@cindex regular IPA passes 305 306A regular IPA pass is a pass derived from @code{ipa_opt_pass_d} that 307is executed in WHOPR compilation. Regular IPA passes may have summary 308hooks implemented in any of the LGEN, WPA or LTRANS stages (@pxref{IPA}). 309 310@itemize @bullet 311@item IPA whole program visibility 312 313This pass performs various optimizations involving symbol visibility 314with @option{-fwhole-program}, including symbol privatization, 315discovering local functions, and dismantling comdat groups. It is 316located in @file{ipa-visibility.c} and is described by 317@code{pass_ipa_whole_program_visibility}. 318 319@item IPA profile 320 321The IPA profile pass propagates profiling frequencies across the call 322graph. It is located in @file{ipa-profile.c} and is described by 323@code{pass_ipa_profile}. 324 325@item IPA identical code folding 326 327This is the inter-procedural identical code folding pass. 328The goal of this transformation is to discover functions 329and read-only variables that have exactly the same semantics. It is 330located in @file{ipa-icf.c} and is described by @code{pass_ipa_icf}. 331 332@item IPA devirtualization 333 334This pass performs speculative devirtualization based on the type 335inheritance graph. When a polymorphic call has only one likely target 336in the unit, it is turned into a speculative call. It is located in 337@file{ipa-devirt.c} and is described by @code{pass_ipa_devirt}. 338 339@item IPA constant propagation 340 341The goal of this pass is to discover functions that are always invoked 342with some arguments with the same known constant values and to modify 343the functions accordingly. It can also do partial specialization and 344type-based devirtualization. It is located in @file{ipa-cp.c} and is 345described by @code{pass_ipa_cp}. 346 347@item IPA scalar replacement of aggregates 348 349This pass can replace an aggregate parameter with a set of other parameters 350representing part of the original, turning those passed by reference 351into new ones which pass the value directly. It also removes unused 352function return values and unused function parameters. This pass is 353located in @file{ipa-sra.c} and is described by @code{pass_ipa_sra}. 354 355@item IPA constructor/destructor merge 356 357This pass merges multiple constructors and destructors for static 358objects into single functions. It's only run at LTO time unless the 359target doesn't support constructors and destructors natively. The 360pass is located in @file{ipa.c} and is described by 361@code{pass_ipa_cdtor_merge}. 362 363@item IPA HSA 364 365This pass is part of the GCC support for HSA (Heterogeneous System 366Architecture) accelerators. It is responsible for creation of HSA 367clones and emitting HSAIL instructions for them. It is located in 368@file{ipa-hsa.c} and is described by @code{pass_ipa_hsa}. 369 370@item IPA function summary 371 372This pass provides function analysis for inter-procedural passes. 373It collects estimates of function body size, execution time, and frame 374size for each function. It also estimates information about function 375calls: call statement size, time and how often the parameters change 376for each call. It is located in @file{ipa-fnsummary.c} and is 377described by @code{pass_ipa_fn_summary}. 378 379@item IPA inline 380 381The IPA inline pass handles function inlining with whole-program 382knowledge. Small functions that are candidates for inlining are 383ordered in increasing badness, bounded by unit growth parameters. 384Unreachable functions are removed from the call graph. Functions called 385once and not exported from the unit are inlined. This pass is located in 386@file{ipa-inline.c} and is described by @code{pass_ipa_inline}. 387 388@item IPA pure/const analysis 389 390This pass marks functions as being either const (@code{TREE_READONLY}) or 391pure (@code{DECL_PURE_P}). The per-function information is produced 392by @code{pure_const_generate_summary}, then the global information is computed 393by performing a transitive closure over the call graph. It is located in 394@file{ipa-pure-const.c} and is described by @code{pass_ipa_pure_const}. 395 396@item IPA free function summary 397 398This pass is a regular IPA pass when argument @code{small_p} is false. 399It releases inline function summaries and call summaries. 400It is located in @file{ipa-fnsummary.c} and is described by 401@code{pass_ipa_free_fn_summary}. 402 403@item IPA reference 404 405This pass gathers information about how variables whose scope is 406confined to the compilation unit are used. It is located in 407@file{ipa-reference.c} and is described by @code{pass_ipa_reference}. 408 409@item IPA single use 410 411This pass checks whether variables are used by a single function. 412It is located in @file{ipa.c} and is described by 413@code{pass_ipa_single_use}. 414 415@item IPA comdats 416 417This pass looks for static symbols that are used exclusively 418within one comdat group, and moves them into that comdat group. It is 419located in @file{ipa-comdats.c} and is described by 420@code{pass_ipa_comdats}. 421 422@end itemize 423 424@node Late IPA passes 425@subsection Late IPA passes 426@cindex late IPA passes 427 428Late IPA passes are simple IPA passes executed after 429the regular passes. In WHOPR mode the passes are executed after 430partitioning and thus see just parts of the compiled unit. 431 432@itemize @bullet 433@item Materialize all clones 434 435Once all functions from compilation unit are in memory, produce all clones 436and update all calls. It is located in @file{ipa.c} and is described by 437@code{pass_materialize_all_clones}. 438 439@item IPA points-to analysis 440 441Points-to analysis; this is the same as the points-to-analysis pass 442run with the small IPA passes (@pxref{Small IPA passes}). 443 444@item OpenMP simd clone 445 446This is the OpenMP constructs' SIMD clone pass. It creates the appropriate 447SIMD clones for functions tagged as elemental SIMD functions. 448It is located in @file{omp-simd-clone.c} and is described by 449@code{pass_omp_simd_clone}. 450 451@end itemize 452 453@node Tree SSA passes 454@section Tree SSA passes 455 456The following briefly describes the Tree optimization passes that are 457run after gimplification and what source files they are located in. 458 459@itemize @bullet 460@item Remove useless statements 461 462This pass is an extremely simple sweep across the gimple code in which 463we identify obviously dead code and remove it. Here we do things like 464simplify @code{if} statements with constant conditions, remove 465exception handling constructs surrounding code that obviously cannot 466throw, remove lexical bindings that contain no variables, and other 467assorted simplistic cleanups. The idea is to get rid of the obvious 468stuff quickly rather than wait until later when it's more work to get 469rid of it. This pass is located in @file{tree-cfg.c} and described by 470@code{pass_remove_useless_stmts}. 471 472@item OpenMP lowering 473 474If OpenMP generation (@option{-fopenmp}) is enabled, this pass lowers 475OpenMP constructs into GIMPLE. 476 477Lowering of OpenMP constructs involves creating replacement 478expressions for local variables that have been mapped using data 479sharing clauses, exposing the control flow of most synchronization 480directives and adding region markers to facilitate the creation of the 481control flow graph. The pass is located in @file{omp-low.c} and is 482described by @code{pass_lower_omp}. 483 484@item OpenMP expansion 485 486If OpenMP generation (@option{-fopenmp}) is enabled, this pass expands 487parallel regions into their own functions to be invoked by the thread 488library. The pass is located in @file{omp-low.c} and is described by 489@code{pass_expand_omp}. 490 491@item Lower control flow 492 493This pass flattens @code{if} statements (@code{COND_EXPR}) 494and moves lexical bindings (@code{BIND_EXPR}) out of line. After 495this pass, all @code{if} statements will have exactly two @code{goto} 496statements in its @code{then} and @code{else} arms. Lexical binding 497information for each statement will be found in @code{TREE_BLOCK} rather 498than being inferred from its position under a @code{BIND_EXPR}. This 499pass is found in @file{gimple-low.c} and is described by 500@code{pass_lower_cf}. 501 502@item Lower exception handling control flow 503 504This pass decomposes high-level exception handling constructs 505(@code{TRY_FINALLY_EXPR} and @code{TRY_CATCH_EXPR}) into a form 506that explicitly represents the control flow involved. After this 507pass, @code{lookup_stmt_eh_region} will return a non-negative 508number for any statement that may have EH control flow semantics; 509examine @code{tree_can_throw_internal} or @code{tree_can_throw_external} 510for exact semantics. Exact control flow may be extracted from 511@code{foreach_reachable_handler}. The EH region nesting tree is defined 512in @file{except.h} and built in @file{except.c}. The lowering pass 513itself is in @file{tree-eh.c} and is described by @code{pass_lower_eh}. 514 515@item Build the control flow graph 516 517This pass decomposes a function into basic blocks and creates all of 518the edges that connect them. It is located in @file{tree-cfg.c} and 519is described by @code{pass_build_cfg}. 520 521@item Find all referenced variables 522 523This pass walks the entire function and collects an array of all 524variables referenced in the function, @code{referenced_vars}. The 525index at which a variable is found in the array is used as a UID 526for the variable within this function. This data is needed by the 527SSA rewriting routines. The pass is located in @file{tree-dfa.c} 528and is described by @code{pass_referenced_vars}. 529 530@item Enter static single assignment form 531 532This pass rewrites the function such that it is in SSA form. After 533this pass, all @code{is_gimple_reg} variables will be referenced by 534@code{SSA_NAME}, and all occurrences of other variables will be 535annotated with @code{VDEFS} and @code{VUSES}; PHI nodes will have 536been inserted as necessary for each basic block. This pass is 537located in @file{tree-ssa.c} and is described by @code{pass_build_ssa}. 538 539@item Warn for uninitialized variables 540 541This pass scans the function for uses of @code{SSA_NAME}s that 542are fed by default definition. For non-parameter variables, such 543uses are uninitialized. The pass is run twice, before and after 544optimization (if turned on). In the first pass we only warn for uses that are 545positively uninitialized; in the second pass we warn for uses that 546are possibly uninitialized. The pass is located in @file{tree-ssa.c} 547and is defined by @code{pass_early_warn_uninitialized} and 548@code{pass_late_warn_uninitialized}. 549 550@item Dead code elimination 551 552This pass scans the function for statements without side effects whose 553result is unused. It does not do memory life analysis, so any value 554that is stored in memory is considered used. The pass is run multiple 555times throughout the optimization process. It is located in 556@file{tree-ssa-dce.c} and is described by @code{pass_dce}. 557 558@item Dominator optimizations 559 560This pass performs trivial dominator-based copy and constant propagation, 561expression simplification, and jump threading. It is run multiple times 562throughout the optimization process. It is located in @file{tree-ssa-dom.c} 563and is described by @code{pass_dominator}. 564 565@item Forward propagation of single-use variables 566 567This pass attempts to remove redundant computation by substituting 568variables that are used once into the expression that uses them and 569seeing if the result can be simplified. It is located in 570@file{tree-ssa-forwprop.c} and is described by @code{pass_forwprop}. 571 572@item Copy Renaming 573 574This pass attempts to change the name of compiler temporaries involved in 575copy operations such that SSA->normal can coalesce the copy away. When compiler 576temporaries are copies of user variables, it also renames the compiler 577temporary to the user variable resulting in better use of user symbols. It is 578located in @file{tree-ssa-copyrename.c} and is described by 579@code{pass_copyrename}. 580 581@item PHI node optimizations 582 583This pass recognizes forms of PHI inputs that can be represented as 584conditional expressions and rewrites them into straight line code. 585It is located in @file{tree-ssa-phiopt.c} and is described by 586@code{pass_phiopt}. 587 588@item May-alias optimization 589 590This pass performs a flow sensitive SSA-based points-to analysis. 591The resulting may-alias, must-alias, and escape analysis information 592is used to promote variables from in-memory addressable objects to 593non-aliased variables that can be renamed into SSA form. We also 594update the @code{VDEF}/@code{VUSE} memory tags for non-renameable 595aggregates so that we get fewer false kills. The pass is located 596in @file{tree-ssa-alias.c} and is described by @code{pass_may_alias}. 597 598Interprocedural points-to information is located in 599@file{tree-ssa-structalias.c} and described by @code{pass_ipa_pta}. 600 601@item Profiling 602 603This pass instruments the function in order to collect runtime block 604and value profiling data. Such data may be fed back into the compiler 605on a subsequent run so as to allow optimization based on expected 606execution frequencies. The pass is located in @file{tree-profile.c} and 607is described by @code{pass_ipa_tree_profile}. 608 609@item Static profile estimation 610 611This pass implements series of heuristics to guess propababilities 612of branches. The resulting predictions are turned into edge profile 613by propagating branches across the control flow graphs. 614The pass is located in @file{tree-profile.c} and is described by 615@code{pass_profile}. 616 617@item Lower complex arithmetic 618 619This pass rewrites complex arithmetic operations into their component 620scalar arithmetic operations. The pass is located in @file{tree-complex.c} 621and is described by @code{pass_lower_complex}. 622 623@item Scalar replacement of aggregates 624 625This pass rewrites suitable non-aliased local aggregate variables into 626a set of scalar variables. The resulting scalar variables are 627rewritten into SSA form, which allows subsequent optimization passes 628to do a significantly better job with them. The pass is located in 629@file{tree-sra.c} and is described by @code{pass_sra}. 630 631@item Dead store elimination 632 633This pass eliminates stores to memory that are subsequently overwritten 634by another store, without any intervening loads. The pass is located 635in @file{tree-ssa-dse.c} and is described by @code{pass_dse}. 636 637@item Tail recursion elimination 638 639This pass transforms tail recursion into a loop. It is located in 640@file{tree-tailcall.c} and is described by @code{pass_tail_recursion}. 641 642@item Forward store motion 643 644This pass sinks stores and assignments down the flowgraph closer to their 645use point. The pass is located in @file{tree-ssa-sink.c} and is 646described by @code{pass_sink_code}. 647 648@item Partial redundancy elimination 649 650This pass eliminates partially redundant computations, as well as 651performing load motion. The pass is located in @file{tree-ssa-pre.c} 652and is described by @code{pass_pre}. 653 654Just before partial redundancy elimination, if 655@option{-funsafe-math-optimizations} is on, GCC tries to convert 656divisions to multiplications by the reciprocal. The pass is located 657in @file{tree-ssa-math-opts.c} and is described by 658@code{pass_cse_reciprocal}. 659 660@item Full redundancy elimination 661 662This is a simpler form of PRE that only eliminates redundancies that 663occur on all paths. It is located in @file{tree-ssa-pre.c} and 664described by @code{pass_fre}. 665 666@item Loop optimization 667 668The main driver of the pass is placed in @file{tree-ssa-loop.c} 669and described by @code{pass_loop}. 670 671The optimizations performed by this pass are: 672 673Loop invariant motion. This pass moves only invariants that 674would be hard to handle on RTL level (function calls, operations that expand to 675nontrivial sequences of insns). With @option{-funswitch-loops} it also moves 676operands of conditions that are invariant out of the loop, so that we can use 677just trivial invariantness analysis in loop unswitching. The pass also includes 678store motion. The pass is implemented in @file{tree-ssa-loop-im.c}. 679 680Canonical induction variable creation. This pass creates a simple counter 681for number of iterations of the loop and replaces the exit condition of the 682loop using it, in case when a complicated analysis is necessary to determine 683the number of iterations. Later optimizations then may determine the number 684easily. The pass is implemented in @file{tree-ssa-loop-ivcanon.c}. 685 686Induction variable optimizations. This pass performs standard induction 687variable optimizations, including strength reduction, induction variable 688merging and induction variable elimination. The pass is implemented in 689@file{tree-ssa-loop-ivopts.c}. 690 691Loop unswitching. This pass moves the conditional jumps that are invariant 692out of the loops. To achieve this, a duplicate of the loop is created for 693each possible outcome of conditional jump(s). The pass is implemented in 694@file{tree-ssa-loop-unswitch.c}. 695 696Loop splitting. If a loop contains a conditional statement that is 697always true for one part of the iteration space and false for the other 698this pass splits the loop into two, one dealing with one side the other 699only with the other, thereby removing one inner-loop conditional. The 700pass is implemented in @file{tree-ssa-loop-split.c}. 701 702The optimizations also use various utility functions contained in 703@file{tree-ssa-loop-manip.c}, @file{cfgloop.c}, @file{cfgloopanal.c} and 704@file{cfgloopmanip.c}. 705 706Vectorization. This pass transforms loops to operate on vector types 707instead of scalar types. Data parallelism across loop iterations is exploited 708to group data elements from consecutive iterations into a vector and operate 709on them in parallel. Depending on available target support the loop is 710conceptually unrolled by a factor @code{VF} (vectorization factor), which is 711the number of elements operated upon in parallel in each iteration, and the 712@code{VF} copies of each scalar operation are fused to form a vector operation. 713Additional loop transformations such as peeling and versioning may take place 714to align the number of iterations, and to align the memory accesses in the 715loop. 716The pass is implemented in @file{tree-vectorizer.c} (the main driver), 717@file{tree-vect-loop.c} and @file{tree-vect-loop-manip.c} (loop specific parts 718and general loop utilities), @file{tree-vect-slp} (loop-aware SLP 719functionality), @file{tree-vect-stmts.c} and @file{tree-vect-data-refs.c}. 720Analysis of data references is in @file{tree-data-ref.c}. 721 722SLP Vectorization. This pass performs vectorization of straight-line code. The 723pass is implemented in @file{tree-vectorizer.c} (the main driver), 724@file{tree-vect-slp.c}, @file{tree-vect-stmts.c} and 725@file{tree-vect-data-refs.c}. 726 727Autoparallelization. This pass splits the loop iteration space to run 728into several threads. The pass is implemented in @file{tree-parloops.c}. 729 730Graphite is a loop transformation framework based on the polyhedral 731model. Graphite stands for Gimple Represented as Polyhedra. The 732internals of this infrastructure are documented in 733@w{@uref{http://gcc.gnu.org/wiki/Graphite}}. The passes working on 734this representation are implemented in the various @file{graphite-*} 735files. 736 737@item Tree level if-conversion for vectorizer 738 739This pass applies if-conversion to simple loops to help vectorizer. 740We identify if convertible loops, if-convert statements and merge 741basic blocks in one big block. The idea is to present loop in such 742form so that vectorizer can have one to one mapping between statements 743and available vector operations. This pass is located in 744@file{tree-if-conv.c} and is described by @code{pass_if_conversion}. 745 746@item Conditional constant propagation 747 748This pass relaxes a lattice of values in order to identify those 749that must be constant even in the presence of conditional branches. 750The pass is located in @file{tree-ssa-ccp.c} and is described 751by @code{pass_ccp}. 752 753A related pass that works on memory loads and stores, and not just 754register values, is located in @file{tree-ssa-ccp.c} and described by 755@code{pass_store_ccp}. 756 757@item Conditional copy propagation 758 759This is similar to constant propagation but the lattice of values is 760the ``copy-of'' relation. It eliminates redundant copies from the 761code. The pass is located in @file{tree-ssa-copy.c} and described by 762@code{pass_copy_prop}. 763 764A related pass that works on memory copies, and not just register 765copies, is located in @file{tree-ssa-copy.c} and described by 766@code{pass_store_copy_prop}. 767 768@item Value range propagation 769 770This transformation is similar to constant propagation but 771instead of propagating single constant values, it propagates 772known value ranges. The implementation is based on Patterson's 773range propagation algorithm (Accurate Static Branch Prediction by 774Value Range Propagation, J. R. C. Patterson, PLDI '95). In 775contrast to Patterson's algorithm, this implementation does not 776propagate branch probabilities nor it uses more than a single 777range per SSA name. This means that the current implementation 778cannot be used for branch prediction (though adapting it would 779not be difficult). The pass is located in @file{tree-vrp.c} and is 780described by @code{pass_vrp}. 781 782@item Folding built-in functions 783 784This pass simplifies built-in functions, as applicable, with constant 785arguments or with inferable string lengths. It is located in 786@file{tree-ssa-ccp.c} and is described by @code{pass_fold_builtins}. 787 788@item Split critical edges 789 790This pass identifies critical edges and inserts empty basic blocks 791such that the edge is no longer critical. The pass is located in 792@file{tree-cfg.c} and is described by @code{pass_split_crit_edges}. 793 794@item Control dependence dead code elimination 795 796This pass is a stronger form of dead code elimination that can 797eliminate unnecessary control flow statements. It is located 798in @file{tree-ssa-dce.c} and is described by @code{pass_cd_dce}. 799 800@item Tail call elimination 801 802This pass identifies function calls that may be rewritten into 803jumps. No code transformation is actually applied here, but the 804data and control flow problem is solved. The code transformation 805requires target support, and so is delayed until RTL@. In the 806meantime @code{CALL_EXPR_TAILCALL} is set indicating the possibility. 807The pass is located in @file{tree-tailcall.c} and is described by 808@code{pass_tail_calls}. The RTL transformation is handled by 809@code{fixup_tail_calls} in @file{calls.c}. 810 811@item Warn for function return without value 812 813For non-void functions, this pass locates return statements that do 814not specify a value and issues a warning. Such a statement may have 815been injected by falling off the end of the function. This pass is 816run last so that we have as much time as possible to prove that the 817statement is not reachable. It is located in @file{tree-cfg.c} and 818is described by @code{pass_warn_function_return}. 819 820@item Leave static single assignment form 821 822This pass rewrites the function such that it is in normal form. At 823the same time, we eliminate as many single-use temporaries as possible, 824so the intermediate language is no longer GIMPLE, but GENERIC@. The 825pass is located in @file{tree-outof-ssa.c} and is described by 826@code{pass_del_ssa}. 827 828@item Merge PHI nodes that feed into one another 829 830This is part of the CFG cleanup passes. It attempts to join PHI nodes 831from a forwarder CFG block into another block with PHI nodes. The 832pass is located in @file{tree-cfgcleanup.c} and is described by 833@code{pass_merge_phi}. 834 835@item Return value optimization 836 837If a function always returns the same local variable, and that local 838variable is an aggregate type, then the variable is replaced with the 839return value for the function (i.e., the function's DECL_RESULT). This 840is equivalent to the C++ named return value optimization applied to 841GIMPLE@. The pass is located in @file{tree-nrv.c} and is described by 842@code{pass_nrv}. 843 844@item Return slot optimization 845 846If a function returns a memory object and is called as @code{var = 847foo()}, this pass tries to change the call so that the address of 848@code{var} is sent to the caller to avoid an extra memory copy. This 849pass is located in @code{tree-nrv.c} and is described by 850@code{pass_return_slot}. 851 852@item Optimize calls to @code{__builtin_object_size} 853 854This is a propagation pass similar to CCP that tries to remove calls 855to @code{__builtin_object_size} when the size of the object can be 856computed at compile-time. This pass is located in 857@file{tree-object-size.c} and is described by 858@code{pass_object_sizes}. 859 860@item Loop invariant motion 861 862This pass removes expensive loop-invariant computations out of loops. 863The pass is located in @file{tree-ssa-loop.c} and described by 864@code{pass_lim}. 865 866@item Loop nest optimizations 867 868This is a family of loop transformations that works on loop nests. It 869includes loop interchange, scaling, skewing and reversal and they are 870all geared to the optimization of data locality in array traversals 871and the removal of dependencies that hamper optimizations such as loop 872parallelization and vectorization. The pass is located in 873@file{tree-loop-linear.c} and described by 874@code{pass_linear_transform}. 875 876@item Removal of empty loops 877 878This pass removes loops with no code in them. The pass is located in 879@file{tree-ssa-loop-ivcanon.c} and described by 880@code{pass_empty_loop}. 881 882@item Unrolling of small loops 883 884This pass completely unrolls loops with few iterations. The pass 885is located in @file{tree-ssa-loop-ivcanon.c} and described by 886@code{pass_complete_unroll}. 887 888@item Predictive commoning 889 890This pass makes the code reuse the computations from the previous 891iterations of the loops, especially loads and stores to memory. 892It does so by storing the values of these computations to a bank 893of temporary variables that are rotated at the end of loop. To avoid 894the need for this rotation, the loop is then unrolled and the copies 895of the loop body are rewritten to use the appropriate version of 896the temporary variable. This pass is located in @file{tree-predcom.c} 897and described by @code{pass_predcom}. 898 899@item Array prefetching 900 901This pass issues prefetch instructions for array references inside 902loops. The pass is located in @file{tree-ssa-loop-prefetch.c} and 903described by @code{pass_loop_prefetch}. 904 905@item Reassociation 906 907This pass rewrites arithmetic expressions to enable optimizations that 908operate on them, like redundancy elimination and vectorization. The 909pass is located in @file{tree-ssa-reassoc.c} and described by 910@code{pass_reassoc}. 911 912@item Optimization of @code{stdarg} functions 913 914This pass tries to avoid the saving of register arguments into the 915stack on entry to @code{stdarg} functions. If the function doesn't 916use any @code{va_start} macros, no registers need to be saved. If 917@code{va_start} macros are used, the @code{va_list} variables don't 918escape the function, it is only necessary to save registers that will 919be used in @code{va_arg} macros. For instance, if @code{va_arg} is 920only used with integral types in the function, floating point 921registers don't need to be saved. This pass is located in 922@code{tree-stdarg.c} and described by @code{pass_stdarg}. 923 924@end itemize 925 926@node RTL passes 927@section RTL passes 928 929The following briefly describes the RTL generation and optimization 930passes that are run after the Tree optimization passes. 931 932@itemize @bullet 933@item RTL generation 934 935@c Avoiding overfull is tricky here. 936The source files for RTL generation include 937@file{stmt.c}, 938@file{calls.c}, 939@file{expr.c}, 940@file{explow.c}, 941@file{expmed.c}, 942@file{function.c}, 943@file{optabs.c} 944and @file{emit-rtl.c}. 945Also, the file 946@file{insn-emit.c}, generated from the machine description by the 947program @code{genemit}, is used in this pass. The header file 948@file{expr.h} is used for communication within this pass. 949 950@findex genflags 951@findex gencodes 952The header files @file{insn-flags.h} and @file{insn-codes.h}, 953generated from the machine description by the programs @code{genflags} 954and @code{gencodes}, tell this pass which standard names are available 955for use and which patterns correspond to them. 956 957@item Generation of exception landing pads 958 959This pass generates the glue that handles communication between the 960exception handling library routines and the exception handlers within 961the function. Entry points in the function that are invoked by the 962exception handling library are called @dfn{landing pads}. The code 963for this pass is located in @file{except.c}. 964 965@item Control flow graph cleanup 966 967This pass removes unreachable code, simplifies jumps to next, jumps to 968jump, jumps across jumps, etc. The pass is run multiple times. 969For historical reasons, it is occasionally referred to as the ``jump 970optimization pass''. The bulk of the code for this pass is in 971@file{cfgcleanup.c}, and there are support routines in @file{cfgrtl.c} 972and @file{jump.c}. 973 974@item Forward propagation of single-def values 975 976This pass attempts to remove redundant computation by substituting 977variables that come from a single definition, and 978seeing if the result can be simplified. It performs copy propagation 979and addressing mode selection. The pass is run twice, with values 980being propagated into loops only on the second run. The code is 981located in @file{fwprop.c}. 982 983@item Common subexpression elimination 984 985This pass removes redundant computation within basic blocks, and 986optimizes addressing modes based on cost. The pass is run twice. 987The code for this pass is located in @file{cse.c}. 988 989@item Global common subexpression elimination 990 991This pass performs two 992different types of GCSE depending on whether you are optimizing for 993size or not (LCM based GCSE tends to increase code size for a gain in 994speed, while Morel-Renvoise based GCSE does not). 995When optimizing for size, GCSE is done using Morel-Renvoise Partial 996Redundancy Elimination, with the exception that it does not try to move 997invariants out of loops---that is left to the loop optimization pass. 998If MR PRE GCSE is done, code hoisting (aka unification) is also done, as 999well as load motion. 1000If you are optimizing for speed, LCM (lazy code motion) based GCSE is 1001done. LCM is based on the work of Knoop, Ruthing, and Steffen. LCM 1002based GCSE also does loop invariant code motion. We also perform load 1003and store motion when optimizing for speed. 1004Regardless of which type of GCSE is used, the GCSE pass also performs 1005global constant and copy propagation. 1006The source file for this pass is @file{gcse.c}, and the LCM routines 1007are in @file{lcm.c}. 1008 1009@item Loop optimization 1010 1011This pass performs several loop related optimizations. 1012The source files @file{cfgloopanal.c} and @file{cfgloopmanip.c} contain 1013generic loop analysis and manipulation code. Initialization and finalization 1014of loop structures is handled by @file{loop-init.c}. 1015A loop invariant motion pass is implemented in @file{loop-invariant.c}. 1016Basic block level optimizations---unrolling, and peeling loops--- 1017are implemented in @file{loop-unroll.c}. 1018Replacing of the exit condition of loops by special machine-dependent 1019instructions is handled by @file{loop-doloop.c}. 1020 1021@item Jump bypassing 1022 1023This pass is an aggressive form of GCSE that transforms the control 1024flow graph of a function by propagating constants into conditional 1025branch instructions. The source file for this pass is @file{gcse.c}. 1026 1027@item If conversion 1028 1029This pass attempts to replace conditional branches and surrounding 1030assignments with arithmetic, boolean value producing comparison 1031instructions, and conditional move instructions. In the very last 1032invocation after reload/LRA, it will generate predicated instructions 1033when supported by the target. The code is located in @file{ifcvt.c}. 1034 1035@item Web construction 1036 1037This pass splits independent uses of each pseudo-register. This can 1038improve effect of the other transformation, such as CSE or register 1039allocation. The code for this pass is located in @file{web.c}. 1040 1041@item Instruction combination 1042 1043This pass attempts to combine groups of two or three instructions that 1044are related by data flow into single instructions. It combines the 1045RTL expressions for the instructions by substitution, simplifies the 1046result using algebra, and then attempts to match the result against 1047the machine description. The code is located in @file{combine.c}. 1048 1049@item Mode switching optimization 1050 1051This pass looks for instructions that require the processor to be in a 1052specific ``mode'' and minimizes the number of mode changes required to 1053satisfy all users. What these modes are, and what they apply to are 1054completely target-specific. The code for this pass is located in 1055@file{mode-switching.c}. 1056 1057@cindex modulo scheduling 1058@cindex sms, swing, software pipelining 1059@item Modulo scheduling 1060 1061This pass looks at innermost loops and reorders their instructions 1062by overlapping different iterations. Modulo scheduling is performed 1063immediately before instruction scheduling. The code for this pass is 1064located in @file{modulo-sched.c}. 1065 1066@item Instruction scheduling 1067 1068This pass looks for instructions whose output will not be available by 1069the time that it is used in subsequent instructions. Memory loads and 1070floating point instructions often have this behavior on RISC machines. 1071It re-orders instructions within a basic block to try to separate the 1072definition and use of items that otherwise would cause pipeline 1073stalls. This pass is performed twice, before and after register 1074allocation. The code for this pass is located in @file{haifa-sched.c}, 1075@file{sched-deps.c}, @file{sched-ebb.c}, @file{sched-rgn.c} and 1076@file{sched-vis.c}. 1077 1078@item Register allocation 1079 1080These passes make sure that all occurrences of pseudo registers are 1081eliminated, either by allocating them to a hard register, replacing 1082them by an equivalent expression (e.g.@: a constant) or by placing 1083them on the stack. This is done in several subpasses: 1084 1085@itemize @bullet 1086@item 1087The integrated register allocator (@acronym{IRA}). It is called 1088integrated because coalescing, register live range splitting, and hard 1089register preferencing are done on-the-fly during coloring. It also 1090has better integration with the reload/LRA pass. Pseudo-registers spilled 1091by the allocator or the reload/LRA have still a chance to get 1092hard-registers if the reload/LRA evicts some pseudo-registers from 1093hard-registers. The allocator helps to choose better pseudos for 1094spilling based on their live ranges and to coalesce stack slots 1095allocated for the spilled pseudo-registers. IRA is a regional 1096register allocator which is transformed into Chaitin-Briggs allocator 1097if there is one region. By default, IRA chooses regions using 1098register pressure but the user can force it to use one region or 1099regions corresponding to all loops. 1100 1101Source files of the allocator are @file{ira.c}, @file{ira-build.c}, 1102@file{ira-costs.c}, @file{ira-conflicts.c}, @file{ira-color.c}, 1103@file{ira-emit.c}, @file{ira-lives}, plus header files @file{ira.h} 1104and @file{ira-int.h} used for the communication between the allocator 1105and the rest of the compiler and between the IRA files. 1106 1107@cindex reloading 1108@item 1109Reloading. This pass renumbers pseudo registers with the hardware 1110registers numbers they were allocated. Pseudo registers that did not 1111get hard registers are replaced with stack slots. Then it finds 1112instructions that are invalid because a value has failed to end up in 1113a register, or has ended up in a register of the wrong kind. It fixes 1114up these instructions by reloading the problematical values 1115temporarily into registers. Additional instructions are generated to 1116do the copying. 1117 1118The reload pass also optionally eliminates the frame pointer and inserts 1119instructions to save and restore call-clobbered registers around calls. 1120 1121Source files are @file{reload.c} and @file{reload1.c}, plus the header 1122@file{reload.h} used for communication between them. 1123 1124@cindex Local Register Allocator (LRA) 1125@item 1126This pass is a modern replacement of the reload pass. Source files 1127are @file{lra.c}, @file{lra-assign.c}, @file{lra-coalesce.c}, 1128@file{lra-constraints.c}, @file{lra-eliminations.c}, 1129@file{lra-lives.c}, @file{lra-remat.c}, @file{lra-spills.c}, the 1130header @file{lra-int.h} used for communication between them, and the 1131header @file{lra.h} used for communication between LRA and the rest of 1132compiler. 1133 1134Unlike the reload pass, intermediate LRA decisions are reflected in 1135RTL as much as possible. This reduces the number of target-dependent 1136macros and hooks, leaving instruction constraints as the primary 1137source of control. 1138 1139LRA is run on targets for which TARGET_LRA_P returns true. 1140@end itemize 1141 1142@item Basic block reordering 1143 1144This pass implements profile guided code positioning. If profile 1145information is not available, various types of static analysis are 1146performed to make the predictions normally coming from the profile 1147feedback (IE execution frequency, branch probability, etc). It is 1148implemented in the file @file{bb-reorder.c}, and the various 1149prediction routines are in @file{predict.c}. 1150 1151@item Variable tracking 1152 1153This pass computes where the variables are stored at each 1154position in code and generates notes describing the variable locations 1155to RTL code. The location lists are then generated according to these 1156notes to debug information if the debugging information format supports 1157location lists. The code is located in @file{var-tracking.c}. 1158 1159@item Delayed branch scheduling 1160 1161This optional pass attempts to find instructions that can go into the 1162delay slots of other instructions, usually jumps and calls. The code 1163for this pass is located in @file{reorg.c}. 1164 1165@item Branch shortening 1166 1167On many RISC machines, branch instructions have a limited range. 1168Thus, longer sequences of instructions must be used for long branches. 1169In this pass, the compiler figures out what how far each instruction 1170will be from each other instruction, and therefore whether the usual 1171instructions, or the longer sequences, must be used for each branch. 1172The code for this pass is located in @file{final.c}. 1173 1174@item Register-to-stack conversion 1175 1176Conversion from usage of some hard registers to usage of a register 1177stack may be done at this point. Currently, this is supported only 1178for the floating-point registers of the Intel 80387 coprocessor. The 1179code for this pass is located in @file{reg-stack.c}. 1180 1181@item Final 1182 1183This pass outputs the assembler code for the function. The source files 1184are @file{final.c} plus @file{insn-output.c}; the latter is generated 1185automatically from the machine description by the tool @file{genoutput}. 1186The header file @file{conditions.h} is used for communication between 1187these files. 1188 1189@item Debugging information output 1190 1191This is run after final because it must output the stack slot offsets 1192for pseudo registers that did not get hard registers. Source files 1193are @file{dbxout.c} for DBX symbol table format, @file{dwarfout.c} for 1194DWARF symbol table format, files @file{dwarf2out.c} and @file{dwarf2asm.c} 1195for DWARF2 symbol table format, and @file{vmsdbgout.c} for VMS debug 1196symbol table format. 1197 1198@end itemize 1199 1200@node Optimization info 1201@section Optimization info 1202@include optinfo.texi 1203