xref: /netbsd/external/gpl3/gcc.old/dist/gcc/doc/loop.texi (revision ec02198a)
1*ec02198aSmrg@c Copyright (C) 2006-2020 Free Software Foundation, Inc.
210d565efSmrg@c Free Software Foundation, Inc.
310d565efSmrg@c This is part of the GCC manual.
410d565efSmrg@c For copying conditions, see the file gcc.texi.
510d565efSmrg
610d565efSmrg@c ---------------------------------------------------------------------
710d565efSmrg@c Loop Representation
810d565efSmrg@c ---------------------------------------------------------------------
910d565efSmrg
1010d565efSmrg@node Loop Analysis and Representation
1110d565efSmrg@chapter Analysis and Representation of Loops
1210d565efSmrg
1310d565efSmrgGCC provides extensive infrastructure for work with natural loops, i.e.,
1410d565efSmrgstrongly connected components of CFG with only one entry block.  This
1510d565efSmrgchapter describes representation of loops in GCC, both on GIMPLE and in
1610d565efSmrgRTL, as well as the interfaces to loop-related analyses (induction
1710d565efSmrgvariable analysis and number of iterations analysis).
1810d565efSmrg
1910d565efSmrg@menu
2010d565efSmrg* Loop representation::         Representation and analysis of loops.
2110d565efSmrg* Loop querying::               Getting information about loops.
2210d565efSmrg* Loop manipulation::           Loop manipulation functions.
2310d565efSmrg* LCSSA::                       Loop-closed SSA form.
2410d565efSmrg* Scalar evolutions::           Induction variables on GIMPLE.
2510d565efSmrg* loop-iv::                     Induction variables on RTL.
2610d565efSmrg* Number of iterations::        Number of iterations analysis.
2710d565efSmrg* Dependency analysis::         Data dependency analysis.
2810d565efSmrg@end menu
2910d565efSmrg
3010d565efSmrg@node Loop representation
3110d565efSmrg@section Loop representation
3210d565efSmrg@cindex Loop representation
3310d565efSmrg@cindex Loop analysis
3410d565efSmrg
3510d565efSmrgThis chapter describes the representation of loops in GCC, and functions
3610d565efSmrgthat can be used to build, modify and analyze this representation.  Most
3710d565efSmrgof the interfaces and data structures are declared in @file{cfgloop.h}.
3810d565efSmrgLoop structures are analyzed and this information disposed or updated
3910d565efSmrgat the discretion of individual passes.  Still most of the generic
4010d565efSmrgCFG manipulation routines are aware of loop structures and try to
4110d565efSmrgkeep them up-to-date.  By this means an increasing part of the
4210d565efSmrgcompilation pipeline is setup to maintain loop structure across
4310d565efSmrgpasses to allow attaching meta information to individual loops
4410d565efSmrgfor consumption by later passes.
4510d565efSmrg
4610d565efSmrgIn general, a natural loop has one entry block (header) and possibly
4710d565efSmrgseveral back edges (latches) leading to the header from the inside of
4810d565efSmrgthe loop.  Loops with several latches may appear if several loops share
4910d565efSmrga single header, or if there is a branching in the middle of the loop.
5010d565efSmrgThe representation of loops in GCC however allows only loops with a
5110d565efSmrgsingle latch.  During loop analysis, headers of such loops are split and
5210d565efSmrgforwarder blocks are created in order to disambiguate their structures.
5310d565efSmrgHeuristic based on profile information and structure of the induction
5410d565efSmrgvariables in the loops is used to determine whether the latches
5510d565efSmrgcorrespond to sub-loops or to control flow in a single loop.  This means
5610d565efSmrgthat the analysis sometimes changes the CFG, and if you run it in the
5710d565efSmrgmiddle of an optimization pass, you must be able to deal with the new
5810d565efSmrgblocks.  You may avoid CFG changes by passing
5910d565efSmrg@code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} flag to the loop discovery,
6010d565efSmrgnote however that most other loop manipulation functions will not work
6110d565efSmrgcorrectly for loops with multiple latch edges (the functions that only
6210d565efSmrgquery membership of blocks to loops and subloop relationships, or
6310d565efSmrgenumerate and test loop exits, can be expected to work).
6410d565efSmrg
6510d565efSmrgBody of the loop is the set of blocks that are dominated by its header,
6610d565efSmrgand reachable from its latch against the direction of edges in CFG@.  The
6710d565efSmrgloops are organized in a containment hierarchy (tree) such that all the
6810d565efSmrgloops immediately contained inside loop L are the children of L in the
6910d565efSmrgtree.  This tree is represented by the @code{struct loops} structure.
7010d565efSmrgThe root of this tree is a fake loop that contains all blocks in the
7110d565efSmrgfunction.  Each of the loops is represented in a @code{struct loop}
7210d565efSmrgstructure.  Each loop is assigned an index (@code{num} field of the
7310d565efSmrg@code{struct loop} structure), and the pointer to the loop is stored in
7410d565efSmrgthe corresponding field of the @code{larray} vector in the loops
7510d565efSmrgstructure.  The indices do not have to be continuous, there may be
7610d565efSmrgempty (@code{NULL}) entries in the @code{larray} created by deleting
7710d565efSmrgloops.  Also, there is no guarantee on the relative order of a loop
7810d565efSmrgand its subloops in the numbering.  The index of a loop never changes.
7910d565efSmrg
8010d565efSmrgThe entries of the @code{larray} field should not be accessed directly.
8110d565efSmrgThe function @code{get_loop} returns the loop description for a loop with
8210d565efSmrgthe given index.  @code{number_of_loops} function returns number of
8310d565efSmrgloops in the function.  To traverse all loops, use @code{FOR_EACH_LOOP}
8410d565efSmrgmacro.  The @code{flags} argument of the macro is used to determine
8510d565efSmrgthe direction of traversal and the set of loops visited.  Each loop is
8610d565efSmrgguaranteed to be visited exactly once, regardless of the changes to the
8710d565efSmrgloop tree, and the loops may be removed during the traversal.  The newly
8810d565efSmrgcreated loops are never traversed, if they need to be visited, this
89*ec02198aSmrgmust be done separately after their creation.
9010d565efSmrg
9110d565efSmrgEach basic block contains the reference to the innermost loop it belongs
9210d565efSmrgto (@code{loop_father}).  For this reason, it is only possible to have
9310d565efSmrgone @code{struct loops} structure initialized at the same time for each
9410d565efSmrgCFG@.  The global variable @code{current_loops} contains the
9510d565efSmrg@code{struct loops} structure.  Many of the loop manipulation functions
9610d565efSmrgassume that dominance information is up-to-date.
9710d565efSmrg
9810d565efSmrgThe loops are analyzed through @code{loop_optimizer_init} function.  The
9910d565efSmrgargument of this function is a set of flags represented in an integer
10010d565efSmrgbitmask.  These flags specify what other properties of the loop
10110d565efSmrgstructures should be calculated/enforced and preserved later:
10210d565efSmrg
10310d565efSmrg@itemize
10410d565efSmrg@item @code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES}: If this flag is set, no
10510d565efSmrgchanges to CFG will be performed in the loop analysis, in particular,
10610d565efSmrgloops with multiple latch edges will not be disambiguated.  If a loop
10710d565efSmrghas multiple latches, its latch block is set to NULL@.  Most of
10810d565efSmrgthe loop manipulation functions will not work for loops in this shape.
10910d565efSmrgNo other flags that require CFG changes can be passed to
11010d565efSmrgloop_optimizer_init.
11110d565efSmrg@item @code{LOOPS_HAVE_PREHEADERS}: Forwarder blocks are created in such
11210d565efSmrga way that each loop has only one entry edge, and additionally, the
11310d565efSmrgsource block of this entry edge has only one successor.  This creates a
11410d565efSmrgnatural place where the code can be moved out of the loop, and ensures
11510d565efSmrgthat the entry edge of the loop leads from its immediate super-loop.
11610d565efSmrg@item @code{LOOPS_HAVE_SIMPLE_LATCHES}: Forwarder blocks are created to
11710d565efSmrgforce the latch block of each loop to have only one successor.  This
11810d565efSmrgensures that the latch of the loop does not belong to any of its
11910d565efSmrgsub-loops, and makes manipulation with the loops significantly easier.
12010d565efSmrgMost of the loop manipulation functions assume that the loops are in
12110d565efSmrgthis shape.  Note that with this flag, the ``normal'' loop without any
12210d565efSmrgcontrol flow inside and with one exit consists of two basic blocks.
12310d565efSmrg@item @code{LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS}: Basic blocks and
12410d565efSmrgedges in the strongly connected components that are not natural loops
12510d565efSmrg(have more than one entry block) are marked with
12610d565efSmrg@code{BB_IRREDUCIBLE_LOOP} and @code{EDGE_IRREDUCIBLE_LOOP} flags.  The
12710d565efSmrgflag is not set for blocks and edges that belong to natural loops that
12810d565efSmrgare in such an irreducible region (but it is set for the entry and exit
12910d565efSmrgedges of such a loop, if they lead to/from this region).
13010d565efSmrg@item @code{LOOPS_HAVE_RECORDED_EXITS}: The lists of exits are recorded
13110d565efSmrgand updated for each loop.  This makes some functions (e.g.,
13210d565efSmrg@code{get_loop_exit_edges}) more efficient.  Some functions (e.g.,
13310d565efSmrg@code{single_exit}) can be used only if the lists of exits are
13410d565efSmrgrecorded.
13510d565efSmrg@end itemize
13610d565efSmrg
13710d565efSmrgThese properties may also be computed/enforced later, using functions
13810d565efSmrg@code{create_preheaders}, @code{force_single_succ_latches},
13910d565efSmrg@code{mark_irreducible_loops} and @code{record_loop_exits}.
14010d565efSmrgThe properties can be queried using @code{loops_state_satisfies_p}.
14110d565efSmrg
14210d565efSmrgThe memory occupied by the loops structures should be freed with
14310d565efSmrg@code{loop_optimizer_finalize} function.  When loop structures are
14410d565efSmrgsetup to be preserved across passes this function reduces the
14510d565efSmrginformation to be kept up-to-date to a minimum (only
14610d565efSmrg@code{LOOPS_MAY_HAVE_MULTIPLE_LATCHES} set).
14710d565efSmrg
14810d565efSmrgThe CFG manipulation functions in general do not update loop structures.
14910d565efSmrgSpecialized versions that additionally do so are provided for the most
15010d565efSmrgcommon tasks.  On GIMPLE, @code{cleanup_tree_cfg_loop} function can be
15110d565efSmrgused to cleanup CFG while updating the loops structures if
15210d565efSmrg@code{current_loops} is set.
15310d565efSmrg
15410d565efSmrgAt the moment loop structure is preserved from the start of GIMPLE
15510d565efSmrgloop optimizations until the end of RTL loop optimizations.  During
15610d565efSmrgthis time a loop can be tracked by its @code{struct loop} and number.
15710d565efSmrg
15810d565efSmrg@node Loop querying
15910d565efSmrg@section Loop querying
16010d565efSmrg@cindex Loop querying
16110d565efSmrg
16210d565efSmrgThe functions to query the information about loops are declared in
16310d565efSmrg@file{cfgloop.h}.  Some of the information can be taken directly from
16410d565efSmrgthe structures.  @code{loop_father} field of each basic block contains
16510d565efSmrgthe innermost loop to that the block belongs.  The most useful fields of
16610d565efSmrgloop structure (that are kept up-to-date at all times) are:
16710d565efSmrg
16810d565efSmrg@itemize
16910d565efSmrg@item @code{header}, @code{latch}: Header and latch basic blocks of the
17010d565efSmrgloop.
17110d565efSmrg@item @code{num_nodes}: Number of basic blocks in the loop (including
17210d565efSmrgthe basic blocks of the sub-loops).
17310d565efSmrg@item @code{outer}, @code{inner}, @code{next}: The super-loop, the first
17410d565efSmrgsub-loop, and the sibling of the loop in the loops tree.
17510d565efSmrg@end itemize
17610d565efSmrg
17710d565efSmrgThere are other fields in the loop structures, many of them used only by
17810d565efSmrgsome of the passes, or not updated during CFG changes; in general, they
17910d565efSmrgshould not be accessed directly.
18010d565efSmrg
18110d565efSmrgThe most important functions to query loop structures are:
18210d565efSmrg
18310d565efSmrg@itemize
18410d565efSmrg@item @code{loop_depth}: The depth of the loop in the loops tree, i.e., the
18510d565efSmrgnumber of super-loops of the loop.
18610d565efSmrg@item @code{flow_loops_dump}: Dumps the information about loops to a
18710d565efSmrgfile.
18810d565efSmrg@item @code{verify_loop_structure}: Checks consistency of the loop
18910d565efSmrgstructures.
19010d565efSmrg@item @code{loop_latch_edge}: Returns the latch edge of a loop.
19110d565efSmrg@item @code{loop_preheader_edge}: If loops have preheaders, returns
19210d565efSmrgthe preheader edge of a loop.
19310d565efSmrg@item @code{flow_loop_nested_p}: Tests whether loop is a sub-loop of
19410d565efSmrganother loop.
19510d565efSmrg@item @code{flow_bb_inside_loop_p}: Tests whether a basic block belongs
19610d565efSmrgto a loop (including its sub-loops).
19710d565efSmrg@item @code{find_common_loop}: Finds the common super-loop of two loops.
19810d565efSmrg@item @code{superloop_at_depth}: Returns the super-loop of a loop with
19910d565efSmrgthe given depth.
20010d565efSmrg@item @code{tree_num_loop_insns}, @code{num_loop_insns}: Estimates the
20110d565efSmrgnumber of insns in the loop, on GIMPLE and on RTL.
20210d565efSmrg@item @code{loop_exit_edge_p}: Tests whether edge is an exit from a
20310d565efSmrgloop.
20410d565efSmrg@item @code{mark_loop_exit_edges}: Marks all exit edges of all loops
20510d565efSmrgwith @code{EDGE_LOOP_EXIT} flag.
20610d565efSmrg@item @code{get_loop_body}, @code{get_loop_body_in_dom_order},
20710d565efSmrg@code{get_loop_body_in_bfs_order}: Enumerates the basic blocks in the
20810d565efSmrgloop in depth-first search order in reversed CFG, ordered by dominance
20910d565efSmrgrelation, and breath-first search order, respectively.
21010d565efSmrg@item @code{single_exit}: Returns the single exit edge of the loop, or
21110d565efSmrg@code{NULL} if the loop has more than one exit.  You can only use this
21210d565efSmrgfunction if LOOPS_HAVE_MARKED_SINGLE_EXITS property is used.
21310d565efSmrg@item @code{get_loop_exit_edges}: Enumerates the exit edges of a loop.
21410d565efSmrg@item @code{just_once_each_iteration_p}: Returns true if the basic block
21510d565efSmrgis executed exactly once during each iteration of a loop (that is, it
21610d565efSmrgdoes not belong to a sub-loop, and it dominates the latch of the loop).
21710d565efSmrg@end itemize
21810d565efSmrg
21910d565efSmrg@node Loop manipulation
22010d565efSmrg@section Loop manipulation
22110d565efSmrg@cindex Loop manipulation
22210d565efSmrg
22310d565efSmrgThe loops tree can be manipulated using the following functions:
22410d565efSmrg
22510d565efSmrg@itemize
22610d565efSmrg@item @code{flow_loop_tree_node_add}: Adds a node to the tree.
22710d565efSmrg@item @code{flow_loop_tree_node_remove}: Removes a node from the tree.
22810d565efSmrg@item @code{add_bb_to_loop}: Adds a basic block to a loop.
22910d565efSmrg@item @code{remove_bb_from_loops}: Removes a basic block from loops.
23010d565efSmrg@end itemize
23110d565efSmrg
23210d565efSmrgMost low-level CFG functions update loops automatically.  The following
23310d565efSmrgfunctions handle some more complicated cases of CFG manipulations:
23410d565efSmrg
23510d565efSmrg@itemize
23610d565efSmrg@item @code{remove_path}: Removes an edge and all blocks it dominates.
23710d565efSmrg@item @code{split_loop_exit_edge}: Splits exit edge of the loop,
23810d565efSmrgensuring that PHI node arguments remain in the loop (this ensures that
23910d565efSmrgloop-closed SSA form is preserved).  Only useful on GIMPLE.
24010d565efSmrg@end itemize
24110d565efSmrg
24210d565efSmrgFinally, there are some higher-level loop transformations implemented.
24310d565efSmrgWhile some of them are written so that they should work on non-innermost
24410d565efSmrgloops, they are mostly untested in that case, and at the moment, they
24510d565efSmrgare only reliable for the innermost loops:
24610d565efSmrg
24710d565efSmrg@itemize
24810d565efSmrg@item @code{create_iv}: Creates a new induction variable.  Only works on
24910d565efSmrgGIMPLE@.  @code{standard_iv_increment_position} can be used to find a
25010d565efSmrgsuitable place for the iv increment.
25110d565efSmrg@item @code{duplicate_loop_to_header_edge},
25210d565efSmrg@code{tree_duplicate_loop_to_header_edge}: These functions (on RTL and
25310d565efSmrgon GIMPLE) duplicate the body of the loop prescribed number of times on
25410d565efSmrgone of the edges entering loop header, thus performing either loop
25510d565efSmrgunrolling or loop peeling.  @code{can_duplicate_loop_p}
25610d565efSmrg(@code{can_unroll_loop_p} on GIMPLE) must be true for the duplicated
25710d565efSmrgloop.
25810d565efSmrg@item @code{loop_version}: This function creates a copy of a loop, and
25910d565efSmrga branch before them that selects one of them depending on the
26010d565efSmrgprescribed condition.  This is useful for optimizations that need to
26110d565efSmrgverify some assumptions in runtime (one of the copies of the loop is
26210d565efSmrgusually left unchanged, while the other one is transformed in some way).
26310d565efSmrg@item @code{tree_unroll_loop}: Unrolls the loop, including peeling the
26410d565efSmrgextra iterations to make the number of iterations divisible by unroll
26510d565efSmrgfactor, updating the exit condition, and removing the exits that now
26610d565efSmrgcannot be taken.  Works only on GIMPLE.
26710d565efSmrg@end itemize
26810d565efSmrg
26910d565efSmrg@node LCSSA
27010d565efSmrg@section Loop-closed SSA form
27110d565efSmrg@cindex LCSSA
27210d565efSmrg@cindex Loop-closed SSA form
27310d565efSmrg
27410d565efSmrgThroughout the loop optimizations on tree level, one extra condition is
27510d565efSmrgenforced on the SSA form:  No SSA name is used outside of the loop in
27610d565efSmrgthat it is defined.  The SSA form satisfying this condition is called
27710d565efSmrg``loop-closed SSA form'' -- LCSSA@.  To enforce LCSSA, PHI nodes must be
27810d565efSmrgcreated at the exits of the loops for the SSA names that are used
27910d565efSmrgoutside of them.  Only the real operands (not virtual SSA names) are
28010d565efSmrgheld in LCSSA, in order to save memory.
28110d565efSmrg
28210d565efSmrgThere are various benefits of LCSSA:
28310d565efSmrg
28410d565efSmrg@itemize
28510d565efSmrg@item Many optimizations (value range analysis, final value
28610d565efSmrgreplacement) are interested in the values that are defined in the loop
28710d565efSmrgand used outside of it, i.e., exactly those for that we create new PHI
28810d565efSmrgnodes.
28910d565efSmrg@item In induction variable analysis, it is not necessary to specify the
29010d565efSmrgloop in that the analysis should be performed -- the scalar evolution
29110d565efSmrganalysis always returns the results with respect to the loop in that the
29210d565efSmrgSSA name is defined.
29310d565efSmrg@item It makes updating of SSA form during loop transformations simpler.
29410d565efSmrgWithout LCSSA, operations like loop unrolling may force creation of PHI
29510d565efSmrgnodes arbitrarily far from the loop, while in LCSSA, the SSA form can be
29610d565efSmrgupdated locally.  However, since we only keep real operands in LCSSA, we
29710d565efSmrgcannot use this advantage (we could have local updating of real
29810d565efSmrgoperands, but it is not much more efficient than to use generic SSA form
29910d565efSmrgupdating for it as well; the amount of changes to SSA is the same).
30010d565efSmrg@end itemize
30110d565efSmrg
30210d565efSmrgHowever, it also means LCSSA must be updated.  This is usually
30310d565efSmrgstraightforward, unless you create a new value in loop and use it
30410d565efSmrgoutside, or unless you manipulate loop exit edges (functions are
30510d565efSmrgprovided to make these manipulations simple).
30610d565efSmrg@code{rewrite_into_loop_closed_ssa} is used to rewrite SSA form to
30710d565efSmrgLCSSA, and @code{verify_loop_closed_ssa} to check that the invariant of
30810d565efSmrgLCSSA is preserved.
30910d565efSmrg
31010d565efSmrg@node Scalar evolutions
31110d565efSmrg@section Scalar evolutions
31210d565efSmrg@cindex Scalar evolutions
31310d565efSmrg@cindex IV analysis on GIMPLE
31410d565efSmrg
31510d565efSmrgScalar evolutions (SCEV) are used to represent results of induction
31610d565efSmrgvariable analysis on GIMPLE@.  They enable us to represent variables with
31710d565efSmrgcomplicated behavior in a simple and consistent way (we only use it to
31810d565efSmrgexpress values of polynomial induction variables, but it is possible to
31910d565efSmrgextend it).  The interfaces to SCEV analysis are declared in
32010d565efSmrg@file{tree-scalar-evolution.h}.  To use scalar evolutions analysis,
32110d565efSmrg@code{scev_initialize} must be used.  To stop using SCEV,
32210d565efSmrg@code{scev_finalize} should be used.  SCEV analysis caches results in
32310d565efSmrgorder to save time and memory.  This cache however is made invalid by
32410d565efSmrgmost of the loop transformations, including removal of code.  If such a
32510d565efSmrgtransformation is performed, @code{scev_reset} must be called to clean
32610d565efSmrgthe caches.
32710d565efSmrg
32810d565efSmrgGiven an SSA name, its behavior in loops can be analyzed using the
32910d565efSmrg@code{analyze_scalar_evolution} function.  The returned SCEV however
33010d565efSmrgdoes not have to be fully analyzed and it may contain references to
33110d565efSmrgother SSA names defined in the loop.  To resolve these (potentially
33210d565efSmrgrecursive) references, @code{instantiate_parameters} or
33310d565efSmrg@code{resolve_mixers} functions must be used.
33410d565efSmrg@code{instantiate_parameters} is useful when you use the results of SCEV
33510d565efSmrgonly for some analysis, and when you work with whole nest of loops at
33610d565efSmrgonce.  It will try replacing all SSA names by their SCEV in all loops,
33710d565efSmrgincluding the super-loops of the current loop, thus providing a complete
33810d565efSmrginformation about the behavior of the variable in the loop nest.
33910d565efSmrg@code{resolve_mixers} is useful if you work with only one loop at a
34010d565efSmrgtime, and if you possibly need to create code based on the value of the
34110d565efSmrginduction variable.  It will only resolve the SSA names defined in the
34210d565efSmrgcurrent loop, leaving the SSA names defined outside unchanged, even if
34310d565efSmrgtheir evolution in the outer loops is known.
34410d565efSmrg
34510d565efSmrgThe SCEV is a normal tree expression, except for the fact that it may
34610d565efSmrgcontain several special tree nodes.  One of them is
34710d565efSmrg@code{SCEV_NOT_KNOWN}, used for SSA names whose value cannot be
34810d565efSmrgexpressed.  The other one is @code{POLYNOMIAL_CHREC}.  Polynomial chrec
34910d565efSmrghas three arguments -- base, step and loop (both base and step may
35010d565efSmrgcontain further polynomial chrecs).  Type of the expression and of base
35110d565efSmrgand step must be the same.  A variable has evolution
35210d565efSmrg@code{POLYNOMIAL_CHREC(base, step, loop)} if it is (in the specified
35310d565efSmrgloop) equivalent to @code{x_1} in the following example
35410d565efSmrg
35510d565efSmrg@smallexample
35610d565efSmrgwhile (@dots{})
35710d565efSmrg  @{
35810d565efSmrg    x_1 = phi (base, x_2);
35910d565efSmrg    x_2 = x_1 + step;
36010d565efSmrg  @}
36110d565efSmrg@end smallexample
36210d565efSmrg
36310d565efSmrgNote that this includes the language restrictions on the operations.
36410d565efSmrgFor example, if we compile C code and @code{x} has signed type, then the
36510d565efSmrgoverflow in addition would cause undefined behavior, and we may assume
36610d565efSmrgthat this does not happen.  Hence, the value with this SCEV cannot
36710d565efSmrgoverflow (which restricts the number of iterations of such a loop).
36810d565efSmrg
36910d565efSmrgIn many cases, one wants to restrict the attention just to affine
37010d565efSmrginduction variables.  In this case, the extra expressive power of SCEV
37110d565efSmrgis not useful, and may complicate the optimizations.  In this case,
37210d565efSmrg@code{simple_iv} function may be used to analyze a value -- the result
37310d565efSmrgis a loop-invariant base and step.
37410d565efSmrg
37510d565efSmrg@node loop-iv
37610d565efSmrg@section IV analysis on RTL
37710d565efSmrg@cindex IV analysis on RTL
37810d565efSmrg
37910d565efSmrgThe induction variable on RTL is simple and only allows analysis of
38010d565efSmrgaffine induction variables, and only in one loop at once.  The interface
38110d565efSmrgis declared in @file{cfgloop.h}.  Before analyzing induction variables
38210d565efSmrgin a loop L, @code{iv_analysis_loop_init} function must be called on L.
38310d565efSmrgAfter the analysis (possibly calling @code{iv_analysis_loop_init} for
38410d565efSmrgseveral loops) is finished, @code{iv_analysis_done} should be called.
38510d565efSmrgThe following functions can be used to access the results of the
38610d565efSmrganalysis:
38710d565efSmrg
38810d565efSmrg@itemize
38910d565efSmrg@item @code{iv_analyze}: Analyzes a single register used in the given
39010d565efSmrginsn.  If no use of the register in this insn is found, the following
39110d565efSmrginsns are scanned, so that this function can be called on the insn
39210d565efSmrgreturned by get_condition.
39310d565efSmrg@item @code{iv_analyze_result}: Analyzes result of the assignment in the
39410d565efSmrggiven insn.
39510d565efSmrg@item @code{iv_analyze_expr}: Analyzes a more complicated expression.
39610d565efSmrgAll its operands are analyzed by @code{iv_analyze}, and hence they must
39710d565efSmrgbe used in the specified insn or one of the following insns.
39810d565efSmrg@end itemize
39910d565efSmrg
40010d565efSmrgThe description of the induction variable is provided in @code{struct
40110d565efSmrgrtx_iv}.  In order to handle subregs, the representation is a bit
40210d565efSmrgcomplicated; if the value of the @code{extend} field is not
40310d565efSmrg@code{UNKNOWN}, the value of the induction variable in the i-th
40410d565efSmrgiteration is
40510d565efSmrg
40610d565efSmrg@smallexample
40710d565efSmrgdelta + mult * extend_@{extend_mode@} (subreg_@{mode@} (base + i * step)),
40810d565efSmrg@end smallexample
40910d565efSmrg
41010d565efSmrgwith the following exception:  if @code{first_special} is true, then the
41110d565efSmrgvalue in the first iteration (when @code{i} is zero) is @code{delta +
41210d565efSmrgmult * base}.  However, if @code{extend} is equal to @code{UNKNOWN},
41310d565efSmrgthen @code{first_special} must be false, @code{delta} 0, @code{mult} 1
41410d565efSmrgand the value in the i-th iteration is
41510d565efSmrg
41610d565efSmrg@smallexample
41710d565efSmrgsubreg_@{mode@} (base + i * step)
41810d565efSmrg@end smallexample
41910d565efSmrg
42010d565efSmrgThe function @code{get_iv_value} can be used to perform these
42110d565efSmrgcalculations.
42210d565efSmrg
42310d565efSmrg@node Number of iterations
42410d565efSmrg@section Number of iterations analysis
42510d565efSmrg@cindex Number of iterations analysis
42610d565efSmrg
42710d565efSmrgBoth on GIMPLE and on RTL, there are functions available to determine
42810d565efSmrgthe number of iterations of a loop, with a similar interface.  The
42910d565efSmrgnumber of iterations of a loop in GCC is defined as the number of
43010d565efSmrgexecutions of the loop latch.  In many cases, it is not possible to
43110d565efSmrgdetermine the number of iterations unconditionally -- the determined
43210d565efSmrgnumber is correct only if some assumptions are satisfied.  The analysis
43310d565efSmrgtries to verify these conditions using the information contained in the
43410d565efSmrgprogram; if it fails, the conditions are returned together with the
43510d565efSmrgresult.  The following information and conditions are provided by the
43610d565efSmrganalysis:
43710d565efSmrg
43810d565efSmrg@itemize
43910d565efSmrg@item @code{assumptions}: If this condition is false, the rest of
44010d565efSmrgthe information is invalid.
44110d565efSmrg@item @code{noloop_assumptions} on RTL, @code{may_be_zero} on GIMPLE: If
44210d565efSmrgthis condition is true, the loop exits in the first iteration.
44310d565efSmrg@item @code{infinite}: If this condition is true, the loop is infinite.
44410d565efSmrgThis condition is only available on RTL@.  On GIMPLE, conditions for
44510d565efSmrgfiniteness of the loop are included in @code{assumptions}.
44610d565efSmrg@item @code{niter_expr} on RTL, @code{niter} on GIMPLE: The expression
44710d565efSmrgthat gives number of iterations.  The number of iterations is defined as
44810d565efSmrgthe number of executions of the loop latch.
44910d565efSmrg@end itemize
45010d565efSmrg
45110d565efSmrgBoth on GIMPLE and on RTL, it necessary for the induction variable
45210d565efSmrganalysis framework to be initialized (SCEV on GIMPLE, loop-iv on RTL).
45310d565efSmrgOn GIMPLE, the results are stored to @code{struct tree_niter_desc}
45410d565efSmrgstructure.  Number of iterations before the loop is exited through a
45510d565efSmrggiven exit can be determined using @code{number_of_iterations_exit}
45610d565efSmrgfunction.  On RTL, the results are returned in @code{struct niter_desc}
45710d565efSmrgstructure.  The corresponding function is named
45810d565efSmrg@code{check_simple_exit}.  There are also functions that pass through
45910d565efSmrgall the exits of a loop and try to find one with easy to determine
46010d565efSmrgnumber of iterations -- @code{find_loop_niter} on GIMPLE and
46110d565efSmrg@code{find_simple_exit} on RTL@.  Finally, there are functions that
46210d565efSmrgprovide the same information, but additionally cache it, so that
46310d565efSmrgrepeated calls to number of iterations are not so costly --
46410d565efSmrg@code{number_of_latch_executions} on GIMPLE and @code{get_simple_loop_desc}
46510d565efSmrgon RTL.
46610d565efSmrg
46710d565efSmrgNote that some of these functions may behave slightly differently than
46810d565efSmrgothers -- some of them return only the expression for the number of
46910d565efSmrgiterations, and fail if there are some assumptions.  The function
47010d565efSmrg@code{number_of_latch_executions} works only for single-exit loops.
47110d565efSmrgThe function @code{number_of_cond_exit_executions} can be used to
47210d565efSmrgdetermine number of executions of the exit condition of a single-exit
47310d565efSmrgloop (i.e., the @code{number_of_latch_executions} increased by one).
47410d565efSmrg
47510d565efSmrgOn GIMPLE, below constraint flags affect semantics of some APIs of number
47610d565efSmrgof iterations analyzer:
47710d565efSmrg
47810d565efSmrg@itemize
47910d565efSmrg@item @code{LOOP_C_INFINITE}: If this constraint flag is set, the loop
48010d565efSmrgis known to be infinite.  APIs like @code{number_of_iterations_exit} can
48110d565efSmrgreturn false directly without doing any analysis.
48210d565efSmrg@item @code{LOOP_C_FINITE}: If this constraint flag is set, the loop is
48310d565efSmrgknown to be finite, in other words, loop's number of iterations can be
48410d565efSmrgcomputed with @code{assumptions} be true.
48510d565efSmrg@end itemize
48610d565efSmrg
48710d565efSmrgGenerally, the constraint flags are set/cleared by consumers which are
48810d565efSmrgloop optimizers.  It's also the consumers' responsibility to set/clear
48910d565efSmrgconstraints correctly.  Failing to do that might result in hard to track
49010d565efSmrgdown bugs in scev/niter consumers.  One typical use case is vectorizer:
49110d565efSmrgit drives number of iterations analyzer by setting @code{LOOP_C_FINITE}
49210d565efSmrgand vectorizes possibly infinite loop by versioning loop with analysis
49310d565efSmrgresult.  In return, constraints set by consumers can also help number of
49410d565efSmrgiterations analyzer in following optimizers.  For example, @code{niter}
49510d565efSmrgof a loop versioned under @code{assumptions} is valid unconditionally.
49610d565efSmrg
49710d565efSmrgOther constraints may be added in the future, for example, a constraint
49810d565efSmrgindicating that loops' latch must roll thus @code{may_be_zero} would be
49910d565efSmrgfalse unconditionally.
50010d565efSmrg
50110d565efSmrg@node Dependency analysis
50210d565efSmrg@section Data Dependency Analysis
50310d565efSmrg@cindex Data Dependency Analysis
50410d565efSmrg
50510d565efSmrgThe code for the data dependence analysis can be found in
50610d565efSmrg@file{tree-data-ref.c} and its interface and data structures are
50710d565efSmrgdescribed in @file{tree-data-ref.h}.  The function that computes the
50810d565efSmrgdata dependences for all the array and pointer references for a given
50910d565efSmrgloop is @code{compute_data_dependences_for_loop}.  This function is
51010d565efSmrgcurrently used by the linear loop transform and the vectorization
51110d565efSmrgpasses.  Before calling this function, one has to allocate two vectors:
51210d565efSmrga first vector will contain the set of data references that are
51310d565efSmrgcontained in the analyzed loop body, and the second vector will contain
51410d565efSmrgthe dependence relations between the data references.  Thus if the
51510d565efSmrgvector of data references is of size @code{n}, the vector containing the
51610d565efSmrgdependence relations will contain @code{n*n} elements.  However if the
51710d565efSmrganalyzed loop contains side effects, such as calls that potentially can
51810d565efSmrginterfere with the data references in the current analyzed loop, the
51910d565efSmrganalysis stops while scanning the loop body for data references, and
52010d565efSmrginserts a single @code{chrec_dont_know} in the dependence relation
52110d565efSmrgarray.
52210d565efSmrg
52310d565efSmrgThe data references are discovered in a particular order during the
52410d565efSmrgscanning of the loop body: the loop body is analyzed in execution order,
52510d565efSmrgand the data references of each statement are pushed at the end of the
52610d565efSmrgdata reference array.  Two data references syntactically occur in the
52710d565efSmrgprogram in the same order as in the array of data references.  This
52810d565efSmrgsyntactic order is important in some classical data dependence tests,
52910d565efSmrgand mapping this order to the elements of this array avoids costly
53010d565efSmrgqueries to the loop body representation.
53110d565efSmrg
53210d565efSmrgThree types of data references are currently handled: ARRAY_REF,
53310d565efSmrgINDIRECT_REF and COMPONENT_REF@. The data structure for the data reference
53410d565efSmrgis @code{data_reference}, where @code{data_reference_p} is a name of a
53510d565efSmrgpointer to the data reference structure. The structure contains the
53610d565efSmrgfollowing elements:
53710d565efSmrg
53810d565efSmrg@itemize
53910d565efSmrg@item @code{base_object_info}: Provides information about the base object
54010d565efSmrgof the data reference and its access functions. These access functions
54110d565efSmrgrepresent the evolution of the data reference in the loop relative to
54210d565efSmrgits base, in keeping with the classical meaning of the data reference
54310d565efSmrgaccess function for the support of arrays. For example, for a reference
54410d565efSmrg@code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
54510d565efSmrgone for each array subscript, are:
54610d565efSmrg@code{@{i_init, + i_step@}_1, @{j_init, +, j_step@}_2}.
54710d565efSmrg
54810d565efSmrg@item @code{first_location_in_loop}: Provides information about the first
54910d565efSmrglocation accessed by the data reference in the loop and about the access
55010d565efSmrgfunction used to represent evolution relative to this location. This data
55110d565efSmrgis used to support pointers, and is not used for arrays (for which we
55210d565efSmrghave base objects). Pointer accesses are represented as a one-dimensional
55310d565efSmrgaccess that starts from the first location accessed in the loop. For
55410d565efSmrgexample:
55510d565efSmrg
55610d565efSmrg@smallexample
55710d565efSmrg      for1 i
55810d565efSmrg         for2 j
55910d565efSmrg          *((int *)p + i + j) = a[i][j];
56010d565efSmrg@end smallexample
56110d565efSmrg
56210d565efSmrgThe access function of the pointer access is @code{@{0, + 4B@}_for2}
56310d565efSmrgrelative to @code{p + i}. The access functions of the array are
56410d565efSmrg@code{@{i_init, + i_step@}_for1} and @code{@{j_init, +, j_step@}_for2}
56510d565efSmrgrelative to @code{a}.
56610d565efSmrg
56710d565efSmrgUsually, the object the pointer refers to is either unknown, or we cannot
56810d565efSmrgprove that the access is confined to the boundaries of a certain object.
56910d565efSmrg
57010d565efSmrgTwo data references can be compared only if at least one of these two
57110d565efSmrgrepresentations has all its fields filled for both data references.
57210d565efSmrg
57310d565efSmrgThe current strategy for data dependence tests is as follows:
57410d565efSmrgIf both @code{a} and @code{b} are represented as arrays, compare
57510d565efSmrg@code{a.base_object} and @code{b.base_object};
57610d565efSmrgif they are equal, apply dependence tests (use access functions based on
57710d565efSmrgbase_objects).
57810d565efSmrgElse if both @code{a} and @code{b} are represented as pointers, compare
57910d565efSmrg@code{a.first_location} and @code{b.first_location};
58010d565efSmrgif they are equal, apply dependence tests (use access functions based on
58110d565efSmrgfirst location).
58210d565efSmrgHowever, if @code{a} and @code{b} are represented differently, only try
58310d565efSmrgto prove that the bases are definitely different.
58410d565efSmrg
58510d565efSmrg@item Aliasing information.
58610d565efSmrg@item Alignment information.
58710d565efSmrg@end itemize
58810d565efSmrg
58910d565efSmrgThe structure describing the relation between two data references is
59010d565efSmrg@code{data_dependence_relation} and the shorter name for a pointer to
59110d565efSmrgsuch a structure is @code{ddr_p}.  This structure contains:
59210d565efSmrg
59310d565efSmrg@itemize
59410d565efSmrg@item a pointer to each data reference,
59510d565efSmrg@item a tree node @code{are_dependent} that is set to @code{chrec_known}
59610d565efSmrgif the analysis has proved that there is no dependence between these two
59710d565efSmrgdata references, @code{chrec_dont_know} if the analysis was not able to
59810d565efSmrgdetermine any useful result and potentially there could exist a
59910d565efSmrgdependence between these data references, and @code{are_dependent} is
60010d565efSmrgset to @code{NULL_TREE} if there exist a dependence relation between the
60110d565efSmrgdata references, and the description of this dependence relation is
60210d565efSmrggiven in the @code{subscripts}, @code{dir_vects}, and @code{dist_vects}
60310d565efSmrgarrays,
60410d565efSmrg@item a boolean that determines whether the dependence relation can be
60510d565efSmrgrepresented by a classical distance vector,
60610d565efSmrg@item an array @code{subscripts} that contains a description of each
60710d565efSmrgsubscript of the data references.  Given two array accesses a
60810d565efSmrgsubscript is the tuple composed of the access functions for a given
60910d565efSmrgdimension.  For example, given @code{A[f1][f2][f3]} and
61010d565efSmrg@code{B[g1][g2][g3]}, there are three subscripts: @code{(f1, g1), (f2,
61110d565efSmrgg2), (f3, g3)}.
61210d565efSmrg@item two arrays @code{dir_vects} and @code{dist_vects} that contain
61310d565efSmrgclassical representations of the data dependences under the form of
61410d565efSmrgdirection and distance dependence vectors,
61510d565efSmrg@item an array of loops @code{loop_nest} that contains the loops to
61610d565efSmrgwhich the distance and direction vectors refer to.
61710d565efSmrg@end itemize
61810d565efSmrg
61910d565efSmrgSeveral functions for pretty printing the information extracted by the
62010d565efSmrgdata dependence analysis are available: @code{dump_ddrs} prints with a
62110d565efSmrgmaximum verbosity the details of a data dependence relations array,
62210d565efSmrg@code{dump_dist_dir_vectors} prints only the classical distance and
62310d565efSmrgdirection vectors for a data dependence relations array, and
62410d565efSmrg@code{dump_data_references} prints the details of the data references
62510d565efSmrgcontained in a data reference array.
626