1========================== 2Convergence And Uniformity 3========================== 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11Some parallel environments execute threads in groups that allow 12communication within the group using special primitives called 13*convergent* operations. The outcome of a convergent operation is 14sensitive to the set of threads that executes it "together", i.e., 15convergently. 16 17A value is said to be *uniform* across a set of threads if it is the 18same across those threads, and *divergent* otherwise. Correspondingly, 19a branch is said to be a uniform branch if its condition is uniform, 20and it is a divergent branch otherwise. 21 22Whether threads are *converged* or not depends on the paths they take 23through the control flow graph. Threads take different outgoing edges 24at a *divergent branch*. Divergent branches constrain 25program transforms such as changing the CFG or moving a convergent 26operation to a different point of the CFG. Performing these 27transformations across a divergent branch can change the sets of 28threads that execute convergent operations convergently. While these 29constraints are out of scope for this document, the described 30*uniformity analysis* allows these transformations to identify 31uniform branches where these constraints do not hold. 32 33Convergence and 34uniformity are inter-dependent: When threads diverge at a divergent 35branch, they may later *reconverge* at a common program point. 36Subsequent operations are performed convergently, but the inputs may 37be non-uniform, thus producing divergent outputs. 38 39Uniformity is also useful by itself on targets that execute threads in 40groups with shared execution resources (e.g. waves, warps, or 41subgroups): 42 43- Uniform outputs can potentially be computed or stored on shared 44 resources. 45- These targets must "linearize" a divergent branch to ensure that 46 each side of the branch is followed by the corresponding threads in 47 the same group. But linearization is unnecessary at uniform 48 branches, since the whole group of threads follows either one side 49 of the branch or the other. 50 51This document presents a definition of convergence that is reasonable 52for real targets and is compatible with the currently implicit 53semantics of convergent operations in LLVM IR. This is accompanied by 54a *uniformity analysis* that extends the existing divergence analysis 55[DivergenceSPMD]_ to cover irreducible control-flow. 56 57.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian 58 Hack. 2021. An Abstract Interpretation for SPMD Divergence on 59 Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL, 60 Article 31 (January 2021), 35 pages. 61 https://doi.org/10.1145/3434312 62 63Terminology 64=========== 65 66Cycles 67 Described in :ref:`cycle-terminology`. 68 69Closed path 70 Described in :ref:`cycle-closed-path`. 71 72Disjoint paths 73 Two paths in a CFG are said to be disjoint if the only nodes common 74 to both are the start node or the end node, or both. 75 76Join node 77 A join node of a branch is a node reachable along disjoint paths 78 starting from that branch. 79 80Diverged path 81 A diverged path is a path that starts from a divergent branch and 82 either reaches a join node of the branch or reaches the end of the 83 function without passing through any join node of the branch. 84 85Threads and Dynamic Instances 86============================= 87 88Each occurrence of an instruction in the program source is called a 89*static instance*. When a thread executes a program, each execution of 90a static instance produces a distinct *dynamic instance* of that 91instruction. 92 93Each thread produces a unique sequence of dynamic instances: 94 95- The sequence is generated along branch decisions and loop 96 traversals. 97- Starts with a dynamic instance of a "first" instruction. 98- Continues with dynamic instances of successive "next" 99 instructions. 100 101Threads are independent; some targets may choose to execute them in 102groups in order to share resources when possible. 103 104.. figure:: convergence-natural-loop.png 105 :name: convergence-natural-loop 106 107.. table:: 108 :name: convergence-thread-example 109 :align: left 110 111 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 112 | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 113 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 114 | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 115 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 116 | Thread 2 | Entry1 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 117 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 118 119In the above table, each row is a different thread, listing the 120dynamic instances produced by that thread from left to right. Each 121thread executes the same program that starts with an ``Entry`` node 122and ends with an ``Exit`` node, but different threads may take 123different paths through the control flow of the program. The columns 124are numbered merely for convenience, and empty cells have no special 125meaning. Dynamic instances listed in the same column are converged. 126 127.. _convergence-definition: 128 129Convergence 130=========== 131 132*Converged-with* is a transitive symmetric relation over dynamic 133instances produced by *different threads* for the *same static 134instance*. Informally, two threads that produce converged dynamic 135instances are said to be *converged*, and they are said to execute 136that static instance *convergently*, at that point in the execution. 137 138*Convergence order* is a strict partial order over dynamic instances 139that is defined as the transitive closure of: 140 1411. If dynamic instance ``P`` is executed strictly before ``Q`` in the 142 same thread, then ``P`` is *convergence-before* ``Q``. 1432. If dynamic instance ``P`` is executed strictly before ``Q1`` in the 144 same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is 145 *convergence-before* ``Q2``. 1463. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2`` 147 is executed strictly before ``Q`` in the same thread, then ``P1`` 148 is *convergence-before* ``Q``. 149 150.. table:: 151 :name: convergence-order-example 152 :align: left 153 154 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 155 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 156 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 157 | Thread 1 | Entry | ... | | | | S2 | T | ... | Exit | 158 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 159 | Thread 2 | Entry | ... | | Q2 | R | S1 | | ... | Exit | 160 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 161 | Thread 3 | Entry | ... | P | Q1 | | | | ... | | 162 +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 163 164The above table shows partial sequences of dynamic instances from 165different threads. Dynamic instances in the same column are assumed 166to be converged (i.e., related to each other in the converged-with 167relation). The resulting convergence order includes the edges ``P -> 168Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc. 169 170The fact that *convergence-before* is a strict partial order is a 171constraint on the *converged-with* relation. It is trivially satisfied 172if different dynamic instances are never converged. It is also 173trivially satisfied for all known implementations for which 174convergence plays some role. Aside from the strict partial convergence 175order, there are currently no additional constraints on the 176*converged-with* relation imposed in LLVM IR. 177 178.. _convergence-note-convergence: 179 180.. note:: 181 182 1. The ``convergent`` attribute on convergent operations does 183 constrain changes to ``converged-with``, but it is expressed in 184 terms of control flow and does not explicitly deal with thread 185 convergence. 186 187 2. The convergence-before relation is not 188 directly observable. Program transforms are in general free to 189 change the order of instructions, even though that obviously 190 changes the convergence-before relation. 191 192 3. Converged dynamic instances need not be executed at the same 193 time or even on the same resource. Converged dynamic instances 194 of a convergent operation may appear to do so but that is an 195 implementation detail. The fact that ``P`` is convergence-before 196 ``Q`` does not automatically imply that ``P`` happens-before 197 ``Q`` in a memory model sense. 198 199 4. **Future work:** Providing convergence-related guarantees to 200 compiler frontends enables some powerful optimization techniques 201 that can be used by programmers or by high-level program 202 transforms. Constraints on the ``converged-with`` relation may 203 be added eventually as part of the definition of LLVM 204 IR, so that guarantees can be made that frontends can rely on. 205 For a proposal on how this might work, see `D85603 206 <https://reviews.llvm.org/D85603>`_. 207 208.. _convergence-maximal: 209 210Maximal Convergence 211------------------- 212 213This section defines a constraint that may be used to 214produce a *maximal converged-with* relation without violating the 215strict *convergence-before* order. This maximal converged-with 216relation is reasonable for real targets and is compatible with 217convergent operations. 218 219The maximal converged-with relation is defined in terms of cycle 220headers, which are not unique to a given CFG. Each cycle hierarchy for 221the same CFG results in a different maximal converged-with relation. 222 223 **Maximal converged-with:** 224 225 Dynamic instances ``X1`` and ``X2`` produced by different threads 226 for the same static instance ``X`` are converged in the maximal 227 converged-with relation if and only if for every cycle ``C`` with 228 header ``H`` that contains ``X``: 229 230 - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in 231 the respective thread is convergence-before ``X2``, and, 232 - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in 233 the respective thread is convergence-before ``X1``, 234 - without assuming that ``X1`` is converged with ``X2``. 235 236.. note:: 237 238 For brevity, the rest of the document restricts the term 239 *converged* to mean "related under the maximal converged-with 240 relation for the given cycle hierarchy". 241 242Maximal convergence can now be demonstrated in the earlier example as follows: 243 244.. table:: 245 :align: left 246 247 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 248 | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | 249 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 250 | Thread 1 | Entry1 | H1 | B1 | L1 | H3 | | L3 | | | | Exit | 251 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 252 | Thread 2 | Entry2 | H2 | | L2 | H4 | B2 | L4 | H5 | B3 | L5 | Exit | 253 +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 254 255- ``Entry1`` and ``Entry2`` are converged. 256- ``H1`` and ``H2`` are converged. 257- ``B1`` and ``B2`` are not converged due to ``H4`` which is not 258 convergence-before ``B1``. 259- ``H3`` and ``H4`` are converged. 260- ``H3`` is not converged with ``H5`` due to ``H4`` which is not 261 convergence-before ``H3``. 262- ``L1`` and ``L2`` are converged. 263- ``L3`` and ``L4`` are converged. 264- ``L3`` is not converged with ``L5`` due to ``H5`` which is not 265 convergence-before ``L3``. 266 267.. _convergence-cycle-headers: 268 269Dependence on Cycles Headers 270---------------------------- 271 272Contradictions in convergence order are possible only between two 273nodes that are inside some cycle. The dynamic instances of such nodes 274may be interleaved in the same thread, and this interleaving may be 275different for different threads. 276 277When a thread executes a node ``X`` once and then executes it again, 278it must have followed a closed path in the CFG that includes ``X``. 279Such a path must pass through the header of at least one cycle --- the 280smallest cycle that includes the entire closed path. In a given 281thread, two dynamic instances of ``X`` are either separated by the 282execution of at least one cycle header, or ``X`` itself is a cycle 283header. 284 285In reducible cycles (natural loops), each execution of the header is 286equivalent to the start of a new iteration of the cycle. But this 287analogy breaks down in the presence of explicit constraints on the 288converged-with relation, such as those described in :ref:`future 289work<convergence-note-convergence>`. Instead, cycle headers should be 290treated as implicit *points of convergence* in a maximal 291converged-with relation. 292 293Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such 294that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle, 295with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread 296enters the cycle ``Ck``, any of the following is possible: 297 2981. The thread directly entered cycle ``Ck`` without having executed 299 any of the headers ``H1`` to ``Hk``. 300 3012. The thread executed some or all of the nested headers one or more 302 times. 303 304The maximal converged-with relation captures the following intuition 305about cycles: 306 3071. When two threads enter a top-level cycle ``C1``, they execute 308 converged dynamic instances of every node that is a :ref:`child 309 <cycle-parent-block>` of ``C1``. 310 3112. When two threads enter a nested cycle ``Ck``, they execute 312 converged dynamic instances of every node that is a child of 313 ``Ck``, until either thread exits ``Ck``, if and only if they 314 executed converged dynamic instances of the last nested header that 315 either thread encountered. 316 317 Note that when a thread exits a nested cycle ``Ck``, it must follow 318 a closed path outside ``Ck`` to reenter it. This requires executing 319 the header of some outer cycle, as described earlier. 320 321Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1`` 322and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``. 323Maximal convergence relates ``X1`` and ``X2`` as follows: 324 3251. If neither thread executed any header from ``H1`` to ``Hk``, then 326 ``X1`` and ``X2`` are converged. 327 3282. Otherwise, if there are no converged dynamic instances ``Q1`` and 329 ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is 330 possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and 331 ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and 332 ``X2`` are not converged. 333 3343. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic 335 instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most 336 recently before ``X1`` and ``X2`` in the respective threads. Then 337 ``X1`` and ``X2`` are converged if and only if there is no dynamic 338 instance of any header from ``H1`` to ``Hk`` that occurs between 339 ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in 340 thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last 341 point of convergence, with no other header being executed before 342 executing ``X``. 343 344**Example:** 345 346.. figure:: convergence-both-diverged-nested.png 347 :name: convergence-both-diverged-nested 348 349The above figure shows two nested irreducible cycles with headers 350``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent 351branches. The table below shows the convergence between three threads 352taking different paths through the CFG. Dynamic instances listed in 353the same column are converged. 354 355 .. table:: 356 :align: left 357 358 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 359 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 10 | 360 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 361 | Thread1 | Entry | P1 | Q1 | S1 | P3 | Q3 | R1 | S2 | Exit | 362 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 363 | Thread2 | Entry | P2 | Q2 | | | | R2 | S3 | Exit | 364 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 365 | Thread3 | Entry | | | | | | R3 | S4 | Exit | 366 +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+ 367 368- ``P2`` and ``P3`` are not converged due to ``S1`` 369- ``Q2`` and ``Q3`` are not converged due to ``S1`` 370- ``S1`` and ``S3`` are not converged due to ``R2`` 371- ``S1`` and ``S4`` are not converged due to ``R3`` 372 373Informally, ``T1`` and ``T2`` execute the inner cycle a different 374number of times, without executing the header of the outer cycle. All 375threads converge in the outer cycle when they first execute the header 376of the outer cycle. 377 378.. _convergence-uniformity: 379 380Uniformity 381========== 382 3831. The output of two converged dynamic instances is uniform if and 384 only if it compares equal for those two dynamic instances. 3852. The output of a static instance ``X`` is uniform *for a given set 386 of threads* if and only if it is uniform for every pair of 387 converged dynamic instances of ``X`` produced by those threads. 388 389A non-uniform value is said to be *divergent*. 390 391For a set ``S`` of threads, the uniformity of each output of a static 392instance is determined as follows: 393 3941. The semantics of the instruction may specify the output to be 395 uniform. 3962. Otherwise, if it is a PHI node, its output is uniform if and only 397 if for every pair of converged dynamic instances produced by all 398 threads in ``S``: 399 400 a. Both instances choose the same output from converged 401 dynamic instances, and, 402 b. That output is uniform for all threads in ``S``. 4033. Otherwise, the output is uniform if and only if the input 404 operands are uniform for all threads in ``S``. 405 406Divergent Cycle Exits 407--------------------- 408 409When a divergent branch occurs inside a cycle, it is possible that a 410diverged path continues to an exit of the cycle. This is called a 411divergent cycle exit. If the cycle is irreducible, the diverged path 412may re-enter and eventually reach a join within the cycle. Such a join 413should be examined for the :ref:`diverged entry 414<convergence-diverged-entry>` criterion. 415 416Nodes along the diverged path that lie outside the cycle experience 417*temporal divergence*, when two threads executing convergently inside 418the cycle produce uniform values, but exit the cycle along the same 419divergent path after executing the header a different number of times 420(informally, on different iterations of the cycle). For a node ``N`` 421inside the cycle the outputs may be uniform for the two threads, but 422any use ``U`` outside the cycle receives a value from non-converged 423dynamic instances of ``N``. An output of ``U`` may be divergent, 424depending on the semantics of the instruction. 425 426Static Uniformity Analysis 427========================== 428 429Irreducible control flow results in different cycle hierarchies 430depending on the choice of headers during depth-first traversal. As a 431result, a static analysis cannot always determine the convergence of 432nodes in irreducible cycles, and any uniformity analysis is limited to 433those static instances whose convergence is independent of the cycle 434hierarchy: 435 436 **m-converged static instances:** 437 438 A static instance ``X`` is *m-converged* for a given CFG if and only 439 if the maximal converged-with relation for its dynamic instances is 440 the same in every cycle hierarchy that can be constructed for that CFG. 441 442 .. note:: 443 444 In other words, two dynamic instances ``X1`` and ``X2`` of an 445 m-converged static instance ``X`` are converged in some cycle 446 hierarchy if and only if they are also converged in every other 447 cycle hierarchy for the same CFG. 448 449 As noted earlier, for brevity, we restrict the term *converged* to 450 mean "related under the maximal converged-with relation for a given 451 cycle hierarchy". 452 453 454Each node ``X`` in a given CFG is reported to be m-converged if and 455only if: 456 4571. ``X`` is a :ref:`top-level<cycle-toplevel-block>` node, in which 458 case, there are no cycle headers to influence the convergence of 459 ``X``. 460 4612. Otherwise, if ``X`` is inside a cycle, then every cycle that 462 contains ``X`` satisfies the following necessary conditions: 463 464 a. Every divergent branch inside the cycle satisfies the 465 :ref:`diverged entry criterion<convergence-diverged-entry>`, and, 466 b. There are no :ref:`diverged paths reaching the 467 cycle<convergence-diverged-outside>` from a divergent branch 468 outside it. 469 470.. note:: 471 472 A reducible cycle :ref:`trivially satisfies 473 <convergence-reducible-cycle>` the above conditions. In particular, 474 if the whole CFG is reducible, then all nodes in the CFG are 475 m-converged. 476 477If a static instance is not m-converged, then every output is assumed 478to be divergent. Otherwise, for an m-converged static instance, the 479uniformity of each output is determined using the criteria 480:ref:`described earlier <convergence-uniformity>`. The discovery of 481divergent outputs may cause their uses (including branches) to also 482become divergent. The analysis propagates this divergence until a 483fixed point is reached. 484 485The convergence inferred using these criteria is a safe subset of the 486maximal converged-with relation for any cycle hierarchy. In 487particular, it is sufficient to determine if a static instance is 488m-converged for a given cycle hierarchy ``T``, even if that fact is 489not detected when examining some other cycle hierarchy ``T'``. 490 491This property allows compiler transforms to use the uniformity 492analysis without being affected by DFS choices made in the underlying 493cycle analysis. When two transforms use different instances of the 494uniformity analysis for the same CFG, a "divergent value" result in 495one analysis instance cannot contradict a "uniform value" result in 496the other. 497 498Generic transforms such as SimplifyCFG, CSE, and loop transforms 499commonly change the program in ways that change the maximal 500converged-with relations. This also means that a value that was 501previously uniform can become divergent after such a transform. 502Uniformity has to be recomputed after such transforms. 503 504Divergent Branch inside a Cycle 505------------------------------- 506 507.. figure:: convergence-divergent-inside.png 508 :name: convergence-divergent-inside 509 510The above figure shows a divergent branch ``Q`` inside an irreducible 511cyclic region. When two threads diverge at ``Q``, the convergence of 512dynamic instances within the cyclic region depends on the cycle 513hierarchy chosen: 514 5151. In an implementation that detects a single cycle ``C`` with header 516 ``P``, convergence inside the cycle is determined by ``P``. 517 5182. In an implementation that detects two nested cycles with headers 519 ``R`` and ``S``, convergence inside those cycles is determined by 520 their respective headers. 521 522.. _convergence-diverged-entry: 523 524A conservative approach would be to simply report all nodes inside 525irreducible cycles as having divergent outputs. But it is desirable to 526recognize m-converged nodes in the CFG in order to maximize 527uniformity. This section describes one such pattern of nodes derived 528from *closed paths*, which are a property of the CFG and do not depend 529on the cycle hierarchy. 530 531 **Diverged Entry Criterion:** 532 533 The dynamic instances of all the nodes in a closed path ``P`` are 534 m-converged only if for every divergent branch ``B`` and its 535 join node ``J`` that lie on ``P``, there is no entry to ``P`` which 536 lies on a diverged path from ``B`` to ``J``. 537 538.. figure:: convergence-closed-path.png 539 :name: convergence-closed-path 540 541Consider the closed path ``P -> Q -> R -> S`` in the above figure. 542``P`` and ``R`` are :ref:`entries to the closed 543path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a 544join for that branch, with diverged paths ``Q -> R -> S`` and ``Q -> 545S``. 546 547- If a diverged entry ``R`` exists, then in some cycle hierarchy, 548 ``R`` is the header of the smallest cycle ``C`` containing the 549 closed path and a :ref:`child cycle<cycle-definition>` ``C'`` 550 exists in the set ``C - R``, containing both branch ``Q`` and join 551 ``S``. When threads diverge at ``Q``, one subset ``M`` continues 552 inside cycle ``C'``, while the complement ``N`` exits ``C'`` and 553 reaches ``R``. Dynamic instances of ``S`` executed by threads in set 554 ``M`` are not converged with those executed in set ``N`` due to the 555 presence of ``R``. Informally, threads that diverge at ``Q`` 556 reconverge in the same iteration of the outer cycle ``C``, but they 557 may have executed the inner cycle ``C'`` differently. 558 559 .. table:: 560 :align: left 561 562 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 563 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 564 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 565 | Thread1 | Entry | P1 | Q1 | | | | R1 | S1 | P3 | ... | Exit | 566 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 567 | Thread2 | Entry | P2 | Q2 | S2 | P4 | Q4 | R2 | S4 | | | Exit | 568 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 569 570 In the table above, ``S2`` is not converged with ``S1`` due to ``R1``. 571 572| 573 574- If ``R`` does not exist, or if any node other than ``R`` is the 575 header of ``C``, then no such child cycle ``C'`` is detected. 576 Threads that diverge at ``Q`` execute converged dynamic instances of 577 ``S`` since they do not encounter the cycle header on any path from 578 ``Q`` to ``S``. Informally, threads that diverge at ``Q`` 579 reconverge at ``S`` in the same iteration of ``C``. 580 581 .. table:: 582 :align: left 583 584 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 585 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 586 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 587 | Thread1 | Entry | P1 | Q1 | R1 | S1 | P3 | Q3 | R3 | S3 | Exit | 588 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 589 | Thread2 | Entry | P2 | Q2 | | S2 | P4 | Q4 | R2 | S4 | Exit | 590 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+ 591 592| 593 594 .. note:: 595 596 In general, the cycle ``C`` in the above statements is not 597 expected to be the same cycle for different headers. Cycles and 598 their headers are tightly coupled; for different headers in the 599 same outermost cycle, the child cycles detected may be different. 600 The property relevant to the above examples is that for every 601 closed path, there is a cycle ``C`` that contains the path and 602 whose header is on that path. 603 604The diverged entry criterion must be checked for every closed path 605passing through a divergent branch ``B`` and its join ``J``. Since 606:ref:`every closed path passes through the header of some 607cycle<cycle-closed-path-header>`, this amounts to checking every cycle 608``C`` that contains ``B`` and ``J``. When the header of ``C`` 609dominates the join ``J``, there can be no entry to any path from the 610header to ``J``, which includes any diverged path from ``B`` to ``J``. 611This is also true for any closed paths passing through the header of 612an outer cycle that contains ``C``. 613 614Thus, the diverged entry criterion can be conservatively simplified 615as follows: 616 617 For a divergent branch ``B`` and its join node ``J``, the nodes in a 618 cycle ``C`` that contains both ``B`` and ``J`` are m-converged only 619 if: 620 621 - ``B`` strictly dominates ``J``, or, 622 - The header ``H`` of ``C`` strictly dominates ``J``, or, 623 - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the 624 same condition. 625 626When ``J`` is the same as ``H`` or ``B``, the trivial dominance is 627insufficient to make any statement about entries to diverged paths. 628 629.. _convergence-diverged-outside: 630 631Diverged Paths reaching a Cycle 632------------------------------- 633 634.. figure:: convergence-divergent-outside.png 635 :name: convergence-divergent-outside 636 637The figure shows two cycle hierarchies with a divergent branch in 638``Entry`` instead of ``Q``. For two threads that enter the closed path 639``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence 640of dynamic instances generated along the path depends on whether ``P`` 641or ``R`` is the header. 642 643- Convergence when ``P`` is the header. 644 645 .. table:: 646 :align: left 647 648 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 649 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 650 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 651 | Thread1 | Entry | | | | P1 | Q1 | R1 | S1 | P3 | Q3 | | S3 | Exit | 652 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 653 | Thread2 | Entry | | R2 | S2 | P2 | Q2 | | S2 | P4 | Q4 | R3 | S4 | Exit | 654 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 655 656 | 657 658- Convergence when ``R`` is the header. 659 660 .. table:: 661 :align: left 662 663 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 664 | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 665 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 666 | Thread1 | Entry | | P1 | Q1 | R1 | S1 | P3 | Q3 | S3 | | | Exit | 667 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 668 | Thread2 | Entry | | | | R2 | S2 | P2 | Q2 | S2 | P4 | ... | Exit | 669 +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+ 670 671 | 672 673Thus, when diverged paths reach different entries of an irreducible 674cycle from outside the cycle, the static analysis conservatively 675reports every node in the cycle as not m-converged. 676 677.. _convergence-reducible-cycle: 678 679Reducible Cycle 680--------------- 681 682If ``C`` is a reducible cycle with header ``H``, then in any DFS, 683``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>` 684``C'`` that contains ``C``. Independent of the DFS, there is no entry 685to the subgraph ``C`` other than ``H`` itself. Thus, we have the 686following: 687 6881. The diverged entry criterion is trivially satisfied for a divergent 689 branch and its join, where both are inside subgraph ``C``. 6902. When diverged paths reach the subgraph ``C`` from outside, their 691 convergence is always determined by the same header ``H``. 692 693Clearly, this can be determined only in a cycle hierarchy ``T`` where 694``C`` is detected as a reducible cycle. No such conclusion can be made 695in a different cycle hierarchy ``T'`` where ``C`` is part of a larger 696cycle ``C'`` with the same header, but this does not contradict the 697conclusion in ``T``. 698