1==========================
2Convergence And Uniformity
3==========================
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11Some parallel environments execute threads in groups that allow
12communication within the group using special primitives called
13*convergent* operations. The outcome of a convergent operation is
14sensitive to the set of threads that executes it "together", i.e.,
15convergently.
16
17A value is said to be *uniform* across a set of threads if it is the
18same across those threads, and *divergent* otherwise. Correspondingly,
19a branch is said to be a uniform branch if its condition is uniform,
20and it is a divergent branch otherwise.
21
22Whether threads are *converged* or not depends on the paths they take
23through the control flow graph. Threads take different outgoing edges
24at a *divergent branch*. Divergent branches constrain
25program transforms such as changing the CFG or moving a convergent
26operation to a different point of the CFG. Performing these
27transformations across a divergent branch can change the sets of
28threads that execute convergent operations convergently. While these
29constraints are out of scope for this document, the described
30*uniformity analysis* allows these transformations to identify
31uniform branches where these constraints do not hold.
32
33Convergence and
34uniformity are inter-dependent: When threads diverge at a divergent
35branch, they may later *reconverge* at a common program point.
36Subsequent operations are performed convergently, but the inputs may
37be non-uniform, thus producing divergent outputs.
38
39Uniformity is also useful by itself on targets that execute threads in
40groups with shared execution resources (e.g. waves, warps, or
41subgroups):
42
43- Uniform outputs can potentially be computed or stored on shared
44  resources.
45- These targets must "linearize" a divergent branch to ensure that
46  each side of the branch is followed by the corresponding threads in
47  the same group. But linearization is unnecessary at uniform
48  branches, since the whole group of threads follows either one side
49  of the branch or the other.
50
51This document presents a definition of convergence that is reasonable
52for real targets and is compatible with the currently implicit
53semantics of convergent operations in LLVM IR. This is accompanied by
54a *uniformity analysis* that extends the existing divergence analysis
55[DivergenceSPMD]_ to cover irreducible control-flow.
56
57.. [DivergenceSPMD] Julian Rosemann, Simon Moll, and Sebastian
58   Hack. 2021. An Abstract Interpretation for SPMD Divergence on
59   Reducible Control Flow Graphs. Proc. ACM Program. Lang. 5, POPL,
60   Article 31 (January 2021), 35 pages.
61   https://doi.org/10.1145/3434312
62
63Terminology
64===========
65
66Cycles
67   Described in :ref:`cycle-terminology`.
68
69Closed path
70   Described in :ref:`cycle-closed-path`.
71
72Disjoint paths
73   Two paths in a CFG are said to be disjoint if the only nodes common
74   to both are the start node or the end node, or both.
75
76Join node
77   A join node of a branch is a node reachable along disjoint paths
78   starting from that branch.
79
80Diverged path
81   A diverged path is a path that starts from a divergent branch and
82   either reaches a join node of the branch or reaches the end of the
83   function without passing through any join node of the branch.
84
85Threads and Dynamic Instances
86=============================
87
88Each occurrence of an instruction in the program source is called a
89*static instance*. When a thread executes a program, each execution of
90a static instance produces a distinct *dynamic instance* of that
91instruction.
92
93Each thread produces a unique sequence of dynamic instances:
94
95- The sequence is generated along branch decisions and loop
96  traversals.
97- Starts with a dynamic instance of a "first" instruction.
98- Continues with dynamic instances of successive "next"
99  instructions.
100
101Threads are independent; some targets may choose to execute them in
102groups in order to share resources when possible.
103
104.. figure:: convergence-natural-loop.png
105   :name: convergence-natural-loop
106
107.. table::
108   :name: convergence-thread-example
109   :align: left
110
111   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
112   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
113   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
114   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
115   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
116   | Thread 2 | Entry1 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
117   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
118
119In the above table, each row is a different thread, listing the
120dynamic instances produced by that thread from left to right. Each
121thread executes the same program that starts with an ``Entry`` node
122and ends with an ``Exit`` node, but different threads may take
123different paths through the control flow of the program. The columns
124are numbered merely for convenience, and empty cells have no special
125meaning. Dynamic instances listed in the same column are converged.
126
127.. _convergence-definition:
128
129Convergence
130===========
131
132*Converged-with* is a transitive symmetric relation over dynamic
133instances produced by *different threads* for the *same static
134instance*. Informally, two threads that produce converged dynamic
135instances are said to be *converged*, and they are said to execute
136that static instance *convergently*, at that point in the execution.
137
138*Convergence order* is a strict partial order over dynamic instances
139that is defined as the transitive closure of:
140
1411. If dynamic instance ``P`` is executed strictly before ``Q`` in the
142   same thread, then ``P`` is *convergence-before* ``Q``.
1432. If dynamic instance ``P`` is executed strictly before ``Q1`` in the
144   same thread, and ``Q1`` is *converged-with* ``Q2``, then ``P`` is
145   *convergence-before* ``Q2``.
1463. If dynamic instance ``P1`` is *converged-with* ``P2``, and ``P2``
147   is executed strictly before ``Q`` in the same thread, then ``P1``
148   is *convergence-before* ``Q``.
149
150.. table::
151   :name: convergence-order-example
152   :align: left
153
154   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
155   |          | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9    |
156   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
157   | Thread 1 | Entry | ... |     |     |     | S2  | T   | ... | Exit |
158   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
159   | Thread 2 | Entry | ... |     | Q2  | R   | S1  |     | ... | Exit |
160   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
161   | Thread 3 | Entry | ... | P   | Q1  |     |     |     | ... |      |
162   +----------+-------+-----+-----+-----+-----+-----+-----+-----+------+
163
164The above table shows partial sequences of dynamic instances from
165different threads. Dynamic instances in the same column are assumed
166to be converged (i.e., related to each other in the converged-with
167relation). The resulting convergence order includes the edges ``P ->
168Q2``, ``Q1 -> R``, ``P -> R``, ``P -> T``, etc.
169
170The fact that *convergence-before* is a strict partial order is a
171constraint on the *converged-with* relation. It is trivially satisfied
172if different dynamic instances are never converged. It is also
173trivially satisfied for all known implementations for which
174convergence plays some role. Aside from the strict partial convergence
175order, there are currently no additional constraints on the
176*converged-with* relation imposed in LLVM IR.
177
178.. _convergence-note-convergence:
179
180.. note::
181
182   1. The ``convergent`` attribute on convergent operations does
183      constrain changes to ``converged-with``, but it is expressed in
184      terms of control flow and does not explicitly deal with thread
185      convergence.
186
187   2. The convergence-before relation is not
188      directly observable. Program transforms are in general free to
189      change the order of instructions, even though that obviously
190      changes the convergence-before relation.
191
192   3. Converged dynamic instances need not be executed at the same
193      time or even on the same resource. Converged dynamic instances
194      of a convergent operation may appear to do so but that is an
195      implementation detail. The fact that ``P`` is convergence-before
196      ``Q`` does not automatically imply that ``P`` happens-before
197      ``Q`` in a memory model sense.
198
199   4. **Future work:** Providing convergence-related guarantees to
200      compiler frontends enables some powerful optimization techniques
201      that can be used by programmers or by high-level program
202      transforms. Constraints on the ``converged-with`` relation may
203      be added eventually as part of the definition of LLVM
204      IR, so that guarantees can be made that frontends can rely on.
205      For a proposal on how this might work, see `D85603
206      <https://reviews.llvm.org/D85603>`_.
207
208.. _convergence-maximal:
209
210Maximal Convergence
211-------------------
212
213This section defines a constraint that may be used to
214produce a *maximal converged-with* relation without violating the
215strict *convergence-before* order. This maximal converged-with
216relation is reasonable for real targets and is compatible with
217convergent operations.
218
219The maximal converged-with relation is defined in terms of cycle
220headers, which are not unique to a given CFG. Each cycle hierarchy for
221the same CFG results in a different maximal converged-with relation.
222
223   **Maximal converged-with:**
224
225   Dynamic instances ``X1`` and ``X2`` produced by different threads
226   for the same static instance ``X`` are converged in the maximal
227   converged-with relation if and only if for every cycle ``C`` with
228   header ``H`` that contains ``X``:
229
230   - every dynamic instance ``H1`` of ``H`` that precedes ``X1`` in
231     the respective thread is convergence-before ``X2``, and,
232   - every dynamic instance ``H2`` of ``H`` that precedes ``X2`` in
233     the respective thread is convergence-before ``X1``,
234   - without assuming that ``X1`` is converged with ``X2``.
235
236.. note::
237
238   For brevity, the rest of the document restricts the term
239   *converged* to mean "related under the maximal converged-with
240   relation for the given cycle hierarchy".
241
242Maximal convergence can now be demonstrated in the earlier example as follows:
243
244.. table::
245   :align: left
246
247   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
248   |          |        | 1   | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   |      |
249   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
250   | Thread 1 | Entry1 | H1  | B1  | L1  | H3  |     | L3  |     |     |     | Exit |
251   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
252   | Thread 2 | Entry2 | H2  |     | L2  | H4  | B2  | L4  | H5  | B3  | L5  | Exit |
253   +----------+--------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
254
255- ``Entry1`` and ``Entry2`` are converged.
256- ``H1`` and ``H2`` are converged.
257- ``B1`` and ``B2`` are not converged due to ``H4`` which is not
258  convergence-before ``B1``.
259- ``H3`` and ``H4`` are converged.
260- ``H3`` is not converged with ``H5`` due to ``H4`` which is not
261  convergence-before ``H3``.
262- ``L1`` and ``L2`` are converged.
263- ``L3`` and ``L4`` are converged.
264- ``L3`` is not converged with ``L5`` due to ``H5`` which is not
265  convergence-before ``L3``.
266
267.. _convergence-cycle-headers:
268
269Dependence on Cycles Headers
270----------------------------
271
272Contradictions in convergence order are possible only between two
273nodes that are inside some cycle. The dynamic instances of such nodes
274may be interleaved in the same thread, and this interleaving may be
275different for different threads.
276
277When a thread executes a node ``X`` once and then executes it again,
278it must have followed a closed path in the CFG that includes ``X``.
279Such a path must pass through the header of at least one cycle --- the
280smallest cycle that includes the entire closed path. In a given
281thread, two dynamic instances of ``X`` are either separated by the
282execution of at least one cycle header, or ``X`` itself is a cycle
283header.
284
285In reducible cycles (natural loops), each execution of the header is
286equivalent to the start of a new iteration of the cycle. But this
287analogy breaks down in the presence of explicit constraints on the
288converged-with relation, such as those described in :ref:`future
289work<convergence-note-convergence>`. Instead, cycle headers should be
290treated as implicit *points of convergence* in a maximal
291converged-with relation.
292
293Consider a sequence of nested cycles ``C1``, ``C2``, ..., ``Ck`` such
294that ``C1`` is the outermost cycle and ``Ck`` is the innermost cycle,
295with headers ``H1``, ``H2``, ..., ``Hk`` respectively. When a thread
296enters the cycle ``Ck``, any of the following is possible:
297
2981. The thread directly entered cycle ``Ck`` without having executed
299   any of the headers ``H1`` to ``Hk``.
300
3012. The thread executed some or all of the nested headers one or more
302   times.
303
304The maximal converged-with relation captures the following intuition
305about cycles:
306
3071. When two threads enter a top-level cycle ``C1``, they execute
308   converged dynamic instances of every node that is a :ref:`child
309   <cycle-parent-block>` of ``C1``.
310
3112. When two threads enter a nested cycle ``Ck``, they execute
312   converged dynamic instances of every node that is a child of
313   ``Ck``, until either thread exits ``Ck``, if and only if they
314   executed converged dynamic instances of the last nested header that
315   either thread encountered.
316
317   Note that when a thread exits a nested cycle ``Ck``, it must follow
318   a closed path outside ``Ck`` to reenter it. This requires executing
319   the header of some outer cycle, as described earlier.
320
321Consider two dynamic instances ``X1`` and ``X2`` produced by threads ``T1``
322and ``T2`` for a node ``X`` that is a child of nested cycle ``Ck``.
323Maximal convergence relates ``X1`` and ``X2`` as follows:
324
3251. If neither thread executed any header from ``H1`` to ``Hk``, then
326   ``X1`` and ``X2`` are converged.
327
3282. Otherwise, if there are no converged dynamic instances ``Q1`` and
329   ``Q2`` of any header ``Q`` from ``H1`` to ``Hk`` (where ``Q`` is
330   possibly the same as ``X``), such that ``Q1`` precedes ``X1`` and
331   ``Q2`` precedes ``X2`` in the respective threads, then ``X1`` and
332   ``X2`` are not converged.
333
3343. Otherwise, consider the pair ``Q1`` and ``Q2`` of converged dynamic
335   instances of a header ``Q`` from ``H1`` to ``Hk`` that occur most
336   recently before ``X1`` and ``X2`` in the respective threads. Then
337   ``X1`` and ``X2`` are converged if and only if there is no dynamic
338   instance of any header from ``H1`` to ``Hk`` that occurs between
339   ``Q1`` and ``X1`` in thread ``T1``, or between ``Q2`` and ``X2`` in
340   thread ``T2``. In other words, ``Q1`` and ``Q2`` represent the last
341   point of convergence, with no other header being executed before
342   executing ``X``.
343
344**Example:**
345
346.. figure:: convergence-both-diverged-nested.png
347   :name: convergence-both-diverged-nested
348
349The above figure shows two nested irreducible cycles with headers
350``R`` and ``S``. The nodes ``Entry`` and ``Q`` have divergent
351branches. The table below shows the convergence between three threads
352taking different paths through the CFG. Dynamic instances listed in
353the same column are converged.
354
355   .. table::
356      :align: left
357
358      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
359      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 10   |
360      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
361      | Thread1 | Entry | P1  | Q1  | S1  | P3  | Q3  | R1  | S2  | Exit |
362      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
363      | Thread2 | Entry | P2  | Q2  |     |     |     | R2  | S3  | Exit |
364      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
365      | Thread3 | Entry |     |     |     |     |     | R3  | S4  | Exit |
366      +---------+-------+-----+-----+-----+-----+-----+-----+-----+------+
367
368- ``P2`` and ``P3`` are not converged due to ``S1``
369- ``Q2`` and ``Q3`` are not converged due to ``S1``
370- ``S1`` and ``S3`` are not converged due to ``R2``
371- ``S1`` and ``S4`` are not converged due to ``R3``
372
373Informally, ``T1`` and ``T2`` execute the inner cycle a different
374number of times, without executing the header of the outer cycle. All
375threads converge in the outer cycle when they first execute the header
376of the outer cycle.
377
378.. _convergence-uniformity:
379
380Uniformity
381==========
382
3831. The output of two converged dynamic instances is uniform if and
384   only if it compares equal for those two dynamic instances.
3852. The output of a static instance ``X`` is uniform *for a given set
386   of threads* if and only if it is uniform for every pair of
387   converged dynamic instances of ``X`` produced by those threads.
388
389A non-uniform value is said to be *divergent*.
390
391For a set ``S`` of threads, the uniformity of each output of a static
392instance is determined as follows:
393
3941. The semantics of the instruction may specify the output to be
395   uniform.
3962. Otherwise, if it is a PHI node, its output is uniform if and only
397   if for every pair of converged dynamic instances produced by all
398   threads in ``S``:
399
400   a. Both instances choose the same output from converged
401      dynamic instances, and,
402   b. That output is uniform for all threads in ``S``.
4033. Otherwise, the output is uniform if and only if the input
404   operands are uniform for all threads in ``S``.
405
406Divergent Cycle Exits
407---------------------
408
409When a divergent branch occurs inside a cycle, it is possible that a
410diverged path continues to an exit of the cycle. This is called a
411divergent cycle exit. If the cycle is irreducible, the diverged path
412may re-enter and eventually reach a join within the cycle. Such a join
413should be examined for the :ref:`diverged entry
414<convergence-diverged-entry>` criterion.
415
416Nodes along the diverged path that lie outside the cycle experience
417*temporal divergence*, when two threads executing convergently inside
418the cycle produce uniform values, but exit the cycle along the same
419divergent path after executing the header a different number of times
420(informally, on different iterations of the cycle). For a node ``N``
421inside the cycle the outputs may be uniform for the two threads, but
422any use ``U`` outside the cycle receives a value from non-converged
423dynamic instances of ``N``. An output of ``U`` may be divergent,
424depending on the semantics of the instruction.
425
426Static Uniformity Analysis
427==========================
428
429Irreducible control flow results in different cycle hierarchies
430depending on the choice of headers during depth-first traversal. As a
431result, a static analysis cannot always determine the convergence of
432nodes in irreducible cycles, and any uniformity analysis is limited to
433those static instances whose convergence is independent of the cycle
434hierarchy:
435
436  **m-converged static instances:**
437
438  A static instance ``X`` is *m-converged* for a given CFG if and only
439  if the maximal converged-with relation for its dynamic instances is
440  the same in every cycle hierarchy that can be constructed for that CFG.
441
442  .. note::
443
444   In other words, two dynamic instances ``X1`` and ``X2`` of an
445   m-converged static instance ``X`` are converged in some cycle
446   hierarchy if and only if they are also converged in every other
447   cycle hierarchy for the same CFG.
448
449   As noted earlier, for brevity, we restrict the term *converged* to
450   mean "related under the maximal converged-with relation for a given
451   cycle hierarchy".
452
453
454Each node ``X`` in a given CFG is reported to be m-converged if and
455only if:
456
4571. ``X`` is a :ref:`top-level<cycle-toplevel-block>` node, in which
458   case, there are no cycle headers to influence the convergence of
459   ``X``.
460
4612. Otherwise, if ``X`` is inside a cycle, then every cycle that
462   contains ``X`` satisfies the following necessary conditions:
463
464   a. Every divergent branch inside the cycle satisfies the
465      :ref:`diverged entry criterion<convergence-diverged-entry>`, and,
466   b. There are no :ref:`diverged paths reaching the
467      cycle<convergence-diverged-outside>` from a divergent branch
468      outside it.
469
470.. note::
471
472   A reducible cycle :ref:`trivially satisfies
473   <convergence-reducible-cycle>` the above conditions. In particular,
474   if the whole CFG is reducible, then all nodes in the CFG are
475   m-converged.
476
477If a static instance is not m-converged, then every output is assumed
478to be divergent. Otherwise, for an m-converged static instance, the
479uniformity of each output is determined using the criteria
480:ref:`described earlier <convergence-uniformity>`. The discovery of
481divergent outputs may cause their uses (including branches) to also
482become divergent. The analysis propagates this divergence until a
483fixed point is reached.
484
485The convergence inferred using these criteria is a safe subset of the
486maximal converged-with relation for any cycle hierarchy. In
487particular, it is sufficient to determine if a static instance is
488m-converged for a given cycle hierarchy ``T``, even if that fact is
489not detected when examining some other cycle hierarchy ``T'``.
490
491This property allows compiler transforms to use the uniformity
492analysis without being affected by DFS choices made in the underlying
493cycle analysis. When two transforms use different instances of the
494uniformity analysis for the same CFG, a "divergent value" result in
495one analysis instance cannot contradict a "uniform value" result in
496the other.
497
498Generic transforms such as SimplifyCFG, CSE, and loop transforms
499commonly change the program in ways that change the maximal
500converged-with relations. This also means that a value that was
501previously uniform can become divergent after such a transform.
502Uniformity has to be recomputed after such transforms.
503
504Divergent Branch inside a Cycle
505-------------------------------
506
507.. figure:: convergence-divergent-inside.png
508   :name: convergence-divergent-inside
509
510The above figure shows a divergent branch ``Q`` inside an irreducible
511cyclic region. When two threads diverge at ``Q``, the convergence of
512dynamic instances within the cyclic region depends on the cycle
513hierarchy chosen:
514
5151. In an implementation that detects a single cycle ``C`` with header
516   ``P``, convergence inside the cycle is determined by ``P``.
517
5182. In an implementation that detects two nested cycles with headers
519   ``R`` and ``S``, convergence inside those cycles is determined by
520   their respective headers.
521
522.. _convergence-diverged-entry:
523
524A conservative approach would be to simply report all nodes inside
525irreducible cycles as having divergent outputs. But it is desirable to
526recognize m-converged nodes in the CFG in order to maximize
527uniformity. This section describes one such pattern of nodes derived
528from *closed paths*, which are a property of the CFG and do not depend
529on the cycle hierarchy.
530
531  **Diverged Entry Criterion:**
532
533  The dynamic instances of all the nodes in a closed path ``P`` are
534  m-converged only if for every divergent branch ``B`` and its
535  join node ``J`` that lie on ``P``, there is no entry to ``P`` which
536  lies on a diverged path from ``B`` to ``J``.
537
538.. figure:: convergence-closed-path.png
539   :name: convergence-closed-path
540
541Consider the closed path ``P -> Q -> R -> S`` in the above figure.
542``P`` and ``R`` are :ref:`entries to the closed
543path<cycle-closed-path>`. ``Q`` is a divergent branch and ``S`` is a
544join for that branch, with diverged paths ``Q -> R -> S`` and ``Q ->
545S``.
546
547- If a diverged entry ``R`` exists, then in some cycle hierarchy,
548  ``R`` is the header of the smallest cycle ``C`` containing the
549  closed path and a :ref:`child cycle<cycle-definition>` ``C'``
550  exists in the set ``C - R``, containing both branch ``Q`` and join
551  ``S``. When threads diverge at ``Q``, one subset ``M`` continues
552  inside cycle ``C'``, while the complement ``N`` exits ``C'`` and
553  reaches ``R``. Dynamic instances of ``S`` executed by threads in set
554  ``M`` are not converged with those executed in set ``N`` due to the
555  presence of ``R``. Informally, threads that diverge at ``Q``
556  reconverge in the same iteration of the outer cycle ``C``, but they
557  may have executed the inner cycle ``C'`` differently.
558
559  .. table::
560     :align: left
561
562     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
563     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11   |
564     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
565     | Thread1 | Entry | P1  | Q1  |     |     |     | R1  | S1  | P3  | ... | Exit |
566     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
567     | Thread2 | Entry | P2  | Q2  | S2  | P4  | Q4  | R2  | S4  |     |     | Exit |
568     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
569
570  In the table above, ``S2`` is not converged with ``S1`` due to ``R1``.
571
572|
573
574- If ``R`` does not exist, or if any node other than ``R`` is the
575  header of ``C``, then no such child cycle ``C'`` is detected.
576  Threads that diverge at ``Q`` execute converged dynamic instances of
577  ``S`` since they do not encounter the cycle header on any path from
578  ``Q`` to ``S``. Informally, threads that diverge at ``Q``
579  reconverge at ``S`` in the same iteration of ``C``.
580
581  .. table::
582     :align: left
583
584     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
585     |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10   |
586     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
587     | Thread1 | Entry | P1  | Q1  | R1  | S1  | P3  | Q3  | R3  | S3  | Exit |
588     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
589     | Thread2 | Entry | P2  | Q2  |     | S2  | P4  | Q4  | R2  | S4  | Exit |
590     +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+------+
591
592|
593
594  .. note::
595
596     In general, the cycle ``C`` in the above statements is not
597     expected to be the same cycle for different headers. Cycles and
598     their headers are tightly coupled; for different headers in the
599     same outermost cycle, the child cycles detected may be different.
600     The property relevant to the above examples is that for every
601     closed path, there is a cycle ``C`` that contains the path and
602     whose header is on that path.
603
604The diverged entry criterion must be checked for every closed path
605passing through a divergent branch ``B`` and its join ``J``. Since
606:ref:`every closed path passes through the header of some
607cycle<cycle-closed-path-header>`, this amounts to checking every cycle
608``C`` that contains ``B`` and ``J``. When the header of ``C``
609dominates the join ``J``, there can be no entry to any path from the
610header to ``J``, which includes any diverged path from ``B`` to ``J``.
611This is also true for any closed paths passing through the header of
612an outer cycle that contains ``C``.
613
614Thus, the diverged entry criterion can be conservatively simplified
615as follows:
616
617  For a divergent branch ``B`` and its join node ``J``, the nodes in a
618  cycle ``C`` that contains both ``B`` and ``J`` are m-converged only
619  if:
620
621  - ``B`` strictly dominates ``J``, or,
622  - The header ``H`` of ``C`` strictly dominates ``J``, or,
623  - Recursively, there is cycle ``C'`` inside ``C`` that satisfies the
624    same condition.
625
626When ``J`` is the same as ``H`` or ``B``, the trivial dominance is
627insufficient to make any statement about entries to diverged paths.
628
629.. _convergence-diverged-outside:
630
631Diverged Paths reaching a Cycle
632-------------------------------
633
634.. figure:: convergence-divergent-outside.png
635   :name: convergence-divergent-outside
636
637The figure shows two cycle hierarchies with a divergent branch in
638``Entry`` instead of ``Q``. For two threads that enter the closed path
639``P -> Q -> R -> S`` at ``P`` and ``R`` respectively, the convergence
640of dynamic instances generated along the path depends on whether ``P``
641or ``R`` is the header.
642
643-  Convergence when ``P`` is the header.
644
645   .. table::
646      :align: left
647
648      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
649      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12  | 13   |
650      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
651      | Thread1 | Entry |     |     |     | P1  | Q1  | R1  | S1  | P3  | Q3  |     | S3  | Exit |
652      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
653      | Thread2 | Entry |     | R2  | S2  | P2  | Q2  |     | S2  | P4  | Q4  | R3  | S4  | Exit |
654      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
655
656   |
657
658-  Convergence when ``R`` is the header.
659
660   .. table::
661      :align: left
662
663      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
664      |         | 1     | 2   | 3   | 4   | 5   | 6   | 7   | 8   | 9   | 10  | 11  | 12   |
665      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
666      | Thread1 | Entry |     | P1  | Q1  | R1  | S1  | P3  | Q3  | S3  |     |     | Exit |
667      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
668      | Thread2 | Entry |     |     |     | R2  | S2  | P2  | Q2  | S2  | P4  | ... | Exit |
669      +---------+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+------+
670
671   |
672
673Thus, when diverged paths reach different entries of an irreducible
674cycle from outside the cycle, the static analysis conservatively
675reports every node in the cycle as not m-converged.
676
677.. _convergence-reducible-cycle:
678
679Reducible Cycle
680---------------
681
682If ``C`` is a reducible cycle with header ``H``, then in any DFS,
683``H`` :ref:`must be the header of some cycle<cycle-reducible-headers>`
684``C'`` that contains ``C``. Independent of the DFS, there is no entry
685to the subgraph ``C`` other than ``H`` itself. Thus, we have the
686following:
687
6881. The diverged entry criterion is trivially satisfied for a divergent
689   branch and its join, where both are inside subgraph ``C``.
6902. When diverged paths reach the subgraph ``C`` from outside, their
691   convergence is always determined by the same header ``H``.
692
693Clearly, this can be determined only in a cycle hierarchy ``T`` where
694``C`` is detected as a reducible cycle. No such conclusion can be made
695in a different cycle hierarchy ``T'`` where ``C`` is part of a larger
696cycle ``C'`` with the same header, but this does not contradict the
697conclusion in ``T``.
698