1.. _transformation-metadata:
2
3============================
4Code Transformation Metadata
5============================
6
7.. contents::
8   :local:
9
10Overview
11========
12
13LLVM transformation passes can be controlled by attaching metadata to
14the code to transform. By default, transformation passes use heuristics
15to determine whether or not to perform transformations, and when doing
16so, other details of how the transformations are applied (e.g., which
17vectorization factor to select).
18Unless the optimizer is otherwise directed, transformations are applied
19conservatively. This conservatism generally allows the optimizer to
20avoid unprofitable transformations, but in practice, this results in the
21optimizer not applying transformations that would be highly profitable.
22
23Frontends can give additional hints to LLVM passes on which
24transformations they should apply. This can be additional knowledge that
25cannot be derived from the emitted IR, or directives passed from the
26user/programmer. OpenMP pragmas are an example of the latter.
27
28If any such metadata is dropped from the program, the code's semantics
29must not change.
30
31Metadata on Loops
32=================
33
34Attributes can be attached to loops as described in :ref:`llvm.loop`.
35Attributes can describe properties of the loop, disable transformations,
36force specific transformations and set transformation options.
37
38Because metadata nodes are immutable (with the exception of
39``MDNode::replaceOperandWith`` which is dangerous to use on uniqued
40metadata), in order to add or remove a loop attributes, a new ``MDNode``
41must be created and assigned as the new ``llvm.loop`` metadata. Any
42connection between the old ``MDNode`` and the loop is lost. The
43``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e.
44the loop effectively gets a new identifier. For instance,
45``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if
46the parallel access property is to be preserved after adding/removing
47loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be
48updated to the new LoopID.
49
50Transformation Metadata Structure
51=================================
52
53Some attributes describe code transformations (unrolling, vectorizing,
54loop distribution, etc.). They can either be a hint to the optimizer
55that a transformation might be beneficial, instruction to use a specific
56option, , or convey a specific request from the user (such as
57``#pragma clang loop`` or ``#pragma omp simd``).
58
59If a transformation is forced but cannot be carried-out for any reason,
60an optimization-missed warning must be emitted. Semantic information
61such as a transformation being safe (e.g.
62``llvm.mem.parallel_loop_access``) can be unused by the optimizer
63without generating a warning.
64
65Unless explicitly disabled, any optimization pass may heuristically
66determine whether a transformation is beneficial and apply it. If
67metadata for another transformation was specified, applying a different
68transformation before it might be inadvertent due to being applied on a
69different loop or the loop not existing anymore. To avoid having to
70explicitly disable an unknown number of passes, the attribute
71``llvm.loop.disable_nonforced`` disables all optional, high-level,
72restructuring transformations.
73
74The following example avoids the loop being altered before being
75vectorized, for instance being unrolled.
76
77.. code-block:: llvm
78
79      br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0
80    ...
81    !0 = distinct !{!0, !1, !2}
82    !1 = !{!"llvm.loop.vectorize.enable", i1 true}
83    !2 = !{!"llvm.loop.disable_nonforced"}
84
85After a transformation is applied, follow-up attributes are set on the
86transformed and/or new loop(s). This allows additional attributes
87including followup-transformations to be specified. Specifying multiple
88transformations in the same metadata node is possible for compatibility
89reasons, but their execution order is undefined. For instance, when
90``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are
91specified at the same time, unrolling may occur either before or after
92vectorization.
93
94As an example, the following instructs a loop to be vectorized and only
95then unrolled.
96
97.. code-block:: llvm
98
99    !0 = distinct !{!0, !1, !2, !3}
100    !1 = !{!"llvm.loop.vectorize.enable", i1 true}
101    !2 = !{!"llvm.loop.disable_nonforced"}
102    !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}}
103
104If, and only if, no followup is specified, the pass may add attributes itself.
105For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and
106all attributes from the original loop excluding its loop vectorizer
107attributes. To avoid this, an empty followup attribute can be used, e.g.
108
109.. code-block:: llvm
110
111    !3 = !{!"llvm.loop.vectorize.followup_vectorized"}
112
113The followup attributes of a transformation that cannot be applied will
114never be added to a loop and are therefore effectively ignored. This means
115that any followup-transformation in such attributes requires that its
116prior transformations are applied before the followup-transformation.
117The user should receive a warning about the first transformation in the
118transformation chain that could not be applied if it a forced
119transformation. All following transformations are skipped.
120
121Pass-Specific Transformation Metadata
122=====================================
123
124Transformation options are specific to each transformation. In the
125following, we present the model for each LLVM loop optimization pass and
126the metadata to influence them.
127
128Loop Vectorization and Interleaving
129-----------------------------------
130
131Loop vectorization and interleaving is interpreted as a single
132transformation. It is interpreted as forced if
133``!{"llvm.loop.vectorize.enable", i1 true}`` is set.
134
135Assuming the pre-vectorization loop is
136
137.. code-block:: c
138
139    for (int i = 0; i < n; i+=1) // original loop
140      Stmt(i);
141
142then the code after vectorization will be approximately (assuming an
143SIMD width of 4):
144
145.. code-block:: c
146
147    int i = 0;
148    if (rtc) {
149      for (; i + 3 < n; i+=4) // vectorized/interleaved loop
150        Stmt(i:i+3);
151    }
152    for (; i < n; i+=1) // epilogue loop
153      Stmt(i);
154
155where ``rtc`` is a generated runtime check.
156
157``llvm.loop.vectorize.followup_vectorized`` will set the attributes for
158the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is
159combined with the original loop's attributes to avoid it being
160vectorized multiple times.
161
162``llvm.loop.vectorize.followup_epilogue`` will set the attributes for
163the remainder loop. If not specified, it will have the original loop's
164attributes combined with ``llvm.loop.isvectorized`` and
165``llvm.loop.unroll.runtime.disable`` (unless the original loop already
166has unroll metadata).
167
168The attributes specified by ``llvm.loop.vectorize.followup_all`` are
169added to both loops.
170
171When using a follow-up attribute, it replaces any automatically deduced
172attributes for the generated loop in question. Therefore it is
173recommended to add ``llvm.loop.isvectorized`` to
174``llvm.loop.vectorize.followup_all`` which avoids that the loop
175vectorizer tries to optimize the loops again.
176
177Loop Unrolling
178--------------
179
180Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}``
181metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``)
182is present. Unrolling can be full unrolling, partial unrolling of a loop
183with constant trip count or runtime unrolling of a loop with a trip
184count unknown at compile-time.
185
186If the loop has been unrolled fully, there is no followup-loop. For
187partial/runtime unrolling, the original loop of
188
189.. code-block:: c
190
191    for (int i = 0; i < n; i+=1) // original loop
192      Stmt(i);
193
194is transformed into (using an unroll factor of 4):
195
196.. code-block:: c
197
198    int i = 0;
199    for (; i + 3 < n; i+=4) { // unrolled loop
200      Stmt(i);
201      Stmt(i+1);
202      Stmt(i+2);
203      Stmt(i+3);
204    }
205    for (; i < n; i+=1) // remainder loop
206      Stmt(i);
207
208``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of
209the unrolled loop. If not specified, the attributes of the original loop
210without the ``llvm.loop.unroll.*`` attributes are copied and
211``llvm.loop.unroll.disable`` added to it.
212
213``llvm.loop.unroll.followup_remainder`` defines the attributes of the
214remainder loop. If not specified the remainder loop will have no
215attributes. The remainder loop might not be present due to being fully
216unrolled in which case this attribute has no effect.
217
218Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the
219unrolled and remainder loops.
220
221To avoid that the partially unrolled loop is unrolled again, it is
222recommended to add ``llvm.loop.unroll.disable`` to
223``llvm.loop.unroll.followup_all``. If no follow-up attribute specified
224for a generated loop, it is added automatically.
225
226Unroll-And-Jam
227--------------
228
229Unroll-and-jam uses the following transformation model (here with an
230unroll factor if 2). Currently, it does not support a fallback version
231when the transformation is unsafe.
232
233.. code-block:: c
234
235    for (int i = 0; i < n; i+=1) { // original outer loop
236      Fore(i);
237      for (int j = 0; j < m; j+=1) // original inner loop
238        SubLoop(i, j);
239      Aft(i);
240    }
241
242.. code-block:: c
243
244    int i = 0;
245    for (; i + 1 < n; i+=2) { // unrolled outer loop
246      Fore(i);
247      Fore(i+1);
248      for (int j = 0; j < m; j+=1) { // unrolled inner loop
249        SubLoop(i, j);
250        SubLoop(i+1, j);
251      }
252      Aft(i);
253      Aft(i+1);
254    }
255    for (; i < n; i+=1) { // remainder outer loop
256      Fore(i);
257      for (int j = 0; j < m; j+=1) // remainder inner loop
258        SubLoop(i, j);
259      Aft(i);
260    }
261
262``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes
263of the unrolled outer loop. If not specified, the attributes of the
264original outer loop without the ``llvm.loop.unroll.*`` attributes are
265copied and ``llvm.loop.unroll.disable`` added to it.
266
267``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes
268of the unrolled inner loop. If not specified, the attributes of the
269original inner loop are used unchanged.
270
271``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop
272attributes of the outer remainder loop. If not specified it will not
273have any attributes. The remainder loop might not be present due to
274being fully unrolled.
275
276``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop
277attributes of the inner remainder loop. If not specified it will have
278the attributes of the original inner loop. It the outer remainder loop
279is unrolled, the inner remainder loop might be present multiple times.
280
281Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are
282added to all of the aforementioned output loops.
283
284To avoid that the unrolled loop is unrolled again, it is
285recommended to add ``llvm.loop.unroll.disable`` to
286``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam
287as well as an additional inner loop unrolling. If no follow-up
288attribute specified for a generated loop, it is added automatically.
289
290Loop Distribution
291-----------------
292
293The LoopDistribution pass tries to separate vectorizable parts of a loop
294from the non-vectorizable part (which otherwise would make the entire
295loop non-vectorizable). Conceptually, it transforms a loop such as
296
297.. code-block:: c
298
299    for (int i = 1; i < n; i+=1) { // original loop
300      A[i] = i;
301      B[i] = 2 + B[i];
302      C[i] = 3 + C[i - 1];
303    }
304
305into the following code:
306
307.. code-block:: c
308
309    if (rtc) {
310      for (int i = 1; i < n; i+=1) // coincident loop
311        A[i] = i;
312      for (int i = 1; i < n; i+=1) // coincident loop
313        B[i] = 2 + B[i];
314      for (int i = 1; i < n; i+=1) // sequential loop
315        C[i] = 3 + C[i - 1];
316    } else {
317      for (int i = 1; i < n; i+=1) { // fallback loop
318        A[i] = i;
319        B[i] = 2 + B[i];
320        C[i] = 3 + C[i - 1];
321      }
322    }
323
324where ``rtc`` is a generated runtime check.
325
326``llvm.loop.distribute.followup_coincident`` sets the loop attributes of
327all loops without loop-carried dependencies (i.e. vectorizable loops).
328There might be more than one such loops. If not defined, the loops will
329inherit the original loop's attributes.
330
331``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the
332loop with potentially unsafe dependencies. There should be at most one
333such loop. If not defined, the loop will inherit the original loop's
334attributes.
335
336``llvm.loop.distribute.followup_fallback`` defines the loop attributes
337for the fallback loop, which is a copy of the original loop for when
338loop versioning is required. If undefined, the fallback loop inherits
339all attributes from the original loop.
340
341Attributes defined in ``llvm.loop.distribute.followup_all`` are added to
342all of the aforementioned output loops.
343
344It is recommended to add ``llvm.loop.disable_nonforced`` to
345``llvm.loop.distribute.followup_fallback``. This avoids that the
346fallback version (which is likely never executed) is further optimized
347which would increase the code size.
348
349Versioning LICM
350---------------
351
352The pass hoists code out of loops that are only loop-invariant when
353dynamic conditions apply. For instance, it transforms the loop
354
355.. code-block:: c
356
357    for (int i = 0; i < n; i+=1) // original loop
358      A[i] = B[0];
359
360into:
361
362.. code-block:: c
363
364    if (rtc) {
365      auto b = B[0];
366      for (int i = 0; i < n; i+=1) // versioned loop
367        A[i] = b;
368    } else {
369      for (int i = 0; i < n; i+=1) // unversioned loop
370        A[i] = B[0];
371    }
372
373The runtime condition (``rtc``) checks that the array ``A`` and the
374element `B[0]` do not alias.
375
376Currently, this transformation does not support followup-attributes.
377
378Loop Interchange
379----------------
380
381Currently, the ``LoopInterchange`` pass does not use any metadata.
382
383Ambiguous Transformation Order
384==============================
385
386If there multiple transformations defined, the order in which they are
387executed depends on the order in LLVM's pass pipeline, which is subject
388to change. The default optimization pipeline (anything higher than
389``-O0``) has the following order.
390
391When using the legacy pass manager:
392
393 - LoopInterchange (if enabled)
394 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
395 - VersioningLICM (if enabled)
396 - LoopDistribute
397 - LoopVectorizer
398 - LoopUnrollAndJam (if enabled)
399 - LoopUnroll (partial and runtime unrolling)
400
401When using the legacy pass manager with LTO:
402
403 - LoopInterchange (if enabled)
404 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
405 - LoopVectorizer
406 - LoopUnroll (partial and runtime unrolling)
407
408When using the new pass manager:
409
410 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling)
411 - LoopDistribute
412 - LoopVectorizer
413 - LoopUnrollAndJam (if enabled)
414 - LoopUnroll (partial and runtime unrolling)
415
416Leftover Transformations
417========================
418
419Forced transformations that have not been applied after the last
420transformation pass should be reported to the user. The transformation
421passes themselves cannot be responsible for this reporting because they
422might not be in the pipeline, there might be multiple passes able to
423apply a transformation (e.g. ``LoopInterchange`` and Polly) or a
424transformation attribute may be 'hidden' inside another passes' followup
425attribute.
426
427The pass ``-transform-warning`` (``WarnMissedTransformationsPass``)
428emits such warnings. It should be placed after the last transformation
429pass.
430
431The current pass pipeline has a fixed order in which transformations
432passes are executed. A transformation can be in the followup of a pass
433that is executed later and thus leftover. For instance, a loop nest
434cannot be distributed and then interchanged with the current pass
435pipeline. The loop distribution will execute, but there is no loop
436interchange pass following such that any loop interchange metadata will
437be ignored. The ``-transform-warning`` should emit a warning in this
438case.
439
440Future versions of LLVM may fix this by executing transformations using
441a dynamic ordering.
442