1.. _transformation-metadata: 2 3============================ 4Code Transformation Metadata 5============================ 6 7.. contents:: 8 :local: 9 10Overview 11======== 12 13LLVM transformation passes can be controlled by attaching metadata to 14the code to transform. By default, transformation passes use heuristics 15to determine whether or not to perform transformations, and when doing 16so, other details of how the transformations are applied (e.g., which 17vectorization factor to select). 18Unless the optimizer is otherwise directed, transformations are applied 19conservatively. This conservatism generally allows the optimizer to 20avoid unprofitable transformations, but in practice, this results in the 21optimizer not applying transformations that would be highly profitable. 22 23Frontends can give additional hints to LLVM passes on which 24transformations they should apply. This can be additional knowledge that 25cannot be derived from the emitted IR, or directives passed from the 26user/programmer. OpenMP pragmas are an example of the latter. 27 28If any such metadata is dropped from the program, the code's semantics 29must not change. 30 31Metadata on Loops 32================= 33 34Attributes can be attached to loops as described in :ref:`llvm.loop`. 35Attributes can describe properties of the loop, disable transformations, 36force specific transformations and set transformation options. 37 38Because metadata nodes are immutable (with the exception of 39``MDNode::replaceOperandWith`` which is dangerous to use on uniqued 40metadata), in order to add or remove a loop attributes, a new ``MDNode`` 41must be created and assigned as the new ``llvm.loop`` metadata. Any 42connection between the old ``MDNode`` and the loop is lost. The 43``llvm.loop`` node is also used as LoopID (``Loop::getLoopID()``), i.e. 44the loop effectively gets a new identifier. For instance, 45``llvm.mem.parallel_loop_access`` references the LoopID. Therefore, if 46the parallel access property is to be preserved after adding/removing 47loop attributes, any ``llvm.mem.parallel_loop_access`` reference must be 48updated to the new LoopID. 49 50Transformation Metadata Structure 51================================= 52 53Some attributes describe code transformations (unrolling, vectorizing, 54loop distribution, etc.). They can either be a hint to the optimizer 55that a transformation might be beneficial, instruction to use a specific 56option, , or convey a specific request from the user (such as 57``#pragma clang loop`` or ``#pragma omp simd``). 58 59If a transformation is forced but cannot be carried-out for any reason, 60an optimization-missed warning must be emitted. Semantic information 61such as a transformation being safe (e.g. 62``llvm.mem.parallel_loop_access``) can be unused by the optimizer 63without generating a warning. 64 65Unless explicitly disabled, any optimization pass may heuristically 66determine whether a transformation is beneficial and apply it. If 67metadata for another transformation was specified, applying a different 68transformation before it might be inadvertent due to being applied on a 69different loop or the loop not existing anymore. To avoid having to 70explicitly disable an unknown number of passes, the attribute 71``llvm.loop.disable_nonforced`` disables all optional, high-level, 72restructuring transformations. 73 74The following example avoids the loop being altered before being 75vectorized, for instance being unrolled. 76 77.. code-block:: llvm 78 79 br i1 %exitcond, label %for.exit, label %for.header, !llvm.loop !0 80 ... 81 !0 = distinct !{!0, !1, !2} 82 !1 = !{!"llvm.loop.vectorize.enable", i1 true} 83 !2 = !{!"llvm.loop.disable_nonforced"} 84 85After a transformation is applied, follow-up attributes are set on the 86transformed and/or new loop(s). This allows additional attributes 87including followup-transformations to be specified. Specifying multiple 88transformations in the same metadata node is possible for compatibility 89reasons, but their execution order is undefined. For instance, when 90``llvm.loop.vectorize.enable`` and ``llvm.loop.unroll.enable`` are 91specified at the same time, unrolling may occur either before or after 92vectorization. 93 94As an example, the following instructs a loop to be vectorized and only 95then unrolled. 96 97.. code-block:: llvm 98 99 !0 = distinct !{!0, !1, !2, !3} 100 !1 = !{!"llvm.loop.vectorize.enable", i1 true} 101 !2 = !{!"llvm.loop.disable_nonforced"} 102 !3 = !{!"llvm.loop.vectorize.followup_vectorized", !{"llvm.loop.unroll.enable"}} 103 104If, and only if, no followup is specified, the pass may add attributes itself. 105For instance, the vectorizer adds a ``llvm.loop.isvectorized`` attribute and 106all attributes from the original loop excluding its loop vectorizer 107attributes. To avoid this, an empty followup attribute can be used, e.g. 108 109.. code-block:: llvm 110 111 !3 = !{!"llvm.loop.vectorize.followup_vectorized"} 112 113The followup attributes of a transformation that cannot be applied will 114never be added to a loop and are therefore effectively ignored. This means 115that any followup-transformation in such attributes requires that its 116prior transformations are applied before the followup-transformation. 117The user should receive a warning about the first transformation in the 118transformation chain that could not be applied if it a forced 119transformation. All following transformations are skipped. 120 121Pass-Specific Transformation Metadata 122===================================== 123 124Transformation options are specific to each transformation. In the 125following, we present the model for each LLVM loop optimization pass and 126the metadata to influence them. 127 128Loop Vectorization and Interleaving 129----------------------------------- 130 131Loop vectorization and interleaving is interpreted as a single 132transformation. It is interpreted as forced if 133``!{"llvm.loop.vectorize.enable", i1 true}`` is set. 134 135Assuming the pre-vectorization loop is 136 137.. code-block:: c 138 139 for (int i = 0; i < n; i+=1) // original loop 140 Stmt(i); 141 142then the code after vectorization will be approximately (assuming an 143SIMD width of 4): 144 145.. code-block:: c 146 147 int i = 0; 148 if (rtc) { 149 for (; i + 3 < n; i+=4) // vectorized/interleaved loop 150 Stmt(i:i+3); 151 } 152 for (; i < n; i+=1) // epilogue loop 153 Stmt(i); 154 155where ``rtc`` is a generated runtime check. 156 157``llvm.loop.vectorize.followup_vectorized`` will set the attributes for 158the vectorized loop. If not specified, ``llvm.loop.isvectorized`` is 159combined with the original loop's attributes to avoid it being 160vectorized multiple times. 161 162``llvm.loop.vectorize.followup_epilogue`` will set the attributes for 163the remainder loop. If not specified, it will have the original loop's 164attributes combined with ``llvm.loop.isvectorized`` and 165``llvm.loop.unroll.runtime.disable`` (unless the original loop already 166has unroll metadata). 167 168The attributes specified by ``llvm.loop.vectorize.followup_all`` are 169added to both loops. 170 171When using a follow-up attribute, it replaces any automatically deduced 172attributes for the generated loop in question. Therefore it is 173recommended to add ``llvm.loop.isvectorized`` to 174``llvm.loop.vectorize.followup_all`` which avoids that the loop 175vectorizer tries to optimize the loops again. 176 177Loop Unrolling 178-------------- 179 180Unrolling is interpreted as forced any ``!{!"llvm.loop.unroll.enable"}`` 181metadata or option (``llvm.loop.unroll.count``, ``llvm.loop.unroll.full``) 182is present. Unrolling can be full unrolling, partial unrolling of a loop 183with constant trip count or runtime unrolling of a loop with a trip 184count unknown at compile-time. 185 186If the loop has been unrolled fully, there is no followup-loop. For 187partial/runtime unrolling, the original loop of 188 189.. code-block:: c 190 191 for (int i = 0; i < n; i+=1) // original loop 192 Stmt(i); 193 194is transformed into (using an unroll factor of 4): 195 196.. code-block:: c 197 198 int i = 0; 199 for (; i + 3 < n; i+=4) { // unrolled loop 200 Stmt(i); 201 Stmt(i+1); 202 Stmt(i+2); 203 Stmt(i+3); 204 } 205 for (; i < n; i+=1) // remainder loop 206 Stmt(i); 207 208``llvm.loop.unroll.followup_unrolled`` will set the loop attributes of 209the unrolled loop. If not specified, the attributes of the original loop 210without the ``llvm.loop.unroll.*`` attributes are copied and 211``llvm.loop.unroll.disable`` added to it. 212 213``llvm.loop.unroll.followup_remainder`` defines the attributes of the 214remainder loop. If not specified the remainder loop will have no 215attributes. The remainder loop might not be present due to being fully 216unrolled in which case this attribute has no effect. 217 218Attributes defined in ``llvm.loop.unroll.followup_all`` are added to the 219unrolled and remainder loops. 220 221To avoid that the partially unrolled loop is unrolled again, it is 222recommended to add ``llvm.loop.unroll.disable`` to 223``llvm.loop.unroll.followup_all``. If no follow-up attribute specified 224for a generated loop, it is added automatically. 225 226Unroll-And-Jam 227-------------- 228 229Unroll-and-jam uses the following transformation model (here with an 230unroll factor if 2). Currently, it does not support a fallback version 231when the transformation is unsafe. 232 233.. code-block:: c 234 235 for (int i = 0; i < n; i+=1) { // original outer loop 236 Fore(i); 237 for (int j = 0; j < m; j+=1) // original inner loop 238 SubLoop(i, j); 239 Aft(i); 240 } 241 242.. code-block:: c 243 244 int i = 0; 245 for (; i + 1 < n; i+=2) { // unrolled outer loop 246 Fore(i); 247 Fore(i+1); 248 for (int j = 0; j < m; j+=1) { // unrolled inner loop 249 SubLoop(i, j); 250 SubLoop(i+1, j); 251 } 252 Aft(i); 253 Aft(i+1); 254 } 255 for (; i < n; i+=1) { // remainder outer loop 256 Fore(i); 257 for (int j = 0; j < m; j+=1) // remainder inner loop 258 SubLoop(i, j); 259 Aft(i); 260 } 261 262``llvm.loop.unroll_and_jam.followup_outer`` will set the loop attributes 263of the unrolled outer loop. If not specified, the attributes of the 264original outer loop without the ``llvm.loop.unroll.*`` attributes are 265copied and ``llvm.loop.unroll.disable`` added to it. 266 267``llvm.loop.unroll_and_jam.followup_inner`` will set the loop attributes 268of the unrolled inner loop. If not specified, the attributes of the 269original inner loop are used unchanged. 270 271``llvm.loop.unroll_and_jam.followup_remainder_outer`` sets the loop 272attributes of the outer remainder loop. If not specified it will not 273have any attributes. The remainder loop might not be present due to 274being fully unrolled. 275 276``llvm.loop.unroll_and_jam.followup_remainder_inner`` sets the loop 277attributes of the inner remainder loop. If not specified it will have 278the attributes of the original inner loop. It the outer remainder loop 279is unrolled, the inner remainder loop might be present multiple times. 280 281Attributes defined in ``llvm.loop.unroll_and_jam.followup_all`` are 282added to all of the aforementioned output loops. 283 284To avoid that the unrolled loop is unrolled again, it is 285recommended to add ``llvm.loop.unroll.disable`` to 286``llvm.loop.unroll_and_jam.followup_all``. It suppresses unroll-and-jam 287as well as an additional inner loop unrolling. If no follow-up 288attribute specified for a generated loop, it is added automatically. 289 290Loop Distribution 291----------------- 292 293The LoopDistribution pass tries to separate vectorizable parts of a loop 294from the non-vectorizable part (which otherwise would make the entire 295loop non-vectorizable). Conceptually, it transforms a loop such as 296 297.. code-block:: c 298 299 for (int i = 1; i < n; i+=1) { // original loop 300 A[i] = i; 301 B[i] = 2 + B[i]; 302 C[i] = 3 + C[i - 1]; 303 } 304 305into the following code: 306 307.. code-block:: c 308 309 if (rtc) { 310 for (int i = 1; i < n; i+=1) // coincident loop 311 A[i] = i; 312 for (int i = 1; i < n; i+=1) // coincident loop 313 B[i] = 2 + B[i]; 314 for (int i = 1; i < n; i+=1) // sequential loop 315 C[i] = 3 + C[i - 1]; 316 } else { 317 for (int i = 1; i < n; i+=1) { // fallback loop 318 A[i] = i; 319 B[i] = 2 + B[i]; 320 C[i] = 3 + C[i - 1]; 321 } 322 } 323 324where ``rtc`` is a generated runtime check. 325 326``llvm.loop.distribute.followup_coincident`` sets the loop attributes of 327all loops without loop-carried dependencies (i.e. vectorizable loops). 328There might be more than one such loops. If not defined, the loops will 329inherit the original loop's attributes. 330 331``llvm.loop.distribute.followup_sequential`` sets the loop attributes of the 332loop with potentially unsafe dependencies. There should be at most one 333such loop. If not defined, the loop will inherit the original loop's 334attributes. 335 336``llvm.loop.distribute.followup_fallback`` defines the loop attributes 337for the fallback loop, which is a copy of the original loop for when 338loop versioning is required. If undefined, the fallback loop inherits 339all attributes from the original loop. 340 341Attributes defined in ``llvm.loop.distribute.followup_all`` are added to 342all of the aforementioned output loops. 343 344It is recommended to add ``llvm.loop.disable_nonforced`` to 345``llvm.loop.distribute.followup_fallback``. This avoids that the 346fallback version (which is likely never executed) is further optimized 347which would increase the code size. 348 349Versioning LICM 350--------------- 351 352The pass hoists code out of loops that are only loop-invariant when 353dynamic conditions apply. For instance, it transforms the loop 354 355.. code-block:: c 356 357 for (int i = 0; i < n; i+=1) // original loop 358 A[i] = B[0]; 359 360into: 361 362.. code-block:: c 363 364 if (rtc) { 365 auto b = B[0]; 366 for (int i = 0; i < n; i+=1) // versioned loop 367 A[i] = b; 368 } else { 369 for (int i = 0; i < n; i+=1) // unversioned loop 370 A[i] = B[0]; 371 } 372 373The runtime condition (``rtc``) checks that the array ``A`` and the 374element `B[0]` do not alias. 375 376Currently, this transformation does not support followup-attributes. 377 378Loop Interchange 379---------------- 380 381Currently, the ``LoopInterchange`` pass does not use any metadata. 382 383Ambiguous Transformation Order 384============================== 385 386If there multiple transformations defined, the order in which they are 387executed depends on the order in LLVM's pass pipeline, which is subject 388to change. The default optimization pipeline (anything higher than 389``-O0``) has the following order. 390 391When using the legacy pass manager: 392 393 - LoopInterchange (if enabled) 394 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) 395 - VersioningLICM (if enabled) 396 - LoopDistribute 397 - LoopVectorizer 398 - LoopUnrollAndJam (if enabled) 399 - LoopUnroll (partial and runtime unrolling) 400 401When using the legacy pass manager with LTO: 402 403 - LoopInterchange (if enabled) 404 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) 405 - LoopVectorizer 406 - LoopUnroll (partial and runtime unrolling) 407 408When using the new pass manager: 409 410 - SimpleLoopUnroll/LoopFullUnroll (only performs full unrolling) 411 - LoopDistribute 412 - LoopVectorizer 413 - LoopUnrollAndJam (if enabled) 414 - LoopUnroll (partial and runtime unrolling) 415 416Leftover Transformations 417======================== 418 419Forced transformations that have not been applied after the last 420transformation pass should be reported to the user. The transformation 421passes themselves cannot be responsible for this reporting because they 422might not be in the pipeline, there might be multiple passes able to 423apply a transformation (e.g. ``LoopInterchange`` and Polly) or a 424transformation attribute may be 'hidden' inside another passes' followup 425attribute. 426 427The pass ``-transform-warning`` (``WarnMissedTransformationsPass``) 428emits such warnings. It should be placed after the last transformation 429pass. 430 431The current pass pipeline has a fixed order in which transformations 432passes are executed. A transformation can be in the followup of a pass 433that is executed later and thus leftover. For instance, a loop nest 434cannot be distributed and then interchanged with the current pass 435pipeline. The loop distribution will execute, but there is no loop 436interchange pass following such that any loop interchange metadata will 437be ignored. The ``-transform-warning`` should emit a warning in this 438case. 439 440Future versions of LLVM may fix this by executing transformations using 441a dynamic ordering. 442