1# 'llvm' Dialect 2 3This dialect maps [LLVM IR](https://llvm.org/docs/LangRef.html) into MLIR by 4defining the corresponding operations and types. LLVM IR metadata is usually 5represented as MLIR attributes, which offer additional structure verification. 6 7We use "LLVM IR" to designate the 8[intermediate representation of LLVM](https://llvm.org/docs/LangRef.html) and 9"LLVM _dialect_" or "LLVM IR _dialect_" to refer to this MLIR dialect. 10 11Unless explicitly stated otherwise, the semantics of the LLVM dialect operations 12must correspond to the semantics of LLVM IR instructions and any divergence is 13considered a bug. The dialect also contains auxiliary operations that smoothen 14the differences in the IR structure, e.g., MLIR does not have `phi` operations 15and LLVM IR does not have a `constant` operation. These auxiliary operations are 16systematically prefixed with `mlir`, e.g. `llvm.mlir.constant` where `llvm.` is 17the dialect namespace prefix. 18 19[TOC] 20 21## Dependency on LLVM IR 22 23LLVM dialect is not expected to depend on any object that requires an 24`LLVMContext`, such as an LLVM IR instruction or type. Instead, MLIR provides 25thread-safe alternatives compatible with the rest of the infrastructure. The 26dialect is allowed to depend on the LLVM IR objects that don't require a 27context, such as data layout and triple description. 28 29## Module Structure 30 31IR modules use the built-in MLIR `ModuleOp` and support all its features. In 32particular, modules can be named, nested and are subject to symbol visibility. 33Modules can contain any operations, including LLVM functions and globals. 34 35### Data Layout and Triple 36 37An IR module may have an optional data layout and triple information attached 38using MLIR attributes `llvm.data_layout` and `llvm.triple`, respectively. Both 39are string attributes with the 40[same syntax](https://llvm.org/docs/LangRef.html#data-layout) as in LLVM IR and 41are verified to be correct. They can be defined as follows. 42 43```mlir 44module attributes {llvm.data_layout = "e", 45 llvm.target_triple = "aarch64-linux-android"} { 46 // module contents 47} 48``` 49 50### Functions 51 52LLVM functions are represented by a special operation, `llvm.func`, that has 53syntax similar to that of the built-in function operation but supports 54LLVM-related features such as linkage and variadic argument lists. See detailed 55description in the operation list [below](#llvmfunc-mlirllvmllvmfuncop). 56 57### PHI Nodes and Block Arguments 58 59MLIR uses block arguments instead of PHI nodes to communicate values between 60blocks. Therefore, the LLVM dialect has no operation directly equivalent to 61`phi` in LLVM IR. Instead, all terminators can pass values as successor operands 62as these values will be forwarded as block arguments when the control flow is 63transferred. 64 65For example: 66 67```mlir 68^bb1: 69 %0 = llvm.addi %arg0, %cst : i32 70 llvm.br ^bb2[%0: i32] 71 72// If the control flow comes from ^bb1, %arg1 == %0. 73^bb2(%arg1: i32) 74 // ... 75``` 76 77is equivalent to LLVM IR 78 79```llvm 80%0: 81 %1 = add i32 %arg0, %cst 82 br %3 83 84%3: 85 %arg1 = phi [%1, %0], //... 86``` 87 88Since there is no need to use the block identifier to differentiate the source 89of different values, the LLVM dialect supports terminators that transfer the 90control flow to the same block with different arguments. For example: 91 92```mlir 93^bb1: 94 llvm.cond_br %cond, ^bb2[%0: i32], ^bb2[%1: i32] 95 96^bb2(%arg0: i32): 97 // ... 98``` 99 100### Context-Level Values 101 102Some value kinds in LLVM IR, such as constants and undefs, are uniqued in 103context and used directly in relevant operations. MLIR does not support such 104values for thread-safety and concept parsimony reasons. Instead, regular values 105are produced by dedicated operations that have the corresponding semantics: 106[`llvm.mlir.constant`](#llvmmlirconstant-mlirllvmconstantop), 107[`llvm.mlir.undef`](#llvmmlirundef-mlirllvmundefop), 108[`llvm.mlir.null`](#llvmmlirnull-mlirllvmnullop). Note how these operations are 109prefixed with `mlir.` to indicate that they don't belong to LLVM IR but are only 110necessary to model it in MLIR. The values produced by these operations are 111usable just like any other value. 112 113Examples: 114 115```mlir 116// Create an undefined value of structure type with a 32-bit integer followed 117// by a float. 118%0 = llvm.mlir.undef : !llvm.struct<(i32, f32)> 119 120// Null pointer to i8. 121%1 = llvm.mlir.null : !llvm.ptr<i8> 122 123// Null pointer to a function with signature void(). 124%2 = llvm.mlir.null : !llvm.ptr<func<void ()>> 125 126// Constant 42 as i32. 127%3 = llvm.mlir.constant(42 : i32) : i32 128 129// Splat dense vector constant. 130%3 = llvm.mlir.constant(dense<1.0> : vector<4xf32>) : vector<4xf32> 131``` 132 133Note that constants list the type twice. This is an artifact of the LLVM dialect 134not using built-in types, which are used for typed MLIR attributes. The syntax 135will be reevaluated after considering composite constants. 136 137### Globals 138 139Global variables are also defined using a special operation, 140[`llvm.mlir.global`](#llvmmlirglobal-mlirllvmglobalop), located at the module 141level. Globals are MLIR symbols and are identified by their name. 142 143Since functions need to be isolated-from-above, i.e. values defined outside the 144function cannot be directly used inside the function, an additional operation, 145[`llvm.mlir.addressof`](#llvmmliraddressof-mlirllvmaddressofop), is provided to 146locally define a value containing the _address_ of a global. The actual value 147can then be loaded from that pointer, or a new value can be stored into it if 148the global is not declared constant. This is similar to LLVM IR where globals 149are accessed through name and have a pointer type. 150 151### Linkage 152 153Module-level named objects in the LLVM dialect, namely functions and globals, 154have an optional _linkage_ attribute derived from LLVM IR 155[linkage types](https://llvm.org/docs/LangRef.html#linkage-types). Linkage is 156specified by the same keyword as in LLVM IR and is located between the operation 157name (`llvm.func` or `llvm.global`) and the symbol name. If no linkage keyword 158is present, `external` linkage is assumed by default. Linakge is _distinct_ from 159MLIR symbol visibility. 160 161### Attribute Pass-Through 162 163The LLVM dialect provides a mechanism to forward function-level attributes to 164LLVM IR using the `passthrough` attribute. This is an array attribute containing 165either string attributes or array attributes. In the former case, the value of 166the string is interpreted as the name of LLVM IR function attribute. In the 167latter case, the array is expected to contain exactly two string attributes, the 168first corresponding to the name of LLVM IR function attribute, and the second 169corresponding to its value. Note that even integer LLVM IR function attributes 170have their value represented in the string form. 171 172Example: 173 174```mlir 175llvm.func @func() attributes { 176 passthrough = ["noinline", // value-less attribute 177 ["alignstack", "4"], // integer attribute with value 178 ["other", "attr"]] // attribute unknown to LLVM 179} { 180 llvm.return 181} 182``` 183 184If the attribute is not known to LLVM IR, it will be attached as a string 185attribute. 186 187## Types 188 189LLVM dialect uses built-in types whenever possible and defines a set of 190complementary types, which correspond to the LLVM IR types that cannot be 191directly represented with built-in types. Similarly to other MLIR context-owned 192objects, the creation and manipulation of LLVM dialect types is thread-safe. 193 194MLIR does not support module-scoped named type declarations, e.g. `%s = type 195{i32, i32}` in LLVM IR. Instead, types must be fully specified at each use, 196except for recursive types where only the first reference to a named type needs 197to be fully specified. MLIR [type aliases](../LangRef.md/#type-aliases) can be used 198to achieve more compact syntax. 199 200The general syntax of LLVM dialect types is `!llvm.`, followed by a type kind 201identifier (e.g., `ptr` for pointer or `struct` for structure) and by an 202optional list of type parameters in angle brackets. The dialect follows MLIR 203style for types with nested angle brackets and keyword specifiers rather than 204using different bracket styles to differentiate types. Types inside the angle 205brackets may omit the `!llvm.` prefix for brevity: the parser first attempts to 206find a type (starting with `!` or a built-in type) and falls back to accepting a 207keyword. For example, `!llvm.ptr<!llvm.ptr<i32>>` and `!llvm.ptr<ptr<i32>>` are 208equivalent, with the latter being the canonical form, and denote a pointer to a 209pointer to a 32-bit integer. 210 211### Built-in Type Compatibility 212 213LLVM dialect accepts a subset of built-in types that are referred to as _LLVM 214dialect-compatible types_. The following types are compatible: 215 216- Signless integers - `iN` (`IntegerType`). 217- Floating point types - `bfloat`, `half`, `float`, `double` , `f80`, `f128` 218 (`FloatType`). 219- 1D vectors of signless integers or floating point types - `vector<NxT>` 220 (`VectorType`). 221 222Note that only a subset of types that can be represented by a given class is 223compatible. For example, signed and unsigned integers are not compatible. LLVM 224provides a function, `bool LLVM::isCompatibleType(Type)`, that can be used as a 225compatibility check. 226 227Each LLVM IR type corresponds to *exactly one* MLIR type, either built-in or 228LLVM dialect type. For example, because `i32` is LLVM-compatible, there is no 229`!llvm.i32` type. However, `!llvm.ptr<T>` is defined in the LLVM dialect as 230there is no corresponding built-in type. 231 232### Additional Simple Types 233 234The following non-parametric types derived from the LLVM IR are available in the 235LLVM dialect: 236 237- `!llvm.x86_mmx` (`LLVMX86MMXType`) - value held in an MMX register on x86 238 machine. 239- `!llvm.ppc_fp128` (`LLVMPPCFP128Type`) - 128-bit floating-point value (two 240 64 bits). 241- `!llvm.token` (`LLVMTokenType`) - a non-inspectable value associated with an 242 operation. 243- `!llvm.metadata` (`LLVMMetadataType`) - LLVM IR metadata, to be used only if 244 the metadata cannot be represented as structured MLIR attributes. 245- `!llvm.void` (`LLVMVoidType`) - does not represent any value; can only 246 appear in function results. 247 248These types represent a single value (or an absence thereof in case of `void`) 249and correspond to their LLVM IR counterparts. 250 251### Additional Parametric Types 252 253These types are parameterized by the types they contain, e.g., the pointee or 254the element type, which can be either compatible built-in or LLVM dialect types. 255 256#### Pointer Types 257 258Pointer types specify an address in memory. 259 260Pointer types are parametric types parameterized by the element type and the 261address space. The address space is an integer, but this choice may be 262reconsidered if MLIR implements named address spaces. Their syntax is as 263follows: 264 265``` 266 llvm-ptr-type ::= `!llvm.ptr<` type (`,` integer-literal)? `>` 267``` 268 269where the optional integer literal corresponds to the memory space. Both cases 270are represented by `LLVMPointerType` internally. 271 272#### Array Types 273 274Array types represent sequences of elements in memory. Array elements can be 275addressed with a value unknown at compile time, and can be nested. Only 1D 276arrays are allowed though. 277 278Array types are parameterized by the fixed size and the element type. 279Syntactically, their representation is the following: 280 281``` 282 llvm-array-type ::= `!llvm.array<` integer-literal `x` type `>` 283``` 284 285and they are internally represented as `LLVMArrayType`. 286 287#### Function Types 288 289Function types represent the type of a function, i.e. its signature. 290 291Function types are parameterized by the result type, the list of argument types 292and by an optional "variadic" flag. Unlike built-in `FunctionType`, LLVM dialect 293functions (`LLVMFunctionType`) always have single result, which may be 294`!llvm.void` if the function does not return anything. The syntax is as follows: 295 296``` 297 llvm-func-type ::= `!llvm.func<` type `(` type-list (`,` `...`)? `)` `>` 298``` 299 300For example, 301 302```mlir 303!llvm.func<void ()> // a function with no arguments; 304!llvm.func<i32 (f32, i32)> // a function with two arguments and a result; 305!llvm.func<void (i32, ...)> // a variadic function with at least one argument. 306``` 307 308In the LLVM dialect, functions are not first-class objects and one cannot have a 309value of function type. Instead, one can take the address of a function and 310operate on pointers to functions. 311 312### Vector Types 313 314Vector types represent sequences of elements, typically when multiple data 315elements are processed by a single instruction (SIMD). Vectors are thought of as 316stored in registers and therefore vector elements can only be addressed through 317constant indices. 318 319Vector types are parameterized by the size, which may be either _fixed_ or a 320multiple of some fixed size in case of _scalable_ vectors, and the element type. 321Vectors cannot be nested and only 1D vectors are supported. Scalable vectors are 322still considered 1D. 323 324LLVM dialect uses built-in vector types for _fixed_-size vectors of built-in 325types, and provides additional types for fixed-sized vectors of LLVM dialect 326types (`LLVMFixedVectorType`) and scalable vectors of any types 327(`LLVMScalableVectorType`). These two additional types share the following 328syntax: 329 330``` 331 llvm-vec-type ::= `!llvm.vec<` (`?` `x`)? integer-literal `x` type `>` 332``` 333 334Note that the sets of element types supported by built-in and LLVM dialect 335vector types are mutually exclusive, e.g., the built-in vector type does not 336accept `!llvm.ptr<i32>` and the LLVM dialect fixed-width vector type does not 337accept `i32`. 338 339The following functions are provided to operate on any kind of the vector types 340compatible with the LLVM dialect: 341 342- `bool LLVM::isCompatibleVectorType(Type)` - checks whether a type is a 343 vector type compatible with the LLVM dialect; 344- `Type LLVM::getVectorElementType(Type)` - returns the element type of any 345 vector type compatible with the LLVM dialect; 346- `llvm::ElementCount LLVM::getVectorNumElements(Type)` - returns the number 347 of elements in any vector type compatible with the LLVM dialect; 348- `Type LLVM::getFixedVectorType(Type, unsigned)` - gets a fixed vector type 349 with the given element type and size; the resulting type is either a 350 built-in or an LLVM dialect vector type depending on which one supports the 351 given element type. 352 353#### Examples of Compatible Vector Types 354 355```mlir 356vector<42 x i32> // Vector of 42 32-bit integers. 357!llvm.vec<42 x ptr<i32>> // Vector of 42 pointers to 32-bit integers. 358!llvm.vec<? x 4 x i32> // Scalable vector of 32-bit integers with 359 // size divisible by 4. 360!llvm.array<2 x vector<2 x i32>> // Array of 2 vectors of 2 32-bit integers. 361!llvm.array<2 x vec<2 x ptr<i32>>> // Array of 2 vectors of 2 pointers to 32-bit 362 // integers. 363``` 364 365### Structure Types 366 367The structure type is used to represent a collection of data members together in 368memory. The elements of a structure may be any type that has a size. 369 370Structure types are represented in a single dedicated class 371mlir::LLVM::LLVMStructType. Internally, the struct type stores a (potentially 372empty) name, a (potentially empty) list of contained types and a bitmask 373indicating whether the struct is named, opaque, packed or uninitialized. 374Structure types that don't have a name are referred to as _literal_ structs. 375Such structures are uniquely identified by their contents. _Identified_ structs 376on the other hand are uniquely identified by the name. 377 378#### Identified Structure Types 379 380Identified structure types are uniqued using their name in a given context. 381Attempting to construct an identified structure with the same name a structure 382that already exists in the context *will result in the existing structure being 383returned*. **MLIR does not auto-rename identified structs in case of name 384conflicts** because there is no naming scope equivalent to a module in LLVM IR 385since MLIR modules can be arbitrarily nested. 386 387Programmatically, identified structures can be constructed in an _uninitialized_ 388state. In this case, they are given a name but the body must be set up by a 389later call, using MLIR's type mutation mechanism. Such uninitialized types can 390be used in type construction, but must be eventually initialized for IR to be 391valid. This mechanism allows for constructing _recursive_ or mutually referring 392structure types: an uninitialized type can be used in its own initialization. 393 394Once the type is initialized, its body cannot be changed anymore. Any further 395attempts to modify the body will fail and return failure to the caller _unless 396the type is initialized with the exact same body_. Type initialization is 397thread-safe; however, if a concurrent thread initializes the type before the 398current thread, the initialization may return failure. 399 400The syntax for identified structure types is as follows. 401 402``` 403llvm-ident-struct-type ::= `!llvm.struct<` string-literal, `opaque` `>` 404 | `!llvm.struct<` string-literal, `packed`? 405 `(` type-or-ref-list `)` `>` 406type-or-ref-list ::= <maybe empty comma-separated list of type-or-ref> 407type-or-ref ::= <any compatible type with optional !llvm.> 408 | `!llvm.`? `struct<` string-literal `>` 409``` 410 411The body of the identified struct is printed in full unless the it is 412transitively contained in the same struct. In the latter case, only the 413identifier is printed. For example, the structure containing the pointer to 414itself is represented as `!llvm.struct<"A", (ptr<"A">)>`, and the structure `A` 415containing two pointers to the structure `B` containing a pointer to the 416structure `A` is represented as `!llvm.struct<"A", (ptr<"B", (ptr<"A">)>, 417ptr<"B", (ptr<"A">))>`. Note that the structure `B` is "unrolled" for both 418elements. _A structure with the same name but different body is a syntax error._ 419**The user must ensure structure name uniqueness across all modules processed in 420a given MLIR context.** Structure names are arbitrary string literals and may 421include, e.g., spaces and keywords. 422 423Identified structs may be _opaque_. In this case, the body is unknown but the 424structure type is considered _initialized_ and is valid in the IR. 425 426#### Literal Structure Types 427 428Literal structures are uniqued according to the list of elements they contain, 429and can optionally be packed. The syntax for such structs is as follows. 430 431``` 432llvm-literal-struct-type ::= `!llvm.struct<` `packed`? `(` type-list `)` `>` 433type-list ::= <maybe empty comma-separated list of types with optional !llvm.> 434``` 435 436Literal structs cannot be recursive, but can contain other structs. Therefore, 437they must be constructed in a single step with the entire list of contained 438elements provided. 439 440#### Examples of Structure Types 441 442```mlir 443!llvm.struct<> // NOT allowed 444!llvm.struct<()> // empty, literal 445!llvm.struct<(i32)> // literal 446!llvm.struct<(struct<(i32)>)> // struct containing a struct 447!llvm.struct<packed (i8, i32)> // packed struct 448!llvm.struct<"a"> // recursive reference, only allowed within 449 // another struct, NOT allowed at top level 450!llvm.struct<"a", ptr<struct<"a">>> // supported example of recursive reference 451!llvm.struct<"a", ()> // empty, named (necessary to differentiate from 452 // recursive reference) 453!llvm.struct<"a", opaque> // opaque, named 454!llvm.struct<"a", (i32)> // named 455!llvm.struct<"a", packed (i8, i32)> // named, packed 456``` 457 458### Unsupported Types 459 460LLVM IR `label` type does not have a counterpart in the LLVM dialect since, in 461MLIR, blocks are not values and don't need a type. 462 463## Operations 464 465All operations in the LLVM IR dialect have a custom form in MLIR. The mnemonic 466of an operation is that used in LLVM IR prefixed with "`llvm.`". 467 468[include "Dialects/LLVMOps.md"] 469