1# Register Allocation in Cranelift 2 3Cranelift uses a *decoupled, SSA-based* register allocator. Decoupled means that 4register allocation is split into two primary phases: *spilling* and 5*coloring*. SSA-based means that the code stays in SSA form throughout the 6register allocator, and in fact is still in SSA form after register allocation. 7 8Before the register allocator is run, all instructions in the function must be 9*legalized*, which means that every instruction has an entry in the 10`encodings` table. The encoding entries also provide register class 11constraints on the instruction's operands that the register allocator must 12satisfy. 13 14After the register allocator has run, the `locations` table provides a 15register or stack slot location for all SSA values used by the function. The 16register allocator may have inserted `spill`, `fill`, and 17`copy` instructions to make that possible. 18 19## SSA-based register allocation 20 21The phases of the SSA-based register allocator are: 22 23Liveness analysis 24 For each SSA value, determine exactly where it is live. 25 26Coalescing 27 Form *virtual registers* which are sets of SSA values that should be 28 assigned to the same location. Split live ranges such that values that 29 belong to the same virtual register don't have interfering live ranges. 30 31Spilling 32 The process of deciding which SSA values go in a stack slot and which 33 values go in a register. The spilling phase can also split live ranges by 34 inserting `copy` instructions, or transform the code in other ways to 35 reduce the number of values kept in registers. 36 37 After spilling, the number of live register values never exceeds the number 38 of available registers. 39 40Reload 41 Insert `spill` and `fill` instructions as necessary such that 42 instructions that expect their operands in registers won't see values that 43 live on the stack and vice versa. 44 45 Reuse registers containing values loaded from the stack as much as possible 46 without exceeding the maximum allowed register pressure. 47 48Coloring 49 The process of assigning specific registers to the live values. It's a 50 property of SSA form that this can be done in a linear scan of the 51 dominator tree without causing any additional spills. 52 53 Make sure that specific register operand constraints are satisfied. 54 55The contract between the spilling and coloring phases is that the number of 56values in registers never exceeds the number of available registers. This 57sounds simple enough in theory, but in practice there are some complications. 58 59### Real-world complications to SSA coloring 60 61In practice, instruction set architectures don't have "K interchangeable 62registers", and register pressure can't be measured with a single number. There 63are complications: 64 65Different register banks 66 Most ISAs separate integer registers from floating point registers, and 67 instructions require their operands to come from a specific bank. This is a 68 fairly simple problem to deal with since the register banks are completely 69 disjoint. We simply count the number of integer and floating-point values 70 that are live independently, and make sure that each number does not exceed 71 the size of their respective register banks. 72 73Instructions with fixed operands 74 Some instructions use a fixed register for an operand. This happens on the 75 x86 ISAs: 76 77 - Dynamic shift and rotate instructions take the shift amount in CL. 78 - Division instructions use RAX and RDX for both input and output operands. 79 - Wide multiply instructions use fixed RAX and RDX registers for input and 80 output operands. 81 - A few SSE variable blend instructions use a hardwired XMM0 input operand. 82 83Operands constrained to register subclasses 84 Some instructions can only use a subset of the registers for some operands. 85 For example, the ARM NEON vmla (scalar) instruction requires the scalar 86 operand to be located in D0-15 or even D0-7, depending on the data type. 87 The other operands can be from the full D0-31 register set. 88 89ABI boundaries 90 Before making a function call, arguments must be placed in specific 91 registers and stack locations determined by the ABI, and return values 92 appear in fixed registers. 93 94 Some registers can be clobbered by the call and some are saved by the 95 callee. In some cases, only the low bits of a register are saved by the 96 callee. For example, ARM64 callees save only the low 64 bits of v8-15, and 97 Win64 callees only save the low 128 bits of AVX registers. 98 99 ABI boundaries also affect the location of arguments to the entry block and 100 return values passed to the `return` instruction. 101 102Aliasing registers 103 Different registers sometimes share the same bits in the register bank. 104 This can make it difficult to measure register pressure. For example, the 105 x86 registers RAX, EAX, AX, AL, and AH overlap. 106 107 If only one of the aliasing registers can be used at a time, the aliasing 108 doesn't cause problems since the registers can simply be counted as one 109 unit. 110 111Early clobbers 112 Sometimes an instruction requires that the register used for an output 113 operand does not alias any of the input operands. This happens for inline 114 assembly and in some other special cases. 115 116 117## Liveness Analysis 118 119All the register allocator passes need to know exactly where SSA values are 120live. The liveness analysis computes this information. 121 122The data structure representing the live range of a value uses the linear 123layout of the function. All instructions and EBB headers are assigned a 124*program position*. A starting point for a live range can be one of the 125following: 126 127- The instruction where the value is defined. 128- The EBB header where the value is an EBB parameter. 129- An EBB header where the value is live-in because it was defined in a 130 dominating block. 131 132The ending point of a live range can be: 133 134- The last instruction to use the value. 135- A branch or jump to an EBB where the value is live-in. 136 137When all the EBBs in a function are laid out linearly, the live range of a 138value doesn't have to be a contiguous interval, although it will be in a 139majority of cases. There can be holes in the linear live range. 140 141The part of a value's live range that falls inside a single EBB will always be 142an interval without any holes. This follows from the dominance requirements of 143SSA. A live range is represented as: 144 145- The interval inside the EBB where the value is defined. 146- A set of intervals for EBBs where the value is live-in. 147 148Any value that is only used inside a single EBB will have an empty set of 149live-in intervals. Some values are live across large parts of the function, and 150this can often be represented with coalesced live-in intervals covering many 151EBBs. It is important that the live range data structure doesn't have to grow 152linearly with the number of EBBs covered by a live range. 153 154This representation is very similar to LLVM's `LiveInterval` data structure 155with a few important differences: 156 157- The Cranelift `LiveRange` only covers a single SSA value, while LLVM's 158 `LiveInterval` represents the union of multiple related SSA values in a 159 virtual register. This makes Cranelift's representation smaller because 160 individual segments don't have to annotated with a value number. 161- Cranelift stores the def-interval separately from a list of coalesced live-in 162 intervals, while LLVM stores an array of segments. The two representations 163 are equivalent, but Cranelift optimizes for the common case of a value that is 164 only used locally. 165- It is simpler to check if two live ranges are overlapping. The dominance 166 properties of SSA form means that it is only necessary to check the 167 def-interval of each live range against the intervals of the other range. It 168 is not necessary to check for overlap between the two sets of live-in 169 intervals. This makes the overlap check logarithmic in the number of live-in 170 intervals instead of linear. 171- LLVM represents a program point as `SlotIndex` which holds a pointer to a 172 32-byte `IndexListEntry` struct. The entries are organized in a double 173 linked list that mirrors the ordering of instructions in a basic block. This 174 allows 'tombstone' program points corresponding to instructions that have 175 been deleted. 176 177 Cranelift uses a 32-bit program point representation that encodes an 178 instruction or EBB number directly. There are no 'tombstones' for deleted 179 instructions, and no mirrored linked list of instructions. Live ranges must 180 be updated when instructions are deleted. 181 182A consequence of Cranelift's more compact representation is that two program 183points can't be compared without the context of a function layout. 184 185## Coalescing algorithm 186 187Unconstrained SSA form is not well suited to register allocation because of the problems 188that can arise around EBB parameters and arguments. Consider this simple example: 189 190``` 191 function %interference(i32, i32) -> i32 { 192 ebb0(v0: i32, v1: i32): 193 brz v0, ebb1(v1) 194 jump ebb1(v0) 195 196 ebb1(v2: i32): 197 v3 = iadd v1, v2 198 return v3 199 } 200``` 201 202Here, the value `v1` is both passed as an argument to `ebb1` *and* it is 203live in to the EBB because it is used by the `iadd` instruction. Since 204EBB arguments on the `brz` instruction need to be in the same register as 205the corresponding EBB parameter `v2`, there is going to be interference 206between `v1` and `v2` in the `ebb1` block. 207 208The interference can be resolved by isolating the SSA values passed as EBB arguments: 209 210``` 211 function %coalesced(i32, i32) -> i32 { 212 ebb0(v0: i32, v1: i32): 213 v5 = copy v1 214 brz v0, ebb1(v5) 215 v6 = copy v0 216 jump ebb1(v6) 217 218 ebb1(v2: i32): 219 v3 = iadd.i32 v1, v2 220 return v3 221 } 222``` 223 224Now the EBB argument is `v5` which is *not* itself live into `ebb1`, 225resolving the interference. 226 227The coalescing pass groups the SSA values into sets called *virtual registers* 228and inserts copies such that: 229 2301. Whenever a value is passed as an EBB argument, the corresponding EBB 231 parameter value belongs to the same virtual register as the passed argument 232 value. 2332. The live ranges of values belonging to the same virtual register do not 234 interfere, i.e. they don't overlap anywhere. 235 236Most virtual registers contains only a single isolated SSA value because most 237SSA values are never passed as EBB arguments. The `VirtRegs` data structure 238doesn't store any information about these singleton virtual registers, it only 239tracks larger virtual registers and assumes that any value it doesn't know about 240is its own singleton virtual register 241 242Once the values have been partitioned into interference-free virtual registers, 243the code is said to be in `conventional SSA form (CSSA) 244<http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.107.7249>`_. A program 245in CSSA form can be register allocated correctly by assigning all the values in 246a virtual register to the same stack or register location. 247 248Conventional SSA form and the virtual registers are maintained through all the 249register allocator passes. 250 251 252## Spilling algorithm 253 254The spilling pass is responsible for lowering the register pressure enough that 255the coloring pass is guaranteed to be able to find a coloring solution. It does 256this by assigning whole virtual registers to stack slots. 257 258Besides just counting registers, the spiller also has to look at the 259instruction's operand constraints because sometimes the constraints can require 260extra registers to solve, raising the register pressure: 261 262- If a single value is used more than once by an instruction, and the operands 263 have conflicting constraints, two registers must be used. The most common case is 264 when a single value is passed as two separate arguments to a function call. 265- If an instruction has a *tied operand constraint* where one of the input operands 266 must use the same register as the output operand, the spiller makes sure that 267 the tied input value doesn't interfere with the output value by inserting a copy 268 if needed. 269 270The spilling heuristic used by Cranelift is very simple. Whenever the spiller 271determines that the register pressure is too high at some instruction, it picks 272the live SSA value whose definition is farthest away as the spill candidate. 273Then it spills all values in the corresponding virtual register to the same 274spill slot. It is important that all values in a virtual register get the same 275spill slot, otherwise we could need memory-to-memory copies when passing spilled 276arguments to a spilled EBB parameter. 277 278This simple heuristic tends to spill values with long live ranges, and it 279depends on the reload pass to do a good job of reusing registers reloaded from 280spill slots if the spilled value gets used a lot. The idea is to minimize stack 281*write* traffic with the spilling heuristic and to minimize stack *read* traffic 282with the reload pass. 283 284## Coloring algorithm 285 286The SSA coloring algorithm is based on a single observation: If two SSA values 287interfere, one of the values must be live where the other value is defined. 288 289We visit the EBBs in a topological order such that all dominating EBBs are 290visited before the current EBB. The instructions in an EBB are visited in a 291top-down order, and each value define by the instruction is assigned an 292available register. With this iteration order, every value that is live at an 293instruction has already been assigned to a register. 294 295This coloring algorithm works if the following condition holds: 296 297 At every instruction, consider the values live through the instruction. No 298 matter how the live values have been assigned to registers, there must be 299 available registers of the right register classes available for the values 300 defined by the instruction. 301 302We'll need to modify this condition in order to deal with the real-world 303complications. 304 305The coloring algorithm needs to keep track of the set of live values at each 306instruction. At the top of an EBB, this set can be computed as the union of: 307 308- The set of live values before the immediately dominating branch or jump 309 instruction. The topological iteration order guarantees that this set is 310 available. Values whose live range indicate that they are not live-in to the 311 current EBB should be filtered out. 312- The set of parameters the EBB. These values should all be live-in, although 313 it is possible that some are dead and never used anywhere. 314 315For each live value, we also track its kill point in the current EBB. This is 316the last instruction to use the value in the EBB. Values that are live-out 317through the EBB terminator don't have a kill point. Note that the kill point 318can be a branch to another EBB that uses the value, so the kill instruction 319doesn't have to be a use of the value. 320 321When advancing past an instruction, the live set is updated: 322 323- Any values whose kill point is the current instruction are removed. 324- Any values defined by the instruction are added, unless their kill point is 325 the current instruction. This corresponds to a dead def which has no uses. 326