1========= 2MemorySSA 3========= 4 5.. contents:: 6 :local: 7 8Introduction 9============ 10 11``MemorySSA`` is an analysis that allows us to cheaply reason about the 12interactions between various memory operations. Its goal is to replace 13``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because, 14unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily 15result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't 16have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get 17better results, too. One common use of ``MemorySSA`` is to quickly find out 18that something definitely cannot happen (for example, reason that a hoist 19out of a loop can't happen). 20 21At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based 22form for memory, complete with def-use and use-def chains, which 23enables users to quickly find may-def and may-uses of memory operations. 24It can also be thought of as a way to cheaply give versions to the complete 25state of memory, and associate memory operations with those versions. 26 27This document goes over how ``MemorySSA`` is structured, and some basic 28intuition on how ``MemorySSA`` works. 29 30A paper on MemorySSA (with notes about how it's implemented in GCC) `can be 31found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's 32relatively out-of-date; the paper references multiple memory partitions, but GCC 33eventually swapped to just using one, like we now have in LLVM. Like 34GCC's, LLVM's MemorySSA is intraprocedural. 35 36 37MemorySSA Structure 38=================== 39 40MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a 41structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are 42``MemorySSA``'s parallel to LLVM ``Instruction``\ s. 43 44Each ``MemoryAccess`` can be one of three types: 45 46- ``MemoryDef`` 47- ``MemoryPhi`` 48- ``MemoryUse`` 49 50``MemoryDef``\ s are operations which may either modify memory, or which 51introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s 52include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher) 53ordering, volatile operations, memory fences, etc. A ``MemoryDef`` 54always introduces a new version of the entire memory and is linked with a single 55``MemoryDef/MemoryPhi`` which is the version of memory that the new 56version is based on. This implies that there is a *single* 57``Def`` chain that connects all the ``Def``\ s, either directly 58or indirectly. For example in: 59 60.. code-block:: llvm 61 62 b = MemoryDef(a) 63 c = MemoryDef(b) 64 d = MemoryDef(c) 65 66``d`` is connected directly with ``c`` and indirectly with ``b``. 67This means that ``d`` potentially clobbers (see below) ``c`` *or* 68``b`` *or* both. This in turn implies that without the use of `The walker`_, 69initially every ``MemoryDef`` clobbers every other ``MemoryDef``. 70 71``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any 72point we have two (or more) ``MemoryDef``\ s that could flow into a 73``BasicBlock``, the block's top ``MemoryAccess`` will be a 74``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any 75concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s 76inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s 77and ``MemoryDef``\ s. 78 79Note also that in SSA, Phi nodes merge must-reach definitions (that is, 80definitions that *must* be new versions of variables). In MemorySSA, PHI nodes 81merge may-reach definitions (that is, until disambiguated, the versions that 82reach a phi node may or may not clobber a given variable). 83 84``MemoryUse``\ s are operations which use but don't modify memory. An example of 85a ``MemoryUse`` is a ``load``, or a ``readonly`` function call. 86 87Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``. 88It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being 89run on, and implies that we've hit the top of the function. It's the only 90``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of 91``liveOnEntry`` implies that the memory being used is either undefined or 92defined before the function begins. 93 94An example of all of this overlaid on LLVM IR (obtained by running ``opt 95-passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When 96viewing this example, it may be helpful to view it in terms of clobbers. 97The operands of a given ``MemoryAccess`` are all (potential) clobbers of said 98``MemoryAccess``, and the value produced by a ``MemoryAccess`` can act as a clobber 99for other ``MemoryAccess``\ es. 100 101If a ``MemoryAccess`` is a *clobber* of another, it means that these two 102``MemoryAccess``\ es may access the same memory. For example, ``x = MemoryDef(y)`` 103means that ``x`` potentially modifies memory that ``y`` modifies/constrains 104(or has modified / constrained). 105In the same manner, ``a = MemoryPhi({BB1,b},{BB2,c})`` means that 106anyone that uses ``a`` is accessing memory potentially modified / constrained 107by either ``b`` or ``c`` (or both). And finally, ``MemoryUse(x)`` means 108that this use accesses memory that ``x`` has modified / constrained 109(as an example, think that if ``x = MemoryDef(...)`` 110and ``MemoryUse(x)`` are in the same loop, the use can't 111be hoisted outside alone). 112 113Another useful way of looking at it is in terms of memory versions. 114In that view, operands of a given ``MemoryAccess`` are the version 115of the entire memory before the operation, and if the access produces 116a value (i.e. ``MemoryDef/MemoryPhi``), 117the value is the new version of the memory after the operation. 118 119.. code-block:: llvm 120 121 define void @foo() { 122 entry: 123 %p1 = alloca i8 124 %p2 = alloca i8 125 %p3 = alloca i8 126 ; 1 = MemoryDef(liveOnEntry) 127 store i8 0, i8* %p3 128 br label %while.cond 129 130 while.cond: 131 ; 6 = MemoryPhi({entry,1},{if.end,4}) 132 br i1 undef, label %if.then, label %if.else 133 134 if.then: 135 ; 2 = MemoryDef(6) 136 store i8 0, i8* %p1 137 br label %if.end 138 139 if.else: 140 ; 3 = MemoryDef(6) 141 store i8 1, i8* %p2 142 br label %if.end 143 144 if.end: 145 ; 5 = MemoryPhi({if.then,2},{if.else,3}) 146 ; MemoryUse(5) 147 %1 = load i8, i8* %p1 148 ; 4 = MemoryDef(5) 149 store i8 2, i8* %p2 150 ; MemoryUse(1) 151 %2 = load i8, i8* %p3 152 br label %while.cond 153 } 154 155The ``MemorySSA`` IR is shown in comments that precede the instructions they map 156to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)`` 157is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM 158instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this 159particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8* 160%p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM 161Instruction, so the line directly below a ``MemoryPhi`` isn't special. 162 163Going from the top down: 164 165- ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering 166 ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This 167 ``MemoryPhi`` is referred to in the textual IR by the number ``6``. 168- ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition, 169 and its reaching definition before it is ``6``, or the ``MemoryPhi`` after 170 ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_ 171 sections below for why this ``MemoryDef`` isn't linked to a separate, 172 disambiguated ``MemoryPhi``.) 173- ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its 174 reaching definition is also ``6``. 175- ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before 176 this block could either be ``2`` or ``3``. 177- ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that 178 it's clobbered by ``5``. 179- ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; its 180 reaching definition is ``5``. 181- ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory, 182 and the last thing that could clobber this use is above ``while.cond`` (e.g. 183 the store to ``%p3``). In memory versioning parlance, it really only depends on 184 the memory version 1, and is unaffected by the new memory versions generated since 185 then. 186 187As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not 188meant to interact with LLVM IR. 189 190Design of MemorySSA 191=================== 192 193``MemorySSA`` is an analysis that can be built for any arbitrary function. When 194it's built, it does a pass over the function's IR in order to build up its 195mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things 196like the dominance relation between ``MemoryAccess``\ es, and get the 197``MemoryAccess`` for any given ``Instruction`` . 198 199When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker`` 200that you can use (see below). 201 202 203The walker 204---------- 205 206A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or 207the walker, for short. The goal of the walker is to provide answers to clobber 208queries beyond what's represented directly by ``MemoryAccess``\ es. For example, 209given: 210 211.. code-block:: llvm 212 213 define void @foo() { 214 %a = alloca i8 215 %b = alloca i8 216 217 ; 1 = MemoryDef(liveOnEntry) 218 store i8 0, i8* %a 219 ; 2 = MemoryDef(1) 220 store i8 0, i8* %b 221 } 222 223The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would 224be the walker's goal to figure this out, and return ``liveOnEntry`` when queried 225for the clobber of ``MemoryAccess`` ``2``. 226 227By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s 228and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to 229be using. Walkers were built to be flexible, though, so it's entirely reasonable 230(and expected) to create more specialized walkers (e.g. one that specifically 231queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc). 232 233Default walker APIs 234^^^^^^^^^^^^^^^^^^^ 235 236There are two main APIs used to retrieve the clobbering access using the walker: 237 238- ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA);`` return the 239 clobbering memory access for ``MA``, caching all intermediate results 240 computed along the way as part of each access queried. 241 242- ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA, const MemoryLocation &Loc);`` 243 returns the access clobbering memory location ``Loc``, starting at ``MA``. 244 Because this API does not request the clobbering access of a specific memory 245 access, there are no results that can be cached. 246 247Locating clobbers yourself 248^^^^^^^^^^^^^^^^^^^^^^^^^^ 249 250If you choose to make your own walker, you can find the clobber for a 251``MemoryAccess`` by walking every ``MemoryDef`` that dominates said 252``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple; 253they ultimately form a linked list of every clobber that dominates the 254``MemoryAccess`` that you're trying to optimize. In other words, the 255``definingAccess`` of a ``MemoryDef`` is always the nearest dominating 256``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``. 257 258 259Build-time use optimization 260--------------------------- 261 262``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time. 263Specifically, we optimize the operand of every ``MemoryUse`` to point to the 264actual clobber of said ``MemoryUse``. This can be seen in the above example; the 265second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a 266``MemoryDef`` from the entry block. This is done to make walking, 267value numbering, etc, faster and easier. 268 269It is not possible to optimize ``MemoryDef`` in the same way, as we 270restrict ``MemorySSA`` to one memory variable and, thus, one Phi node 271per block. 272 273 274Invalidation and updating 275------------------------- 276 277Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever 278the IR is updated. "Update", in this case, includes the addition, deletion, and 279motion of ``Instructions``. The update API is being made on an as-needed basis. 280If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API. 281 282 283Phi placement 284^^^^^^^^^^^^^ 285 286``MemorySSA`` only places ``MemoryPhi``\ s where they're actually 287needed. That is, it is a pruned SSA form, like LLVM's SSA form. For 288example, consider: 289 290.. code-block:: llvm 291 292 define void @foo() { 293 entry: 294 %p1 = alloca i8 295 %p2 = alloca i8 296 %p3 = alloca i8 297 ; 1 = MemoryDef(liveOnEntry) 298 store i8 0, i8* %p3 299 br label %while.cond 300 301 while.cond: 302 ; 3 = MemoryPhi({%0,1},{if.end,2}) 303 br i1 undef, label %if.then, label %if.else 304 305 if.then: 306 br label %if.end 307 308 if.else: 309 br label %if.end 310 311 if.end: 312 ; MemoryUse(1) 313 %1 = load i8, i8* %p1 314 ; 2 = MemoryDef(3) 315 store i8 2, i8* %p2 316 ; MemoryUse(1) 317 %2 = load i8, i8* %p3 318 br label %while.cond 319 } 320 321Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi`` 322for ``if.end`` would be pointless, so we don't place one. So, if you need to 323place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create 324a ``MemoryPhi`` for ``if.end``. 325 326If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s 327everywhere. Because we have Walkers that are capable of optimizing above said 328phis, doing so shouldn't prohibit optimizations. 329 330 331Non-Goals 332--------- 333 334``MemorySSA`` is meant to reason about the relation between memory 335operations, and enable quicker querying. 336It isn't meant to be the single source of truth for all potential memory-related 337optimizations. Specifically, care must be taken when trying to use ``MemorySSA`` 338to reason about atomic or volatile operations, as in: 339 340.. code-block:: llvm 341 342 define i8 @foo(i8* %a) { 343 entry: 344 br i1 undef, label %if.then, label %if.end 345 346 if.then: 347 ; 1 = MemoryDef(liveOnEntry) 348 %0 = load volatile i8, i8* %a 349 br label %if.end 350 351 if.end: 352 %av = phi i8 [0, %entry], [%0, %if.then] 353 ret i8 %av 354 } 355 356Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may 357seem legal. Because it's a volatile load, though, it's not. 358 359 360Design tradeoffs 361---------------- 362 363Precision 364^^^^^^^^^ 365 366``MemorySSA`` in LLVM deliberately trades off precision for speed. 367Let us think about memory variables as if they were disjoint partitions of the 368memory (that is, if you have one variable, as above, it represents the entire 369memory, and if you have multiple variables, each one represents some 370disjoint portion of the memory) 371 372First, because alias analysis results conflict with each other, and 373each result may be what an analysis wants (IE 374TBAA may say no-alias, and something else may say must-alias), it is 375not possible to partition the memory the way every optimization wants. 376Second, some alias analysis results are not transitive (IE A noalias B, 377and B noalias C, does not mean A noalias C), so it is not possible to 378come up with a precise partitioning in all cases without variables to 379represent every pair of possible aliases. Thus, partitioning 380precisely may require introducing at least N^2 new virtual variables, 381phi nodes, etc. 382 383Each of these variables may be clobbered at multiple def sites. 384 385To give an example, if you were to split up struct fields into 386individual variables, all aliasing operations that may-def multiple struct 387fields, will may-def more than one of them. This is pretty common (calls, 388copies, field stores, etc). 389 390Experience with SSA forms for memory in other compilers has shown that 391it is simply not possible to do this precisely, and in fact, doing it 392precisely is not worth it, because now all the optimizations have to 393walk tons and tons of virtual variables and phi nodes. 394 395So we partition. At the point at which you partition, again, 396experience has shown us there is no point in partitioning to more than 397one variable. It simply generates more IR, and optimizations still 398have to query something to disambiguate further anyway. 399 400As a result, LLVM partitions to one variable. 401 402Use Optimization 403^^^^^^^^^^^^^^^^ 404 405Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one 406useful guarantee - all loads are optimized to point at the thing that 407actually clobbers them. This gives some nice properties. For example, 408for a given store, you can find all loads actually clobbered by that 409store by walking the immediate uses of the store. 410 411LLVM Developers Meeting presentations 412------------------------------------- 413 414- `2016 LLVM Developers' Meeting: G. Burgess - MemorySSA in Five Minutes <https://www.youtube.com/watch?v=bdxWmryoHak>`_. 415- `2020 LLVM Developers' Meeting: S. Baziotis & S. Moll - Finding Your Way Around the LLVM Dependence Analysis Zoo <https://www.youtube.com/watch?v=1e5y6WDbXCQ>`_ 416