1=========
2MemorySSA
3=========
4
5.. contents::
6   :local:
7
8Introduction
9============
10
11``MemorySSA`` is an analysis that allows us to cheaply reason about the
12interactions between various memory operations. Its goal is to replace
13``MemoryDependenceAnalysis`` for most (if not all) use-cases. This is because,
14unless you're very careful, use of ``MemoryDependenceAnalysis`` can easily
15result in quadratic-time algorithms in LLVM. Additionally, ``MemorySSA`` doesn't
16have as many arbitrary limits as ``MemoryDependenceAnalysis``, so you should get
17better results, too. One common use of ``MemorySSA`` is to quickly find out
18that something definitely cannot happen (for example, reason that a hoist
19out of a loop can't happen).
20
21At a high level, one of the goals of ``MemorySSA`` is to provide an SSA based
22form for memory, complete with def-use and use-def chains, which
23enables users to quickly find may-def and may-uses of memory operations.
24It can also be thought of as a way to cheaply give versions to the complete
25state of memory, and associate memory operations with those versions.
26
27This document goes over how ``MemorySSA`` is structured, and some basic
28intuition on how ``MemorySSA`` works.
29
30A paper on MemorySSA (with notes about how it's implemented in GCC) `can be
31found here <http://www.airs.com/dnovillo/Papers/mem-ssa.pdf>`_. Though, it's
32relatively out-of-date; the paper references multiple memory partitions, but GCC
33eventually swapped to just using one, like we now have in LLVM.  Like
34GCC's, LLVM's MemorySSA is intraprocedural.
35
36
37MemorySSA Structure
38===================
39
40MemorySSA is a virtual IR. After it's built, ``MemorySSA`` will contain a
41structure that maps ``Instruction``\ s to ``MemoryAccess``\ es, which are
42``MemorySSA``'s parallel to LLVM ``Instruction``\ s.
43
44Each ``MemoryAccess`` can be one of three types:
45
46- ``MemoryDef``
47- ``MemoryPhi``
48- ``MemoryUse``
49
50``MemoryDef``\ s are operations which may either modify memory, or which
51introduce some kind of ordering constraints. Examples of ``MemoryDef``\ s
52include ``store``\ s, function calls, ``load``\ s with ``acquire`` (or higher)
53ordering, volatile operations, memory fences, etc. A ``MemoryDef``
54always introduces a new version of the entire memory and is linked with a single
55``MemoryDef/MemoryPhi`` which is the version of memory that the new
56version is based on. This implies that there is a *single*
57``Def`` chain that connects all the ``Def``\ s, either directly
58or indirectly. For example in:
59
60.. code-block:: llvm
61
62  b = MemoryDef(a)
63  c = MemoryDef(b)
64  d = MemoryDef(c)
65
66``d`` is connected directly with ``c`` and indirectly with ``b``.
67This means that ``d`` potentially clobbers (see below) ``c`` *or*
68``b`` *or* both. This in turn implies that without the use of `The walker`_,
69initially every ``MemoryDef`` clobbers every other ``MemoryDef``.
70
71``MemoryPhi``\ s are ``PhiNode``\ s, but for memory operations. If at any
72point we have two (or more) ``MemoryDef``\ s that could flow into a
73``BasicBlock``, the block's top ``MemoryAccess`` will be a
74``MemoryPhi``. As in LLVM IR, ``MemoryPhi``\ s don't correspond to any
75concrete operation. As such, ``BasicBlock``\ s are mapped to ``MemoryPhi``\ s
76inside ``MemorySSA``, whereas ``Instruction``\ s are mapped to ``MemoryUse``\ s
77and ``MemoryDef``\ s.
78
79Note also that in SSA, Phi nodes merge must-reach definitions (that is,
80definitions that *must* be new versions of variables). In MemorySSA, PHI nodes
81merge may-reach definitions (that is, until disambiguated, the versions that
82reach a phi node may or may not clobber a given variable).
83
84``MemoryUse``\ s are operations which use but don't modify memory. An example of
85a ``MemoryUse`` is a ``load``, or a ``readonly`` function call.
86
87Every function that exists has a special ``MemoryDef`` called ``liveOnEntry``.
88It dominates every ``MemoryAccess`` in the function that ``MemorySSA`` is being
89run on, and implies that we've hit the top of the function. It's the only
90``MemoryDef`` that maps to no ``Instruction`` in LLVM IR. Use of
91``liveOnEntry`` implies that the memory being used is either undefined or
92defined before the function begins.
93
94An example of all of this overlaid on LLVM IR (obtained by running ``opt
95-passes='print<memoryssa>' -disable-output`` on an ``.ll`` file) is below. When
96viewing this example, it may be helpful to view it in terms of clobbers.
97The operands of a given ``MemoryAccess`` are all (potential) clobbers of said
98``MemoryAccess``, and the value produced by a ``MemoryAccess`` can act as a clobber
99for other ``MemoryAccess``\ es.
100
101If a ``MemoryAccess`` is a *clobber* of another, it means that these two
102``MemoryAccess``\ es may access the same memory. For example, ``x = MemoryDef(y)``
103means that ``x`` potentially modifies memory that ``y`` modifies/constrains
104(or has modified / constrained).
105In the same manner, ``a = MemoryPhi({BB1,b},{BB2,c})`` means that
106anyone that uses ``a`` is accessing memory potentially modified / constrained
107by either ``b`` or ``c`` (or both).  And finally, ``MemoryUse(x)`` means
108that this use accesses memory that ``x`` has modified / constrained
109(as an example, think that if ``x = MemoryDef(...)``
110and ``MemoryUse(x)`` are in the same loop, the use can't
111be hoisted outside alone).
112
113Another useful way of looking at it is in terms of memory versions.
114In that view, operands of a given ``MemoryAccess`` are the version
115of the entire memory before the operation, and if the access produces
116a value (i.e. ``MemoryDef/MemoryPhi``),
117the value is the new version of the memory after the operation.
118
119.. code-block:: llvm
120
121  define void @foo() {
122  entry:
123    %p1 = alloca i8
124    %p2 = alloca i8
125    %p3 = alloca i8
126    ; 1 = MemoryDef(liveOnEntry)
127    store i8 0, i8* %p3
128    br label %while.cond
129
130  while.cond:
131    ; 6 = MemoryPhi({entry,1},{if.end,4})
132    br i1 undef, label %if.then, label %if.else
133
134  if.then:
135    ; 2 = MemoryDef(6)
136    store i8 0, i8* %p1
137    br label %if.end
138
139  if.else:
140    ; 3 = MemoryDef(6)
141    store i8 1, i8* %p2
142    br label %if.end
143
144  if.end:
145    ; 5 = MemoryPhi({if.then,2},{if.else,3})
146    ; MemoryUse(5)
147    %1 = load i8, i8* %p1
148    ; 4 = MemoryDef(5)
149    store i8 2, i8* %p2
150    ; MemoryUse(1)
151    %2 = load i8, i8* %p3
152    br label %while.cond
153  }
154
155The ``MemorySSA`` IR is shown in comments that precede the instructions they map
156to (if such an instruction exists). For example, ``1 = MemoryDef(liveOnEntry)``
157is a ``MemoryAccess`` (specifically, a ``MemoryDef``), and it describes the LLVM
158instruction ``store i8 0, i8* %p3``. Other places in ``MemorySSA`` refer to this
159particular ``MemoryDef`` as ``1`` (much like how one can refer to ``load i8, i8*
160%p1`` in LLVM with ``%1``). Again, ``MemoryPhi``\ s don't correspond to any LLVM
161Instruction, so the line directly below a ``MemoryPhi`` isn't special.
162
163Going from the top down:
164
165- ``6 = MemoryPhi({entry,1},{if.end,4})`` notes that, when entering
166  ``while.cond``, the reaching definition for it is either ``1`` or ``4``. This
167  ``MemoryPhi`` is referred to in the textual IR by the number ``6``.
168- ``2 = MemoryDef(6)`` notes that ``store i8 0, i8* %p1`` is a definition,
169  and its reaching definition before it is ``6``, or the ``MemoryPhi`` after
170  ``while.cond``. (See the `Build-time use optimization`_ and `Precision`_
171  sections below for why this ``MemoryDef`` isn't linked to a separate,
172  disambiguated ``MemoryPhi``.)
173- ``3 = MemoryDef(6)`` notes that ``store i8 0, i8* %p2`` is a definition; its
174  reaching definition is also ``6``.
175- ``5 = MemoryPhi({if.then,2},{if.else,3})`` notes that the clobber before
176  this block could either be ``2`` or ``3``.
177- ``MemoryUse(5)`` notes that ``load i8, i8* %p1`` is a use of memory, and that
178  it's clobbered by ``5``.
179- ``4 = MemoryDef(5)`` notes that ``store i8 2, i8* %p2`` is a definition; its
180  reaching definition is ``5``.
181- ``MemoryUse(1)`` notes that ``load i8, i8* %p3`` is just a user of memory,
182  and the last thing that could clobber this use is above ``while.cond`` (e.g.
183  the store to ``%p3``). In memory versioning parlance, it really only depends on
184  the memory version 1, and is unaffected by the new memory versions generated since
185  then.
186
187As an aside, ``MemoryAccess`` is a ``Value`` mostly for convenience; it's not
188meant to interact with LLVM IR.
189
190Design of MemorySSA
191===================
192
193``MemorySSA`` is an analysis that can be built for any arbitrary function. When
194it's built, it does a pass over the function's IR in order to build up its
195mapping of ``MemoryAccess``\ es. You can then query ``MemorySSA`` for things
196like the dominance relation between ``MemoryAccess``\ es, and get the
197``MemoryAccess`` for any given ``Instruction`` .
198
199When ``MemorySSA`` is done building, it also hands you a ``MemorySSAWalker``
200that you can use (see below).
201
202
203The walker
204----------
205
206A structure that helps ``MemorySSA`` do its job is the ``MemorySSAWalker``, or
207the walker, for short. The goal of the walker is to provide answers to clobber
208queries beyond what's represented directly by ``MemoryAccess``\ es. For example,
209given:
210
211.. code-block:: llvm
212
213  define void @foo() {
214    %a = alloca i8
215    %b = alloca i8
216
217    ; 1 = MemoryDef(liveOnEntry)
218    store i8 0, i8* %a
219    ; 2 = MemoryDef(1)
220    store i8 0, i8* %b
221  }
222
223The store to ``%a`` is clearly not a clobber for the store to ``%b``. It would
224be the walker's goal to figure this out, and return ``liveOnEntry`` when queried
225for the clobber of ``MemoryAccess`` ``2``.
226
227By default, ``MemorySSA`` provides a walker that can optimize ``MemoryDef``\ s
228and ``MemoryUse``\ s by consulting whatever alias analysis stack you happen to
229be using. Walkers were built to be flexible, though, so it's entirely reasonable
230(and expected) to create more specialized walkers (e.g. one that specifically
231queries ``GlobalsAA``, one that always stops at ``MemoryPhi`` nodes, etc).
232
233Default walker APIs
234^^^^^^^^^^^^^^^^^^^
235
236There are two main APIs used to retrieve the clobbering access using the walker:
237
238-  ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA);`` return the
239   clobbering memory access for ``MA``, caching all intermediate results
240   computed along the way as part of each access queried.
241
242-  ``MemoryAccess *getClobberingMemoryAccess(MemoryAccess *MA, const MemoryLocation &Loc);``
243   returns the access clobbering memory location ``Loc``, starting at ``MA``.
244   Because this API does not request the clobbering access of a specific memory
245   access, there are no results that can be cached.
246
247Locating clobbers yourself
248^^^^^^^^^^^^^^^^^^^^^^^^^^
249
250If you choose to make your own walker, you can find the clobber for a
251``MemoryAccess`` by walking every ``MemoryDef`` that dominates said
252``MemoryAccess``. The structure of ``MemoryDef``\ s makes this relatively simple;
253they ultimately form a linked list of every clobber that dominates the
254``MemoryAccess`` that you're trying to optimize. In other words, the
255``definingAccess`` of a ``MemoryDef`` is always the nearest dominating
256``MemoryDef`` or ``MemoryPhi`` of said ``MemoryDef``.
257
258
259Build-time use optimization
260---------------------------
261
262``MemorySSA`` will optimize some ``MemoryAccess``\ es at build-time.
263Specifically, we optimize the operand of every ``MemoryUse`` to point to the
264actual clobber of said ``MemoryUse``. This can be seen in the above example; the
265second ``MemoryUse`` in ``if.end`` has an operand of ``1``, which is a
266``MemoryDef`` from the entry block.  This is done to make walking,
267value numbering, etc, faster and easier.
268
269It is not possible to optimize ``MemoryDef`` in the same way, as we
270restrict ``MemorySSA`` to one memory variable and, thus, one Phi node
271per block.
272
273
274Invalidation and updating
275-------------------------
276
277Because ``MemorySSA`` keeps track of LLVM IR, it needs to be updated whenever
278the IR is updated. "Update", in this case, includes the addition, deletion, and
279motion of ``Instructions``. The update API is being made on an as-needed basis.
280If you'd like examples, ``GVNHoist`` is a user of ``MemorySSA``\ s update API.
281
282
283Phi placement
284^^^^^^^^^^^^^
285
286``MemorySSA`` only places ``MemoryPhi``\ s where they're actually
287needed. That is, it is a pruned SSA form, like LLVM's SSA form.  For
288example, consider:
289
290.. code-block:: llvm
291
292  define void @foo() {
293  entry:
294    %p1 = alloca i8
295    %p2 = alloca i8
296    %p3 = alloca i8
297    ; 1 = MemoryDef(liveOnEntry)
298    store i8 0, i8* %p3
299    br label %while.cond
300
301  while.cond:
302    ; 3 = MemoryPhi({%0,1},{if.end,2})
303    br i1 undef, label %if.then, label %if.else
304
305  if.then:
306    br label %if.end
307
308  if.else:
309    br label %if.end
310
311  if.end:
312    ; MemoryUse(1)
313    %1 = load i8, i8* %p1
314    ; 2 = MemoryDef(3)
315    store i8 2, i8* %p2
316    ; MemoryUse(1)
317    %2 = load i8, i8* %p3
318    br label %while.cond
319  }
320
321Because we removed the stores from ``if.then`` and ``if.else``, a ``MemoryPhi``
322for ``if.end`` would be pointless, so we don't place one. So, if you need to
323place a ``MemoryDef`` in ``if.then`` or ``if.else``, you'll need to also create
324a ``MemoryPhi`` for ``if.end``.
325
326If it turns out that this is a large burden, we can just place ``MemoryPhi``\ s
327everywhere. Because we have Walkers that are capable of optimizing above said
328phis, doing so shouldn't prohibit optimizations.
329
330
331Non-Goals
332---------
333
334``MemorySSA`` is meant to reason about the relation between memory
335operations, and enable quicker querying.
336It isn't meant to be the single source of truth for all potential memory-related
337optimizations. Specifically, care must be taken when trying to use ``MemorySSA``
338to reason about atomic or volatile operations, as in:
339
340.. code-block:: llvm
341
342  define i8 @foo(i8* %a) {
343  entry:
344    br i1 undef, label %if.then, label %if.end
345
346  if.then:
347    ; 1 = MemoryDef(liveOnEntry)
348    %0 = load volatile i8, i8* %a
349    br label %if.end
350
351  if.end:
352    %av = phi i8 [0, %entry], [%0, %if.then]
353    ret i8 %av
354  }
355
356Going solely by ``MemorySSA``'s analysis, hoisting the ``load`` to ``entry`` may
357seem legal. Because it's a volatile load, though, it's not.
358
359
360Design tradeoffs
361----------------
362
363Precision
364^^^^^^^^^
365
366``MemorySSA`` in LLVM deliberately trades off precision for speed.
367Let us think about memory variables as if they were disjoint partitions of the
368memory (that is, if you have one variable, as above, it represents the entire
369memory, and if you have multiple variables, each one represents some
370disjoint portion of the memory)
371
372First, because alias analysis results conflict with each other, and
373each result may be what an analysis wants (IE
374TBAA may say no-alias, and something else may say must-alias), it is
375not possible to partition the memory the way every optimization wants.
376Second, some alias analysis results are not transitive (IE A noalias B,
377and B noalias C, does not mean A noalias C), so it is not possible to
378come up with a precise partitioning in all cases without variables to
379represent every pair of possible aliases.  Thus, partitioning
380precisely may require introducing at least N^2 new virtual variables,
381phi nodes, etc.
382
383Each of these variables may be clobbered at multiple def sites.
384
385To give an example, if you were to split up struct fields into
386individual variables, all aliasing operations that may-def multiple struct
387fields, will may-def more than one of them.  This is pretty common (calls,
388copies, field stores, etc).
389
390Experience with SSA forms for memory in other compilers has shown that
391it is simply not possible to do this precisely, and in fact, doing it
392precisely is not worth it, because now all the optimizations have to
393walk tons and tons of virtual variables and phi nodes.
394
395So we partition.  At the point at which you partition, again,
396experience has shown us there is no point in partitioning to more than
397one variable.  It simply generates more IR, and optimizations still
398have to query something to disambiguate further anyway.
399
400As a result, LLVM partitions to one variable.
401
402Use Optimization
403^^^^^^^^^^^^^^^^
404
405Unlike other partitioned forms, LLVM's ``MemorySSA`` does make one
406useful guarantee - all loads are optimized to point at the thing that
407actually clobbers them. This gives some nice properties.  For example,
408for a given store, you can find all loads actually clobbered by that
409store by walking the immediate uses of the store.
410
411LLVM Developers Meeting presentations
412-------------------------------------
413
414- `2016 LLVM Developers' Meeting: G. Burgess - MemorySSA in Five Minutes <https://www.youtube.com/watch?v=bdxWmryoHak>`_.
415- `2020 LLVM Developers' Meeting: S. Baziotis & S. Moll - Finding Your Way Around the LLVM Dependence Analysis Zoo <https://www.youtube.com/watch?v=1e5y6WDbXCQ>`_
416