1# Bufferization
2
3[TOC]
4
5## Overview
6
7Bufferization in MLIR is the process of converting the `tensor` type to the
8`memref` type. MLIR provides a composable system that allows dialects to
9systematically bufferize a program. This system is a simple application
10of MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of
11the code related to bufferization is a set of ordinary `ConversionPattern`'s
12that dialect authors write for converting ops that operate on `tensor`'s to ops
13that operate on `memref`'s. A set of conventions and best practices are followed
14that allow these patterns to be run across multiple independent passes (rather
15than requiring a single huge atomic conversion pass), which makes the
16compilation pipelines scalable, robust, and easy to debug.
17
18This document is targeted at people looking to utilize MLIR's bufferization
19functionality, along with people who want to extend it to cover their own ops.
20
21<a name="the-talk">**NOTE:**</a> Before reading this document, please watch the
22talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization
23Infrastructure"
24([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing),
25[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)).
26That talk gives a high-level overview of the bufferization infrastructure and
27important conceptual details related to using the MLIR dialect conversion
28infrastructure.
29
30## Bufferization's place in a compilation pipeline
31
32Bufferization itself does not free any of the buffers that have been allocated,
33nor does it do anything particularly intelligent with the placement of buffers
34w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist
35of:
36
371. Bufferization
381. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and
39   `promote-buffers-to-stack`, which do optimizations that are only exposed
40   after bufferization.
411. Finally, running the [buffer deallocation](BufferDeallocation.md) pass.
42
43After buffer deallocation has been completed, the program will be quite
44difficult to transform due to the presence of the deallocation ops. Thus, other
45optimizations such as linalg fusion on memrefs should be done before that stage.
46
47## General structure of the bufferization process
48
49Bufferization consists of running multiple _partial_ bufferization passes,
50followed by one _finalizing_ bufferization pass.
51
52There is typically one partial bufferization pass per dialect (though other
53subdivisions are possible). For example, for a dialect `X` there will typically
54be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect.
55By running pass `X-bufferize` for each dialect `X` in the program, all the ops
56in the program are incrementally bufferized.
57
58Partial bufferization passes create programs where only some ops have been
59bufferized. These passes will create _materializations_ (also sometimes called
60"casts") that convert between the `tensor` and `memref` type, which allows
61bridging between ops that have been bufferized and ops that have not yet been
62bufferized.
63
64Finalizing bufferizations complete the bufferization process, and guarantee that
65there are no tensors remaining in the program. This involves eliminating the
66materializations. The pass `finalizing-bufferize` provides a minimal pass that
67only eliminates materializations and issues an error if any unbufferized ops
68exist in the program.
69
70However, it is possible for a finalizing bufferization to do more than just
71eliminate materializations. By adding patterns (just as a partial bufferization
72would), it is possible for a finalizing bufferization pass to simultaneously
73bufferize ops and eliminate materializations. This has a number of disadvantages
74discussed in the talk and should generally be avoided.
75
76### Example
77
78As a concrete example, we will look at the bufferization pipeline from the
79`mlir-npcomp` reference backend
80([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)).
81The code, slightly simplified and annotated, is reproduced here:
82
83```c++
84  // Partial bufferization passes.
85  pm.addPass(createTensorConstantBufferizePass());
86  pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect.
87  pm.addNestedPass<FuncOp>(createSCFBufferizePass());
88  pm.addNestedPass<FuncOp>(createLinalgBufferizePass());
89  pm.addNestedPass<FuncOp>(createStdBufferizePass());
90  pm.addNestedPass<FuncOp>(createTensorBufferizePass());
91  pm.addPass(createFuncBufferizePass());
92
93  // Finalizing bufferization pass.
94  pm.addNestedPass<FuncOp>(createFinalizingBufferizePass());
95```
96
97Looking first at the partial bufferization passes, we see that there are a
98sequence of `FuncOp` passes (which run in parallel on functions). These function
99passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which
100are module passes (and thus serialize the parallel compilation process). These
101two passes must be module passes because they make changes to the top-level
102module.
103
104The bulk of the bufferization work is done by the function passes. Most of these
105passes are provided as part of the upstream MLIR distribution and bufferize
106their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect).
107The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass
108used to bufferize the downstream `tcp` dialect, and fits in perfectly with all
109the other passes provided upstream.
110
111The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference
112backend has arranged that all ops are bufferized by partial bufferizations, so
113that the upstream `finalizing-bufferize` pass can be used as the finalizing
114bufferization pass. This gives excellent diagnostics when something goes wrong
115with the bufferization process, such as due to an op that wasn't handled by any
116pattern.
117
118## How to write a partial bufferization pass
119
120The contract of a partial bufferization pass is that a subset of ops (or kinds
121of ops, customizable by a ConversionTarget) get bufferized.
122
123A partial bufferization pass is just a pass that uses the
124[dialect conversion](DialectConversion.md) framework to apply
125`ConversionPattern`s with a `tensor` to `memref` type conversion.
126
127To describe how to write such a pass, we will walk through an example, the
128`tensor-bufferize` pass
129([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23),
130[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1))
131that bufferizes the `tensor` dialect.
132
133The bulk of the code in the pass will be a set of conversion patterns, with a
134simple example being
135[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)).
136
137```
138class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> {
139public:
140  using OpConversionPattern::OpConversionPattern;
141  LogicalResult
142  matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands,
143                  ConversionPatternRewriter &rewriter) const override {
144    auto resultType = getTypeConverter()->convertType(op.getType());
145    rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]);
146    return success();
147  }
148};
149```
150
151See [the talk](#the-talk) for more details on how to write these patterns.
152
153The
154[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57)
155is very small, and follows the basic pattern of any dialect conversion pass.
156
157```
158void mlir::populateTensorBufferizePatterns(
159    MLIRContext *context, BufferizeTypeConverter &typeConverter,
160    OwningRewritePatternList &patterns) {
161  patterns.insert<BufferizeCastOp, BufferizeExtractOp>(typeConverter, context);
162}
163
164struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> {
165  void runOnFunction() override {
166    auto *context = &getContext();
167    BufferizeTypeConverter typeConverter;
168    OwningRewritePatternList patterns;
169    ConversionTarget target(*context);
170
171    populateTensorBufferizePatterns(context, typeConverter, patterns);
172    target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>();
173    target.addLegalDialect<StandardOpsDialect>();
174
175    if (failed(
176            applyPartialConversion(getFunction(), target, std::move(patterns))))
177      signalPassFailure();
178  }
179};
180```
181
182The pass has all the hallmarks of a dialect conversion pass that does type
183conversions: a `TypeConverter`, a `OwningRewritePatternList`, and a
184`ConversionTarget`, and a call to `applyPartialConversion`. Note that a function
185`populateTensorBufferizePatterns` is separated, so that power users can use the
186patterns independently, if necessary (such as to combine multiple sets of
187conversion patterns into a single conversion call, for performance).
188
189One convenient utility provided by the MLIR bufferization infrastructure is the
190`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions
191and materializations between `tensor` and `memref`.
192
193In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load`
194and `tensor_to_memref` ops, which are inserted automatically by the dialect
195conversion framework as materializations, are legal. There is a helper
196`populateBufferizeMaterializationLegality`
197([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53))
198which helps with this in general.
199
200### Other partial bufferization examples
201
202- `linalg-bufferize`
203  ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1),
204  [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1))
205
206  - Bufferizes the `linalg` dialect.
207  - This is an example of how to simultaneously bufferize all the ops that
208    satisfy a certain OpInterface with a single pattern. Specifically,
209    `BufferizeAnyLinalgOp`
210    ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170))
211    bufferizes any ops that implements the `LinalgOp` interface.
212
213- `scf-bufferize`
214  ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1),
215  [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1))
216
217  - Bufferizes ops from the `scf` dialect.
218  - This is an example of how to bufferize ops that implement
219    `RegionBranchOpInterface` (that is, they use regions to represent control
220    flow).
221  - The bulk of the work is done by
222    `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp`
223    ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)),
224    which is well-commented and covers how to correctly convert ops that contain
225    regions.
226
227- `func-bufferize`
228  ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1),
229  [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1))
230
231  - Bufferizes `func`, `call`, and `BranchOpInterface` ops.
232  - This is an example of how to bufferize ops that have multi-block regions.
233  - This is an example of a pass that is not split along dialect subdivisions.
234
235- `tensor-constant-bufferize`
236  ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1),
237  [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1))
238  - Bufferizes only `std.constant` ops of `tensor` type.
239  - This is an example of setting up the legality so that only a subset of
240    `std.constant` ops get bufferized.
241  - This is an example of a pass that is not split along dialect subdivisions.
242
243## How to write a finalizing bufferization pass
244
245The contract of a finalizing bufferization pass is that all tensors are gone
246from the program.
247
248The easiest way to write a finalizing bufferize pass is to not write one at all!
249MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` /
250`tensor_to_memref` materialization ops inserted by partial bufferization passes
251and emits an error if that is not sufficient to remove all tensors from the
252program.
253
254This pass is sufficient when partial bufferization passes have bufferized all
255the ops in the program, leaving behind only the materializations. When possible,
256it is recommended to structure your pass pipeline this way, as this has the
257significant advantage that if an op does not get bufferized (due to a missing
258pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean
259error, and the IR seen by `finalizing-bufferize` will only contain only one
260unbufferized op.
261
262However, before the current bufferization infrastructure was put in place,
263bufferization could only be done as a single finalizing bufferization
264mega-pass that used the `populate*BufferizePatterns` functions from multiple
265dialects to simultaneously bufferize everything at once. Thus, one might see
266code in downstream projects structured this way. This structure is not
267recommended in new code. A helper,
268`populateEliminateBufferizeMaterializationsPatterns`
269([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58))
270is available for such passes to provide patterns that eliminate `tensor_load`
271and `tensor_to_memref`.
272
273## Changes since [the talk](#the-talk)
274
275- `func-bufferize` was changed to be a partial conversion pass, and there is a
276  new `finalizing-bufferize` which serves as a general finalizing bufferization
277  pass.
278