1# Bufferization 2 3[TOC] 4 5## Overview 6 7Bufferization in MLIR is the process of converting the `tensor` type to the 8`memref` type. MLIR provides a composable system that allows dialects to 9systematically bufferize a program. This system is a simple application 10of MLIR's [dialect conversion](DialectConversion.md) infrastructure. The bulk of 11the code related to bufferization is a set of ordinary `ConversionPattern`'s 12that dialect authors write for converting ops that operate on `tensor`'s to ops 13that operate on `memref`'s. A set of conventions and best practices are followed 14that allow these patterns to be run across multiple independent passes (rather 15than requiring a single huge atomic conversion pass), which makes the 16compilation pipelines scalable, robust, and easy to debug. 17 18This document is targeted at people looking to utilize MLIR's bufferization 19functionality, along with people who want to extend it to cover their own ops. 20 21<a name="the-talk">**NOTE:**</a> Before reading this document, please watch the 22talk "Type Conversions the Not-So-Hard-Way: MLIR's New Bufferization 23Infrastructure" 24([slides](https://drive.google.com/file/d/1FVbzCXxZzS9LBLuvpPNLWJD-XDkt54ky/view?usp=sharing), 25[recording](https://drive.google.com/file/d/1VfVajitgf8ZPnd-HRkJvaJiFLhBsluXN/view?usp=sharing)). 26That talk gives a high-level overview of the bufferization infrastructure and 27important conceptual details related to using the MLIR dialect conversion 28infrastructure. 29 30## Bufferization's place in a compilation pipeline 31 32Bufferization itself does not free any of the buffers that have been allocated, 33nor does it do anything particularly intelligent with the placement of buffers 34w.r.t. control flow. Thus, a realistic compilation pipeline will usually consist 35of: 36 371. Bufferization 381. Buffer optimizations such as `buffer-hoisting`, `buffer-loop-hoisting`, and 39 `promote-buffers-to-stack`, which do optimizations that are only exposed 40 after bufferization. 411. Finally, running the [buffer deallocation](BufferDeallocation.md) pass. 42 43After buffer deallocation has been completed, the program will be quite 44difficult to transform due to the presence of the deallocation ops. Thus, other 45optimizations such as linalg fusion on memrefs should be done before that stage. 46 47## General structure of the bufferization process 48 49Bufferization consists of running multiple _partial_ bufferization passes, 50followed by one _finalizing_ bufferization pass. 51 52There is typically one partial bufferization pass per dialect (though other 53subdivisions are possible). For example, for a dialect `X` there will typically 54be a pass `X-bufferize` that knows how to bufferize all the ops in that dialect. 55By running pass `X-bufferize` for each dialect `X` in the program, all the ops 56in the program are incrementally bufferized. 57 58Partial bufferization passes create programs where only some ops have been 59bufferized. These passes will create _materializations_ (also sometimes called 60"casts") that convert between the `tensor` and `memref` type, which allows 61bridging between ops that have been bufferized and ops that have not yet been 62bufferized. 63 64Finalizing bufferizations complete the bufferization process, and guarantee that 65there are no tensors remaining in the program. This involves eliminating the 66materializations. The pass `finalizing-bufferize` provides a minimal pass that 67only eliminates materializations and issues an error if any unbufferized ops 68exist in the program. 69 70However, it is possible for a finalizing bufferization to do more than just 71eliminate materializations. By adding patterns (just as a partial bufferization 72would), it is possible for a finalizing bufferization pass to simultaneously 73bufferize ops and eliminate materializations. This has a number of disadvantages 74discussed in the talk and should generally be avoided. 75 76### Example 77 78As a concrete example, we will look at the bufferization pipeline from the 79`mlir-npcomp` reference backend 80([code](https://github.com/llvm/mlir-npcomp/blob/97d6d04d41216e73d40b89ffd79620973fc14ce3/lib/RefBackend/RefBackend.cpp#L232)). 81The code, slightly simplified and annotated, is reproduced here: 82 83```c++ 84 // Partial bufferization passes. 85 pm.addPass(createTensorConstantBufferizePass()); 86 pm.addNestedPass<FuncOp>(createTCPBufferizePass()); // Bufferizes the downstream `tcp` dialect. 87 pm.addNestedPass<FuncOp>(createSCFBufferizePass()); 88 pm.addNestedPass<FuncOp>(createLinalgBufferizePass()); 89 pm.addNestedPass<FuncOp>(createStdBufferizePass()); 90 pm.addNestedPass<FuncOp>(createTensorBufferizePass()); 91 pm.addPass(createFuncBufferizePass()); 92 93 // Finalizing bufferization pass. 94 pm.addNestedPass<FuncOp>(createFinalizingBufferizePass()); 95``` 96 97Looking first at the partial bufferization passes, we see that there are a 98sequence of `FuncOp` passes (which run in parallel on functions). These function 99passes are bracketed by `tensor-constant-bufferize` and `func-bufferize`, which 100are module passes (and thus serialize the parallel compilation process). These 101two passes must be module passes because they make changes to the top-level 102module. 103 104The bulk of the bufferization work is done by the function passes. Most of these 105passes are provided as part of the upstream MLIR distribution and bufferize 106their respective dialects (e.g. `scf-bufferize` bufferizes the `scf` dialect). 107The `tcp-bufferize` pass is an exception -- it is a partial bufferization pass 108used to bufferize the downstream `tcp` dialect, and fits in perfectly with all 109the other passes provided upstream. 110 111The last pass is the finalizing bufferization pass. The `mlir-npcomp` reference 112backend has arranged that all ops are bufferized by partial bufferizations, so 113that the upstream `finalizing-bufferize` pass can be used as the finalizing 114bufferization pass. This gives excellent diagnostics when something goes wrong 115with the bufferization process, such as due to an op that wasn't handled by any 116pattern. 117 118## How to write a partial bufferization pass 119 120The contract of a partial bufferization pass is that a subset of ops (or kinds 121of ops, customizable by a ConversionTarget) get bufferized. 122 123A partial bufferization pass is just a pass that uses the 124[dialect conversion](DialectConversion.md) framework to apply 125`ConversionPattern`s with a `tensor` to `memref` type conversion. 126 127To describe how to write such a pass, we will walk through an example, the 128`tensor-bufferize` pass 129([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23), 130[test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Tensor/bufferize.mlir#L1)) 131that bufferizes the `tensor` dialect. 132 133The bulk of the code in the pass will be a set of conversion patterns, with a 134simple example being 135[BufferizeCastOp](https://github.com/llvm/llvm-project/blob/2bf6e443e54604c7818c4d1a1837f3d091023270/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L23)). 136 137``` 138class BufferizeCastOp : public OpConversionPattern<tensor::CastOp> { 139public: 140 using OpConversionPattern::OpConversionPattern; 141 LogicalResult 142 matchAndRewrite(tensor::CastOp op, ArrayRef<Value> operands, 143 ConversionPatternRewriter &rewriter) const override { 144 auto resultType = getTypeConverter()->convertType(op.getType()); 145 rewriter.replaceOpWithNewOp<MemRefCastOp>(op, resultType, operands[0]); 146 return success(); 147 } 148}; 149``` 150 151See [the talk](#the-talk) for more details on how to write these patterns. 152 153The 154[pass itself](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Tensor/Transforms/Bufferize.cpp#L57) 155is very small, and follows the basic pattern of any dialect conversion pass. 156 157``` 158void mlir::populateTensorBufferizePatterns( 159 MLIRContext *context, BufferizeTypeConverter &typeConverter, 160 OwningRewritePatternList &patterns) { 161 patterns.insert<BufferizeCastOp, BufferizeExtractOp>(typeConverter, context); 162} 163 164struct TensorBufferizePass : public TensorBufferizeBase<TensorBufferizePass> { 165 void runOnFunction() override { 166 auto *context = &getContext(); 167 BufferizeTypeConverter typeConverter; 168 OwningRewritePatternList patterns; 169 ConversionTarget target(*context); 170 171 populateTensorBufferizePatterns(context, typeConverter, patterns); 172 target.addIllegalOp<tensor::CastOp, tensor::ExtractOp>(); 173 target.addLegalDialect<StandardOpsDialect>(); 174 175 if (failed( 176 applyPartialConversion(getFunction(), target, std::move(patterns)))) 177 signalPassFailure(); 178 } 179}; 180``` 181 182The pass has all the hallmarks of a dialect conversion pass that does type 183conversions: a `TypeConverter`, a `OwningRewritePatternList`, and a 184`ConversionTarget`, and a call to `applyPartialConversion`. Note that a function 185`populateTensorBufferizePatterns` is separated, so that power users can use the 186patterns independently, if necessary (such as to combine multiple sets of 187conversion patterns into a single conversion call, for performance). 188 189One convenient utility provided by the MLIR bufferization infrastructure is the 190`BufferizeTypeConverter`, which comes pre-loaded with the necessary conversions 191and materializations between `tensor` and `memref`. 192 193In this case, the `StandardOpsDialect` is marked as legal, so the `tensor_load` 194and `tensor_to_memref` ops, which are inserted automatically by the dialect 195conversion framework as materializations, are legal. There is a helper 196`populateBufferizeMaterializationLegality` 197([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L53)) 198which helps with this in general. 199 200### Other partial bufferization examples 201 202- `linalg-bufferize` 203 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L1), 204 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Linalg/bufferize.mlir#L1)) 205 206 - Bufferizes the `linalg` dialect. 207 - This is an example of how to simultaneously bufferize all the ops that 208 satisfy a certain OpInterface with a single pattern. Specifically, 209 `BufferizeAnyLinalgOp` 210 ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/Linalg/Transforms/Bufferize.cpp#L170)) 211 bufferizes any ops that implements the `LinalgOp` interface. 212 213- `scf-bufferize` 214 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/SCF/Transforms/Bufferize.cpp#L1), 215 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/SCF/bufferize.mlir#L1)) 216 217 - Bufferizes ops from the `scf` dialect. 218 - This is an example of how to bufferize ops that implement 219 `RegionBranchOpInterface` (that is, they use regions to represent control 220 flow). 221 - The bulk of the work is done by 222 `lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp` 223 ([code](https://github.com/llvm/llvm-project/blob/daaaed6bb89044ac58a23f1bb1ccdd12342a5a58/mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp#L1)), 224 which is well-commented and covers how to correctly convert ops that contain 225 regions. 226 227- `func-bufferize` 228 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/FuncBufferize.cpp#L1), 229 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/func-bufferize.mlir#L1)) 230 231 - Bufferizes `func`, `call`, and `BranchOpInterface` ops. 232 - This is an example of how to bufferize ops that have multi-block regions. 233 - This is an example of a pass that is not split along dialect subdivisions. 234 235- `tensor-constant-bufferize` 236 ([code](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/lib/Dialect/StandardOps/Transforms/TensorConstantBufferize.cpp#L1), 237 [test](https://github.com/llvm/llvm-project/blob/bc8acf2ce8ad6e8c9b1d97b2e02d3f4ad26e1d9d/mlir/test/Dialect/Standard/tensor-constant-bufferize.mlir#L1)) 238 - Bufferizes only `std.constant` ops of `tensor` type. 239 - This is an example of setting up the legality so that only a subset of 240 `std.constant` ops get bufferized. 241 - This is an example of a pass that is not split along dialect subdivisions. 242 243## How to write a finalizing bufferization pass 244 245The contract of a finalizing bufferization pass is that all tensors are gone 246from the program. 247 248The easiest way to write a finalizing bufferize pass is to not write one at all! 249MLIR provides a pass `finalizing-bufferize` which eliminates the `tensor_load` / 250`tensor_to_memref` materialization ops inserted by partial bufferization passes 251and emits an error if that is not sufficient to remove all tensors from the 252program. 253 254This pass is sufficient when partial bufferization passes have bufferized all 255the ops in the program, leaving behind only the materializations. When possible, 256it is recommended to structure your pass pipeline this way, as this has the 257significant advantage that if an op does not get bufferized (due to a missing 258pattern, bug in the code, etc.), `finalizing-bufferize` will emit a nice clean 259error, and the IR seen by `finalizing-bufferize` will only contain only one 260unbufferized op. 261 262However, before the current bufferization infrastructure was put in place, 263bufferization could only be done as a single finalizing bufferization 264mega-pass that used the `populate*BufferizePatterns` functions from multiple 265dialects to simultaneously bufferize everything at once. Thus, one might see 266code in downstream projects structured this way. This structure is not 267recommended in new code. A helper, 268`populateEliminateBufferizeMaterializationsPatterns` 269([code](https://github.com/llvm/llvm-project/blob/a0b65a7bcd6065688189b3d678c42ed6af9603db/mlir/include/mlir/Transforms/Bufferize.h#L58)) 270is available for such passes to provide patterns that eliminate `tensor_load` 271and `tensor_to_memref`. 272 273## Changes since [the talk](#the-talk) 274 275- `func-bufferize` was changed to be a partial conversion pass, and there is a 276 new `finalizing-bufferize` which serves as a general finalizing bufferization 277 pass. 278