1# MLIR: Incremental Application to Graph Algorithms in ML Frameworks
2
3The existing documentation about MLIR focuses on long term vision, how its
4pieces fit together, and the benefits of modular and composable infrastructure
5in the vast and distant future. While this viewpoint appeals to some, it causes
6concern for others who are more concerned about the "here and now" - why does it
7make sense to make a "revolutionary" change when any individual problem can be
8fixed in place?
9
10This document explains that adoption of MLIR to solve graph based problems
11_isn't_ a revolutionary change: it is an incremental series of steps which build
12on each other, each of which delivers local value. This document also addresses
13some points of confusion that keep coming up.
14
15One note: even though a major advantage of MLIR is that it can span the full
16spectrum from graph algorithms down to low-level code generation, this document
17focuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock
18exciting code generation opportunities (particularly given its novel approach to
19integrating state of the art polyhedral techniques), but issues that touch on
20MLIR's relationship to XLA, Eigen, etc, are out of scope for this particular
21doc.
22
23This document uses TensorFlow as the example given that it is the focus of our
24immediate work, but we believe that the same viewpoint could be useful for
25people working in the context of other ML frameworks that may consider adopting
26MLIR in the future.
27
28### How is MLIR relevant?
29
30MLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate
31Representation". Its high-level purpose is to provide mechanics for describing
32and transforming programs and computations in a flexible way. It provides common
33compiler infrastructure for things like constant folding, dead code elimination,
34graph rewriting, and others - which are independent of the representational
35choices picked by a given dialect (e.g. its concurrency semantics). It was built
36with a specific focus on compile time and memory efficiency, accurate
37propagation of source location information (important for reporting high quality
38errors and warnings) and is designed for testability.
39
40TensorFlow has numerous subsystems (some of which are proprietary, e.g.
41Tensor-RT, nGraph, CoreML, etc) as well as translation layers between these
42different subsystems, and these translation layers face similar challenges. ((As
43an aside, the internals of each of these subsystems could often benefit from
44MLIR infrastructure, but that isn't a focus of this doc.))
45
46A key observation that MLIR makes is that these subsystems often have two things
47going on: they are both particular data structures and encodings (e.g. HLO
48graphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX
49abstraction, etc) as well as an abstraction of computation (a specific way of
50modeling a convolution, a set of supported operations etc).
51
52MLIR uses a standard IR (i.e., a set of data structures) for representing these
53computations - this allows a huge amount of shared infrastructure across these
54problem domains. MLIR then allows the definition of domain-specific "dialects"
55that describe the set of operations that are legal and supported for a given
56application. This means that the actual translations between data structures are
57kept as simple as possible - and are thus relatively easy to make "correct".
58This allows the common compiler infrastructure to handle the mapping problems
59and the other issues within the domain.
60
61MLIR's design is directly informed by the experience of building (and then
62living with) intermediate representations like the LLVM IR, LLVM SelectionDAG,
63the LLVM machine instruction representation, Swift SIL IR, and learns new
64lessons from TensorFlow and XLA HLO, as well as learning from building countless
65research and production systems on top of them. Our goal is to drag the state of
66the art in compilers forward, not to merely apply a few well-known techniques to
67the machine learning domain.
68
69### What does adoption mean?
70
71The point of this document is not to advocate for rewriting any particular
72subsystem in TensorFlow - indeed, the burden required to justify a rewrite is
73high, and often very specific to that subsystem. That said, there are several
74subsystems that are about to get rewritten or substantially revised anyway, so
75we use those as examples to concretely describe the benefits that MLIR provides
76in these cases and what it will take. The subsystems discussed are:
77
781.  the TF Lite TOCO translator, which we need to improve error
79    reporting/reliability issues and generalize it to support more ops, and
801.  the TF/XLA bridge which needs to improve usability by merging some of its
81    usage models, support dynamic shapes and generalize guest subsystem support
82    to Tensor-RT and nGraph.
831.  Grappler is another subsystem that is likely to get substantial revisions in
84    the future, and would definitely benefit from the MLIR framework, but there
85    are no known plans to do that work at this point, so we don't discuss it
86    further.
87
88Adopting MLIR for these works the same way - and, in fact, the work to support
89TF Lite is mostly a subset of the larger work to support the functionality of
90the TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes
91(things like encapsulate, functionalize control flow, lowering of ops, fusion,
92constant folding, shape inference, etc).
93
94MLIR supports converting from TensorFlow Graphs to MLIR and back, which means
95that we can start by putting in a no-op translation to MLIR and back into the
96pipeline, and verify that nothing breaks. Then we can work on replacing the
97compiler transformations one by one by reimplementing them (with the improved
98algorithms that we're planning).
99
100This is a development plan, we wouldn't actually ship a TensorFlow that just
101uses MLIR for a single pass. In practice, we'll have the MLIR flag gated under
102an option, build out a replacement for an entire subsystem (e.g. the TOCO
103translator) and when the time is right, we'll do A/B comparisons and eventually
104make a switch and phase out the old code over time.
105
106## What benefit does MLIR provide?
107
108The adoption plan above might sound like it only makes things worse in the
109immediate term - we have two implementations of the same functionality, we are
110dividing our efforts, etc. In order for this to be worth it, we should have a
111good sense that we are building towards an improved future that will make
112customers and TensorFlow engineers happier when it lands. Here we describe a few
113of the benefits that MLIR provides, in no particular order:
114
115### A Lossless Human Editable Textual Representation
116
117The MLIR in-memory data structure has a human readable and writable format, as
118well as [a specification](LangRef.md) for that format - built just like any
119other programming language. Important properties of this format are that it is
120compact, easy to read, and lossless. You can dump an MLIR program out to disk
121and munge around with it, then send it through a few more passes.
122
123If you haven't worked with a system that works this way, it is hard to overstate
124how big of a deal this in practice: it means that you can call `foo->dump()` on
125an IR object to see its full contents, it means you can diff the IR before and
126after a change, delta reduce IR files, and many other things.
127
128### A Graph Verification Pass
129
130Like many other popular compiler infrastructures, MLIR provides infrastructure
131and implementation for a "verifier" which checks that the IR is well formed. The
132MLIR verifier is a simple framework that makes it easy to provide a single
133source of truth for those correctness properties and is general across all
134Dialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc).
135
136A verifier pass is sort of like a 'super assertion' that catches mistakes in
137program transformations early, making you as an engineer more productive, making
138the product more reliable, and making it easier to track down bugs when they
139appear - because the verifier can be run at any time, either as a compiler pass
140or with a single function call.
141
142While MLIR provides a well-considered infrastructure for IR verification, and
143has simple checks for existing TensorFlow operations, there is a lot that should
144be added here and lots of opportunity to get involved!
145
146### Designed for Testability
147
148There are many aspects of this in MLIR, but we'll focus on compiler
149transformations since they are the easiest to understand. Compiler
150transformations are modeled as subclasses of the `Pass` C++ class, which are
151driven by an `mlir-opt` tool. When combined with a lossless textual
152representation, it becomes really easy to write unit tests for compiler
153transformations, for example, this is a simple test that shows "x-x" is being
154turned into zero:
155
156```mlir
157  // RUN: mlir-opt %s -canonicalize | FileCheck %s
158  func @test_subi_zero_cfg(%arg0: i32) -> i32 {
159    %y = subi %arg0, %arg0 : i32
160    return %y: i32
161  }
162  // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32)
163  // CHECK-NEXT: %c0_i32 = constant 0 : i32
164  // CHECK-NEXT: return %c0
165```
166
167The "CHECK" comments are interpreted by the
168[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which
169is sort of like a really advanced grep. This test is fully self-contained: it
170feeds the input into the [canonicalize pass](Canonicalization.md), and checks
171that the output matches the CHECK lines. See the `test/Transforms` directory for
172more examples. In contrast, standard unit testing exposes the API of the
173underlying framework to lots and lots of tests (making it harder to refactor and
174move the API), typically requires a lot more code, and exacerbates issues with
175link time. For examples, see
176[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc).
177
178MLIR has been pervasively designed with this sort of design by testability,
179allowing us to put in place a culture that expects every behavior changing
180commit to include a test case, and for these test cases to be stable and
181reliable over time, since they are testing exactly what they are supposed to.
182End to end integration tests are still super useful for some things of course!
183
184### Infrastructure for Warnings and Error Diagnostics and Location Tracking
185
186MLIR benefits from the lessons learned from building other compilers - including
187Clang which
188[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)
189for quality of implementation in C/C++ compiler diagnostics. Drawing from this
190experience (and fixing mistakes in LLVM), MLIR requires that operations and
191functions carry abstract location information, that transformations propagate
192this information, and provides standardized mechanisms to emit errors and
193warnings, as well as for clients to hook into them to capture and report them in
194custom ways.
195
196Why is this important? In practice, many graph-to-graph translators can fail
197(e.g. TF Lite when an unsupported op is used) and it is important to be able to
198report the error up through to the user in the most precise way possible, in
199order for it to be actionable. This includes tracking rewrites through fusions
200and fissions of ops, mapping back into language / API specific domains, etc.
201
202More selfishly for infrastructure hackers, this is a huge boon because it means
203that it is easy to write good tests for this: the testing tools for MLIR capture
204the diagnostics produced by passes (using the standard diagnostic hooks) and
205check that they match the expected diagnostics in the testcase. For example, to
206test the dependence analysis infra in the code generator, Andy Davis wrote a
207simple pass that checks dependencies and emits them as "notes", allowing him to
208write tests like this:
209
210```mlir
211  // RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics
212  func @different_memrefs() {
213    %m.a = alloc() : memref<100xf32>
214    %m.b = alloc() : memref<100xf32>
215    %c0 = constant 0 : index
216    %c1 = constant 1.0 : f32
217    store %c1, %m.a[%c0] : memref<100xf32>
218    // expected-note@-1 {{dependence from memref access 0 to access 1 = false}}
219    %v0 = load %m.b[%c0] : memref<100xf32>
220    return
221  }
222```
223
224Note that a major limitation of this is that MLIR suffers from a problem of
225"garbage in, garbage out": if the input locations to MLIR are imprecise, then
226there is nothing that it can do to recover them. There is work underway in
227TensorFlow/Python to improve the situation, and Swift for TensorFlow already has
228perfect location tracking due to its design.
229
230### Shape Information Captured in the IR
231
232In TensorFlow Graphs, each op takes and returns values using a very simple type
233system (TF_DataType) in which each value is a tensor of unknown rank and
234dimensions. At the same time, many graphs have static shapes easily knowable for
235wide swaths of the computation, and even dynamically shaped operations often
236have statically knowable dimensions. Many analyses and transformations benefit
237and use this information when available, but because TensorFlow graphs don't
238capture this (e.g. serialize it to proto), passes have to recompute it on demand
239with ShapeRefiner.
240
241The [MLIR Tensor Type](LangRef.md#tensor-type) directly captures shape
242information, so you can have things like:
243
244```mlir
245  %x = tf.Add %x, %y : tensor<128 x 8 x ? x f32>
246```
247
248Capturing this in the IR is expected to speed up transformations (avoiding
249recomputing the same info over and over again) which therefore makes it
250practical to apply stronger shape analysis algorithms. It also makes it easier
251to work with the IR, because on-the-side representations can get out of date,
252and the API is easier to work with from an ergonomics perspective.
253
254### Unified Graph Rewriting Infrastructure
255
256This is still a work in progress, but we have sightlines towards a
257[general rewriting infrastructure](GenericDAGRewriter.md) for transforming DAG
258tiles into other DAG tiles, using a declarative pattern format. DAG to DAG
259rewriting is a generalized solution for many common compiler optimizations,
260lowerings, and other rewrites and having an IR enables us to invest in building
261a single high-quality implementation.
262
263Declarative pattern rules are preferable to imperative C++ code for a number of
264reasons: they are more compact, easier to reason about, can have checkers
265written against them, and new tools can be built that inspect and manipulate the
266declarative patterns in interesting ways - e.g. applying theorem provers to
267them. It will be exciting to see this ecosystem develop as the infrastructure
268matures.
269
270### Clarified Semantics for TensorFlow Operations
271
272One of the challenging things about working with TensorFlow is that there are
273many invariants and behaviors that need to be preserved and known about when
274working with Graphs, and these can be difficult to reason about and lead to
275bugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics,
276nodes that execute even when passed a dead value, multiple device program
277representation - etc... all add complexities that can make it challenging to
278reason about whether a transformation or analysis is correct in general. Even
279something as simple as constant folding or transforming integer `x-x` into `0`
280is non-trivial because you need to consider control dependence edges.
281
282One of our major goals for the TensorFlow dialect of MLIR is to sort out these
283situations and upgrade existing TensorFlow graphs to semantics that are easier
284to reason about. The solutions to these problems are all still being debated,
285but those discussions have already yielded a lot of potential answers:
286introducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations
287using futures/async semantics etc. None of these particular battles are critical
288or important for MLIR to succeed (because of its "meta" nature, the abstraction
289decisions of any given dialect are up for it to decide), but each one that works
290out will make it easier to work with and transform TensorFlow operations. We
291expect these issues to get nailed down in the next couple of months when MLIR
292effort moves beyond TF Lite / TOCO support. The discussions that are happening
293now are super valuable and making progress.
294
295### Ergonomics
296
297A minor-in-theory, but important-in-practice point is that MLIR is designed to
298make it easy, memory efficient, and less error prone to transform code than
299other systems. `TensorFlow::Graph` has implementation issues where the same
300information is stored redundantly in different places (which must be manually
301kept up to date), has somewhat unusual representation of certain constructs
302(e.g. the function library, which makes it very difficult to add or remove
303functions, e.g. during interprocedural transformations), and stores information
304in the graph that is used by the executor, but isn't necessary for program
305transformation.
306
307TensorFlow has made a lot of progress in this area over the years, and there are
308lots of ideas about further improvements in the future, we are happy that MLIR
309addresses these needs (making it much easier to implement correct program
310transformations) today, and are committed to pushing hard to make it better.
311
312### Compile Time Performance and Memory Use
313
314MLIR has been designed to be memory and compile-time efficient in its algorithms
315and data structures, using immutable and uniqued structures, low level
316bit-packing, and other well-known techniques to avoid unnecessary heap
317allocations, and allow simple and safe multithreaded optimization of MLIR
318programs. There are other reasons to believe that the MLIR implementations of
319common transformations will be more efficient than the Python and C++
320TensorFlow::Graph implementations of the same things, given the current
321implementation details of TensorFlow.
322
323That said, this is very much a theory at this point. When the new implementation
324of various subsystems are available, we will see what happens in practice: there
325will be no reason to speculate - we can measure.
326
327## Common Questions and Concerns
328
329Here we address some frequently asked questions and concerns.
330
331### Isn't MLIR a big dependency to take on?
332
333We've heard that at least some people are concerned that MLIR is a "big"
334dependency to take on, and could result in large code size. Here are some key
335points MLIR:
336
3371.  The entire MLIR codebase is a pretty small C++ code base in absolute terms
338    compared to what goes into a modern ML framework.
3391.  Like LLVM, MLIR is designed as a set of libraries that clients can link in
340    or ignore as they wish. For example, the transformations in MLIR kept
341    separate from the core IR abstractions, and dialect specific code (e.g.
342    TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build
343    system. Clients that don't care about XLA don't link in that code, whether
344    they are a TF-Lite system or a client that is completely unrelated to
345    TensorFlow.
3461.  MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM
347    IR or any other heavy dependency - it just depends on LLVM's support library
348    which provides efficient hash tables and other
349    [memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task).
350    There have been discussions about splitting this set of libraries out to its
351    own subproject in LLVM that the LLVM IR project depends on. This would be
352    great for MLIR as well as other LLVM subprojects.
3531.  TensorFlow and many other frameworks already use LLVM - if so, MLIR would
354    not be pulling in an additional dependency at all.
355
356### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow?
357
358MLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow
359graphs and MLIR, as well as a pretty complete translator back and forth (the
360only known gap is that a few TF_DataType enums aren't handled yet). MLIR is a
361"Multi-Level IR", which allows it to represent code with different abstraction
362levels, so the ability to faithfully represent TensorFlow code in a completely
363backwards compatible way (even if there are some historical warts!) is critical.
364
365In *addition* to the isomorphic mapping, we are actively working on efforts to
366raise the abstraction level for working with TensorFlow graphs in MLIR. Doing so
367would make it even easier to write TensorFlow transformations than it is today,
368and would provide a path to migrating TF 1.x graphs forward into the TF 2.x
369world. For example, because MLIR has an extensible type system, we can directly
370model whether it is impossible for a Tensor value to be a "dead" value - similar
371to the use of optional types in modern programming languages.
372
373These discussions occasionally cause confusion because there are several issues
374being mixed up into one:
375
376*   What are the current semantics of TensorFlow graphs, and what invariants can
377    we rely on?
378*   What should the semantics be in TensorFlow 2.0?
379*   What do programs rely on in practice, and if it is unfriendly, can we
380    migrate it?
381*   Can we find a way to make it so transforms don't have to worry about the
382    complexities of Switch/Merge, by using higher level control flow
383    representations? (tentative answer: yes)
384*   How should MLIR represent async vs sync operations, what invariants are
385    provided, how does this dovetail with control flow?
386*   When is it safe and beneficial to perform optimizations that might reduce
387    parallelism?
388
389All of these questions have a "conservative/safe fallback": we can continue
390providing exactly the same abstractions that TensorFlow always has. That said,
391we are trying hard to level-up the representation (taking advantage of the
392"Multi-Level" part of MLIR) because doing so will make it much much easier to
393write analyses and transformations than it currently is in TensorFlow.
394
395### Non Goals
396
397It is important to point out things that MLIR does not aim to do. For example,
398there is no runtime component to MLIR: the TensorFlow executor, the TF Lite
399FlatBuffer interpreter, or other existing runtime should be used as-is.
400
401Another non-goal is that MLIR currently doesn't support a stable binary
402encoding. We will certainly add this at some point, but existing formats should
403be used for serialization and distribution in the meantime.
404