1# MLIR: Incremental Application to Graph Algorithms in ML Frameworks 2 3The existing documentation about MLIR focuses on long term vision, how its 4pieces fit together, and the benefits of modular and composable infrastructure 5in the vast and distant future. While this viewpoint appeals to some, it causes 6concern for others who are more concerned about the "here and now" - why does it 7make sense to make a "revolutionary" change when any individual problem can be 8fixed in place? 9 10This document explains that adoption of MLIR to solve graph based problems 11_isn't_ a revolutionary change: it is an incremental series of steps which build 12on each other, each of which delivers local value. This document also addresses 13some points of confusion that keep coming up. 14 15One note: even though a major advantage of MLIR is that it can span the full 16spectrum from graph algorithms down to low-level code generation, this document 17focuses on the use of MLIR for **graph-level algorithms**. MLIR will also unlock 18exciting code generation opportunities (particularly given its novel approach to 19integrating state of the art polyhedral techniques), but issues that touch on 20MLIR's relationship to XLA, Eigen, etc, are out of scope for this particular 21doc. 22 23This document uses TensorFlow as the example given that it is the focus of our 24immediate work, but we believe that the same viewpoint could be useful for 25people working in the context of other ML frameworks that may consider adopting 26MLIR in the future. 27 28### How is MLIR relevant? 29 30MLIR is an overloaded acronym which unpacks as "Multi-Level Intermediate 31Representation". Its high-level purpose is to provide mechanics for describing 32and transforming programs and computations in a flexible way. It provides common 33compiler infrastructure for things like constant folding, dead code elimination, 34graph rewriting, and others - which are independent of the representational 35choices picked by a given dialect (e.g. its concurrency semantics). It was built 36with a specific focus on compile time and memory efficiency, accurate 37propagation of source location information (important for reporting high quality 38errors and warnings) and is designed for testability. 39 40TensorFlow has numerous subsystems (some of which are proprietary, e.g. 41Tensor-RT, nGraph, CoreML, etc) as well as translation layers between these 42different subsystems, and these translation layers face similar challenges. ((As 43an aside, the internals of each of these subsystems could often benefit from 44MLIR infrastructure, but that isn't a focus of this doc.)) 45 46A key observation that MLIR makes is that these subsystems often have two things 47going on: they are both particular data structures and encodings (e.g. HLO 48graphs, TF-Lite's flat buffer format, TensorFlow's Graph format, the ONNX 49abstraction, etc) as well as an abstraction of computation (a specific way of 50modeling a convolution, a set of supported operations etc). 51 52MLIR uses a standard IR (i.e., a set of data structures) for representing these 53computations - this allows a huge amount of shared infrastructure across these 54problem domains. MLIR then allows the definition of domain-specific "dialects" 55that describe the set of operations that are legal and supported for a given 56application. This means that the actual translations between data structures are 57kept as simple as possible - and are thus relatively easy to make "correct". 58This allows the common compiler infrastructure to handle the mapping problems 59and the other issues within the domain. 60 61MLIR's design is directly informed by the experience of building (and then 62living with) intermediate representations like the LLVM IR, LLVM SelectionDAG, 63the LLVM machine instruction representation, Swift SIL IR, and learns new 64lessons from TensorFlow and XLA HLO, as well as learning from building countless 65research and production systems on top of them. Our goal is to drag the state of 66the art in compilers forward, not to merely apply a few well-known techniques to 67the machine learning domain. 68 69### What does adoption mean? 70 71The point of this document is not to advocate for rewriting any particular 72subsystem in TensorFlow - indeed, the burden required to justify a rewrite is 73high, and often very specific to that subsystem. That said, there are several 74subsystems that are about to get rewritten or substantially revised anyway, so 75we use those as examples to concretely describe the benefits that MLIR provides 76in these cases and what it will take. The subsystems discussed are: 77 781. the TF Lite TOCO translator, which we need to improve error 79 reporting/reliability issues and generalize it to support more ops, and 801. the TF/XLA bridge which needs to improve usability by merging some of its 81 usage models, support dynamic shapes and generalize guest subsystem support 82 to Tensor-RT and nGraph. 831. Grappler is another subsystem that is likely to get substantial revisions in 84 the future, and would definitely benefit from the MLIR framework, but there 85 are no known plans to do that work at this point, so we don't discuss it 86 further. 87 88Adopting MLIR for these works the same way - and, in fact, the work to support 89TF Lite is mostly a subset of the larger work to support the functionality of 90the TF/XLA bridge. TF Lite and the TF/XLA bridge include several compiler passes 91(things like encapsulate, functionalize control flow, lowering of ops, fusion, 92constant folding, shape inference, etc). 93 94MLIR supports converting from TensorFlow Graphs to MLIR and back, which means 95that we can start by putting in a no-op translation to MLIR and back into the 96pipeline, and verify that nothing breaks. Then we can work on replacing the 97compiler transformations one by one by reimplementing them (with the improved 98algorithms that we're planning). 99 100This is a development plan, we wouldn't actually ship a TensorFlow that just 101uses MLIR for a single pass. In practice, we'll have the MLIR flag gated under 102an option, build out a replacement for an entire subsystem (e.g. the TOCO 103translator) and when the time is right, we'll do A/B comparisons and eventually 104make a switch and phase out the old code over time. 105 106## What benefit does MLIR provide? 107 108The adoption plan above might sound like it only makes things worse in the 109immediate term - we have two implementations of the same functionality, we are 110dividing our efforts, etc. In order for this to be worth it, we should have a 111good sense that we are building towards an improved future that will make 112customers and TensorFlow engineers happier when it lands. Here we describe a few 113of the benefits that MLIR provides, in no particular order: 114 115### A Lossless Human Editable Textual Representation 116 117The MLIR in-memory data structure has a human readable and writable format, as 118well as [a specification](../LangRef.md) for that format - built just like any 119other programming language. Important properties of this format are that it is 120compact, easy to read, and lossless. You can dump an MLIR program out to disk 121and munge around with it, then send it through a few more passes. 122 123If you haven't worked with a system that works this way, it is hard to overstate 124how big of a deal this in practice: it means that you can call `foo->dump()` on 125an IR object to see its full contents, it means you can diff the IR before and 126after a change, delta reduce IR files, and many other things. 127 128### A Graph Verification Pass 129 130Like many other popular compiler infrastructures, MLIR provides infrastructure 131and implementation for a "verifier" which checks that the IR is well formed. The 132MLIR verifier is a simple framework that makes it easy to provide a single 133source of truth for those correctness properties and is general across all 134Dialects (e.g. TF Graph, TF Lite flat buffer, XLA HLO, etc). 135 136A verifier pass is sort of like a 'super assertion' that catches mistakes in 137program transformations early, making you as an engineer more productive, making 138the product more reliable, and making it easier to track down bugs when they 139appear - because the verifier can be run at any time, either as a compiler pass 140or with a single function call. 141 142While MLIR provides a well-considered infrastructure for IR verification, and 143has simple checks for existing TensorFlow operations, there is a lot that should 144be added here and lots of opportunity to get involved! 145 146### Designed for Testability 147 148There are many aspects of this in MLIR, but we'll focus on compiler 149transformations since they are the easiest to understand. Compiler 150transformations are modeled as subclasses of the `Pass` C++ class, which are 151driven by an `mlir-opt` tool. When combined with a lossless textual 152representation, it becomes really easy to write unit tests for compiler 153transformations, for example, this is a simple test that shows "x-x" is being 154turned into zero: 155 156```mlir 157 // RUN: mlir-opt %s -canonicalize | FileCheck %s 158 func @test_subi_zero_cfg(%arg0: i32) -> i32 { 159 %y = subi %arg0, %arg0 : i32 160 return %y: i32 161 } 162 // CHECK-LABEL: func @test_subi_zero_cfg(%arg0: i32) 163 // CHECK-NEXT: %c0_i32 = constant 0 : i32 164 // CHECK-NEXT: return %c0 165``` 166 167The "CHECK" comments are interpreted by the 168[LLVM FileCheck tool](https://llvm.org/docs/CommandGuide/FileCheck.html), which 169is sort of like a really advanced grep. This test is fully self-contained: it 170feeds the input into the [canonicalize pass](../Canonicalization.md), and checks 171that the output matches the CHECK lines. See the `test/Transforms` directory for 172more examples. In contrast, standard unit testing exposes the API of the 173underlying framework to lots and lots of tests (making it harder to refactor and 174move the API), typically requires a lot more code, and exacerbates issues with 175link time. For examples, see 176[the TEST_F functions in TensorFlow's testsuite](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/grappler/optimizers/arithmetic_optimizer_test.cc). 177 178MLIR has been pervasively designed with this sort of design by testability, 179allowing us to put in place a culture that expects every behavior changing 180commit to include a test case, and for these test cases to be stable and 181reliable over time, since they are testing exactly what they are supposed to. 182End to end integration tests are still super useful for some things of course! 183 184### Infrastructure for Warnings and Error Diagnostics and Location Tracking 185 186MLIR benefits from the lessons learned from building other compilers - including 187Clang which 188[[set the standard](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html)](http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html) 189for quality of implementation in C/C++ compiler diagnostics. Drawing from this 190experience (and fixing mistakes in LLVM), MLIR requires that operations and 191functions carry abstract location information, that transformations propagate 192this information, and provides standardized mechanisms to emit errors and 193warnings, as well as for clients to hook into them to capture and report them in 194custom ways. 195 196Why is this important? In practice, many graph-to-graph translators can fail 197(e.g. TF Lite when an unsupported op is used) and it is important to be able to 198report the error up through to the user in the most precise way possible, in 199order for it to be actionable. This includes tracking rewrites through fusions 200and fissions of ops, mapping back into language / API specific domains, etc. 201 202More selfishly for infrastructure hackers, this is a huge boon because it means 203that it is easy to write good tests for this: the testing tools for MLIR capture 204the diagnostics produced by passes (using the standard diagnostic hooks) and 205check that they match the expected diagnostics in the testcase. For example, to 206test the dependence analysis infra in the code generator, Andy Davis wrote a 207simple pass that checks dependencies and emits them as "notes", allowing him to 208write tests like this: 209 210```mlir 211 // RUN: mlir-opt %s -memref-dependence-check -verify-diagnostics 212 func @different_memrefs() { 213 %m.a = alloc() : memref<100xf32> 214 %m.b = alloc() : memref<100xf32> 215 %c0 = constant 0 : index 216 %c1 = constant 1.0 : f32 217 store %c1, %m.a[%c0] : memref<100xf32> 218 // expected-note@-1 {{dependence from memref access 0 to access 1 = false}} 219 %v0 = load %m.b[%c0] : memref<100xf32> 220 return 221 } 222``` 223 224Note that a major limitation of this is that MLIR suffers from a problem of 225"garbage in, garbage out": if the input locations to MLIR are imprecise, then 226there is nothing that it can do to recover them. There is work underway in 227TensorFlow/Python to improve the situation, and Swift for TensorFlow already has 228perfect location tracking due to its design. 229 230### Shape Information Captured in the IR 231 232In TensorFlow Graphs, each op takes and returns values using a very simple type 233system (TF_DataType) in which each value is a tensor of unknown rank and 234dimensions. At the same time, many graphs have static shapes easily knowable for 235wide swaths of the computation, and even dynamically shaped operations often 236have statically knowable dimensions. Many analyses and transformations benefit 237and use this information when available, but because TensorFlow graphs don't 238capture this (e.g. serialize it to proto), passes have to recompute it on demand 239with ShapeRefiner. 240 241The [MLIR Tensor Type](../Dialects/Builtin.md/#rankedtensortype) directly captures shape 242information, so you can have things like: 243 244```mlir 245 %x = tf.Add %x, %y : tensor<128 x 8 x ? x f32> 246``` 247 248Capturing this in the IR is expected to speed up transformations (avoiding 249recomputing the same info over and over again) which therefore makes it 250practical to apply stronger shape analysis algorithms. It also makes it easier 251to work with the IR, because on-the-side representations can get out of date, 252and the API is easier to work with from an ergonomics perspective. 253 254### Unified Graph Rewriting Infrastructure 255 256This is still a work in progress, but we have sightlines towards a 257[general rewriting infrastructure](RationaleGenericDAGRewriter.md) for transforming DAG 258tiles into other DAG tiles, using a declarative pattern format. DAG to DAG 259rewriting is a generalized solution for many common compiler optimizations, 260lowerings, and other rewrites and having an IR enables us to invest in building 261a single high-quality implementation. 262 263Declarative pattern rules are preferable to imperative C++ code for a number of 264reasons: they are more compact, easier to reason about, can have checkers 265written against them, and new tools can be built that inspect and manipulate the 266declarative patterns in interesting ways - e.g. applying theorem provers to 267them. It will be exciting to see this ecosystem develop as the infrastructure 268matures. 269 270### Clarified Semantics for TensorFlow Operations 271 272One of the challenging things about working with TensorFlow is that there are 273many invariants and behaviors that need to be preserved and known about when 274working with Graphs, and these can be difficult to reason about and lead to 275bugs. Things like 'dead values', Switch and Merge nodes, concurrency semantics, 276nodes that execute even when passed a dead value, multiple device program 277representation - etc... all add complexities that can make it challenging to 278reason about whether a transformation or analysis is correct in general. Even 279something as simple as constant folding or transforming integer `x-x` into `0` 280is non-trivial because you need to consider control dependence edges. 281 282One of our major goals for the TensorFlow dialect of MLIR is to sort out these 283situations and upgrade existing TensorFlow graphs to semantics that are easier 284to reason about. The solutions to these problems are all still being debated, 285but those discussions have already yielded a lot of potential answers: 286introducing a `tf_dead_or<x>` types for switch/merge, modeling of TF operations 287using futures/async semantics etc. None of these particular battles are critical 288or important for MLIR to succeed (because of its "meta" nature, the abstraction 289decisions of any given dialect are up for it to decide), but each one that works 290out will make it easier to work with and transform TensorFlow operations. We 291expect these issues to get nailed down in the next couple of months when MLIR 292effort moves beyond TF Lite / TOCO support. The discussions that are happening 293now are super valuable and making progress. 294 295### Ergonomics 296 297A minor-in-theory, but important-in-practice point is that MLIR is designed to 298make it easy, memory efficient, and less error prone to transform code than 299other systems. `TensorFlow::Graph` has implementation issues where the same 300information is stored redundantly in different places (which must be manually 301kept up to date), has somewhat unusual representation of certain constructs 302(e.g. the function library, which makes it very difficult to add or remove 303functions, e.g. during interprocedural transformations), and stores information 304in the graph that is used by the executor, but isn't necessary for program 305transformation. 306 307TensorFlow has made a lot of progress in this area over the years, and there are 308lots of ideas about further improvements in the future, we are happy that MLIR 309addresses these needs (making it much easier to implement correct program 310transformations) today, and are committed to pushing hard to make it better. 311 312### Compile Time Performance and Memory Use 313 314MLIR has been designed to be memory and compile-time efficient in its algorithms 315and data structures, using immutable and uniqued structures, low level 316bit-packing, and other well-known techniques to avoid unnecessary heap 317allocations, and allow simple and safe multithreaded optimization of MLIR 318programs. There are other reasons to believe that the MLIR implementations of 319common transformations will be more efficient than the Python and C++ 320TensorFlow::Graph implementations of the same things, given the current 321implementation details of TensorFlow. 322 323That said, this is very much a theory at this point. When the new implementation 324of various subsystems are available, we will see what happens in practice: there 325will be no reason to speculate - we can measure. 326 327## Common Questions and Concerns 328 329Here we address some frequently asked questions and concerns. 330 331### Isn't MLIR a big dependency to take on? 332 333We've heard that at least some people are concerned that MLIR is a "big" 334dependency to take on, and could result in large code size. Here are some key 335points MLIR: 336 3371. The entire MLIR codebase is a pretty small C++ code base in absolute terms 338 compared to what goes into a modern ML framework. 3391. Like LLVM, MLIR is designed as a set of libraries that clients can link in 340 or ignore as they wish. For example, the transformations in MLIR kept 341 separate from the core IR abstractions, and dialect specific code (e.g. 342 TensorFlow, TF-Lite, XLA, etc) is all independently selectable by the build 343 system. Clients that don't care about XLA don't link in that code, whether 344 they are a TF-Lite system or a client that is completely unrelated to 345 TensorFlow. 3461. MLIR's only third party dependency is on LLVM, but it doesn't depend on LLVM 347 IR or any other heavy dependency - it just depends on LLVM's support library 348 which provides efficient hash tables and other 349 [memory efficient data structures that the STL does not](http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task). 350 There have been discussions about splitting this set of libraries out to its 351 own subproject in LLVM that the LLVM IR project depends on. This would be 352 great for MLIR as well as other LLVM subprojects. 3531. TensorFlow and many other frameworks already use LLVM - if so, MLIR would 354 not be pulling in an additional dependency at all. 355 356### How does MLIR represent {control flow, concurrency, …} semantics in TensorFlow? 357 358MLIR provides a dialect that is an isomorphic 1-1 mapping between TensorFlow 359graphs and MLIR, as well as a pretty complete translator back and forth (the 360only known gap is that a few TF_DataType enums aren't handled yet). MLIR is a 361"Multi-Level IR", which allows it to represent code with different abstraction 362levels, so the ability to faithfully represent TensorFlow code in a completely 363backwards compatible way (even if there are some historical warts!) is critical. 364 365In *addition* to the isomorphic mapping, we are actively working on efforts to 366raise the abstraction level for working with TensorFlow graphs in MLIR. Doing so 367would make it even easier to write TensorFlow transformations than it is today, 368and would provide a path to migrating TF 1.x graphs forward into the TF 2.x 369world. For example, because MLIR has an extensible type system, we can directly 370model whether it is impossible for a Tensor value to be a "dead" value - similar 371to the use of optional types in modern programming languages. 372 373These discussions occasionally cause confusion because there are several issues 374being mixed up into one: 375 376* What are the current semantics of TensorFlow graphs, and what invariants can 377 we rely on? 378* What should the semantics be in TensorFlow 2.0? 379* What do programs rely on in practice, and if it is unfriendly, can we 380 migrate it? 381* Can we find a way to make it so transforms don't have to worry about the 382 complexities of Switch/Merge, by using higher level control flow 383 representations? (tentative answer: yes) 384* How should MLIR represent async vs sync operations, what invariants are 385 provided, how does this dovetail with control flow? 386* When is it safe and beneficial to perform optimizations that might reduce 387 parallelism? 388 389All of these questions have a "conservative/safe fallback": we can continue 390providing exactly the same abstractions that TensorFlow always has. That said, 391we are trying hard to level-up the representation (taking advantage of the 392"Multi-Level" part of MLIR) because doing so will make it much much easier to 393write analyses and transformations than it currently is in TensorFlow. 394 395### Non Goals 396 397It is important to point out things that MLIR does not aim to do. For example, 398there is no runtime component to MLIR: the TensorFlow executor, the TF Lite 399FlatBuffer interpreter, or other existing runtime should be used as-is. 400 401Another non-goal is that MLIR currently doesn't support a stable binary 402encoding. We will certainly add this at some point, but existing formats should 403be used for serialization and distribution in the meantime. 404