1# Background topics 2 3This section covers a numbers of common compiler terms that arise in 4this guide. We try to give the general definition while providing some 5Rust-specific context. 6 7<a name="cfg"></a> 8 9## What is a control-flow graph? 10 11A control-flow graph (CFG) is a common term from compilers. If you've ever 12used a flow-chart, then the concept of a control-flow graph will be 13pretty familiar to you. It's a representation of your program that 14clearly exposes the underlying control flow. 15 16A control-flow graph is structured as a set of **basic blocks** 17connected by edges. The key idea of a basic block is that it is a set 18of statements that execute "together" – that is, whenever you branch 19to a basic block, you start at the first statement and then execute 20all the remainder. Only at the end of the block is there the 21possibility of branching to more than one place (in MIR, we call that 22final statement the **terminator**): 23 24```mir 25bb0: { 26 statement0; 27 statement1; 28 statement2; 29 ... 30 terminator; 31} 32``` 33 34Many expressions that you are used to in Rust compile down to multiple 35basic blocks. For example, consider an if statement: 36 37```rust,ignore 38a = 1; 39if some_variable { 40 b = 1; 41} else { 42 c = 1; 43} 44d = 1; 45``` 46 47This would compile into four basic blocks in MIR. In textual form, it looks like 48this: 49 50```mir 51BB0: { 52 a = 1; 53 if some_variable { 54 goto BB1; 55 } else { 56 goto BB2; 57 } 58} 59 60BB1: { 61 b = 1; 62 goto BB3; 63} 64 65BB2: { 66 c = 1; 67 goto BB3; 68} 69 70BB3: { 71 d = 1; 72 ... 73} 74``` 75 76In graphical form, it looks like this: 77 78``` 79 BB0 80 +--------------------+ 81 | a = 1; | 82 +--------------------+ 83 / \ 84 if some_variable else 85 / \ 86 BB1 / \ BB2 87 +-----------+ +-----------+ 88 | b = 1; | | c = 1; | 89 +-----------+ +-----------+ 90 \ / 91 \ / 92 \ BB3 / 93 +----------+ 94 | d = 1; | 95 | ... | 96 +----------+ 97``` 98 99When using a control-flow graph, a loop simply appears as a cycle in 100the graph, and the `break` keyword translates into a path out of that 101cycle. 102 103<a name="dataflow"></a> 104 105## What is a dataflow analysis? 106 107[*Static Program Analysis*](https://cs.au.dk/~amoeller/spa/) by Anders Møller 108and Michael I. Schwartzbach is an incredible resource! 109 110_Dataflow analysis_ is a type of static analysis that is common in many 111compilers. It describes a general technique, rather than a particular analysis. 112 113The basic idea is that we can walk over a [control-flow graph (CFG)](#cfg) and 114keep track of what some value could be. At the end of the walk, we might have 115shown that some claim is true or not necessarily true (e.g. "this variable must 116be initialized"). `rustc` tends to do dataflow analyses over the MIR, since MIR 117is already a CFG. 118 119For example, suppose we want to check that `x` is initialized before it is used 120in this snippet: 121 122```rust,ignore 123fn foo() { 124 let mut x; 125 126 if some_cond { 127 x = 1; 128 } 129 130 dbg!(x); 131} 132``` 133 134A CFG for this code might look like this: 135 136```txt 137 +------+ 138 | Init | (A) 139 +------+ 140 | \ 141 | if some_cond 142 else \ +-------+ 143 | \| x = 1 | (B) 144 | +-------+ 145 | / 146 +---------+ 147 | dbg!(x) | (C) 148 +---------+ 149``` 150 151We can do the dataflow analysis as follows: we will start off with a flag `init` 152which indicates if we know `x` is initialized. As we walk the CFG, we will 153update the flag. At the end, we can check its value. 154 155So first, in block (A), the variable `x` is declared but not initialized, so 156`init = false`. In block (B), we initialize the value, so we know that `x` is 157initialized. So at the end of (B), `init = true`. 158 159Block (C) is where things get interesting. Notice that there are two incoming 160edges, one from (A) and one from (B), corresponding to whether `some_cond` is true or not. 161But we cannot know that! It could be the case the `some_cond` is always true, 162so that `x` is actually always initialized. It could also be the case that 163`some_cond` depends on something random (e.g. the time), so `x` may not be 164initialized. In general, we cannot know statically (due to [Rice's 165Theorem][rice]). So what should the value of `init` be in block (C)? 166 167[rice]: https://en.wikipedia.org/wiki/Rice%27s_theorem 168 169Generally, in dataflow analyses, if a block has multiple parents (like (C) in 170our example), its dataflow value will be some function of all its parents (and 171of course, what happens in (C)). Which function we use depends on the analysis 172we are doing. 173 174In this case, we want to be able to prove definitively that `x` must be 175initialized before use. This forces us to be conservative and assume that 176`some_cond` might be false sometimes. So our "merging function" is "and". That 177is, `init = true` in (C) if `init = true` in (A) _and_ in (B) (or if `x` is 178initialized in (C)). But this is not the case; in particular, `init = false` in 179(A), and `x` is not initialized in (C). Thus, `init = false` in (C); we can 180report an error that "`x` may not be initialized before use". 181 182There is definitely a lot more that can be said about dataflow analyses. There is an 183extensive body of research literature on the topic, including a lot of theory. 184We only discussed a forwards analysis, but backwards dataflow analysis is also 185useful. For example, rather than starting from block (A) and moving forwards, 186we might have started with the usage of `x` and moved backwards to try to find 187its initialization. 188 189<a name="quantified"></a> 190 191## What is "universally quantified"? What about "existentially quantified"? 192 193In math, a predicate may be _universally quantified_ or _existentially 194quantified_: 195 196- _Universal_ quantification: 197 - the predicate holds if it is true for all possible inputs. 198 - Traditional notation: ∀x: P(x). Read as "for all x, P(x) holds". 199- _Existential_ quantification: 200 - the predicate holds if there is any input where it is true, i.e., there 201 only has to be a single input. 202 - Traditional notation: ∃x: P(x). Read as "there exists x such that P(x) holds". 203 204In Rust, they come up in type checking and trait solving. For example, 205 206```rust,ignore 207fn foo<T>() 208``` 209This function claims that the function is well-typed for all types `T`: `∀ T: well_typed(foo)`. 210 211Another example: 212 213```rust,ignore 214fn foo<'a>(_: &'a usize) 215``` 216This function claims that for any lifetime `'a` (determined by the 217caller), it is well-typed: `∀ 'a: well_typed(foo)`. 218 219Another example: 220 221```rust,ignore 222fn foo<F>() 223where for<'a> F: Fn(&'a u8) 224``` 225This function claims that it is well-typed for all types `F` such that for all 226lifetimes `'a`, `F: Fn(&'a u8)`: `∀ F: ∀ 'a: (F: Fn(&'a u8)) => well_typed(foo)`. 227 228One more example: 229 230```rust,ignore 231fn foo(_: dyn Debug) 232``` 233This function claims that there exists some type `T` that implements `Debug` 234such that the function is well-typed: `∃ T: (T: Debug) and well_typed(foo)`. 235 236<a name="variance"></a> 237 238## What is a de Bruijn Index? 239 240[De Bruijn indices][wikideb] are a way of representing, using only integers, 241which variables are bound in which binders. They were originally invented for 242use in lambda calculus evaluation (see [this Wikipedia article][wikideb] for 243more). In `rustc`, we use de Bruijn indices to [represent generic types][sub]. 244 245[wikideb]: https://en.wikipedia.org/wiki/De_Bruijn_index 246[sub]: ../generics.md 247 248Here is a basic example of how de Bruijn indices might be used for closures (we 249don't actually do this in `rustc` though!): 250 251```rust,ignore 252|x| { 253 f(x) // de Bruijn index of `x` is 1 because `x` is bound 1 level up 254 255 |y| { 256 g(x, y) // index of `x` is 2 because it is bound 2 levels up 257 // index of `y` is 1 because it is bound 1 level up 258 } 259} 260``` 261 262## What are co- and contra-variance? 263 264Check out the subtyping chapter from the 265[Rust Nomicon](https://doc.rust-lang.org/nomicon/subtyping.html). 266 267See the [variance](../variance.html) chapter of this guide for more info on how 268the type checker handles variance. 269 270<a name="free-vs-bound"></a> 271 272## What is a "free region" or a "free variable"? What about "bound region"? 273 274Let's describe the concepts of free vs bound in terms of program 275variables, since that's the thing we're most familiar with. 276 277- Consider this expression, which creates a closure: `|a, b| a + b`. 278 Here, the `a` and `b` in `a + b` refer to the arguments that the closure will 279 be given when it is called. We say that the `a` and `b` there are **bound** to 280 the closure, and that the closure signature `|a, b|` is a **binder** for the 281 names `a` and `b` (because any references to `a` or `b` within refer to the 282 variables that it introduces). 283- Consider this expression: `a + b`. In this expression, `a` and `b` refer to 284 local variables that are defined *outside* of the expression. We say that 285 those variables **appear free** in the expression (i.e., they are **free**, 286 not **bound** (tied up)). 287 288So there you have it: a variable "appears free" in some 289expression/statement/whatever if it refers to something defined 290outside of that expressions/statement/whatever. Equivalently, we can 291then refer to the "free variables" of an expression – which is just 292the set of variables that "appear free". 293 294So what does this have to do with regions? Well, we can apply the 295analogous concept to type and regions. For example, in the type `&'a 296u32`, `'a` appears free. But in the type `for<'a> fn(&'a u32)`, it 297does not. 298 299# Further Reading About Compilers 300 301> Thanks to `mem`, `scottmcm`, and `Levi` on the official Discord for the 302> recommendations, and to `tinaun` for posting a link to a [twitter thread from 303> Graydon Hoare](https://twitter.com/graydon_pub/status/1039615569132118016) 304> which had some more recommendations! 305> 306> Other sources: https://gcc.gnu.org/wiki/ListOfCompilerBooks 307> 308> If you have other suggestions, please feel free to open an issue or PR. 309 310## Books 311- [Types and Programming Languages](https://www.cis.upenn.edu/~bcpierce/tapl/) 312- [Programming Language Pragmatics](https://www.cs.rochester.edu/~scott/pragmatics/) 313- [Practical Foundations for Programming Languages](https://www.cs.cmu.edu/~rwh/pfpl/2nded.pdf) 314- [Compilers: Principles, Techniques, and Tools, 2nd Edition](https://www.pearson.com/us/higher-education/program/Aho-Compilers-Principles-Techniques-and-Tools-2nd-Edition/PGM167067.html) 315- [Garbage Collection: Algorithms for Automatic Dynamic Memory Management](https://www.cs.kent.ac.uk/people/staff/rej/gcbook/) 316- [Linkers and Loaders](https://www.amazon.com/Linkers-Kaufmann-Software-Engineering-Programming/dp/1558604960) (There are also free versions of this, but the version we had linked seems to be offline at the moment.) 317- [Advanced Compiler Design and Implementation](https://www.goodreads.com/book/show/887908.Advanced_Compiler_Design_and_Implementation) 318- [Building an Optimizing Compiler](https://www.goodreads.com/book/show/2063103.Building_an_Optimizing_Compiler) 319- [Crafting Interpreters](http://www.craftinginterpreters.com/) 320 321## Courses 322- [University of Oregon Programming Languages Summer School archive](https://www.cs.uoregon.edu/research/summerschool/archives.html) 323 324## Wikis 325- [Wikipedia](https://en.wikipedia.org/wiki/List_of_programming_languages_by_type) 326- [Esoteric Programming Languages](https://esolangs.org/wiki/Main_Page) 327- [Stanford Encyclopedia of Philosophy](https://plato.stanford.edu/index.html) 328- [nLab](https://ncatlab.org/nlab/show/HomePage) 329 330## Misc Papers and Blog Posts 331- [Programming in Martin-Löf's Type Theory](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.118.6683&rep=rep1&type=pdf) 332- [Polymorphism, Subtyping, and Type Inference in MLsub](https://dl.acm.org/doi/10.1145/3093333.3009882) 333