1% vim: tw=100 2 3\documentclass[twocolumn]{article} 4\usepackage{blindtext} 5\usepackage[hypcap]{caption} 6\usepackage{fontspec} 7\usepackage[colorlinks, urlcolor={blue!80!black}]{hyperref} 8\usepackage[outputdir=out]{minted} 9\usepackage{relsize} 10\usepackage{xcolor} 11 12\setmonofont{Source Code Pro}[ 13 BoldFont={* Medium}, 14 BoldItalicFont={* Medium Italic}, 15 Scale=MatchLowercase, 16] 17 18\newcommand{\rust}[1]{\mintinline{rust}{#1}} 19 20\begin{document} 21 22\title{Miri: \\ \smaller{An interpreter for Rust's mid-level intermediate representation}} 23\author{Scott Olson\footnote{\href{mailto:scott@solson.me}{scott@solson.me}} \\ 24 \smaller{Supervised by Christopher Dutchyn}} 25\date{April 12th, 2016} 26\maketitle 27 28%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 29 30\section{Abstract} 31 32The increasing need for safe low-level code in contexts like operating systems and browsers is 33driving the development of Rust\footnote{\url{https://www.rust-lang.org}}, a programming language 34promising high performance without the risk of memory unsafety. To make programming more convenient, 35it's often desirable to be able to generate code or perform some computation at compile-time. The 36former is mostly covered by Rust's existing macro feature or build-time code generation, but the 37latter is currently restricted to a limited form of constant evaluation capable of little beyond 38simple math. 39 40The architecture of the compiler at the time the existing constant evaluator was built limited its 41potential for future extension. However, a new intermediate representation was recently 42added\footnote{\href{https://github.com/rust-lang/rfcs/blob/master/text/1211-mir.md}{Rust RFC \#1211: Mid-level IR (MIR)}} 43to the Rust compiler between the abstract syntax tree and the back-end LLVM IR, called mid-level 44intermediate representation, or MIR for short. This report will demonstrate that writing an 45interpreter for MIR is a surprisingly effective approach for supporting a large proportion of Rust's 46features in compile-time execution. 47 48%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 49 50\section{Background} 51 52The Rust compiler generates an instance of \rust{Mir} for each function [\autoref{fig:mir}]. Each 53\rust{Mir} structure represents a control-flow graph for a given function, and contains a list of 54``basic blocks'' which in turn contain a list of statements followed by a single terminator. Each 55statement is of the form \rust{place = rvalue}. An \rust{Place} is used for referencing variables 56and calculating addresses such as when dereferencing pointers, accessing fields, or indexing arrays. 57An \rust{Rvalue} represents the core set of operations possible in MIR, including reading a value 58from an place, performing math operations, creating new pointers, structures, and arrays, and so 59on. Finally, a terminator decides where control will flow next, optionally based on the value of a 60boolean or integer. 61 62\begin{figure}[ht] 63 \begin{minted}[autogobble]{rust} 64 struct Mir { 65 basic_blocks: Vec<BasicBlockData>, 66 // ... 67 } 68 69 struct BasicBlockData { 70 statements: Vec<Statement>, 71 terminator: Terminator, 72 // ... 73 } 74 75 struct Statement { 76 place: Place, 77 rvalue: Rvalue 78 } 79 80 enum Terminator { 81 Goto { target: BasicBlock }, 82 If { 83 cond: Operand, 84 targets: [BasicBlock; 2] 85 }, 86 // ... 87 } 88 \end{minted} 89 \caption{MIR (simplified)} 90 \label{fig:mir} 91\end{figure} 92 93%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 94 95\section{First implementation} 96 97\subsection{Basic operation} 98 99To investigate the possibility of executing Rust at compile-time I wrote an interpreter for MIR 100called Miri\footnote{\url{https://github.com/solson/miri}}. The structure of the interpreter closely 101mirrors the structure of MIR itself. It starts executing a function by iterating the statement list 102in the starting basic block, translating the place into a pointer and using the rvalue to decide 103what to write into that pointer. Evaluating the rvalue may involve reads (such as for the two sides 104of a binary operation) or construction of new values. When the terminator is reached, it is used to 105decide which basic block to jump to next. Finally, Miri repeats this entire process, reading 106statements from the new block. 107 108\subsection{Function calls} 109 110To handle function call terminators\footnote{Calls occur only as terminators, never as rvalues.}, 111Miri is required to store some information in a virtual call stack so that it may pick up where it 112left off when the callee returns. Each stack frame stores a reference to the \rust{Mir} for the 113function being executed, its local variables, its return value location\footnote{Return value 114pointers are passed in by callers.}, and the basic block where execution should resume. When Miri 115encounters a \rust{Return} terminator in the MIR, it pops one frame off the stack and resumes the 116previous function. Miri's execution ends when the function it was initially invoked with returns, 117leaving the call stack empty. 118 119It should be noted that Miri does not itself recurse when a function is called; it merely pushes a 120virtual stack frame and jumps to the top of the interpreter loop. Consequently, Miri can interpret 121deeply recursive programs without overflowing its native call stack. This approach would allow Miri 122to set a virtual stack depth limit and report an error when a program exceeds it. 123 124\subsection{Flaws} 125 126This version of Miri supported quite a bit of the Rust language, including booleans, integers, 127if-conditions, while-loops, structures, enums, arrays, tuples, pointers, and function calls, 128requiring approximately 400 lines of Rust code. However, it had a particularly naive value 129representation with a number of downsides. It resembled the data layout of a dynamic language like 130Ruby or Python, where every value has the same size\footnote{An \rust{enum} is a discriminated union 131with a tag and space to fit the largest variant, regardless of which variant it contains.} in the 132interpreter: 133 134\begin{minted}[autogobble]{rust} 135 enum Value { 136 Uninitialized, 137 Bool(bool), 138 Int(i64), 139 Pointer(Pointer), // index into stack 140 Aggregate { 141 variant: usize, 142 data: Pointer, 143 }, 144 } 145\end{minted} 146 147This representation did not work well for aggregate types\footnote{That is, structures, enums, 148arrays, tuples, and closures.} and required strange hacks to support them. Their contained values 149were allocated elsewhere on the stack and pointed to by the aggregate value, which made it more 150complicated to implement copying aggregate values from place to place. 151 152Moreover, while the aggregate issues could be worked around, this value representation made common 153unsafe programming tricks (which make assumptions about the low-level value layout) fundamentally 154impossible. 155 156%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 157 158\section{Current implementation} 159 160Roughly halfway through my time working on Miri, Eduard 161Burtescu\footnote{\href{https://github.com/eddyb}{eddyb on GitHub}} from the Rust compiler 162team\footnote{\url{https://www.rust-lang.org/team.html\#Compiler}} made a post on Rust's internal 163forums about a ``Rust Abstract Machine'' 164specification\footnote{\href{https://internals.rust-lang.org/t/mir-constant-evaluation/3143/31}{Burtescu's 165reply on ``MIR constant evaluation''}} which could be used to implement more powerful compile-time 166function execution, similar to what is supported by C++14's \mintinline{cpp}{constexpr} feature. 167After clarifying some of the details of the data layout with Burtescu via IRC, I started 168implementing it in Miri. 169 170\subsection{Raw value representation} 171 172The main difference in the new value representation was to represent values by ``abstract 173allocations'' containing arrays of raw bytes with different sizes depending on their types. This 174mimics how Rust values are represented when compiled for physical machines. In addition to the raw 175bytes, allocations carry information about pointers and undefined bytes. 176 177\begin{minted}[autogobble]{rust} 178 struct Memory { 179 map: HashMap<AllocId, Allocation>, 180 next_id: AllocId, 181 } 182 183 struct Allocation { 184 bytes: Vec<u8>, 185 relocations: BTreeMap<usize, AllocId>, 186 undef_mask: UndefMask, 187 } 188\end{minted} 189 190\subsubsection{Relocations} 191 192The abstract machine represents pointers through ``relocations'', which are analogous to relocations 193in linkers\footnote{\href{https://en.wikipedia.org/wiki/Relocation_(computing)}{Relocation 194(computing) - Wikipedia}}. Instead of storing a global memory address in the raw byte representation 195like on a physical machine, we store an offset from the start of the target allocation and add an 196entry to the relocation table which maps the index of the offset bytes to the target allocation. 197 198In \autoref{fig:reloc}, the relocation stored at offset 0 in \rust{y} points to offset 2 in \rust{x} 199(the 2nd 16-bit integer). Thus, the relocation table for \rust{y} is \texttt{\{0 => 200x\}}, meaning the next $N$ bytes after offset 0 denote an offset into allocation \rust{x} where $N$ 201is the size of a pointer (4 in this example). The example shows this as a labelled line beneath the 202offset bytes. 203 204In effect, the abstract machine represents pointers as \rust{(allocation_id, offset)} pairs. This 205makes it easy to detect when pointer accesses go out of bounds. 206 207\begin{figure}[hb] 208 \begin{minted}[autogobble]{rust} 209 let x: [i16; 3] = [0xAABB, 0xCCDD, 0xEEFF]; 210 let y = &x[1]; 211 // x: BB AA DD CC FF EE (6 bytes) 212 // y: 02 00 00 00 (4 bytes) 213 // └───(x)───┘ 214 \end{minted} 215 \caption{Example relocation on 32-bit little-endian} 216 \label{fig:reloc} 217\end{figure} 218 219\subsubsection{Undefined byte mask} 220 221The final piece of an abstract allocation is the undefined byte mask. Logically, we store a boolean 222for the definedness of every byte in the allocation, but there are multiple ways to make the storage 223more compact. I tried two implementations: one based on the endpoints of alternating ranges of 224defined and undefined bytes and the other based on a bitmask. The former is more compact but I found 225it surprisingly difficult to update cleanly. I currently use the much simpler bitmask system. 226 227See \autoref{fig:undef} for an example of an undefined byte in a value, represented by underscores. 228Note that there is a value for the second byte in the byte array, but it doesn't matter what it is. 229The bitmask would be $10_2$, i.e.\ \rust{[true, false]}. 230 231\begin{figure}[hb] 232 \begin{minted}[autogobble]{rust} 233 let x: [u8; 2] = unsafe { 234 [1, std::mem::uninitialized()] 235 }; 236 // x: 01 __ (2 bytes) 237 \end{minted} 238 \caption{Example undefined byte} 239 \label{fig:undef} 240\end{figure} 241 242\subsection{Computing data layout} 243 244Currently, the Rust compiler's data layouts for types are hidden from Miri, so it does its own data 245layout computation which will not always match what the compiler does, since Miri doesn't take 246target type alignments into account. In the future, the Rust compiler may be modified so that Miri 247can use the exact same data layout. 248 249Miri's data layout calculation is a relatively simple transformation from Rust types to a structure 250with constant size values for primitives and sets of fields with offsets for aggregate types. These 251layouts are cached for performance. 252 253%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 254 255\section{Deterministic execution} 256\label{sec:deterministic} 257 258In order to be effective as a compile-time evaluator, Miri must have \emph{deterministic execution}, 259as explained by Burtescu in the ``Rust Abstract Machine'' post. That is, given a function and 260arguments to that function, Miri should always produce identical results. This is important for 261coherence in the type checker when constant evaluations are involved in types, such as for sizes of 262array types: 263 264\begin{minted}[autogobble,mathescape]{rust} 265 const fn get_size() -> usize { /* $\ldots$ */ } 266 let array: [i32; get_size()]; 267\end{minted} 268 269Since Miri allows execution of unsafe code\footnote{In fact, the distinction between safe and unsafe 270doesn't exist at the MIR level.}, it is specifically designed to remain safe while interpreting 271potentially unsafe code. When Miri encounters an unrecoverable error, it reports it via the Rust 272compiler's usual error reporting mechanism, pointing to the part of the original code where the 273error occurred. Below is an example from Miri's 274repository.\footnote{\href{https://github.com/solson/miri/blob/master/test/errors.rs}{miri/test/errors.rs}} 275 276\begin{minted}[autogobble]{rust} 277 let b = Box::new(42); 278 let p: *const i32 = &*b; 279 drop(b); 280 unsafe { *p } 281 // ~~ error: dangling pointer 282 // was dereferenced 283\end{minted} 284\label{dangling-pointer} 285 286%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 287 288\section{Language support} 289 290In its current state, Miri supports a large proportion of the Rust language, detailed below. The 291major exception is a lack of support for FFI\footnote{Foreign Function Interface, e.g.\ calling 292functions defined in Assembly, C, or C++.}, which eliminates possibilities like reading and writing 293files, user input, graphics, and more. However, for compile-time evaluation in Rust, this limitation 294is desired. 295 296\subsection{Primitives} 297 298Miri supports booleans, integers of various sizes and signed-ness (i.e.\ \rust{i8}, \rust{i16}, 299\rust{i32}, \rust{i64}, \rust{isize}, \rust{u8}, \rust{u16}, \rust{u32}, \rust{u64}, \rust{usize}), 300and unary and binary operations over these types. The \rust{isize} and \rust{usize} types will be 301sized according to the target machine's pointer size just like in compiled Rust. The \rust{char} and 302float types (\rust{f32}, \rust{f64}) are not supported yet, but there are no known barriers to doing 303so. 304 305When examining a boolean in an \rust{if} condition, Miri will report an error if its byte 306representation is not precisely 0 or 1, since having any other value for a boolean is undefined 307behaviour in Rust. The \rust{char} type will have similar restrictions once it is implemented. 308 309\subsection{Pointers} 310 311Both references and raw pointers are supported, with essentially no difference between them in Miri. 312It is also possible to do pointer comparisons and math. However, a few operations are considered 313errors and a few require special support. 314 315Firstly, pointers into the same allocations may be compared for ordering, but pointers into 316different allocations are considered unordered and Miri will complain if you attempt this. The 317reasoning is that different allocations may have different orderings in the global address space at 318runtime, making this non-deterministic. However, pointers into different allocations \emph{may} be 319compared for direct equality (they are always unequal). 320 321Secondly, pointers represented using relocations may be compared against pointers casted from 322integers (e.g.\ \rust{0 as *const i32}) for things like null pointer checks. To handle these cases, 323Miri has a concept of ``integer pointers'' which are always unequal to abstract pointers. Integer 324pointers can be compared and operated upon freely. However, note that it is impossible to go from an 325integer pointer to an abstract pointer backed by a relocation. It is not valid to dereference an 326integer pointer. 327 328\subsubsection{Slice pointers} 329 330Rust supports pointers to ``dynamically-sized types'' such as \rust{[T]} and \rust{str} which 331represent arrays of indeterminate size. Pointers to such types contain an address \emph{and} the 332length of the referenced array. Miri supports these fully. 333 334\subsubsection{Trait objects} 335 336Rust also supports pointers to ``trait objects'' which represent some type that implements a trait, 337with the specific type unknown at compile-time. These are implemented using virtual dispatch with a 338vtable, similar to virtual methods in C++. Miri does not currently support these at all. 339 340\subsection{Aggregates} 341 342Aggregates include types declared with \rust{struct} or \rust{enum} as well as tuples, arrays, and 343closures. Miri supports all common usage of all of these types. The main missing piece is to handle 344\texttt{\#[repr(..)]} annotations which adjust the layout of a \rust{struct} or \rust{enum}. 345 346\subsection{Place projections} 347 348This category includes field accesses, dereferencing, accessing data in an \rust{enum} variant, and 349indexing arrays. Miri supports all of these, including nested projections such as 350\rust{*foo.bar[2]}. 351 352\subsection{Control flow} 353 354All of Rust's standard control flow features, including \rust{loop}, \rust{while}, \rust{for}, 355\rust{if}, \rust{if let}, \rust{while let}, \rust{match}, \rust{break}, \rust{continue}, and 356\rust{return} are supported. In fact, supporting these was quite easy since the Rust compiler 357reduces them all down to a small set of control-flow graph primitives in MIR. 358 359\subsection{Function calls} 360 361As previously described, Miri supports arbitrary function calls without growing the native stack 362(only its virtual call stack). It is somewhat limited by the fact that cross-crate\footnote{A crate 363is a single Rust library (or executable).} calls only work for functions whose MIR is stored in 364crate metadata. This is currently true for \rust{const}, generic, and inline functions. 365A branch of the compiler could be made that stores MIR for all functions. This would be a non-issue 366for a compile-time evaluator based on Miri, since it would only call \rust{const fn}s. 367 368\subsubsection{Method calls} 369 370Miri supports trait method calls, including invoking all the compiler-internal lookup needed to find 371the correct implementation of the method. 372 373\subsubsection{Closures} 374 375Calls to closures are also supported with the exception of one edge case\footnote{Calling a closure 376that takes a reference to its captures via a closure interface that passes the captures by value is 377not yet supported.}. The value part of a closure that holds the captured variables is handled as an 378aggregate and the function call part is mostly the same as a trait method call, but with the added 379complication that closures use a separate calling convention within the compiler. 380 381\subsubsection{Function pointers} 382 383Function pointers are not currently supported by Miri, but there is a relatively simple way they 384could be encoded using a relocation with a special reserved allocation identifier. The offset of the 385relocation would determine which function it points to in a special array of functions in the 386interpreter. 387 388\subsubsection{Intrinsics} 389 390To support unsafe code, and in particular to support Rust's standard library, it became clear that 391Miri would have to support calls to compiler 392intrinsics\footnote{\url{https://doc.rust-lang.org/stable/std/intrinsics/index.html}}. Intrinsics 393are function calls which cause the Rust compiler to produce special-purpose code instead of a 394regular function call. Miri simply recognizes intrinsic calls by their unique 395ABI\footnote{Application Binary Interface, which defines calling conventions. Includes ``C'', 396``Rust'', and ``rust-intrinsic''.} and name and runs special-purpose code to handle them. 397 398An example of an important intrinsic is \rust{size_of} which will cause Miri to write the size of 399the type in question to the return value location. The Rust standard library uses intrinsics heavily 400to implement various data structures, so this was a major step toward supporting them. Intrinsics 401have been implemented on a case-by-case basis as tests which required them were written, and not all 402intrinsics are supported yet. 403 404\subsubsection{Generic function calls} 405 406Miri needs special support for generic function calls since Rust is a \emph{monomorphizing} 407compiler, meaning it generates a special version of each function for each distinct set of type 408parameters it gets called with. Since functions in MIR are still polymorphic, Miri has to do the 409same thing and substitute function type parameters into all types it encounters to get fully 410concrete, monomorphized types. For example, in\ldots 411 412\begin{minted}[autogobble]{rust} 413 fn some<T>(t: T) -> Option<T> { Some(t) } 414\end{minted} 415 416\ldots{}Miri needs to know the size of \rust{T} to copy the right amount of bytes from the argument 417to the return value. If we call \rust{some(10i32)} Miri will execute \rust{some} knowing that 418\rust{T = i32} and generate a representation for \rust{Option<i32>}. 419 420Miri currently does this monomorphization lazily on-demand unlike the Rust back-end which does it 421all ahead of time. 422 423\subsection{Heap allocations} 424 425The next piece of the puzzle for supporting interesting programs (and the standard library) was heap 426allocations. There are two main interfaces for heap allocation in Rust: the built-in \rust{Box} 427rvalue in MIR and a set of C ABI foreign functions including \rust{__rust_allocate}, 428\rust{__rust_reallocate}, and \rust{__rust_deallocate}. These correspond approximately to 429\mintinline{c}{malloc}, \mintinline{c}{realloc}, and \mintinline{c}{free} in C. 430 431The \rust{Box} rvalue allocates enough space for a single value of a given type. This was easy to 432support in Miri. It simply creates a new abstract allocation in the same manner as for 433stack-allocated values, since there's no major difference between them in Miri. 434 435The allocator functions, which are used to implement things like Rust's standard \rust{Vec<T>} type, 436were a bit trickier. Rust declares them as \rust{extern "C" fn} so that different allocator 437libraries can be linked in at the user's option. Since Miri doesn't actually support FFI and wants 438full control of allocations for safety, it ``cheats'' and recognizes these allocator functions in 439essentially the same way it recognizes compiler intrinsics. Then, a call to \rust{__rust_allocate} 440simply creates another abstract allocation with the requested size and \rust{__rust_reallocate} 441grows one. 442 443In the future, Miri should also track which allocations came from \rust{__rust_allocate} so it can 444reject reallocate or deallocate calls on stack allocations. 445 446\subsection{Destructors} 447 448When a value which ``owns'' some resource (like a heap allocation or file handle) goes out of scope, 449Rust inserts \emph{drop glue} that calls the user-defined destructor for the type if it has one, and 450then drops all of the subfields. Destructors for types like \rust{Box<T>} and \rust{Vec<T>} 451deallocate heap memory. 452 453Miri doesn't yet support calling user-defined destructors, but it has most of the machinery in place 454to do so already. There \emph{is} support for dropping \rust{Box<T>} types, including deallocating 455their associated allocations. This is enough to properly execute the dangling pointer example in 456\autoref{sec:deterministic}. 457 458\subsection{Constants} 459 460Only basic integer, boolean, string, and byte-string literals are currently supported. Evaluating 461more complicated constant expressions in their current form would be a somewhat pointless exercise 462for Miri. Instead, we should lower constant expressions to MIR so Miri can run them directly, which 463is precisely what would need be done to use Miri as the compiler's constant evaluator. 464 465\subsection{Static variables} 466 467Miri doesn't currently support statics, but they would need support similar to constants. Also note 468that while it would be invalid to write to static (i.e.\ global) variables in Miri executions, it 469would probably be fine to allow reads. 470 471\subsection{Standard library} 472 473Throughout the implementation of the above features, I often followed this process: 474 475\begin{enumerate} 476 \item Try using a feature from the standard library. 477 \item See where Miri runs into stuff it can't handle. 478 \item Fix the problem. 479 \item Go to 1. 480\end{enumerate} 481 482At present, Miri supports a number of major non-trivial features from the standard library along 483with tons of minor features. Smart pointer types such as \rust{Box}, \rust{Rc}\footnote{Reference 484counted shared pointer} and \rust{Arc}\footnote{Atomically reference-counted thread-safe shared 485pointer} all seem to work. I've also tested using the shared smart pointer types with \rust{Cell} 486and \rust{RefCell}\footnote{\href{https://doc.rust-lang.org/stable/std/cell/index.html}{Rust 487documentation for cell types}} for internal mutability, and that works as well, although 488\rust{RefCell} can't ever be borrowed twice until I implement destructor calls, since a destructor 489is what releases the borrow. 490 491But the standard library collection I spent the most time on was \rust{Vec}, the standard 492dynamically-growable array type, similar to C++'s \texttt{std::vector} or Java's 493\texttt{java.util.ArrayList}. In Rust, \rust{Vec} is an extremely pervasive collection, so 494supporting it is a big win for supporting a larger swath of Rust programs in Miri. 495 496See \autoref{fig:vec} for an example (working in Miri today) of initializing a \rust{Vec} with a 497small amount of space on the heap and then pushing enough elements to force it to reallocate its 498data array. This involves cross-crate generic function calls, unsafe code using raw pointers, heap 499allocation, handling of uninitialized memory, compiler intrinsics, and more. 500 501\begin{figure}[t] 502 \begin{minted}[autogobble]{rust} 503 struct Vec<T> { 504 data: *mut T, // 4 byte pointer 505 capacity: usize, // 4 byte integer 506 length: usize, // 4 byte integer 507 } 508 509 let mut v: Vec<u8> = 510 Vec::with_capacity(2); 511 // v: 00 00 00 00 02 00 00 00 00 00 00 00 512 // └─(data)──┘ 513 // data: __ __ 514 515 v.push(1); 516 // v: 00 00 00 00 02 00 00 00 01 00 00 00 517 // └─(data)──┘ 518 // data: 01 __ 519 520 v.push(2); 521 // v: 00 00 00 00 02 00 00 00 02 00 00 00 522 // └─(data)──┘ 523 // data: 01 02 524 525 v.push(3); 526 // v: 00 00 00 00 04 00 00 00 03 00 00 00 527 // └─(data)──┘ 528 // data: 01 02 03 __ 529 \end{minted} 530 \caption{\rust{Vec} example on 32-bit little-endian} 531 \label{fig:vec} 532\end{figure} 533 534Miri supports unsafe operations on \rust{Vec} like \rust{v.set_len(10)} or 535\rust{v.get_unchecked(2)}, provided that such calls do no invoke undefined behaviour. If a call 536\emph{does} invoke undefined behaviour, Miri will abort with an appropriate error message (see 537\autoref{fig:vec-error}). 538 539\begin{figure}[t] 540 \begin{minted}[autogobble]{rust} 541 fn out_of_bounds() -> u8 { 542 let v = vec![1, 2]; 543 let p = unsafe { v.get_unchecked(5) }; 544 *p + 10 545 // ~~ error: pointer offset outside 546 // bounds of allocation 547 } 548 549 fn undefined_bytes() -> u8 { 550 let v = Vec::<u8>::with_capacity(10); 551 let p = unsafe { v.get_unchecked(5) }; 552 *p + 10 553 // ~~~~~~~ error: attempted to read 554 // undefined bytes 555 } 556 \end{minted} 557 \caption{\rust{Vec} examples with undefined behaviour} 558 \label{fig:vec-error} 559\end{figure} 560 561\newpage 562 563Here is one final code sample Miri can execute that demonstrates many features at once, including 564vectors, heap allocation, iterators, closures, raw pointers, and math: 565 566\begin{minted}[autogobble]{rust} 567 let x: u8 = vec![1, 2, 3, 4] 568 .into_iter() 569 .map(|x| x * x) 570 .fold(0, |x, y| x + y); 571 // x: 1e (that is, the hex value 572 // 0x1e = 30 = 1 + 4 + 9 + 16) 573\end{minted} 574 575%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 576 577\section{Future directions} 578 579\subsection{Finishing the implementation} 580 581There are a number of pressing items on my to-do list for Miri, including: 582 583\begin{itemize} 584 \item A much more comprehensive and automated test suite. 585 \item User-defined destructor calls. 586 \item Non-trivial casts between primitive types like integers and pointers. 587 \item Handling statics and global memory. 588 \item Reporting errors for all undefined behaviour.\footnote{\href{https://doc.rust-lang.org/reference.html\#behavior-considered-undefined}{The Rust reference on what is considered undefined behaviour}} 589 \item Function pointers. 590 \item Accounting for target machine primitive type alignment and endianness. 591 \item Optimizations (undefined byte masks, tail-calls). 592 \item Benchmarking Miri vs. unoptimized Rust. 593 \item Various \texttt{TODO}s and \texttt{FIXME}s left in the code. 594 \item Integrating into the compiler proper. 595\end{itemize} 596 597\subsection{Future projects} 598 599Other possible Miri-related projects include: 600 601\begin{itemize} 602 \item A read-eval-print-loop (REPL) for Rust, which may be easier to implement on top of Miri than 603 the usual LLVM back-end. 604 \item A graphical or text-mode debugger that steps through MIR execution one statement at a time, 605 for figuring out why some compile-time execution is raising an error or simply learning how Rust 606 works at a low level. 607 \item A less restricted version of Miri that is able to run foreign functions from C/C++ and 608 generally has full access to the operating system. Such an interpreter could be used to more 609 quickly prototype changes to the Rust language that would otherwise require changes to the LLVM 610 back-end. 611 \item Unit-testing the compiler by comparing the results of Miri's execution against the results 612 of LLVM-compiled machine code's execution. This would help to guarantee that compile-time 613 execution works the same as runtime execution. 614 \item Some kind of Miri-based symbolic evaluator that examines multiple possible code paths at 615 once to determine if undefined behaviour could be observed on any of them. 616\end{itemize} 617 618%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 619 620\section{Final thoughts} 621 622Writing an interpreter which models values of varying sizes, stack and heap allocation, unsafe 623memory operations, and more requires some unconventional techniques compared to conventional 624interpreters targeting dynamically-typed languages. However, aside from the somewhat complicated 625abstract memory model, making Miri work was primarily a software engineering problem, and not a 626particularly tricky one. This is a testament to MIR's suitability as an intermediate representation 627for Rust---removing enough unnecessary abstraction to keep it simple. For example, Miri doesn't even 628need to know that there are different kinds of loops, or how to match patterns in a \rust{match} 629expression. 630 631Another advantage to targeting MIR is that any new features at the syntax-level or type-level 632generally require little to no change in Miri. For example, when the new ``question mark'' syntax 633for error handling\footnote{ 634 \href{https://github.com/rust-lang/rfcs/blob/master/text/0243-trait-based-exception-handling.md} 635 {Question mark syntax RFC}} 636was added to rustc, Miri required no change to support it. 637When specialization\footnote{ 638 \href{https://github.com/rust-lang/rfcs/blob/master/text/1210-impl-specialization.md} 639 {Specialization RFC}} 640was added, Miri supported it with just minor changes to trait method lookup. 641 642Of course, Miri also has limitations. The inability to execute FFI and inline assembly reduces the 643amount of Rust programs Miri could ever execute. The good news is that in the constant evaluator, 644FFI can be stubbed out in cases where it makes sense, like I did with \rust{__rust_allocate}. For a 645version of Miri not intended for constant evaluation, it may be possible to use libffi to call C 646functions from the interpreter. 647 648In conclusion, Miri is a surprisingly effective project, and a lot of fun to implement. Due to MIR's 649tendency to collapse multiple source-level features into one, I often ended up supporting features I 650hadn't explicitly intended to. I am excited to work with the compiler team going forward to try to 651make Miri useful for constant evaluation in Rust. 652 653%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 654 655\section{Thanks} 656 657A big thanks goes to Eduard Burtescu for writing the abstract machine specification and answering my 658incessant questions on IRC, to Niko Matsakis for coming up with the idea for Miri and supporting my 659desire to work with the Rust compiler, and to my research supervisor Christopher Dutchyn. Thanks 660also to everyone else on the compiler team and on Mozilla IRC who helped me figure stuff out. 661Finally, thanks to Daniel Keep and everyone else who helped fix my numerous writing mistakes. 662 663\end{document} 664