1# Architecture 2 3This document describes the high-level architecture of rust-analyzer. 4If you want to familiarize yourself with the code base, you are just in the right place! 5 6You might also enjoy ["Explaining Rust Analyzer"](https://www.youtube.com/playlist?list=PLhb66M_x9UmrqXhQuIpWC5VgTdrGxMx3y) series on YouTube. 7It goes deeper than what is covered in this document, but will take some time to watch. 8 9See also these implementation-related blog posts: 10 11* https://rust-analyzer.github.io/blog/2019/11/13/find-usages.html 12* https://rust-analyzer.github.io/blog/2020/07/20/three-architectures-for-responsive-ide.html 13* https://rust-analyzer.github.io/blog/2020/09/16/challeging-LR-parsing.html 14* https://rust-analyzer.github.io/blog/2020/09/28/how-to-make-a-light-bulb.html 15* https://rust-analyzer.github.io/blog/2020/10/24/introducing-ungrammar.html 16 17For older, by now mostly outdated stuff, see the [guide](./guide.md) and [another playlist](https://www.youtube.com/playlist?list=PL85XCvVPmGQho7MZkdW-wtPtuJcFpzycE). 18 19 20## Bird's Eye View 21 22![](https://user-images.githubusercontent.com/4789492/107129398-0ab70f00-687a-11eb-9bfc-d4eb023aec06.png) 23 24On the highest level, rust-analyzer is a thing which accepts input source code from the client and produces a structured semantic model of the code. 25 26More specifically, input data consists of a set of test files (`(PathBuf, String)` pairs) and information about project structure, captured in the so called `CrateGraph`. 27The crate graph specifies which files are crate roots, which cfg flags are specified for each crate and what dependencies exist between the crates. 28This is the input (ground) state. 29The analyzer keeps all this input data in memory and never does any IO. 30Because the input data is source code, which typically measures in tens of megabytes at most, keeping everything in memory is OK. 31 32A "structured semantic model" is basically an object-oriented representation of modules, functions and types which appear in the source code. 33This representation is fully "resolved": all expressions have types, all references are bound to declarations, etc. 34This is derived state. 35 36The client can submit a small delta of input data (typically, a change to a single file) and get a fresh code model which accounts for changes. 37 38The underlying engine makes sure that model is computed lazily (on-demand) and can be quickly updated for small modifications. 39 40## Entry Points 41 42`crates/rust-analyzer/src/bin/main.rs` contains the main function which spawns LSP. 43This is *the* entry point, but it front-loads a lot of complexity, so it's fine to just skim through it. 44 45`crates/rust-analyzer/src/handlers.rs` implements all LSP requests and is a great place to start if you are already familiar with LSP. 46 47`Analysis` and `AnalysisHost` types define the main API for consumers of IDE services. 48 49## Code Map 50 51This section talks briefly about various important directories and data structures. 52Pay attention to the **Architecture Invariant** sections. 53They often talk about things which are deliberately absent in the source code. 54 55Note also which crates are **API Boundaries**. 56Remember, [rules at the boundary are different](https://www.tedinski.com/2018/02/06/system-boundaries.html). 57 58### `xtask` 59 60This is rust-analyzer's "build system". 61We use cargo to compile rust code, but there are also various other tasks, like release management or local installation. 62They are handled by Rust code in the xtask directory. 63 64### `editors/code` 65 66VS Code plugin. 67 68### `lib/` 69 70rust-analyzer independent libraries which we publish to crates.io. 71It's not heavily utilized at the moment. 72 73### `crates/parser` 74 75It is a hand-written recursive descent parser, which produces a sequence of events like "start node X", "finish node Y". 76It works similarly to 77[kotlin's parser](https://github.com/JetBrains/kotlin/blob/4d951de616b20feca92f3e9cc9679b2de9e65195/compiler/frontend/src/org/jetbrains/kotlin/parsing/KotlinParsing.java), 78which is a good source of inspiration for dealing with syntax errors and incomplete input. 79Original [libsyntax parser](https://github.com/rust-lang/rust/blob/6b99adeb11313197f409b4f7c4083c2ceca8a4fe/src/libsyntax/parse/parser.rs) is what we use for the definition of the Rust language. 80`TreeSink` and `TokenSource` traits bridge the tree-agnostic parser from `grammar` with `rowan` trees. 81 82**Architecture Invariant:** the parser is independent of the particular tree structure and particular representation of the tokens. 83It transforms one flat stream of events into another flat stream of events. 84Token independence allows us to parse out both text-based source code and `tt`-based macro input. 85Tree independence allows us to more easily vary the syntax tree implementation. 86It should also unlock efficient light-parsing approaches. 87For example, you can extract the set of names defined in a file (for typo correction) without building a syntax tree. 88 89**Architecture Invariant:** parsing never fails, the parser produces `(T, Vec<Error>)` rather than `Result<T, Error>`. 90 91### `crates/syntax` 92 93Rust syntax tree structure and parser. 94See [RFC](https://github.com/rust-lang/rfcs/pull/2256) and [./syntax.md](./syntax.md) for some design notes. 95 96- [rowan](https://github.com/rust-analyzer/rowan) library is used for constructing syntax trees. 97- `ast` provides a type safe API on top of the raw `rowan` tree. 98- `ungrammar` description of the grammar, which is used to generate `syntax_kinds` and `ast` modules, using `cargo test -p xtask` command. 99 100Tests for ra_syntax are mostly data-driven. 101`test_data/parser` contains subdirectories with a bunch of `.rs` (test vectors) and `.txt` files with corresponding syntax trees. 102During testing, we check `.rs` against `.txt`. 103If the `.txt` file is missing, it is created (this is how you update tests). 104Additionally, running the xtask test suite with `cargo test -p xtask` will walk the grammar module and collect all `// test test_name` comments into files inside `test_data/parser/inline` directory. 105 106To update test data, run with `UPDATE_EXPECT` variable: 107 108```bash 109env UPDATE_EXPECT=1 cargo qt 110``` 111 112After adding a new inline test you need to run `cargo test -p xtask` and also update the test data as described above. 113 114Note [`api_walkthrough`](https://github.com/rust-analyzer/rust-analyzer/blob/2fb6af89eb794f775de60b82afe56b6f986c2a40/crates/ra_syntax/src/lib.rs#L190-L348) 115in particular: it shows off various methods of working with syntax tree. 116 117See [#93](https://github.com/rust-analyzer/rust-analyzer/pull/93) for an example PR which fixes a bug in the grammar. 118 119**Architecture Invariant:** `syntax` crate is completely independent from the rest of rust-analyzer. It knows nothing about salsa or LSP. 120This is important because it is possible to make useful tooling using only the syntax tree. 121Without semantic information, you don't need to be able to _build_ code, which makes the tooling more robust. 122See also https://web.stanford.edu/~mlfbrown/paper.pdf. 123You can view the `syntax` crate as an entry point to rust-analyzer. 124`syntax` crate is an **API Boundary**. 125 126**Architecture Invariant:** syntax tree is a value type. 127The tree is fully determined by the contents of its syntax nodes, it doesn't need global context (like an interner) and doesn't store semantic info. 128Using the tree as a store for semantic info is convenient in traditional compilers, but doesn't work nicely in the IDE. 129Specifically, assists and refactors require transforming syntax trees, and that becomes awkward if you need to do something with the semantic info. 130 131**Architecture Invariant:** syntax tree is built for a single file. 132This is to enable parallel parsing of all files. 133 134**Architecture Invariant:** Syntax trees are by design incomplete and do not enforce well-formedness. 135If an AST method returns an `Option`, it *can* be `None` at runtime, even if this is forbidden by the grammar. 136 137### `crates/base_db` 138 139We use the [salsa](https://github.com/salsa-rs/salsa) crate for incremental and on-demand computation. 140Roughly, you can think of salsa as a key-value store, but it can also compute derived values using specified functions. 141The `base_db` crate provides basic infrastructure for interacting with salsa. 142Crucially, it defines most of the "input" queries: facts supplied by the client of the analyzer. 143Reading the docs of the `base_db::input` module should be useful: everything else is strictly derived from those inputs. 144 145**Architecture Invariant:** particularities of the build system are *not* the part of the ground state. 146In particular, `base_db` knows nothing about cargo. 147For example, `cfg` flags are a part of `base_db`, but `feature`s are not. 148A `foo` feature is a Cargo-level concept, which is lowered by Cargo to `--cfg feature=foo` argument on the command line. 149The `CrateGraph` structure is used to represent the dependencies between the crates abstractly. 150 151**Architecture Invariant:** `base_db` doesn't know about file system and file paths. 152Files are represented with opaque `FileId`, there's no operation to get an `std::path::Path` out of the `FileId`. 153 154### `crates/hir_expand`, `crates/hir_def`, `crates/hir_ty` 155 156These crates are the *brain* of rust-analyzer. 157This is the compiler part of the IDE. 158 159`hir_xxx` crates have a strong [ECS](https://en.wikipedia.org/wiki/Entity_component_system) flavor, in that they work with raw ids and directly query the database. 160There's little abstraction here. 161These crates integrate deeply with salsa and chalk. 162 163Name resolution, macro expansion and type inference all happen here. 164These crates also define various intermediate representations of the core. 165 166`ItemTree` condenses a single `SyntaxTree` into a "summary" data structure, which is stable over modifications to function bodies. 167 168`DefMap` contains the module tree of a crate and stores module scopes. 169 170`Body` stores information about expressions. 171 172**Architecture Invariant:** these crates are not, and will never be, an api boundary. 173 174**Architecture Invariant:** these crates explicitly care about being incremental. 175The core invariant we maintain is "typing inside a function's body never invalidates global derived data". 176i.e., if you change the body of `foo`, all facts about `bar` should remain intact. 177 178**Architecture Invariant:** hir exists only in context of particular crate instance with specific CFG flags. 179The same syntax may produce several instances of HIR if the crate participates in the crate graph more than once. 180 181### `crates/hir` 182 183The top-level `hir` crate is an **API Boundary**. 184If you think about "using rust-analyzer as a library", `hir` crate is most likely the façade you'll be talking to. 185 186It wraps ECS-style internal API into a more OO-flavored API (with an extra `db` argument for each call). 187 188**Architecture Invariant:** `hir` provides a static, fully resolved view of the code. 189While internal `hir_*` crates _compute_ things, `hir`, from the outside, looks like an inert data structure. 190 191`hir` also handles the delicate task of going from syntax to the corresponding `hir`. 192Remember that the mapping here is one-to-many. 193See `Semantics` type and `source_to_def` module. 194 195Note in particular a curious recursive structure in `source_to_def`. 196We first resolve the parent _syntax_ node to the parent _hir_ element. 197Then we ask the _hir_ parent what _syntax_ children does it have. 198Then we look for our node in the set of children. 199 200This is the heart of many IDE features, like goto definition, which start with figuring out the hir node at the cursor. 201This is some kind of (yet unnamed) uber-IDE pattern, as it is present in Roslyn and Kotlin as well. 202 203### `crates/ide` 204 205The `ide` crate builds on top of `hir` semantic model to provide high-level IDE features like completion or goto definition. 206It is an **API Boundary**. 207If you want to use IDE parts of rust-analyzer via LSP, custom flatbuffers-based protocol or just as a library in your text editor, this is the right API. 208 209**Architecture Invariant:** `ide` crate's API is build out of POD types with public fields. 210The API uses editor's terminology, it talks about offsets and string labels rather than in terms of definitions or types. 211It is effectively the view in MVC and viewmodel in [MVVM](https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93viewmodel). 212All arguments and return types are conceptually serializable. 213In particular, syntax trees and hir types are generally absent from the API (but are used heavily in the implementation). 214Shout outs to LSP developers for popularizing the idea that "UI" is a good place to draw a boundary at. 215 216`ide` is also the first crate which has the notion of change over time. 217`AnalysisHost` is a state to which you can transactionally `apply_change`. 218`Analysis` is an immutable snapshot of the state. 219 220Internally, `ide` is split across several crates. `ide_assists`, `ide_completion` and `ide_ssr` implement large isolated features. 221`ide_db` implements common IDE functionality (notably, reference search is implemented here). 222The `ide` contains a public API/façade, as well as implementation for a plethora of smaller features. 223 224**Architecture Invariant:** `ide` crate strives to provide a _perfect_ API. 225Although at the moment it has only one consumer, the LSP server, LSP *does not* influence its API design. 226Instead, we keep in mind a hypothetical _ideal_ client -- an IDE tailored specifically for rust, every nook and cranny of which is packed with Rust-specific goodies. 227 228### `crates/rust-analyzer` 229 230This crate defines the `rust-analyzer` binary, so it is the **entry point**. 231It implements the language server. 232 233**Architecture Invariant:** `rust-analyzer` is the only crate that knows about LSP and JSON serialization. 234If you want to expose a data structure `X` from ide to LSP, don't make it serializable. 235Instead, create a serializable counterpart in `rust-analyzer` crate and manually convert between the two. 236 237`GlobalState` is the state of the server. 238The `main_loop` defines the server event loop which accepts requests and sends responses. 239Requests that modify the state or might block user's typing are handled on the main thread. 240All other requests are processed in background. 241 242**Architecture Invariant:** the server is stateless, a-la HTTP. 243Sometimes state needs to be preserved between requests. 244For example, "what is the `edit` for the fifth completion item of the last completion edit?". 245For this, the second request should include enough info to re-create the context from scratch. 246This generally means including all the parameters of the original request. 247 248`reload` module contains the code that handles configuration and Cargo.toml changes. 249This is a tricky business. 250 251**Architecture Invariant:** `rust-analyzer` should be partially available even when the build is broken. 252Reloading process should not prevent IDE features from working. 253 254### `crates/toolchain`, `crates/project_model`, `crates/flycheck` 255 256These crates deal with invoking `cargo` to learn about project structure and get compiler errors for the "check on save" feature. 257 258They use `crates/path` heavily instead of `std::path`. 259A single `rust-analyzer` process can serve many projects, so it is important that server's current directory does not leak. 260 261### `crates/mbe`, `crates/tt`, `crates/proc_macro_api`, `crates/proc_macro_srv` 262 263These crates implement macros as token tree -> token tree transforms. 264They are independent from the rest of the code. 265 266`tt` crate defined `TokenTree`, a single token or a delimited sequence of token trees. 267`mbe` crate contains tools for transforming between syntax trees and token tree. 268And it also handles the actual parsing and expansion of declarative macro (a-la "Macros By Example" or mbe). 269 270For proc macros, the client-server model are used. 271We pass an argument `--proc-macro` to `rust-analyzer` binary to start a separate process (`proc_macro_srv`). 272And the client (`proc_macro_api`) provides an interface to talk to that server separately. 273 274And then token trees are passed from client, and the server will load the corresponding dynamic library (which built by `cargo`). 275And due to the fact the api for getting result from proc macro are always unstable in `rustc`, 276we maintain our own copy (and paste) of that part of code to allow us to build the whole thing in stable rust. 277 278 **Architecture Invariant:** 279Bad proc macros may panic or segfault accidentally. So we run it in another process and recover it from fatal error. 280And they may be non-deterministic which conflict how `salsa` works, so special attention is required. 281 282### `crates/cfg` 283 284This crate is responsible for parsing, evaluation and general definition of `cfg` attributes. 285 286### `crates/vfs`, `crates/vfs-notify` 287 288These crates implement a virtual file system. 289They provide consistent snapshots of the underlying file system and insulate messy OS paths. 290 291**Architecture Invariant:** vfs doesn't assume a single unified file system. 292i.e., a single rust-analyzer process can act as a remote server for two different machines, where the same `/tmp/foo.rs` path points to different files. 293For this reason, all path APIs generally take some existing path as a "file system witness". 294 295### `crates/stdx` 296 297This crate contains various non-rust-analyzer specific utils, which could have been in std, as well 298as copies of unstable std items we would like to make use of already, like `std::str::split_once`. 299 300### `crates/profile` 301 302This crate contains utilities for CPU and memory profiling. 303 304 305## Cross-Cutting Concerns 306 307This sections talks about the things which are everywhere and nowhere in particular. 308 309### Code generation 310 311Some components in this repository are generated through automatic processes. 312Generated code is updated automatically on `cargo test`. 313Generated code is generally committed to the git repository. 314 315In particular, we generate: 316 317* API for working with syntax trees (`syntax::ast`, the [`ungrammar`](https://github.com/rust-analyzer/ungrammar) crate). 318* Various sections of the manual: 319 320 * features 321 * assists 322 * config 323 324* Documentation tests for assists 325 326See the `sourcegen` crate for details. 327 328**Architecture Invariant:** we avoid bootstrapping. 329For codegen we need to parse Rust code. 330Using rust-analyzer for that would work and would be fun, but it would also complicate the build process a lot. 331For that reason, we use syn and manual string parsing. 332 333### Cancellation 334 335Let's say that the IDE is in the process of computing syntax highlighting, when the user types `foo`. 336What should happen? 337`rust-analyzer`s answer is that the highlighting process should be cancelled -- its results are now stale, and it also blocks modification of the inputs. 338 339The salsa database maintains a global revision counter. 340When applying a change, salsa bumps this counter and waits until all other threads using salsa finish. 341If a thread does salsa-based computation and notices that the counter is incremented, it panics with a special value (see `Canceled::throw`). 342That is, rust-analyzer requires unwinding. 343 344`ide` is the boundary where the panic is caught and transformed into a `Result<T, Cancelled>`. 345 346### Testing 347 348Rust Analyzer has three interesting [system boundaries](https://www.tedinski.com/2018/04/10/making-tests-a-positive-influence-on-design.html) to concentrate tests on. 349 350The outermost boundary is the `rust-analyzer` crate, which defines an LSP interface in terms of stdio. 351We do integration testing of this component, by feeding it with a stream of LSP requests and checking responses. 352These tests are known as "heavy", because they interact with Cargo and read real files from disk. 353For this reason, we try to avoid writing too many tests on this boundary: in a statically typed language, it's hard to make an error in the protocol itself if messages are themselves typed. 354Heavy tests are only run when `RUN_SLOW_TESTS` env var is set. 355 356The middle, and most important, boundary is `ide`. 357Unlike `rust-analyzer`, which exposes API, `ide` uses Rust API and is intended for use by various tools. 358A typical test creates an `AnalysisHost`, calls some `Analysis` functions and compares the results against expectation. 359 360The innermost and most elaborate boundary is `hir`. 361It has a much richer vocabulary of types than `ide`, but the basic testing setup is the same: we create a database, run some queries, assert result. 362 363For comparisons, we use the `expect` crate for snapshot testing. 364 365To test various analysis corner cases and avoid forgetting about old tests, we use so-called marks. 366See the `marks` module in the `test_utils` crate for more. 367 368**Architecture Invariant:** rust-analyzer tests do not use libcore or libstd. 369All required library code must be a part of the tests. 370This ensures fast test execution. 371 372**Architecture Invariant:** tests are data driven and do not test the API. 373Tests which directly call various API functions are a liability, because they make refactoring the API significantly more complicated. 374So most of the tests look like this: 375 376```rust 377#[track_caller] 378fn check(input: &str, expect: expect_test::Expect) { 379 // The single place that actually exercises a particular API 380} 381 382#[test] 383fn foo() { 384 check("foo", expect![["bar"]]); 385} 386 387#[test] 388fn spam() { 389 check("spam", expect![["eggs"]]); 390} 391// ...and a hundred more tests that don't care about the specific API at all. 392``` 393 394To specify input data, we use a single string literal in a special format, which can describe a set of rust files. 395See the `Fixture` its module for fixture examples and documentation. 396 397**Architecture Invariant:** all code invariants are tested by `#[test]` tests. 398There's no additional checks in CI, formatting and tidy tests are run with `cargo test`. 399 400**Architecture Invariant:** tests do not depend on any kind of external resources, they are perfectly reproducible. 401 402 403### Performance Testing 404 405TBA, take a look at the `metrics` xtask and `#[test] fn benchmark_xxx()` functions. 406 407### Error Handling 408 409**Architecture Invariant:** core parts of rust-analyzer (`ide`/`hir`) don't interact with the outside world and thus can't fail. 410Only parts touching LSP are allowed to do IO. 411 412Internals of rust-analyzer need to deal with broken code, but this is not an error condition. 413rust-analyzer is robust: various analysis compute `(T, Vec<Error>)` rather than `Result<T, Error>`. 414 415rust-analyzer is a complex long-running process. 416It will always have bugs and panics. 417But a panic in an isolated feature should not bring down the whole process. 418Each LSP-request is protected by a `catch_unwind`. 419We use `always` and `never` macros instead of `assert` to gracefully recover from impossible conditions. 420 421### Observability 422 423rust-analyzer is a long-running process, so it is important to understand what's going on inside. 424We have several instruments for that. 425 426The event loop that runs rust-analyzer is very explicit. 427Rather than spawning futures or scheduling callbacks (open), the event loop accepts an `enum` of possible events (closed). 428It's easy to see all the things that trigger rust-analyzer processing, together with their performance 429 430rust-analyzer includes a simple hierarchical profiler (`hprof`). 431It is enabled with `RA_PROFILE='*>50'` env var (log all (`*`) actions which take more than `50` ms) and produces output like: 432 433``` 43485ms - handle_completion 435 68ms - import_on_the_fly 436 67ms - import_assets::search_for_relative_paths 437 0ms - crate_def_map:wait (804 calls) 438 0ms - find_path (16 calls) 439 2ms - find_similar_imports (1 calls) 440 0ms - generic_params_query (334 calls) 441 59ms - trait_solve_query (186 calls) 442 0ms - Semantics::analyze_impl (1 calls) 443 1ms - render_resolution (8 calls) 444 0ms - Semantics::analyze_impl (5 calls) 445``` 446 447This is cheap enough to enable in production. 448 449 450Similarly, we save live object counting (`RA_COUNT=1`). 451It is not cheap enough to enable in prod, and this is a bug which should be fixed. 452 453### Configurability 454 455rust-analyzer strives to be as configurable as possible while offering reasonable defaults where no configuration exists yet. 456There will always be features that some people find more annoying than helpful, so giving the users the ability to tweak or disable these is a big part of offering a good user experience. 457Mind the code--architecture gap: at the moment, we are using fewer feature flags than we really should. 458 459### Serialization 460 461In Rust, it is easy (often too easy) to add serialization to any type by adding `#[derive(Serialize)]`. 462This easiness is misleading -- serializable types impose significant backwards compatability constraints. 463If a type is serializable, then it is a part of some IPC boundary. 464You often don't control the other side of this boundary, so changing serializable types is hard. 465 466For this reason, the types in `ide`, `base_db` and below are not serializable by design. 467If such types need to cross an IPC boundary, then the client of rust-analyzer needs to provide custom, client-specific serialization format. 468This isolates backwards compatibility and migration concerns to a specific client. 469 470For example, `rust-project.json` is it's own format -- it doesn't include `CrateGraph` as is. 471Instead, it creates a `CrateGraph` by calling appropriate constructing functions. 472