1# The Query Evaluation Model in Detail 2 3<!-- toc --> 4 5This chapter provides a deeper dive into the abstract model queries are built on. 6It does not go into implementation details but tries to explain 7the underlying logic. The examples here, therefore, have been stripped down and 8simplified and don't directly reflect the compilers internal APIs. 9 10## What is a query? 11 12Abstractly we view the compiler's knowledge about a given crate as a "database" 13and queries are the way of asking the compiler questions about it, i.e. 14we "query" the compiler's "database" for facts. 15 16However, there's something special to this compiler database: It starts out empty 17and is filled on-demand when queries are executed. Consequently, a query must 18know how to compute its result if the database does not contain it yet. For 19doing so, it can access other queries and certain input values that the database 20is pre-filled with on creation. 21 22A query thus consists of the following things: 23 24 - A name that identifies the query 25 - A "key" that specifies what we want to look up 26 - A result type that specifies what kind of result it yields 27 - A "provider" which is a function that specifies how the result is to be 28 computed if it isn't already present in the database. 29 30As an example, the name of the `type_of` query is `type_of`, its query key is a 31`DefId` identifying the item we want to know the type of, the result type is 32`Ty<'tcx>`, and the provider is a function that, given the query key and access 33to the rest of the database, can compute the type of the item identified by the 34key. 35 36So in some sense a query is just a function that maps the query key to the 37corresponding result. However, we have to apply some restrictions in order for 38this to be sound: 39 40 - The key and result must be immutable values. 41 - The provider function must be a pure function in the sense that for the same 42 key it must always yield the same result. 43 - The only parameters a provider function takes are the key and a reference to 44 the "query context" (which provides access to the rest of the "database"). 45 46The database is built up lazily by invoking queries. The query providers will 47invoke other queries, for which the result is either already cached or computed 48by calling another query provider. These query provider invocations 49conceptually form a directed acyclic graph (DAG) at the leaves of which are 50input values that are already known when the query context is created. 51 52 53 54## Caching/Memoization 55 56Results of query invocations are "memoized" which means that the query context 57will cache the result in an internal table and, when the query is invoked with 58the same query key again, will return the result from the cache instead of 59running the provider again. 60 61This caching is crucial for making the query engine efficient. Without 62memoization the system would still be sound (that is, it would yield the same 63results) but the same computations would be done over and over again. 64 65Memoization is one of the main reasons why query providers have to be pure 66functions. If calling a provider function could yield different results for 67each invocation (because it accesses some global mutable state) then we could 68not memoize the result. 69 70 71 72## Input data 73 74When the query context is created, it is still empty: No queries have been 75executed, no results are cached. But the context already provides access to 76"input" data, i.e. pieces of immutable data that were computed before the 77context was created and that queries can access to do their computations. 78 79As of <!-- date: 2021-01 --> January 2021, this input data consists mainly of 80the HIR map, upstream crate metadata, and the command-line options the compiler 81was invoked with; but in the future inputs will just consist of command-line 82options and a list of source files -- the HIR map will itself be provided by a 83query which processes these source files. 84 85Without inputs, queries would live in a void without anything to compute their 86result from (remember, query providers only have access to other queries and 87the context but not any other outside state or information). 88 89For a query provider, input data and results of other queries look exactly the 90same: It just tells the context "give me the value of X". Because input data 91is immutable, the provider can rely on it being the same across 92different query invocations, just as is the case for query results. 93 94 95 96## An example execution trace of some queries 97 98How does this DAG of query invocations come into existence? At some point 99the compiler driver will create the, as yet empty, query context. It will then, 100from outside of the query system, invoke the queries it needs to perform its 101task. This looks something like the following: 102 103```rust,ignore 104fn compile_crate() { 105 let cli_options = ...; 106 let hir_map = ...; 107 108 // Create the query context `tcx` 109 let tcx = TyCtxt::new(cli_options, hir_map); 110 111 // Do type checking by invoking the type check query 112 tcx.type_check_crate(); 113} 114``` 115 116The `type_check_crate` query provider would look something like the following: 117 118```rust,ignore 119fn type_check_crate_provider(tcx, _key: ()) { 120 let list_of_hir_items = tcx.hir_map.list_of_items(); 121 122 for item_def_id in list_of_hir_items { 123 tcx.type_check_item(item_def_id); 124 } 125} 126``` 127 128We see that the `type_check_crate` query accesses input data 129(`tcx.hir_map.list_of_items()`) and invokes other queries 130(`type_check_item`). The `type_check_item` 131invocations will themselves access input data and/or invoke other queries, 132so that in the end the DAG of query invocations will be built up backwards 133from the node that was initially executed: 134 135```ignore 136 (2) (1) 137 list_of_all_hir_items <----------------------------- type_check_crate() 138 | 139 (5) (4) (3) | 140 Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+ 141 | | 142 +-----------------+ | 143 | | 144 (7) v (6) (8) | 145 Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+ 146 147// (x) denotes invocation order 148``` 149 150We also see that often a query result can be read from the cache: 151`type_of(bar)` was computed for `type_check_item(foo)` so when 152`type_check_item(bar)` needs it, it is already in the cache. 153 154Query results stay cached in the query context as long as the context lives. 155So if the compiler driver invoked another query later on, the above graph 156would still exist and already executed queries would not have to be re-done. 157 158 159 160## Cycles 161 162Earlier we stated that query invocations form a DAG. However, it would be easy 163to form a cyclic graph by, for example, having a query provider like the 164following: 165 166```rust,ignore 167fn cyclic_query_provider(tcx, key) -> u32 { 168 // Invoke the same query with the same key again 169 tcx.cyclic_query(key) 170} 171``` 172 173Since query providers are regular functions, this would behave much as expected: 174Evaluation would get stuck in an infinite recursion. A query like this would not 175be very useful either. However, sometimes certain kinds of invalid user input 176can result in queries being called in a cyclic way. The query engine includes 177a check for cyclic invocations and, because cycles are an irrecoverable error, 178will abort execution with a "cycle error" messages that tries to be human 179readable. 180 181At some point the compiler had a notion of "cycle recovery", that is, one could 182"try" to execute a query and if it ended up causing a cycle, proceed in some 183other fashion. However, this was later removed because it is not entirely 184clear what the theoretical consequences of this are, especially regarding 185incremental compilation. 186 187 188## "Steal" Queries 189 190Some queries have their result wrapped in a `Steal<T>` struct. These queries 191behave exactly the same as regular with one exception: Their result is expected 192to be "stolen" out of the cache at some point, meaning some other part of the 193program is taking ownership of it and the result cannot be accessed anymore. 194 195This stealing mechanism exists purely as a performance optimization because some 196result values are too costly to clone (e.g. the MIR of a function). It seems 197like result stealing would violate the condition that query results must be 198immutable (after all we are moving the result value out of the cache) but it is 199OK as long as the mutation is not observable. This is achieved by two things: 200 201- Before a result is stolen, we make sure to eagerly run all queries that 202 might ever need to read that result. This has to be done manually by calling 203 those queries. 204- Whenever a query tries to access a stolen result, we make the compiler ICE so 205 that such a condition cannot go unnoticed. 206 207This is not an ideal setup because of the manual intervention needed, so it 208should be used sparingly and only when it is well known which queries might 209access a given result. In practice, however, stealing has not turned out to be 210much of a maintenance burden. 211 212To summarize: "Steal queries" break some of the rules in a controlled way. 213There are checks in place that make sure that nothing can go silently wrong. 214