1# The Query Evaluation Model in Detail
2
3<!-- toc -->
4
5This chapter provides a deeper dive into the abstract model queries are built on.
6It does not go into implementation details but tries to explain
7the underlying logic. The examples here, therefore, have been stripped down and
8simplified and don't directly reflect the compilers internal APIs.
9
10## What is a query?
11
12Abstractly we view the compiler's knowledge about a given crate as a "database"
13and queries are the way of asking the compiler questions about it, i.e.
14we "query" the compiler's "database" for facts.
15
16However, there's something special to this compiler database: It starts out empty
17and is filled on-demand when queries are executed. Consequently, a query must
18know how to compute its result if the database does not contain it yet. For
19doing so, it can access other queries and certain input values that the database
20is pre-filled with on creation.
21
22A query thus consists of the following things:
23
24 - A name that identifies the query
25 - A "key" that specifies what we want to look up
26 - A result type that specifies what kind of result it yields
27 - A "provider" which is a function that specifies how the result is to be
28   computed if it isn't already present in the database.
29
30As an example, the name of the `type_of` query is `type_of`, its query key is a
31`DefId` identifying the item we want to know the type of, the result type is
32`Ty<'tcx>`, and the provider is a function that, given the query key and access
33to the rest of the database, can compute the type of the item identified by the
34key.
35
36So in some sense a query is just a function that maps the query key to the
37corresponding result. However, we have to apply some restrictions in order for
38this to be sound:
39
40 - The key and result must be immutable values.
41 - The provider function must be a pure function in the sense that for the same
42   key it must always yield the same result.
43 - The only parameters a provider function takes are the key and a reference to
44   the "query context" (which provides access to the rest of the "database").
45
46The database is built up lazily by invoking queries. The query providers will
47invoke other queries, for which the result is either already cached or computed
48by calling another query provider. These query provider invocations
49conceptually form a directed acyclic graph (DAG) at the leaves of which are
50input values that are already known when the query context is created.
51
52
53
54## Caching/Memoization
55
56Results of query invocations are "memoized" which means that the query context
57will cache the result in an internal table and, when the query is invoked with
58the same query key again, will return the result from the cache instead of
59running the provider again.
60
61This caching is crucial for making the query engine efficient. Without
62memoization the system would still be sound (that is, it would yield the same
63results) but the same computations would be done over and over again.
64
65Memoization is one of the main reasons why query providers have to be pure
66functions. If calling a provider function could yield different results for
67each invocation (because it accesses some global mutable state) then we could
68not memoize the result.
69
70
71
72## Input data
73
74When the query context is created, it is still empty: No queries have been
75executed, no results are cached. But the context already provides access to
76"input" data, i.e. pieces of immutable data that were computed before the
77context was created and that queries can access to do their computations.
78
79As of <!-- date: 2021-01 --> January 2021, this input data consists mainly of
80the HIR map, upstream crate metadata, and the command-line options the compiler
81was invoked with; but in the future inputs will just consist of command-line
82options and a list of source files -- the HIR map will itself be provided by a
83query which processes these source files.
84
85Without inputs, queries would live in a void without anything to compute their
86result from (remember, query providers only have access to other queries and
87the context but not any other outside state or information).
88
89For a query provider, input data and results of other queries look exactly the
90same: It just tells the context "give me the value of X". Because input data
91is immutable, the provider can rely on it being the same across
92different query invocations, just as is the case for query results.
93
94
95
96## An example execution trace of some queries
97
98How does this DAG of query invocations come into existence? At some point
99the compiler driver will create the, as yet empty, query context. It will then,
100from outside of the query system, invoke the queries it needs to perform its
101task. This looks something like the following:
102
103```rust,ignore
104fn compile_crate() {
105    let cli_options = ...;
106    let hir_map = ...;
107
108    // Create the query context `tcx`
109    let tcx = TyCtxt::new(cli_options, hir_map);
110
111    // Do type checking by invoking the type check query
112    tcx.type_check_crate();
113}
114```
115
116The `type_check_crate` query provider would look something like the following:
117
118```rust,ignore
119fn type_check_crate_provider(tcx, _key: ()) {
120    let list_of_hir_items = tcx.hir_map.list_of_items();
121
122    for item_def_id in list_of_hir_items {
123        tcx.type_check_item(item_def_id);
124    }
125}
126```
127
128We see that the `type_check_crate` query accesses input data
129(`tcx.hir_map.list_of_items()`) and invokes other queries
130(`type_check_item`). The `type_check_item`
131invocations will themselves access input data and/or invoke other queries,
132so that in the end the DAG of query invocations will be built up backwards
133from the node that was initially executed:
134
135```ignore
136         (2)                                                 (1)
137  list_of_all_hir_items <----------------------------- type_check_crate()
138                                                               |
139    (5)             (4)                  (3)                   |
140  Hir(foo) <--- type_of(foo) <--- type_check_item(foo) <-------+
141                                      |                        |
142                    +-----------------+                        |
143                    |                                          |
144    (7)             v  (6)                  (8)                |
145  Hir(bar) <--- type_of(bar) <--- type_check_item(bar) <-------+
146
147// (x) denotes invocation order
148```
149
150We also see that often a query result can be read from the cache:
151`type_of(bar)` was computed for `type_check_item(foo)` so when
152`type_check_item(bar)` needs it, it is already in the cache.
153
154Query results stay cached in the query context as long as the context lives.
155So if the compiler driver invoked another query later on, the above graph
156would still exist and already executed queries would not have to be re-done.
157
158
159
160## Cycles
161
162Earlier we stated that query invocations form a DAG. However, it would be easy
163to form a cyclic graph by, for example, having a query provider like the
164following:
165
166```rust,ignore
167fn cyclic_query_provider(tcx, key) -> u32 {
168  // Invoke the same query with the same key again
169  tcx.cyclic_query(key)
170}
171```
172
173Since query providers are regular functions, this would behave much as expected:
174Evaluation would get stuck in an infinite recursion. A query like this would not
175be very useful either. However, sometimes certain kinds of invalid user input
176can result in queries being called in a cyclic way. The query engine includes
177a check for cyclic invocations and, because cycles are an irrecoverable error,
178will abort execution with a "cycle error" messages that tries to be human
179readable.
180
181At some point the compiler had a notion of "cycle recovery", that is, one could
182"try" to execute a query and if it ended up causing a cycle, proceed in some
183other fashion. However, this was later removed because it is not entirely
184clear what the theoretical consequences of this are, especially regarding
185incremental compilation.
186
187
188## "Steal" Queries
189
190Some queries have their result wrapped in a `Steal<T>` struct. These queries
191behave exactly the same as regular with one exception: Their result is expected
192to be "stolen" out of the cache at some point, meaning some other part of the
193program is taking ownership of it and the result cannot be accessed anymore.
194
195This stealing mechanism exists purely as a performance optimization because some
196result values are too costly to clone (e.g. the MIR of a function). It seems
197like result stealing would violate the condition that query results must be
198immutable (after all we are moving the result value out of the cache) but it is
199OK as long as the mutation is not observable. This is achieved by two things:
200
201- Before a result is stolen, we make sure to eagerly run all queries that
202  might ever need to read that result. This has to be done manually by calling
203  those queries.
204- Whenever a query tries to access a stolen result, we make the compiler ICE so
205  that such a condition cannot go unnoticed.
206
207This is not an ideal setup because of the manual intervention needed, so it
208should be used sparingly and only when it is well known which queries might
209access a given result. In practice, however, stealing has not turned out to be
210much of a maintenance burden.
211
212To summarize: "Steal queries" break some of the rules in a controlled way.
213There are checks in place that make sure that nothing can go silently wrong.
214