• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

llvm/H03-May-2022-5,8713,913

MakefileH A D08-Nov-2021564 247

READMEH A D08-Nov-202112.7 KiB296233

jit.cH A D08-Nov-20215.4 KiB209108

README

1What is Just-in-Time Compilation?
2=================================
3
4Just-in-Time compilation (JIT) is the process of turning some form of
5interpreted program evaluation into a native program, and doing so at
6runtime.
7
8For example, instead of using a facility that can evaluate arbitrary
9SQL expressions to evaluate an SQL predicate like WHERE a.col = 3, it
10is possible to generate a function than can be natively executed by
11the CPU that just handles that expression, yielding a speedup.
12
13This is JIT, rather than ahead-of-time (AOT) compilation, because it
14is done at query execution time, and perhaps only in cases where the
15relevant task is repeated a number of times. Given the way JIT
16compilation is used in PostgreSQL, the lines between interpretation,
17AOT and JIT are somewhat blurry.
18
19Note that the interpreted program turned into a native program does
20not necessarily have to be a program in the classical sense. E.g. it
21is highly beneficial to JIT compile tuple deforming into a native
22function just handling a specific type of table, despite tuple
23deforming not commonly being understood as a "program".
24
25
26Why JIT?
27========
28
29Parts of PostgreSQL are commonly bottlenecked by comparatively small
30pieces of CPU intensive code. In a number of cases that is because the
31relevant code has to be very generic (e.g. handling arbitrary SQL
32level expressions, over arbitrary tables, with arbitrary extensions
33installed). This often leads to a large number of indirect jumps and
34unpredictable branches, and generally a high number of instructions
35for a given task. E.g. just evaluating an expression comparing a
36column in a database to an integer ends up needing several hundred
37cycles.
38
39By generating native code large numbers of indirect jumps can be
40removed by either making them into direct branches (e.g. replacing the
41indirect call to an SQL operator's implementation with a direct call
42to that function), or by removing it entirely (e.g. by evaluating the
43branch at compile time because the input is constant). Similarly a lot
44of branches can be entirely removed (e.g. by again evaluating the
45branch at compile time because the input is constant). The latter is
46particularly beneficial for removing branches during tuple deforming.
47
48
49How to JIT
50==========
51
52PostgreSQL, by default, uses LLVM to perform JIT. LLVM was chosen
53because it is developed by several large corporations and therefore
54unlikely to be discontinued, because it has a license compatible with
55PostgreSQL, and because its IR can be generated from C using the Clang
56compiler.
57
58
59Shared Library Separation
60-------------------------
61
62To avoid the main PostgreSQL binary directly depending on LLVM, which
63would prevent LLVM support being independently installed by OS package
64managers, the LLVM dependent code is located in a shared library that
65is loaded on-demand.
66
67An additional benefit of doing so is that it is relatively easy to
68evaluate JIT compilation that does not use LLVM, by changing out the
69shared library used to provide JIT compilation.
70
71To achieve this, code intending to perform JIT (e.g. expression evaluation)
72calls an LLVM independent wrapper located in jit.c to do so. If the
73shared library providing JIT support can be loaded (i.e. PostgreSQL was
74compiled with LLVM support and the shared library is installed), the task
75of JIT compiling an expression gets handed off to the shared library. This
76obviously requires that the function in jit.c is allowed to fail in case
77no JIT provider can be loaded.
78
79Which shared library is loaded is determined by the jit_provider GUC,
80defaulting to "llvmjit".
81
82Cloistering code performing JIT into a shared library unfortunately
83also means that code doing JIT compilation for various parts of code
84has to be located separately from the code doing so without
85JIT. E.g. the JIT version of execExprInterp.c is located in jit/llvm/
86rather than executor/.
87
88
89JIT Context
90-----------
91
92For performance and convenience reasons it is useful to allow JITed
93functions to be emitted and deallocated together. It is e.g. very
94common to create a number of functions at query initialization time,
95use them during query execution, and then deallocate all of them
96together at the end of the query.
97
98Lifetimes of JITed functions are managed via JITContext. Exactly one
99such context should be created for work in which all created JITed
100function should have the same lifetime. E.g. there's exactly one
101JITContext for each query executed, in the query's EState.  Only the
102release of a JITContext is exposed to the provider independent
103facility, as the creation of one is done on-demand by the JIT
104implementations.
105
106Emitting individual functions separately is more expensive than
107emitting several functions at once, and emitting them together can
108provide additional optimization opportunities. To facilitate that, the
109LLVM provider separates defining functions from optimizing and
110emitting functions in an executable manner.
111
112Creating functions into the current mutable module (a module
113essentially is LLVM's equivalent of a translation unit in C) is done
114using
115  extern LLVMModuleRef llvm_mutable_module(LLVMJitContext *context);
116in which it then can emit as much code using the LLVM APIs as it
117wants. Whenever a function actually needs to be called
118  extern void *llvm_get_function(LLVMJitContext *context, const char *funcname);
119returns a pointer to it.
120
121E.g. in the expression evaluation case this setup allows most
122functions in a query to be emitted during ExecInitNode(), delaying the
123function emission to the time the first time a function is actually
124used.
125
126
127Error Handling
128--------------
129
130There are two aspects of error handling.  Firstly, generated (LLVM IR)
131and emitted functions (mmap()ed segments) need to be cleaned up both
132after a successful query execution and after an error. This is done by
133registering each created JITContext with the current resource owner,
134and cleaning it up on error / end of transaction. If it is desirable
135to release resources earlier, jit_release_context() can be used.
136
137The second, less pretty, aspect of error handling is OOM handling
138inside LLVM itself. The above resowner based mechanism takes care of
139cleaning up emitted code upon ERROR, but there's also the chance that
140LLVM itself runs out of memory. LLVM by default does *not* use any C++
141exceptions. Its allocations are primarily funneled through the
142standard "new" handlers, and some direct use of malloc() and
143mmap(). For the former a 'new handler' exists:
144http://en.cppreference.com/w/cpp/memory/new/set_new_handler
145For the latter LLVM provides callbacks that get called upon failure
146(unfortunately mmap() failures are treated as fatal rather than OOM errors).
147What we've chosen to do for now is have two functions that LLVM using code
148must use:
149extern void llvm_enter_fatal_on_oom(void);
150extern void llvm_leave_fatal_on_oom(void);
151before interacting with LLVM code.
152
153When a libstdc++ new or LLVM error occurs, the handlers set up by the
154above functions trigger a FATAL error. We have to use FATAL rather
155than ERROR, as we *cannot* reliably throw ERROR inside a foreign
156library without risking corrupting its internal state.
157
158Users of the above sections do *not* have to use PG_TRY/CATCH blocks,
159the handlers instead are reset on toplevel sigsetjmp() level.
160
161Using a relatively small enter/leave protected section of code, rather
162than setting up these handlers globally, avoids negative interactions
163with extensions that might use C++ such as PostGIS. As LLVM code
164generation should never execute arbitrary code, just setting these
165handlers temporarily ought to suffice.
166
167
168Type Synchronization
169--------------------
170
171To be able to generate code that can perform tasks done by "interpreted"
172PostgreSQL, it obviously is required that code generation knows about at
173least a few PostgreSQL types.  While it is possible to inform LLVM about
174type definitions by recreating them manually in C code, that is failure
175prone and labor intensive.
176
177Instead there is one small file (llvmjit_types.c) which references each of
178the types required for JITing. That file is translated to bitcode at
179compile time, and loaded when LLVM is initialized in a backend.
180
181That works very well to synchronize the type definition, but unfortunately
182it does *not* synchronize offsets as the IR level representation doesn't
183know field names.  Instead, required offsets are maintained as defines in
184the original struct definition, like so:
185#define FIELDNO_TUPLETABLESLOT_NVALID 9
186        int                     tts_nvalid;             /* # of valid values in tts_values */
187While that still needs to be defined, it's only required for a
188relatively small number of fields, and it's bunched together with the
189struct definition, so it's easily kept synchronized.
190
191
192Inlining
193--------
194
195One big advantage of JITing expressions is that it can significantly
196reduce the overhead of PostgreSQL's extensible function/operator
197mechanism, by inlining the body of called functions/operators.
198
199It obviously is undesirable to maintain a second implementation of
200commonly used functions, just for inlining purposes. Instead we take
201advantage of the fact that the Clang compiler can emit LLVM IR.
202
203The ability to do so allows us to get the LLVM IR for all operators
204(e.g. int8eq, float8pl etc), without maintaining two copies.  These
205bitcode files get installed into the server's
206  $pkglibdir/bitcode/postgres/
207Using existing LLVM functionality (for parallel LTO compilation),
208additionally an index is over these is stored to
209$pkglibdir/bitcode/postgres.index.bc
210
211Similarly extensions can install code into
212  $pkglibdir/bitcode/[extension]/
213accompanied by
214  $pkglibdir/bitcode/[extension].index.bc
215
216just alongside the actual library.  An extension's index will be used
217to look up symbols when located in the corresponding shared
218library. Symbols that are used inside the extension, when inlined,
219will be first looked up in the main binary and then the extension's.
220
221
222Caching
223-------
224
225Currently it is not yet possible to cache generated functions, even
226though that'd be desirable from a performance point of view. The
227problem is that the generated functions commonly contain pointers into
228per-execution memory. The expression evaluation machinery needs to
229be redesigned a bit to avoid that. Basically all per-execution memory
230needs to be referenced as an offset to one block of memory stored in
231an ExprState, rather than absolute pointers into memory.
232
233Once that is addressed, adding an LRU cache that's keyed by the
234generated LLVM IR will allow the usage of optimized functions even for
235faster queries.
236
237A longer term project is to move expression compilation to the planner
238stage, allowing e.g. to tie compiled expressions to prepared
239statements.
240
241An even more advanced approach would be to use JIT with few
242optimizations initially, and build an optimized version in the
243background. But that's even further off.
244
245
246What to JIT
247===========
248
249Currently expression evaluation and tuple deforming are JITed. Those
250were chosen because they commonly are major CPU bottlenecks in
251analytics queries, but are by no means the only potentially beneficial cases.
252
253For JITing to be beneficial a piece of code first and foremost has to
254be a CPU bottleneck. But also importantly, JITing can only be
255beneficial if overhead can be removed by doing so. E.g. in the tuple
256deforming case the knowledge about the number of columns and their
257types can remove a significant number of branches, and in the
258expression evaluation case a lot of indirect jumps/calls can be
259removed.  If neither of these is the case, JITing is a waste of
260resources.
261
262Future avenues for JITing are tuple sorting, COPY parsing/output
263generation, and later compiling larger parts of queries.
264
265
266When to JIT
267===========
268
269Currently there are a number of GUCs that influence JITing:
270
271- jit_above_cost = -1, 0-DBL_MAX - all queries with a higher total cost
272  get JITed, *without* optimization (expensive part), corresponding to
273  -O0. This commonly already results in significant speedups if
274  expression/deforming is a bottleneck (removing dynamic branches
275  mostly).
276- jit_optimize_above_cost = -1, 0-DBL_MAX - all queries with a higher total cost
277  get JITed, *with* optimization (expensive part).
278- jit_inline_above_cost = -1, 0-DBL_MAX - inlining is tried if query has
279  higher cost.
280
281Whenever a query's total cost is above these limits, JITing is
282performed.
283
284Alternative costing models, e.g. by generating separate paths for
285parts of a query with lower cpu_* costs, are also a possibility, but
286it's doubtful the overhead of doing so is sufficient.  Another
287alternative would be to count the number of times individual
288expressions are estimated to be evaluated, and perform JITing of these
289individual expressions.
290
291The obvious seeming approach of JITing expressions individually after
292a number of execution turns out not to work too well. Primarily
293because emitting many small functions individually has significant
294overhead. Secondarily because the time until JITing occurs causes
295relative slowdowns that eat into the gain of JIT compilation.
296