• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..04-Jun-2020-

crates/H04-Jun-2020-7,0615,063

examples/H03-May-2022-3020

src/H04-Jun-2020-5,0323,949

Cargo.tomlH A D04-Jun-2020677 1815

LICENSEH A D04-Jun-202012 KiB221182

README.mdH A D04-Jun-202013.5 KiB424309

README.md

1<div align="center">
2  <h1><code>peepmatic</code></h1>
3
4  <p>
5    <b>
6      <code>peepmatic</code> is a DSL and compiler for peephole optimizers for
7      <a href="https://github.com/bytecodealliance/wasmtime/tree/master/cranelift#readme">Cranelift</a>.
8    </b>
9  </p>
10
11  <img src="https://github.com/fitzgen/peepmatic/workflows/Rust/badge.svg"/>
12
13</div>
14
15<!-- START doctoc generated TOC please keep comment here to allow auto update -->
16<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
17
18
19- [About](#about)
20- [Example](#example)
21- [A DSL for Optimizations](#a-dsl-for-optimizations)
22  - [Variables](#variables)
23  - [Constants](#constants)
24  - [Nested Patterns](#nested-patterns)
25  - [Preconditions and Unquoting](#preconditions-and-unquoting)
26  - [Bit Widths](#bit-widths)
27- [Implementation](#implementation)
28  - [Parsing](#parsing)
29  - [Type Checking](#type-checking)
30  - [Linearization](#linearization)
31  - [Automatization](#automatization)
32- [References](#references)
33- [Acknowledgments](#acknowledgments)
34
35<!-- END doctoc generated TOC please keep comment here to allow auto update -->
36
37## About
38
39Peepmatic is a DSL for peephole optimizations and compiler for generating
40peephole optimizers from them. The user writes a set of optimizations in the
41DSL, and then `peepmatic` compiles the set of optimizations into an efficient
42peephole optimizer:
43
44```
45DSL ----peepmatic----> Peephole Optimizer
46```
47
48The generated peephole optimizer has all of its optimizations' left-hand sides
49collapsed into a compact automata that makes matching candidate instruction
50sequences fast.
51
52The DSL's optimizations may be written by hand or discovered mechanically with a
53superoptimizer like [Souper][]. Eventually, `peepmatic` should have a verifier
54that ensures that the DSL's optimizations are sound, similar to what [Alive][]
55does for LLVM optimizations.
56
57Currently, `peepmatic` is targeting peephole optimizers that operate on
58Cranelift's clif intermediate representation. The intended next target is
59Cranelift's new backend's "vcode" intermediate representation. Supporting
60non-Cranelift targets is not a goal.
61
62[Cranelift]: https://github.com/bytecodealliance/wasmtime/tree/master/cranelift#readme
63[Souper]: https://github.com/google/souper
64[Alive]: https://github.com/AliveToolkit/alive2
65
66## Example
67
68This snippet of our DSL describes optimizations for removing redundant
69bitwise-or instructions that are no-ops:
70
71```lisp
72(=> (bor $x (bor $x $y))
73    (bor $x $y))
74
75(=> (bor $y (bor $x $y))
76    (bor $x $y))
77
78(=> (bor (bor $x $y) $x)
79    (bor $x $y))
80
81(=> (bor (bor $x $y) $y)
82    (bor $x $y))
83```
84
85When compiled into a peephole optimizer automaton, they look like this:
86
87![](examples/redundant-bor.png)
88
89## A DSL for Optimizations
90
91A single peephole optimization has two parts:
92
931. A **left-hand side** that describes candidate instruction sequences that the
94   optimization applies to.
952. A **right-hand side** that contains the new instruction sequence that
96   replaces old instruction sequences that the left-hand side matched.
97
98A left-hand side may bind sub-expressions to variables and the right-hand side
99may contain those bound variables to reuse the sub-expressions. The operations
100inside the left-hand and right-hand sides are a subset of clif operations.
101
102Let's take a look at an example:
103
104```lisp
105(=> (imul $x 2)
106    (ishl $x 1))
107```
108
109As you can see, the DSL uses S-expressions. (S-expressions are easy to parse and
110we also have a bunch of nice parsing infrastructure for S-expressions already
111for our [`wat`][wat] and [`wast`][wast] crates.)
112
113[wat]: https://crates.io/crates/wat
114[wast]: https://crates.io/crates/wast
115
116The left-hand side of this optimization is `(imul $x 2)`. It matches integer
117multiplication operations where a value is multiplied by the constant two. The
118value multiplied by two is bound to the variable `$x`.
119
120The right-hand side of this optimization is `(ishl $x 1)`. It reuses the `$x`
121variable that was bound in the left-hand side.
122
123This optimization replaces expressions of the form `x * 2` with `x << 1`. This
124is sound because multiplication by two is the same as shifting left by one for
125binary integers, and it is desirable because a shift-left instruction executes
126in fewer cycles than a multiplication.
127
128The general form of an optimization is:
129
130```lisp
131(=> <left-hand-side> <right-hand-side>)
132```
133
134### Variables
135
136Variables begin with a dollar sign and are followed by lowercase letters,
137numbers, hyphens, and underscores: `$x`, `$y`, `$my-var`, `$operand2`.
138
139Left-hand side patterns may contain variables that match any kind of
140sub-expression and give it a name so that it may be reused in the right-hand
141side.
142
143```lisp
144;; Replace `x + 0` with simply `x`.
145(=> (iadd $x 0)
146    $x)
147```
148
149Within a pattern, every occurrence of a variable with the same name must match
150the same value. That is `(iadd $x $x)` matches `(iadd 1 1)` but does not match
151`(iadd 1 2)`. This lets us write optimizations such as this:
152
153```lisp
154;; Xor'ing a value with itself is always zero.
155(=> (bxor $x $x)
156    (iconst 0))
157```
158
159### Constants
160
161We've already seen specific integer literals and wildcard variables in patterns,
162but we can also match any constant. These are written similar to variables, but
163use uppercase letters rather than lowercase: `$C`, `$MY-CONST`, and `$OPERAND2`.
164
165For example, we can use constant patterns to combine an `iconst` and `iadd` into
166a single `iadd_imm` instruction:
167
168```lisp
169(=> (iadd (iconst $C) $x)
170    (iadd_imm $C $x))
171```
172
173### Nested Patterns
174
175Patterns can also match nested operations with their own nesting:
176
177```lisp
178(=> (bor $x (bor $x $y))
179    (bor $x $y))
180```
181
182### Preconditions and Unquoting
183
184Let's reconsider our first example optimization:
185
186```lisp
187(=> (imul $x 2)
188    (ishl $x 1))
189```
190
191This optimization is a little too specific. Here is another version of this
192optimization that we'd like to support:
193
194```lisp
195(=> (imul $x 4)
196    (ishl $x 2))
197```
198
199We don't want to have to write out all instances of this general class of
200optimizations! That would be a lot of repetition and could also bloat the size
201of our generated peephole optimizer's matching automata.
202
203Instead, we can generalize this optimization by matching any multiplication by a
204power of two constant `C` and replacing it with a shift left of `log2(C)`.
205
206First, rather than match `2` directly, we want to match any constant variable `C`:
207
208```lisp
209(imul $x $C)
210```
211
212Note that variables begin with lowercase letters, while constants begin with
213uppercase letters. Both the constant pattern `$C` and variable pattern `$x` will
214match `5`, but only the variable pattern `$x` will match a whole sub-expression
215like `(iadd 1 2)`. The constant pattern `$C` only matches constant values.
216
217Next, we augment our left-hand side's pattern with a **precondition** that the
218constant `$C` must be a power of two. Preconditions are introduced by wrapping
219a pattern in the `when` form:
220
221```lisp
222;; Our new left-hand side, augmenting a pattern with a precondition!
223(when
224  ;; The pattern matching multiplication by a constant value.
225  (imul $x $C)
226
227  ;; The precondition that $C must be a power of two.
228  (is-power-of-two $C))
229 ```
230
231In the right-hand side, we use **unquoting** to perform compile-time evaluation
232of `log2($C)`. Unquoting is done with the `$(...)` form:
233
234```lisp
235;; Our new right-hand side, using unqouting to do compile-time evaluation of
236;; constants that were matched and bound in the left-hand side!
237(ishl $x $(log2 $C))
238```
239
240Finally, here is the general optimization putting our new left-hand and
241right-hand sides together:
242
243```lisp
244(=> (when (imul $x $C)
245          (is-power-of-two $C))
246    (ishl $x $(log2 $C)))
247```
248
249### Bit Widths
250
251Similar to how Cranelift's instructions are bit-width polymorphic, `peepmatic`
252optimizations are also bit-width polymorphic. Unless otherwise specified, a
253pattern will match expressions manipulating `i32`s just the same as expressions
254manipulating `i64`s, etc... An optimization that doesn't constrain its pattern's
255bit widths must be valid for all bit widths:
256
257* 1
258* 8
259* 16
260* 32
261* 64
262* 128
263
264To constrain an optimization to only match `i32`s, for example, you can use the
265`bit-width` precondition:
266
267```lisp
268(=> (when (iadd $x $y)
269          (bit-width $x 32)
270          (bit-width $y 32))
271    ...)
272```
273
274Alternatively, you can ascribe a type to an operation by putting the type inside
275curly brackets after the operator, like this:
276
277```lisp
278(=> (when (sextend{i64} (ireduce{i32} $x))
279          (bit-width $x 64))
280    (sshr (ishl $x 32) 32))
281```
282
283## Implementation
284
285Peepmatic has roughly four phases:
286
2871. Parsing
2882. Type Checking
2893. Linearization
2904. Automatization
291
292(I say "roughly" because there are a couple micro-passes that happen after
293linearization and before automatization. But those are the four main phases.)
294
295### Parsing
296
297Parsing transforms the DSL source text into an abstract syntax tree (AST).
298
299We use [the `wast` crate][wast]. It gives us nicely formatted errors with source
300context, as well as some other generally nice-to-have parsing infrastructure.
301
302Relevant source files:
303
304* `src/parser.rs`
305* `src/ast.rs`
306
307[wast]: https://crates.io/crates/wast
308
309### Type Checking
310
311Type checking operates on the AST. It checks that types and bit widths in the
312optimizations are all valid. For example, it ensures that the type and bit width
313of an optimization's left-hand side is the same as its right-hand side, because
314it doesn't make sense to replace an integer expression with a boolean
315expression.
316
317After type checking is complete, certain AST nodes are assigned a type and bit
318width, that are later used in linearization and when matching and applying
319optimizations.
320
321We walk the AST and gather type constraints. Every constraint is associated with
322a span in the source file. We hand these constraints off to Z3. In the case that
323there are type errors (i.e. Z3 returns `unsat`), we get the constraints that are
324in conflict with each other via `z3::Solver::get_unsat_core` and report the type
325errors to the user, with the source context, thanks to the constraints'
326associated spans.
327
328Using Z3 not only makes implementing type checking easier than it otherwise
329would be, but makes it that much easier to extend type checking with searching
330for counterexample inputs in the future. That is, inputs for which the RHS is
331not equivalent to the LHS, implying that the optimization is unsound.
332
333Relevant source files:
334
335* `src/verify.rs`
336
337### Linearization
338
339Linearization takes the AST of optimizations and converts each optimization into
340a linear form. The goal is to make automaton construction easier in the
341automatization step, as well as simplifying the language to make matching and
342applying optimizations easier.
343
344Each optimization's left-hand side is converted into a sequence of
345
346* match operation,
347* path to the instruction/value/immediate to which the operation is applied, and
348* expected result of the operation.
349
350All match operations must have the expected result for the optimization to be
351applicable to an instruction sequence.
352
353Each optimization's right-hand side is converted into a sequence of build
354actions. These are commands that describe how to construct the right-hand side,
355given that the left-hand side has been matched.
356
357Relevant source files:
358
359* `src/linearize.rs`
360* `src/linear_passes.rs`
361* `crates/runtime/src/linear.rs`
362
363### Automatization
364
365Automatization takes a set of linear optimizations and combines them into a
366transducer automaton. This automaton is the final, compiled peephole
367optimizations. The goal is to de-duplicate as much as we can from all the linear
368optimizations, producing as compact and cache-friendly a representation as we
369can.
370
371Plain automata can tell you whether it matches an input string. It can be
372thought of as a compact representation of a set of strings. A transducer is a
373type of automaton that doesn't just match input strings, but can map them to
374output values. It can be thought of as a compact representation of a dictionary
375or map. By using transducers, we de-duplicate not only the prefix and suffix of
376the match operations, but also the right-hand side build actions.
377
378Each state in the emitted transducers is associated with a match operation and
379path. The transitions out of that state are over the result of the match
380operation. Each transition optionally accumulates some RHS build actions. By the
381time we reach a final state, the RHS build actions are complete and can be
382interpreted to apply the matched optimization.
383
384The relevant source files for constructing the transducer automaton are:
385
386* `src/automatize.rs`
387* `crates/automata/src/lib.rs`
388
389The relevant source files for the runtime that interprets the transducers and
390applies optimizations are:
391
392* `crates/runtime/src/optimizations.rs`
393* `crates/runtime/src/optimizer.rs`
394
395## References
396
397I found these resources helpful when designing `peepmatic`:
398
399* [Extending tree pattern matching for application to peephole
400  optimizations](https://pure.tue.nl/ws/portalfiles/portal/125543109/Thesis_JanekvOirschot.pdf)
401  by van Oirschot
402
403* [Interpreted Pattern Match Execution for
404  MLIR](https://drive.google.com/drive/folders/1hb_sXbdMbIz95X-aaa6Vf5wSYRwsJuve)
405  by Jeff Niu
406
407* [Direct Construction of Minimal Acyclic Subsequential
408  Transducers](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.3698&rep=rep1&type=pdf)
409  by Mihov and Maurel
410
411* [Index 1,600,000,000 Keys with Automata and
412  Rust](https://blog.burntsushi.net/transducers/) and [the `fst`
413  crate](https://crates.io/crates/fst) by Andrew Gallant
414
415## Acknowledgments
416
417Thanks to [Jubi Taneja], [Dan Gohman], [John Regehr], and [Nuno Lopes] for their
418input in design discussions and for sharing helpful resources!
419
420[Jubi Taneja]: https://www.cs.utah.edu/~jubi/
421[Dan Gohman]: https://github.com/sunfishcode
422[John Regehr]: https://www.cs.utah.edu/~regehr/
423[Nuno Lopes]: http://web.ist.utl.pt/nuno.lopes/
424