1Internals — Warts
2=================
3This document covers some design flaws in Rumur that it would be nice to find a
4solution to at some point.
5
6Non-minimal traces when running multithreaded
7---------------------------------------------
8* `Github issue #131 “minimal trace mode”`_
9
10Rumur defaults to generating a multithreaded verifier (``--threads 0``) that
11aborts as soon as it finds an error (``--max-errors 1``). This was seen to be
12pragmatic and aligning with what users typically want to do when bringing up a
13new model. That is, run as fast as possible and stop as soon as we know an
14exhaustive proof is impossible.
15
16When examining counterexample traces, you generally want to be presented with
17the shortest possible example. This happens naturally in a single-threaded
18verifier as it performs a breadth-first search. The multithreaded verifier is
19only approximately breadth-first because all the threads are essentially racing
20with each other in their exploration. As a result, the first problem found may
21not be the shallowest and may result in a non-minimal counterexample trace.
22
23Each thread encounters states in greater-than-or-equal-depth order, so a naive
24attempt at forcing a minimal trace is to temporarily stall a thread when it
25finds an error. We can unblock this thread and allow it to report its error when
26all other threads have either (a) finished exploration or (b) exceeded the depth
27of the error location found by the first thread. If a second thread finds an
28error at a shallower depth than the first, it can simply kill the first thread
29and then itself stall as described.
30
31Alas, this is insufficient. Taking the example from the Github issue, the state
32machine may look as follows:
33
34.. code-block::
35
36  A -> B -> C -> ...
37   \        ^
38    \______/
39
40Thread 0 might discover ``C`` first via the path ``A -> B -> C``. Later thread 1
41might rediscover ``C`` via the path ``A -> C``. Thread 1 will notice its ``C``
42is a duplicate and discard it. Now suppose one of the threads finds a deeper
43error whose counterexample traces back to ``C``. This counterexample will trace
44back through the ``A -> B`` prefix rather than the shorter ``A``.
45
46So far I have not come up with a design that would result in a minimal
47counterexample trace in a multithreaded verifier without degrading current
48performance. Solving this would, I believe, be of significant value to users.
49
50.. _`Github issue #131 “minimal trace mode”`: https://github.com/Smattr/rumur/issues/131
51
52Incomplete AST in recursive functions
53-------------------------------------
54The librumur Abstract Syntax Tree (AST) represents child nodes using smart
55pointers, ``rumur::Ptr``. This has constructor and assignment operator
56implementations that allow the programmer to avoid thinking too much about
57memory management. The general pattern is to freely copy the pointed-to object
58into a new ``rumur::Ptr``. This isn’t very efficient, but has the advantage of
59being simple. Moreover, there is little point tightly optimising AST
60manipulation when any AST that is large enough to see benefit from this most
61likely results from a model that is too large to verify.
62
63A side-effect of this “deep copy” pointer design is that the AST is inherently a
64Directed Acyclic Graph (DAG). It is not possible for an AST node to contain a
65reference to something above itself in the AST because assigning this reference
66takes a copy of the parent node.
67
68This means that during symbol resolution, a recursive function call has no way
69to store a fully resolved reference to its callee. To see why this is, follow
70through the symbol resolution process. To begin with, the containing function
71has a statement that includes an unresolved function call. The symbol resolution
72traversal finds this unresolved function call and resolves it to the containing
73function by copying it. However, this copy still contains the unresolved version
74of the function call as a child. We can descend into the callee function
75(``rumur::FunctionCall::function``) but we’ll merely repeat the process and have
76a deeper unresolved function call.
77
78The current design merely avoids descending into the callees of function calls.
79This is acceptable, but it leaves an AST containing unresolved function calls
80that may come as a surprise to later consumers.
81
82Obtaining a fully resolved AST while supporting recursion seems inherently at
83odds with the current design. It seems like it would be a net loss to permit
84cycles in the AST (we would have to manually manage pointers and memory) but
85perhaps there is another compromise we could find.
86