rumur-2021.09.29/doc/internals-warts.rst

Internals — Warts
=================
This document covers some design flaws in Rumur that it would be nice to find a
solution to at some point.

Non-minimal traces when running multithreaded
---------------------------------------------
* `Github issue #131 “minimal trace mode”`_

Rumur defaults to generating a multithreaded verifier (``--threads 0``) that
aborts as soon as it finds an error (``--max-errors 1``). This was seen to be
pragmatic and aligning with what users typically want to do when bringing up a
new model. That is, run as fast as possible and stop as soon as we know an
exhaustive proof is impossible.

When examining counterexample traces, you generally want to be presented with
the shortest possible example. This happens naturally in a single-threaded
verifier as it performs a breadth-first search. The multithreaded verifier is
only approximately breadth-first because all the threads are essentially racing
with each other in their exploration. As a result, the first problem found may
not be the shallowest and may result in a non-minimal counterexample trace.

Each thread encounters states in greater-than-or-equal-depth order, so a naive
attempt at forcing a minimal trace is to temporarily stall a thread when it
finds an error. We can unblock this thread and allow it to report its error when
all other threads have either (a) finished exploration or (b) exceeded the depth
of the error location found by the first thread. If a second thread finds an
error at a shallower depth than the first, it can simply kill the first thread
and then itself stall as described.

Alas, this is insufficient. Taking the example from the Github issue, the state
machine may look as follows:

.. code-block::

  A -> B -> C -> ...
   \        ^
    \______/

Thread 0 might discover ``C`` first via the path ``A -> B -> C``. Later thread 1
might rediscover ``C`` via the path ``A -> C``. Thread 1 will notice its ``C``
is a duplicate and discard it. Now suppose one of the threads finds a deeper
error whose counterexample traces back to ``C``. This counterexample will trace
back through the ``A -> B`` prefix rather than the shorter ``A``.

So far I have not come up with a design that would result in a minimal
counterexample trace in a multithreaded verifier without degrading current
performance. Solving this would, I believe, be of significant value to users.

.. _`Github issue #131 “minimal trace mode”`: https://github.com/Smattr/rumur/issues/131

Incomplete AST in recursive functions
-------------------------------------
The librumur Abstract Syntax Tree (AST) represents child nodes using smart
pointers, ``rumur::Ptr``. This has constructor and assignment operator
implementations that allow the programmer to avoid thinking too much about
memory management. The general pattern is to freely copy the pointed-to object
into a new ``rumur::Ptr``. This isn’t very efficient, but has the advantage of
being simple. Moreover, there is little point tightly optimising AST
manipulation when any AST that is large enough to see benefit from this most
likely results from a model that is too large to verify.

A side-effect of this “deep copy” pointer design is that the AST is inherently a
Directed Acyclic Graph (DAG). It is not possible for an AST node to contain a
reference to something above itself in the AST because assigning this reference
takes a copy of the parent node.

This means that during symbol resolution, a recursive function call has no way
to store a fully resolved reference to its callee. To see why this is, follow
through the symbol resolution process. To begin with, the containing function
has a statement that includes an unresolved function call. The symbol resolution
traversal finds this unresolved function call and resolves it to the containing
function by copying it. However, this copy still contains the unresolved version
of the function call as a child. We can descend into the callee function
(``rumur::FunctionCall::function``) but we’ll merely repeat the process and have
a deeper unresolved function call.

The current design merely avoids descending into the callees of function calls.
This is acceptable, but it leaves an AST containing unresolved function calls
that may come as a surprise to later consumers.

Obtaining a fully resolved AST while supporting recursion seems inherently at
odds with the current design. It seems like it would be a net loss to permit
cycles in the AST (we would have to manually manage pointers and memory) but
perhaps there is another compromise we could find.