gcc/doc/analyzer.texi

*0bfacb9bSmrg@c Copyright (C) 2019 Free Software Foundation, Inc.
*0bfacb9bSmrg@c This is part of the GCC manual.
*0bfacb9bSmrg@c For copying conditions, see the file gcc.texi.
*0bfacb9bSmrg@c Contributed by David Malcolm <dmalcolm@redhat.com>.
*0bfacb9bSmrg
*0bfacb9bSmrg@node Static Analyzer
*0bfacb9bSmrg@chapter Static Analyzer
*0bfacb9bSmrg@cindex analyzer
*0bfacb9bSmrg@cindex static analysis
*0bfacb9bSmrg@cindex static analyzer
*0bfacb9bSmrg
*0bfacb9bSmrg@menu
*0bfacb9bSmrg* Analyzer Internals::       Analyzer Internals
*0bfacb9bSmrg* Debugging the Analyzer::   Useful debugging tips
*0bfacb9bSmrg@end menu
*0bfacb9bSmrg
*0bfacb9bSmrg@node Analyzer Internals
*0bfacb9bSmrg@section Analyzer Internals
*0bfacb9bSmrg@cindex analyzer, internals
*0bfacb9bSmrg@cindex static analyzer, internals
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Overview
*0bfacb9bSmrg
*0bfacb9bSmrgThe analyzer implementation works on the gimple-SSA representation.
*0bfacb9bSmrg(I chose this in the hopes of making it easy to work with LTO to
*0bfacb9bSmrgdo whole-program analysis).
*0bfacb9bSmrg
*0bfacb9bSmrgThe implementation is read-only: it doesn't attempt to change anything,
*0bfacb9bSmrgjust emit warnings.
*0bfacb9bSmrg
*0bfacb9bSmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}.
*0bfacb9bSmrg
*0bfacb9bSmrgFirst, we build a @code{supergraph} which combines the callgraph and all
*0bfacb9bSmrgof the CFGs into a single directed graph, with both interprocedural and
*0bfacb9bSmrgintraprocedural edges.  The nodes and edges in the supergraph are called
*0bfacb9bSmrg``supernodes'' and ``superedges'', and often referred to in code as
*0bfacb9bSmrg@code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
*0bfacb9bSmrginterprocedural calls, so there can be more than one supernode per
*0bfacb9bSmrgbasic block.  Most statements will be in just one supernode, but a call
*0bfacb9bSmrgstatement can appear in two supernodes: at the end of one for the call,
*0bfacb9bSmrgand again at the start of another for the return.
*0bfacb9bSmrg
*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}.
*0bfacb9bSmrg
*0bfacb9bSmrgWe then build an @code{analysis_plan} which walks the callgraph to
*0bfacb9bSmrgdetermine which calls might be suitable for being summarized (rather
*0bfacb9bSmrgthan fully explored) and thus in what order to explore the functions.
*0bfacb9bSmrg
*0bfacb9bSmrgNext is the heart of the analyzer: we use a worklist to explore state
*0bfacb9bSmrgwithin the supergraph, building an "exploded graph".
*0bfacb9bSmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
*0bfacb9bSmrg     "Precise Interprocedural Dataflow Analysis via Graph Reachability"
*0bfacb9bSmrg     (Thomas Reps, Susan Horwitz and Mooly Sagiv).
*0bfacb9bSmrg
*0bfacb9bSmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid
*0bfacb9bSmrgtracking state too closely, so that (hopefully) we rapidly converge
*0bfacb9bSmrgon a final exploded graph, and terminate the analysis.  We also bail
*0bfacb9bSmrgout if the number of exploded <end-of-basic-block, state> nodes gets
*0bfacb9bSmrglarger than a particular multiple of the total number of basic blocks
*0bfacb9bSmrg(to ensure termination in the face of pathological state-explosion
*0bfacb9bSmrgcases, or bugs).  We also stop exploring a point once we hit a limit
*0bfacb9bSmrgof states for that point.
*0bfacb9bSmrg
*0bfacb9bSmrgWe can identify problems directly when processing a <point,@w{ }state>
*0bfacb9bSmrginstance.  For example, if we're finding the successors of
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg   <point: before-stmt: "free (ptr);",
*0bfacb9bSmrg    state: @{"ptr": freed@}>
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgthen we can detect a double-free of "ptr".  We can then emit a path
*0bfacb9bSmrgto reach the problem by finding the simplest route through the graph.
*0bfacb9bSmrg
*0bfacb9bSmrgProgram points in the analysis are much more fine-grained than in the
*0bfacb9bSmrgCFG and supergraph, with points (and thus potentially exploded nodes)
*0bfacb9bSmrgfor various events, including before individual statements.
*0bfacb9bSmrgBy default the exploded graph merges multiple consecutive statements
*0bfacb9bSmrgin a supernode into one exploded edge to minimize the size of the
*0bfacb9bSmrgexploded graph.  This can be suppressed via
*0bfacb9bSmrg@option{-fanalyzer-fine-grained}.
*0bfacb9bSmrgThe fine-grained approach seems to make things simpler and more debuggable
*0bfacb9bSmrgthat other approaches I tried, in that each point is responsible for one
*0bfacb9bSmrgthing.
*0bfacb9bSmrg
*0bfacb9bSmrgProgram points in the analysis also have a "call string" identifying the
*0bfacb9bSmrgstack of callsites below them, so that paths in the exploded graph
*0bfacb9bSmrgcorrespond to interprocedurally valid paths: we always return to the
*0bfacb9bSmrgcorrect call site, propagating state information accordingly.
*0bfacb9bSmrgWe avoid infinite recursion by stopping the analysis if a callsite
*0bfacb9bSmrgappears more than @code{analyzer-max-recursion-depth} in a callstring
*0bfacb9bSmrg(defaulting to 2).
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Graphs
*0bfacb9bSmrg
*0bfacb9bSmrgNodes and edges in the exploded graph are called ``exploded nodes'' and
*0bfacb9bSmrg``exploded edges'' and often referred to in the code as
*0bfacb9bSmrg@code{enodes} and @code{eedges} (especially when distinguishing them
*0bfacb9bSmrgfrom the @code{snodes} and @code{sedges} in the supergraph).
*0bfacb9bSmrg
*0bfacb9bSmrgEach graph numbers its nodes, giving unique identifiers - supernodes
*0bfacb9bSmrgare referred to throughout dumps in the form @samp{SN': @var{index}} and
*0bfacb9bSmrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
*0bfacb9bSmrg@samp{EN:29}).
*0bfacb9bSmrg
*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
*0bfacb9bSmrg
*0bfacb9bSmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
*0bfacb9bSmrgand other dump options.  Exploded nodes are color-coded in the .dot output
*0bfacb9bSmrgbased on state-machine states to make it easier to see state changes at
*0bfacb9bSmrga glance.
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection State Tracking
*0bfacb9bSmrg
*0bfacb9bSmrgThere's a tension between:
*0bfacb9bSmrg@itemize @bullet
*0bfacb9bSmrg@item
*0bfacb9bSmrgprecision of analysis in the straight-line case, vs
*0bfacb9bSmrg@item
*0bfacb9bSmrgexponential blow-up in the face of control flow.
*0bfacb9bSmrg@end itemize
*0bfacb9bSmrg
*0bfacb9bSmrgFor example, in general, given this CFG:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg      A
*0bfacb9bSmrg     / \
*0bfacb9bSmrg    B   C
*0bfacb9bSmrg     \ /
*0bfacb9bSmrg      D
*0bfacb9bSmrg     / \
*0bfacb9bSmrg    E   F
*0bfacb9bSmrg     \ /
*0bfacb9bSmrg      G
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgwe want to avoid differences in state-tracking in B and C from
*0bfacb9bSmrgleading to blow-up.  If we don't prevent state blowup, we end up
*0bfacb9bSmrgwith exponential growth of the exploded graph like this:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg
*0bfacb9bSmrg           1:A
*0bfacb9bSmrg          /   \
*0bfacb9bSmrg         /     \
*0bfacb9bSmrg        /       \
*0bfacb9bSmrg      2:B       3:C
*0bfacb9bSmrg       |         |
*0bfacb9bSmrg      4:D       5:D        (2 exploded nodes for D)
*0bfacb9bSmrg     /   \     /   \
*0bfacb9bSmrg   6:E   7:F 8:E   9:F
*0bfacb9bSmrg    |     |   |     |
*0bfacb9bSmrg   10:G 11:G 12:G  13:G    (4 exploded nodes for G)
*0bfacb9bSmrg
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgSimilar issues arise with loops.
*0bfacb9bSmrg
*0bfacb9bSmrgTo prevent this, we follow various approaches:
*0bfacb9bSmrg
*0bfacb9bSmrg@enumerate a
*0bfacb9bSmrg@item
*0bfacb9bSmrgstate pruning: which tries to discard state that won't be relevant
*0bfacb9bSmrglater on withing the function.
*0bfacb9bSmrgThis can be disabled via @option{-fno-analyzer-state-purge}.
*0bfacb9bSmrg
*0bfacb9bSmrg@item
*0bfacb9bSmrgstate merging.  We can try to find the commonality between two
*0bfacb9bSmrgprogram_state instances to make a third, simpler program_state.
*0bfacb9bSmrgWe have two strategies here:
*0bfacb9bSmrg
*0bfacb9bSmrg  @enumerate
*0bfacb9bSmrg  @item
*0bfacb9bSmrg     the worklist keeps new nodes for the same program_point together,
*0bfacb9bSmrg     and tries to merge them before processing, and thus before they have
*0bfacb9bSmrg     successors.  Hence, in the above, the two nodes for D (4 and 5) reach
*0bfacb9bSmrg     the front of the worklist together, and we create a node for D with
*0bfacb9bSmrg     the merger of the incoming states.
*0bfacb9bSmrg
*0bfacb9bSmrg  @item
*0bfacb9bSmrg     try merging with the state of existing enodes for the program_point
*0bfacb9bSmrg     (which may have already been explored).  There will be duplication,
*0bfacb9bSmrg     but only one set of duplication; subsequent duplicates are more likely
*0bfacb9bSmrg     to hit the cache.  In particular, (hopefully) all merger chains are
*0bfacb9bSmrg     finite, and so we guarantee termination.
*0bfacb9bSmrg     This is intended to help with loops: we ought to explore the first
*0bfacb9bSmrg     iteration, and then have a "subsequent iterations" exploration,
*0bfacb9bSmrg     which uses a state merged from that of the first, to be more abstract.
*0bfacb9bSmrg  @end enumerate
*0bfacb9bSmrg
*0bfacb9bSmrgWe avoid merging pairs of states that have state-machine differences,
*0bfacb9bSmrgas these are the kinds of differences that are likely to be most
*0bfacb9bSmrginteresting.  So, for example, given:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg      if (condition)
*0bfacb9bSmrg        ptr = malloc (size);
*0bfacb9bSmrg      else
*0bfacb9bSmrg        ptr = local_buf;
*0bfacb9bSmrg
*0bfacb9bSmrg      .... do things with 'ptr'
*0bfacb9bSmrg
*0bfacb9bSmrg      if (condition)
*0bfacb9bSmrg        free (ptr);
*0bfacb9bSmrg
*0bfacb9bSmrg      ...etc
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgthen we end up with an exploded graph that looks like this:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg
*0bfacb9bSmrg                   if (condition)
*0bfacb9bSmrg                     / T      \ F
*0bfacb9bSmrg            ---------          ----------
*0bfacb9bSmrg           /                             \
*0bfacb9bSmrg      ptr = malloc (size)             ptr = local_buf
*0bfacb9bSmrg          |                               |
*0bfacb9bSmrg      copy of                         copy of
*0bfacb9bSmrg        "do things with 'ptr'"          "do things with 'ptr'"
*0bfacb9bSmrg      with ptr: heap-allocated        with ptr: stack-allocated
*0bfacb9bSmrg          |                               |
*0bfacb9bSmrg      if (condition)                  if (condition)
*0bfacb9bSmrg          | known to be T                 | known to be F
*0bfacb9bSmrg      free (ptr);                         |
*0bfacb9bSmrg           \                             /
*0bfacb9bSmrg            -----------------------------
*0bfacb9bSmrg                         | ('ptr' is pruned, so states can be merged)
*0bfacb9bSmrg                        etc
*0bfacb9bSmrg
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgwhere some duplication has occurred, but only for the places where the
*0bfacb9bSmrgthe different paths are worth exploringly separately.
*0bfacb9bSmrg
*0bfacb9bSmrgMerging can be disabled via @option{-fno-analyzer-state-merge}.
*0bfacb9bSmrg@end enumerate
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Region Model
*0bfacb9bSmrg
*0bfacb9bSmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}.
*0bfacb9bSmrgThis is an implementation of the region-based ternary model described in
*0bfacb9bSmrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf,
*0bfacb9bSmrg"A Memory Model for Static Analysis of C Programs"}
*0bfacb9bSmrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
*0bfacb9bSmrg
*0bfacb9bSmrgA @code{region_model} encapsulates a representation of the state of
*0bfacb9bSmrgmemory, with a tree of @code{region} instances, along with their associated
*0bfacb9bSmrgvalues.  The representation is graph-like because values can be pointers
*0bfacb9bSmrgto regions.  It also stores a constraint_manager, capturing relationships
*0bfacb9bSmrgbetween the values.
*0bfacb9bSmrg
*0bfacb9bSmrgBecause each node in the @code{exploded_graph} has a @code{region_model},
*0bfacb9bSmrgand each of the latter is graph-like, the @code{exploded_graph} is in some
*0bfacb9bSmrgways a graph of graphs.
*0bfacb9bSmrg
*0bfacb9bSmrgHere's an example of printing a @code{region_model}, showing the ASCII-art
*0bfacb9bSmrgused to visualize the region hierarchy (colorized when printing to stderr):
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg(gdb) call debug (*this)
*0bfacb9bSmrgr0: @{kind: 'root', parent: null, sval: null@}
*0bfacb9bSmrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@}
*0bfacb9bSmrg|  |: sval: sv1: @{poisoned: uninit@}
*0bfacb9bSmrg|  |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@}
*0bfacb9bSmrg|  |  `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@}
*0bfacb9bSmrg|  |    |: sval: sv3: @{type: 'void *', unknown@}
*0bfacb9bSmrg|  |    |: type: 'void *'
*0bfacb9bSmrg|  `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@}
*0bfacb9bSmrg|    |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*0bfacb9bSmrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
*0bfacb9bSmrg|    |  |: type: 'void *'
*0bfacb9bSmrg|    |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*0bfacb9bSmrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
*0bfacb9bSmrg|    |  |: type: 'void *'
*0bfacb9bSmrg|    `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
*0bfacb9bSmrg|      |: sval: sv4: @{type: 'void *', &r6@}
*0bfacb9bSmrg|      |: type: 'void *'
*0bfacb9bSmrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@}
*0bfacb9bSmrg  |: sval: sv2: @{poisoned: uninit@}
*0bfacb9bSmrg  `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@}
*0bfacb9bSmrgsvalues:
*0bfacb9bSmrg  sv0: @{type: 'size_t', '1024'@}
*0bfacb9bSmrg  sv1: @{poisoned: uninit@}
*0bfacb9bSmrg  sv2: @{poisoned: uninit@}
*0bfacb9bSmrg  sv3: @{type: 'void *', unknown@}
*0bfacb9bSmrg  sv4: @{type: 'void *', &r6@}
*0bfacb9bSmrgconstraint manager:
*0bfacb9bSmrg  equiv classes:
*0bfacb9bSmrg    ec0: @{sv0 == '1024'@}
*0bfacb9bSmrg    ec1: @{sv4@}
*0bfacb9bSmrg  constraints:
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgThis is the state at the point of returning from @code{calls_malloc} back
*0bfacb9bSmrgto @code{test} in the following:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrgvoid *
*0bfacb9bSmrgcalls_malloc (void)
*0bfacb9bSmrg@{
*0bfacb9bSmrg  void *result = malloc (1024);
*0bfacb9bSmrg  return result;
*0bfacb9bSmrg@}
*0bfacb9bSmrg
*0bfacb9bSmrgvoid test (void)
*0bfacb9bSmrg@{
*0bfacb9bSmrg  void *ptr = calls_malloc ();
*0bfacb9bSmrg  /* etc.  */
*0bfacb9bSmrg@}
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two
*0bfacb9bSmrgchildren: a frame for @code{test} (``r2''), and a frame for
*0bfacb9bSmrg@code{calls_malloc} (``r4'').  These frame regions have child regions for
*0bfacb9bSmrgstoring their local variables.  For example, the return region
*0bfacb9bSmrgand that of various other regions within the ``calls_malloc'' frame all have
*0bfacb9bSmrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''.  Within the parent
*0bfacb9bSmrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}.
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Analyzer Paths
*0bfacb9bSmrg
*0bfacb9bSmrgWe need to explain to the user what the problem is, and to persuade them
*0bfacb9bSmrgthat there really is a problem.  Hence having a @code{diagnostic_path}
*0bfacb9bSmrgisn't just an incidental detail of the analyzer; it's required.
*0bfacb9bSmrg
*0bfacb9bSmrgPaths ought to be:
*0bfacb9bSmrg@itemize @bullet
*0bfacb9bSmrg@item
*0bfacb9bSmrginterprocedurally-valid
*0bfacb9bSmrg@item
*0bfacb9bSmrgfeasible
*0bfacb9bSmrg@end itemize
*0bfacb9bSmrg
*0bfacb9bSmrgWithout state-merging, all paths in the exploded graph are feasible
*0bfacb9bSmrg(in terms of constraints being satisified).
*0bfacb9bSmrgWith state-merging, paths in the exploded graph can be infeasible.
*0bfacb9bSmrg
*0bfacb9bSmrgWe collate warnings and only emit them for the simplest path
*0bfacb9bSmrge.g. for a bug in a utility function, with lots of routes to calling it,
*0bfacb9bSmrgwe only emit the simplest path (which could be intraprocedural, if
*0bfacb9bSmrgit can be reproduced without a caller).  We apply a check that
*0bfacb9bSmrgeach duplicate warning's shortest path is feasible, rejecting any
*0bfacb9bSmrgwarnings for which the shortest path is infeasible (which could lead to
*0bfacb9bSmrgfalse negatives).
*0bfacb9bSmrg
*0bfacb9bSmrgWe use the shortest feasible @code{exploded_path} through the
*0bfacb9bSmrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a
*0bfacb9bSmrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) -
*0bfacb9bSmrgspecifically a @code{checker_path}.
*0bfacb9bSmrg
*0bfacb9bSmrgHaving built the @code{checker_path}, we prune it to try to eliminate
*0bfacb9bSmrgevents that aren't relevant, to minimize how much the user has to read.
*0bfacb9bSmrg
*0bfacb9bSmrgAfter pruning, we notify each event in the path of its ID and record the
*0bfacb9bSmrgIDs of interesting events, allowing for events to refer to other events
*0bfacb9bSmrgin their descriptions.  The @code{pending_diagnostic} class has various
*0bfacb9bSmrgvfuncs to support emitting more precise descriptions, so that e.g.
*0bfacb9bSmrg
*0bfacb9bSmrg@itemize @bullet
*0bfacb9bSmrg@item
*0bfacb9bSmrga deref-of-unchecked-malloc diagnostic might use:
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  returning possibly-NULL pointer to 'make_obj' from 'allocator'
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgfor a @code{return_event} to make it clearer how the unchecked value moves
*0bfacb9bSmrgfrom callee back to caller
*0bfacb9bSmrg@item
*0bfacb9bSmrga double-free diagnostic might use:
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  second 'free' here; first 'free' was at (3)
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgand a use-after-free might use
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  use after 'free' here; memory was freed at (2)
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg@end itemize
*0bfacb9bSmrg
*0bfacb9bSmrgAt this point we can emit the diagnostic.
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Limitations
*0bfacb9bSmrg
*0bfacb9bSmrg@itemize @bullet
*0bfacb9bSmrg@item
*0bfacb9bSmrgOnly for C so far
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe implementation of call summaries is currently very simplistic.
*0bfacb9bSmrg@item
*0bfacb9bSmrgLack of function pointer analysis
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe constraint-handling code assumes reflexivity in some places
*0bfacb9bSmrg(that values are equal to themselves), which is not the case for NaN.
*0bfacb9bSmrgAs a simple workaround, constraints on floating-point values are
*0bfacb9bSmrgcurrently ignored.
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe region model code creates lots of little mutable objects at each
*0bfacb9bSmrg@code{region_model} (and thus per @code{exploded_node}) rather than
*0bfacb9bSmrgsharing immutable objects and having the mutable state in the
*0bfacb9bSmrg@code{program_state} or @code{region_model}.  The latter approach might be
*0bfacb9bSmrgmore efficient, and might avoid dealing with IDs rather than pointers
*0bfacb9bSmrg(which requires us to impose an ordering to get meaningful equality).
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe region model code doesn't yet support @code{memcpy}.  At the
*0bfacb9bSmrggimple-ssa level these have been optimized to statements like this:
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c]
*0bfacb9bSmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10;
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgPerhaps they could be supported via a new @code{compound_svalue} type.
*0bfacb9bSmrg@item
*0bfacb9bSmrgThere are various other limitations in the region model (grep for TODO/xfail
*0bfacb9bSmrgin the testsuite).
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe constraint_manager's implementation of transitivity is currently too
*0bfacb9bSmrgexpensive to enable by default and so must be manually enabled via
*0bfacb9bSmrg@option{-fanalyzer-transitivity}).
*0bfacb9bSmrg@item
*0bfacb9bSmrgThe checkers are currently hardcoded and don't allow for user extensibility
*0bfacb9bSmrg(e.g. adding allocate/release pairs).
*0bfacb9bSmrg@item
*0bfacb9bSmrgAlthough the analyzer's test suite has a proof-of-concept test case for
*0bfacb9bSmrgLTO, LTO support hasn't had extensive testing.  There are various
*0bfacb9bSmrglang-specific things in the analyzer that assume C rather than LTO.
*0bfacb9bSmrgFor example, SSA names are printed to the user in ``raw'' form, rather
*0bfacb9bSmrgthan printing the underlying variable name.
*0bfacb9bSmrg@end itemize
*0bfacb9bSmrg
*0bfacb9bSmrgSome ideas for other checkers
*0bfacb9bSmrg@itemize @bullet
*0bfacb9bSmrg@item
*0bfacb9bSmrgFile-descriptor-based APIs
*0bfacb9bSmrg@item
*0bfacb9bSmrgLinux kernel internal APIs
*0bfacb9bSmrg@item
*0bfacb9bSmrgSignal handling
*0bfacb9bSmrg@end itemize
*0bfacb9bSmrg
*0bfacb9bSmrg@node Debugging the Analyzer
*0bfacb9bSmrg@section Debugging the Analyzer
*0bfacb9bSmrg@cindex analyzer, debugging
*0bfacb9bSmrg@cindex static analyzer, debugging
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Special Functions for Debugging the Analyzer
*0bfacb9bSmrg
*0bfacb9bSmrgThe analyzer recognizes various special functions by name, for use
*0bfacb9bSmrgin debugging the analyzer.  Declarations can be seen in the testsuite
*0bfacb9bSmrgin @file{analyzer-decls.h}.  None of these functions are actually
*0bfacb9bSmrgimplemented.
*0bfacb9bSmrg
*0bfacb9bSmrgAdd:
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  __analyzer_break ();
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgto the source being analyzed to trigger a breakpoint in the analyzer when
*0bfacb9bSmrgthat source is reached.  By putting a series of these in the source, it's
*0bfacb9bSmrgmuch easier to effectively step through the program state as it's analyzed.
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg__analyzer_dump ();
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgwill dump the copious information about the analyzer's state each time it
*0bfacb9bSmrgreaches the call in its traversal of the source.
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg__analyzer_dump_path ();
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgwill emit a placeholder ``note'' diagnostic with a path to that call site,
*0bfacb9bSmrgif the analyzer finds a feasible path to it.
*0bfacb9bSmrg
*0bfacb9bSmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning
*0bfacb9bSmrgafter analysis containing information on all of the exploded nodes at that
*0bfacb9bSmrgprogram point:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  __analyzer_dump_exploded_nodes (0);
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgwill output the number of ``processed'' nodes, and the IDs of
*0bfacb9bSmrgboth ``processed'' and ``merger'' nodes, such as:
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgWith a non-zero argument
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg  __analyzer_dump_exploded_nodes (1);
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgit will also dump all of the states within the ``processed'' nodes.
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg   __analyzer_dump_region_model ();
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgwill dump the region_model's state to stderr.
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg__analyzer_eval (expr);
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
*0bfacb9bSmrgtruthfulness of the argument.  This is useful for writing DejaGnu tests.
*0bfacb9bSmrg
*0bfacb9bSmrg
*0bfacb9bSmrg@subsection Other Debugging Techniques
*0bfacb9bSmrg
*0bfacb9bSmrgOne approach when tracking down where a particular bogus state is
*0bfacb9bSmrgintroduced into the @code{exploded_graph} is to add custom code to
*0bfacb9bSmrg@code{region_model::validate}.
*0bfacb9bSmrg
*0bfacb9bSmrgFor example, this custom code (added to @code{region_model::validate})
*0bfacb9bSmrgbreaks with an assertion failure when a variable called @code{ptr}
*0bfacb9bSmrgacquires a value that's unknown, using
*0bfacb9bSmrg@code{region_model::get_value_by_name} to locate the variable
*0bfacb9bSmrg
*0bfacb9bSmrg@smallexample
*0bfacb9bSmrg    /* Find a variable matching "ptr".  */
*0bfacb9bSmrg    svalue_id sid = get_value_by_name ("ptr");
*0bfacb9bSmrg    if (!sid.null_p ())
*0bfacb9bSmrg      @{
*0bfacb9bSmrg	svalue *sval = get_svalue (sid);
*0bfacb9bSmrg	gcc_assert (sval->get_kind () != SK_UNKNOWN);
*0bfacb9bSmrg      @}
*0bfacb9bSmrg@end smallexample
*0bfacb9bSmrg
*0bfacb9bSmrgmaking it easier to investigate further in a debugger when this occurs.