1*0bfacb9bSmrg@c Copyright (C) 2019 Free Software Foundation, Inc.
2*0bfacb9bSmrg@c This is part of the GCC manual.
3*0bfacb9bSmrg@c For copying conditions, see the file gcc.texi.
4*0bfacb9bSmrg@c Contributed by David Malcolm <dmalcolm@redhat.com>.
5*0bfacb9bSmrg
6*0bfacb9bSmrg@node Static Analyzer
7*0bfacb9bSmrg@chapter Static Analyzer
8*0bfacb9bSmrg@cindex analyzer
9*0bfacb9bSmrg@cindex static analysis
10*0bfacb9bSmrg@cindex static analyzer
11*0bfacb9bSmrg
12*0bfacb9bSmrg@menu
13*0bfacb9bSmrg* Analyzer Internals::       Analyzer Internals
14*0bfacb9bSmrg* Debugging the Analyzer::   Useful debugging tips
15*0bfacb9bSmrg@end menu
16*0bfacb9bSmrg
17*0bfacb9bSmrg@node Analyzer Internals
18*0bfacb9bSmrg@section Analyzer Internals
19*0bfacb9bSmrg@cindex analyzer, internals
20*0bfacb9bSmrg@cindex static analyzer, internals
21*0bfacb9bSmrg
22*0bfacb9bSmrg@subsection Overview
23*0bfacb9bSmrg
24*0bfacb9bSmrgThe analyzer implementation works on the gimple-SSA representation.
25*0bfacb9bSmrg(I chose this in the hopes of making it easy to work with LTO to
26*0bfacb9bSmrgdo whole-program analysis).
27*0bfacb9bSmrg
28*0bfacb9bSmrgThe implementation is read-only: it doesn't attempt to change anything,
29*0bfacb9bSmrgjust emit warnings.
30*0bfacb9bSmrg
31*0bfacb9bSmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}.
32*0bfacb9bSmrg
33*0bfacb9bSmrgFirst, we build a @code{supergraph} which combines the callgraph and all
34*0bfacb9bSmrgof the CFGs into a single directed graph, with both interprocedural and
35*0bfacb9bSmrgintraprocedural edges.  The nodes and edges in the supergraph are called
36*0bfacb9bSmrg``supernodes'' and ``superedges'', and often referred to in code as
37*0bfacb9bSmrg@code{snodes} and @code{sedges}.  Basic blocks in the CFGs are split at
38*0bfacb9bSmrginterprocedural calls, so there can be more than one supernode per
39*0bfacb9bSmrgbasic block.  Most statements will be in just one supernode, but a call
40*0bfacb9bSmrgstatement can appear in two supernodes: at the end of one for the call,
41*0bfacb9bSmrgand again at the start of another for the return.
42*0bfacb9bSmrg
43*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}.
44*0bfacb9bSmrg
45*0bfacb9bSmrgWe then build an @code{analysis_plan} which walks the callgraph to
46*0bfacb9bSmrgdetermine which calls might be suitable for being summarized (rather
47*0bfacb9bSmrgthan fully explored) and thus in what order to explore the functions.
48*0bfacb9bSmrg
49*0bfacb9bSmrgNext is the heart of the analyzer: we use a worklist to explore state
50*0bfacb9bSmrgwithin the supergraph, building an "exploded graph".
51*0bfacb9bSmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in
52*0bfacb9bSmrg     "Precise Interprocedural Dataflow Analysis via Graph Reachability"
53*0bfacb9bSmrg     (Thomas Reps, Susan Horwitz and Mooly Sagiv).
54*0bfacb9bSmrg
55*0bfacb9bSmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid
56*0bfacb9bSmrgtracking state too closely, so that (hopefully) we rapidly converge
57*0bfacb9bSmrgon a final exploded graph, and terminate the analysis.  We also bail
58*0bfacb9bSmrgout if the number of exploded <end-of-basic-block, state> nodes gets
59*0bfacb9bSmrglarger than a particular multiple of the total number of basic blocks
60*0bfacb9bSmrg(to ensure termination in the face of pathological state-explosion
61*0bfacb9bSmrgcases, or bugs).  We also stop exploring a point once we hit a limit
62*0bfacb9bSmrgof states for that point.
63*0bfacb9bSmrg
64*0bfacb9bSmrgWe can identify problems directly when processing a <point,@w{ }state>
65*0bfacb9bSmrginstance.  For example, if we're finding the successors of
66*0bfacb9bSmrg
67*0bfacb9bSmrg@smallexample
68*0bfacb9bSmrg   <point: before-stmt: "free (ptr);",
69*0bfacb9bSmrg    state: @{"ptr": freed@}>
70*0bfacb9bSmrg@end smallexample
71*0bfacb9bSmrg
72*0bfacb9bSmrgthen we can detect a double-free of "ptr".  We can then emit a path
73*0bfacb9bSmrgto reach the problem by finding the simplest route through the graph.
74*0bfacb9bSmrg
75*0bfacb9bSmrgProgram points in the analysis are much more fine-grained than in the
76*0bfacb9bSmrgCFG and supergraph, with points (and thus potentially exploded nodes)
77*0bfacb9bSmrgfor various events, including before individual statements.
78*0bfacb9bSmrgBy default the exploded graph merges multiple consecutive statements
79*0bfacb9bSmrgin a supernode into one exploded edge to minimize the size of the
80*0bfacb9bSmrgexploded graph.  This can be suppressed via
81*0bfacb9bSmrg@option{-fanalyzer-fine-grained}.
82*0bfacb9bSmrgThe fine-grained approach seems to make things simpler and more debuggable
83*0bfacb9bSmrgthat other approaches I tried, in that each point is responsible for one
84*0bfacb9bSmrgthing.
85*0bfacb9bSmrg
86*0bfacb9bSmrgProgram points in the analysis also have a "call string" identifying the
87*0bfacb9bSmrgstack of callsites below them, so that paths in the exploded graph
88*0bfacb9bSmrgcorrespond to interprocedurally valid paths: we always return to the
89*0bfacb9bSmrgcorrect call site, propagating state information accordingly.
90*0bfacb9bSmrgWe avoid infinite recursion by stopping the analysis if a callsite
91*0bfacb9bSmrgappears more than @code{analyzer-max-recursion-depth} in a callstring
92*0bfacb9bSmrg(defaulting to 2).
93*0bfacb9bSmrg
94*0bfacb9bSmrg@subsection Graphs
95*0bfacb9bSmrg
96*0bfacb9bSmrgNodes and edges in the exploded graph are called ``exploded nodes'' and
97*0bfacb9bSmrg``exploded edges'' and often referred to in the code as
98*0bfacb9bSmrg@code{enodes} and @code{eedges} (especially when distinguishing them
99*0bfacb9bSmrgfrom the @code{snodes} and @code{sedges} in the supergraph).
100*0bfacb9bSmrg
101*0bfacb9bSmrgEach graph numbers its nodes, giving unique identifiers - supernodes
102*0bfacb9bSmrgare referred to throughout dumps in the form @samp{SN': @var{index}} and
103*0bfacb9bSmrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and
104*0bfacb9bSmrg@samp{EN:29}).
105*0bfacb9bSmrg
106*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}.
107*0bfacb9bSmrg
108*0bfacb9bSmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph}
109*0bfacb9bSmrgand other dump options.  Exploded nodes are color-coded in the .dot output
110*0bfacb9bSmrgbased on state-machine states to make it easier to see state changes at
111*0bfacb9bSmrga glance.
112*0bfacb9bSmrg
113*0bfacb9bSmrg@subsection State Tracking
114*0bfacb9bSmrg
115*0bfacb9bSmrgThere's a tension between:
116*0bfacb9bSmrg@itemize @bullet
117*0bfacb9bSmrg@item
118*0bfacb9bSmrgprecision of analysis in the straight-line case, vs
119*0bfacb9bSmrg@item
120*0bfacb9bSmrgexponential blow-up in the face of control flow.
121*0bfacb9bSmrg@end itemize
122*0bfacb9bSmrg
123*0bfacb9bSmrgFor example, in general, given this CFG:
124*0bfacb9bSmrg
125*0bfacb9bSmrg@smallexample
126*0bfacb9bSmrg      A
127*0bfacb9bSmrg     / \
128*0bfacb9bSmrg    B   C
129*0bfacb9bSmrg     \ /
130*0bfacb9bSmrg      D
131*0bfacb9bSmrg     / \
132*0bfacb9bSmrg    E   F
133*0bfacb9bSmrg     \ /
134*0bfacb9bSmrg      G
135*0bfacb9bSmrg@end smallexample
136*0bfacb9bSmrg
137*0bfacb9bSmrgwe want to avoid differences in state-tracking in B and C from
138*0bfacb9bSmrgleading to blow-up.  If we don't prevent state blowup, we end up
139*0bfacb9bSmrgwith exponential growth of the exploded graph like this:
140*0bfacb9bSmrg
141*0bfacb9bSmrg@smallexample
142*0bfacb9bSmrg
143*0bfacb9bSmrg           1:A
144*0bfacb9bSmrg          /   \
145*0bfacb9bSmrg         /     \
146*0bfacb9bSmrg        /       \
147*0bfacb9bSmrg      2:B       3:C
148*0bfacb9bSmrg       |         |
149*0bfacb9bSmrg      4:D       5:D        (2 exploded nodes for D)
150*0bfacb9bSmrg     /   \     /   \
151*0bfacb9bSmrg   6:E   7:F 8:E   9:F
152*0bfacb9bSmrg    |     |   |     |
153*0bfacb9bSmrg   10:G 11:G 12:G  13:G    (4 exploded nodes for G)
154*0bfacb9bSmrg
155*0bfacb9bSmrg@end smallexample
156*0bfacb9bSmrg
157*0bfacb9bSmrgSimilar issues arise with loops.
158*0bfacb9bSmrg
159*0bfacb9bSmrgTo prevent this, we follow various approaches:
160*0bfacb9bSmrg
161*0bfacb9bSmrg@enumerate a
162*0bfacb9bSmrg@item
163*0bfacb9bSmrgstate pruning: which tries to discard state that won't be relevant
164*0bfacb9bSmrglater on withing the function.
165*0bfacb9bSmrgThis can be disabled via @option{-fno-analyzer-state-purge}.
166*0bfacb9bSmrg
167*0bfacb9bSmrg@item
168*0bfacb9bSmrgstate merging.  We can try to find the commonality between two
169*0bfacb9bSmrgprogram_state instances to make a third, simpler program_state.
170*0bfacb9bSmrgWe have two strategies here:
171*0bfacb9bSmrg
172*0bfacb9bSmrg  @enumerate
173*0bfacb9bSmrg  @item
174*0bfacb9bSmrg     the worklist keeps new nodes for the same program_point together,
175*0bfacb9bSmrg     and tries to merge them before processing, and thus before they have
176*0bfacb9bSmrg     successors.  Hence, in the above, the two nodes for D (4 and 5) reach
177*0bfacb9bSmrg     the front of the worklist together, and we create a node for D with
178*0bfacb9bSmrg     the merger of the incoming states.
179*0bfacb9bSmrg
180*0bfacb9bSmrg  @item
181*0bfacb9bSmrg     try merging with the state of existing enodes for the program_point
182*0bfacb9bSmrg     (which may have already been explored).  There will be duplication,
183*0bfacb9bSmrg     but only one set of duplication; subsequent duplicates are more likely
184*0bfacb9bSmrg     to hit the cache.  In particular, (hopefully) all merger chains are
185*0bfacb9bSmrg     finite, and so we guarantee termination.
186*0bfacb9bSmrg     This is intended to help with loops: we ought to explore the first
187*0bfacb9bSmrg     iteration, and then have a "subsequent iterations" exploration,
188*0bfacb9bSmrg     which uses a state merged from that of the first, to be more abstract.
189*0bfacb9bSmrg  @end enumerate
190*0bfacb9bSmrg
191*0bfacb9bSmrgWe avoid merging pairs of states that have state-machine differences,
192*0bfacb9bSmrgas these are the kinds of differences that are likely to be most
193*0bfacb9bSmrginteresting.  So, for example, given:
194*0bfacb9bSmrg
195*0bfacb9bSmrg@smallexample
196*0bfacb9bSmrg      if (condition)
197*0bfacb9bSmrg        ptr = malloc (size);
198*0bfacb9bSmrg      else
199*0bfacb9bSmrg        ptr = local_buf;
200*0bfacb9bSmrg
201*0bfacb9bSmrg      .... do things with 'ptr'
202*0bfacb9bSmrg
203*0bfacb9bSmrg      if (condition)
204*0bfacb9bSmrg        free (ptr);
205*0bfacb9bSmrg
206*0bfacb9bSmrg      ...etc
207*0bfacb9bSmrg@end smallexample
208*0bfacb9bSmrg
209*0bfacb9bSmrgthen we end up with an exploded graph that looks like this:
210*0bfacb9bSmrg
211*0bfacb9bSmrg@smallexample
212*0bfacb9bSmrg
213*0bfacb9bSmrg                   if (condition)
214*0bfacb9bSmrg                     / T      \ F
215*0bfacb9bSmrg            ---------          ----------
216*0bfacb9bSmrg           /                             \
217*0bfacb9bSmrg      ptr = malloc (size)             ptr = local_buf
218*0bfacb9bSmrg          |                               |
219*0bfacb9bSmrg      copy of                         copy of
220*0bfacb9bSmrg        "do things with 'ptr'"          "do things with 'ptr'"
221*0bfacb9bSmrg      with ptr: heap-allocated        with ptr: stack-allocated
222*0bfacb9bSmrg          |                               |
223*0bfacb9bSmrg      if (condition)                  if (condition)
224*0bfacb9bSmrg          | known to be T                 | known to be F
225*0bfacb9bSmrg      free (ptr);                         |
226*0bfacb9bSmrg           \                             /
227*0bfacb9bSmrg            -----------------------------
228*0bfacb9bSmrg                         | ('ptr' is pruned, so states can be merged)
229*0bfacb9bSmrg                        etc
230*0bfacb9bSmrg
231*0bfacb9bSmrg@end smallexample
232*0bfacb9bSmrg
233*0bfacb9bSmrgwhere some duplication has occurred, but only for the places where the
234*0bfacb9bSmrgthe different paths are worth exploringly separately.
235*0bfacb9bSmrg
236*0bfacb9bSmrgMerging can be disabled via @option{-fno-analyzer-state-merge}.
237*0bfacb9bSmrg@end enumerate
238*0bfacb9bSmrg
239*0bfacb9bSmrg@subsection Region Model
240*0bfacb9bSmrg
241*0bfacb9bSmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}.
242*0bfacb9bSmrgThis is an implementation of the region-based ternary model described in
243*0bfacb9bSmrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf,
244*0bfacb9bSmrg"A Memory Model for Static Analysis of C Programs"}
245*0bfacb9bSmrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang).
246*0bfacb9bSmrg
247*0bfacb9bSmrgA @code{region_model} encapsulates a representation of the state of
248*0bfacb9bSmrgmemory, with a tree of @code{region} instances, along with their associated
249*0bfacb9bSmrgvalues.  The representation is graph-like because values can be pointers
250*0bfacb9bSmrgto regions.  It also stores a constraint_manager, capturing relationships
251*0bfacb9bSmrgbetween the values.
252*0bfacb9bSmrg
253*0bfacb9bSmrgBecause each node in the @code{exploded_graph} has a @code{region_model},
254*0bfacb9bSmrgand each of the latter is graph-like, the @code{exploded_graph} is in some
255*0bfacb9bSmrgways a graph of graphs.
256*0bfacb9bSmrg
257*0bfacb9bSmrgHere's an example of printing a @code{region_model}, showing the ASCII-art
258*0bfacb9bSmrgused to visualize the region hierarchy (colorized when printing to stderr):
259*0bfacb9bSmrg
260*0bfacb9bSmrg@smallexample
261*0bfacb9bSmrg(gdb) call debug (*this)
262*0bfacb9bSmrgr0: @{kind: 'root', parent: null, sval: null@}
263*0bfacb9bSmrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@}
264*0bfacb9bSmrg|  |: sval: sv1: @{poisoned: uninit@}
265*0bfacb9bSmrg|  |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@}
266*0bfacb9bSmrg|  |  `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@}
267*0bfacb9bSmrg|  |    |: sval: sv3: @{type: 'void *', unknown@}
268*0bfacb9bSmrg|  |    |: type: 'void *'
269*0bfacb9bSmrg|  `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@}
270*0bfacb9bSmrg|    |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
271*0bfacb9bSmrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
272*0bfacb9bSmrg|    |  |: type: 'void *'
273*0bfacb9bSmrg|    |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
274*0bfacb9bSmrg|    |  |: sval: sv4: @{type: 'void *', &r6@}
275*0bfacb9bSmrg|    |  |: type: 'void *'
276*0bfacb9bSmrg|    `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@}
277*0bfacb9bSmrg|      |: sval: sv4: @{type: 'void *', &r6@}
278*0bfacb9bSmrg|      |: type: 'void *'
279*0bfacb9bSmrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@}
280*0bfacb9bSmrg  |: sval: sv2: @{poisoned: uninit@}
281*0bfacb9bSmrg  `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@}
282*0bfacb9bSmrgsvalues:
283*0bfacb9bSmrg  sv0: @{type: 'size_t', '1024'@}
284*0bfacb9bSmrg  sv1: @{poisoned: uninit@}
285*0bfacb9bSmrg  sv2: @{poisoned: uninit@}
286*0bfacb9bSmrg  sv3: @{type: 'void *', unknown@}
287*0bfacb9bSmrg  sv4: @{type: 'void *', &r6@}
288*0bfacb9bSmrgconstraint manager:
289*0bfacb9bSmrg  equiv classes:
290*0bfacb9bSmrg    ec0: @{sv0 == '1024'@}
291*0bfacb9bSmrg    ec1: @{sv4@}
292*0bfacb9bSmrg  constraints:
293*0bfacb9bSmrg@end smallexample
294*0bfacb9bSmrg
295*0bfacb9bSmrgThis is the state at the point of returning from @code{calls_malloc} back
296*0bfacb9bSmrgto @code{test} in the following:
297*0bfacb9bSmrg
298*0bfacb9bSmrg@smallexample
299*0bfacb9bSmrgvoid *
300*0bfacb9bSmrgcalls_malloc (void)
301*0bfacb9bSmrg@{
302*0bfacb9bSmrg  void *result = malloc (1024);
303*0bfacb9bSmrg  return result;
304*0bfacb9bSmrg@}
305*0bfacb9bSmrg
306*0bfacb9bSmrgvoid test (void)
307*0bfacb9bSmrg@{
308*0bfacb9bSmrg  void *ptr = calls_malloc ();
309*0bfacb9bSmrg  /* etc.  */
310*0bfacb9bSmrg@}
311*0bfacb9bSmrg@end smallexample
312*0bfacb9bSmrg
313*0bfacb9bSmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two
314*0bfacb9bSmrgchildren: a frame for @code{test} (``r2''), and a frame for
315*0bfacb9bSmrg@code{calls_malloc} (``r4'').  These frame regions have child regions for
316*0bfacb9bSmrgstoring their local variables.  For example, the return region
317*0bfacb9bSmrgand that of various other regions within the ``calls_malloc'' frame all have
318*0bfacb9bSmrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''.  Within the parent
319*0bfacb9bSmrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}.
320*0bfacb9bSmrg
321*0bfacb9bSmrg@subsection Analyzer Paths
322*0bfacb9bSmrg
323*0bfacb9bSmrgWe need to explain to the user what the problem is, and to persuade them
324*0bfacb9bSmrgthat there really is a problem.  Hence having a @code{diagnostic_path}
325*0bfacb9bSmrgisn't just an incidental detail of the analyzer; it's required.
326*0bfacb9bSmrg
327*0bfacb9bSmrgPaths ought to be:
328*0bfacb9bSmrg@itemize @bullet
329*0bfacb9bSmrg@item
330*0bfacb9bSmrginterprocedurally-valid
331*0bfacb9bSmrg@item
332*0bfacb9bSmrgfeasible
333*0bfacb9bSmrg@end itemize
334*0bfacb9bSmrg
335*0bfacb9bSmrgWithout state-merging, all paths in the exploded graph are feasible
336*0bfacb9bSmrg(in terms of constraints being satisified).
337*0bfacb9bSmrgWith state-merging, paths in the exploded graph can be infeasible.
338*0bfacb9bSmrg
339*0bfacb9bSmrgWe collate warnings and only emit them for the simplest path
340*0bfacb9bSmrge.g. for a bug in a utility function, with lots of routes to calling it,
341*0bfacb9bSmrgwe only emit the simplest path (which could be intraprocedural, if
342*0bfacb9bSmrgit can be reproduced without a caller).  We apply a check that
343*0bfacb9bSmrgeach duplicate warning's shortest path is feasible, rejecting any
344*0bfacb9bSmrgwarnings for which the shortest path is infeasible (which could lead to
345*0bfacb9bSmrgfalse negatives).
346*0bfacb9bSmrg
347*0bfacb9bSmrgWe use the shortest feasible @code{exploded_path} through the
348*0bfacb9bSmrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a
349*0bfacb9bSmrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) -
350*0bfacb9bSmrgspecifically a @code{checker_path}.
351*0bfacb9bSmrg
352*0bfacb9bSmrgHaving built the @code{checker_path}, we prune it to try to eliminate
353*0bfacb9bSmrgevents that aren't relevant, to minimize how much the user has to read.
354*0bfacb9bSmrg
355*0bfacb9bSmrgAfter pruning, we notify each event in the path of its ID and record the
356*0bfacb9bSmrgIDs of interesting events, allowing for events to refer to other events
357*0bfacb9bSmrgin their descriptions.  The @code{pending_diagnostic} class has various
358*0bfacb9bSmrgvfuncs to support emitting more precise descriptions, so that e.g.
359*0bfacb9bSmrg
360*0bfacb9bSmrg@itemize @bullet
361*0bfacb9bSmrg@item
362*0bfacb9bSmrga deref-of-unchecked-malloc diagnostic might use:
363*0bfacb9bSmrg@smallexample
364*0bfacb9bSmrg  returning possibly-NULL pointer to 'make_obj' from 'allocator'
365*0bfacb9bSmrg@end smallexample
366*0bfacb9bSmrgfor a @code{return_event} to make it clearer how the unchecked value moves
367*0bfacb9bSmrgfrom callee back to caller
368*0bfacb9bSmrg@item
369*0bfacb9bSmrga double-free diagnostic might use:
370*0bfacb9bSmrg@smallexample
371*0bfacb9bSmrg  second 'free' here; first 'free' was at (3)
372*0bfacb9bSmrg@end smallexample
373*0bfacb9bSmrgand a use-after-free might use
374*0bfacb9bSmrg@smallexample
375*0bfacb9bSmrg  use after 'free' here; memory was freed at (2)
376*0bfacb9bSmrg@end smallexample
377*0bfacb9bSmrg@end itemize
378*0bfacb9bSmrg
379*0bfacb9bSmrgAt this point we can emit the diagnostic.
380*0bfacb9bSmrg
381*0bfacb9bSmrg@subsection Limitations
382*0bfacb9bSmrg
383*0bfacb9bSmrg@itemize @bullet
384*0bfacb9bSmrg@item
385*0bfacb9bSmrgOnly for C so far
386*0bfacb9bSmrg@item
387*0bfacb9bSmrgThe implementation of call summaries is currently very simplistic.
388*0bfacb9bSmrg@item
389*0bfacb9bSmrgLack of function pointer analysis
390*0bfacb9bSmrg@item
391*0bfacb9bSmrgThe constraint-handling code assumes reflexivity in some places
392*0bfacb9bSmrg(that values are equal to themselves), which is not the case for NaN.
393*0bfacb9bSmrgAs a simple workaround, constraints on floating-point values are
394*0bfacb9bSmrgcurrently ignored.
395*0bfacb9bSmrg@item
396*0bfacb9bSmrgThe region model code creates lots of little mutable objects at each
397*0bfacb9bSmrg@code{region_model} (and thus per @code{exploded_node}) rather than
398*0bfacb9bSmrgsharing immutable objects and having the mutable state in the
399*0bfacb9bSmrg@code{program_state} or @code{region_model}.  The latter approach might be
400*0bfacb9bSmrgmore efficient, and might avoid dealing with IDs rather than pointers
401*0bfacb9bSmrg(which requires us to impose an ordering to get meaningful equality).
402*0bfacb9bSmrg@item
403*0bfacb9bSmrgThe region model code doesn't yet support @code{memcpy}.  At the
404*0bfacb9bSmrggimple-ssa level these have been optimized to statements like this:
405*0bfacb9bSmrg@smallexample
406*0bfacb9bSmrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c]
407*0bfacb9bSmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10;
408*0bfacb9bSmrg@end smallexample
409*0bfacb9bSmrgPerhaps they could be supported via a new @code{compound_svalue} type.
410*0bfacb9bSmrg@item
411*0bfacb9bSmrgThere are various other limitations in the region model (grep for TODO/xfail
412*0bfacb9bSmrgin the testsuite).
413*0bfacb9bSmrg@item
414*0bfacb9bSmrgThe constraint_manager's implementation of transitivity is currently too
415*0bfacb9bSmrgexpensive to enable by default and so must be manually enabled via
416*0bfacb9bSmrg@option{-fanalyzer-transitivity}).
417*0bfacb9bSmrg@item
418*0bfacb9bSmrgThe checkers are currently hardcoded and don't allow for user extensibility
419*0bfacb9bSmrg(e.g. adding allocate/release pairs).
420*0bfacb9bSmrg@item
421*0bfacb9bSmrgAlthough the analyzer's test suite has a proof-of-concept test case for
422*0bfacb9bSmrgLTO, LTO support hasn't had extensive testing.  There are various
423*0bfacb9bSmrglang-specific things in the analyzer that assume C rather than LTO.
424*0bfacb9bSmrgFor example, SSA names are printed to the user in ``raw'' form, rather
425*0bfacb9bSmrgthan printing the underlying variable name.
426*0bfacb9bSmrg@end itemize
427*0bfacb9bSmrg
428*0bfacb9bSmrgSome ideas for other checkers
429*0bfacb9bSmrg@itemize @bullet
430*0bfacb9bSmrg@item
431*0bfacb9bSmrgFile-descriptor-based APIs
432*0bfacb9bSmrg@item
433*0bfacb9bSmrgLinux kernel internal APIs
434*0bfacb9bSmrg@item
435*0bfacb9bSmrgSignal handling
436*0bfacb9bSmrg@end itemize
437*0bfacb9bSmrg
438*0bfacb9bSmrg@node Debugging the Analyzer
439*0bfacb9bSmrg@section Debugging the Analyzer
440*0bfacb9bSmrg@cindex analyzer, debugging
441*0bfacb9bSmrg@cindex static analyzer, debugging
442*0bfacb9bSmrg
443*0bfacb9bSmrg@subsection Special Functions for Debugging the Analyzer
444*0bfacb9bSmrg
445*0bfacb9bSmrgThe analyzer recognizes various special functions by name, for use
446*0bfacb9bSmrgin debugging the analyzer.  Declarations can be seen in the testsuite
447*0bfacb9bSmrgin @file{analyzer-decls.h}.  None of these functions are actually
448*0bfacb9bSmrgimplemented.
449*0bfacb9bSmrg
450*0bfacb9bSmrgAdd:
451*0bfacb9bSmrg@smallexample
452*0bfacb9bSmrg  __analyzer_break ();
453*0bfacb9bSmrg@end smallexample
454*0bfacb9bSmrgto the source being analyzed to trigger a breakpoint in the analyzer when
455*0bfacb9bSmrgthat source is reached.  By putting a series of these in the source, it's
456*0bfacb9bSmrgmuch easier to effectively step through the program state as it's analyzed.
457*0bfacb9bSmrg
458*0bfacb9bSmrg@smallexample
459*0bfacb9bSmrg__analyzer_dump ();
460*0bfacb9bSmrg@end smallexample
461*0bfacb9bSmrg
462*0bfacb9bSmrgwill dump the copious information about the analyzer's state each time it
463*0bfacb9bSmrgreaches the call in its traversal of the source.
464*0bfacb9bSmrg
465*0bfacb9bSmrg@smallexample
466*0bfacb9bSmrg__analyzer_dump_path ();
467*0bfacb9bSmrg@end smallexample
468*0bfacb9bSmrg
469*0bfacb9bSmrgwill emit a placeholder ``note'' diagnostic with a path to that call site,
470*0bfacb9bSmrgif the analyzer finds a feasible path to it.
471*0bfacb9bSmrg
472*0bfacb9bSmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning
473*0bfacb9bSmrgafter analysis containing information on all of the exploded nodes at that
474*0bfacb9bSmrgprogram point:
475*0bfacb9bSmrg
476*0bfacb9bSmrg@smallexample
477*0bfacb9bSmrg  __analyzer_dump_exploded_nodes (0);
478*0bfacb9bSmrg@end smallexample
479*0bfacb9bSmrg
480*0bfacb9bSmrgwill output the number of ``processed'' nodes, and the IDs of
481*0bfacb9bSmrgboth ``processed'' and ``merger'' nodes, such as:
482*0bfacb9bSmrg
483*0bfacb9bSmrg@smallexample
484*0bfacb9bSmrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59]
485*0bfacb9bSmrg@end smallexample
486*0bfacb9bSmrg
487*0bfacb9bSmrgWith a non-zero argument
488*0bfacb9bSmrg
489*0bfacb9bSmrg@smallexample
490*0bfacb9bSmrg  __analyzer_dump_exploded_nodes (1);
491*0bfacb9bSmrg@end smallexample
492*0bfacb9bSmrg
493*0bfacb9bSmrgit will also dump all of the states within the ``processed'' nodes.
494*0bfacb9bSmrg
495*0bfacb9bSmrg@smallexample
496*0bfacb9bSmrg   __analyzer_dump_region_model ();
497*0bfacb9bSmrg@end smallexample
498*0bfacb9bSmrgwill dump the region_model's state to stderr.
499*0bfacb9bSmrg
500*0bfacb9bSmrg@smallexample
501*0bfacb9bSmrg__analyzer_eval (expr);
502*0bfacb9bSmrg@end smallexample
503*0bfacb9bSmrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the
504*0bfacb9bSmrgtruthfulness of the argument.  This is useful for writing DejaGnu tests.
505*0bfacb9bSmrg
506*0bfacb9bSmrg
507*0bfacb9bSmrg@subsection Other Debugging Techniques
508*0bfacb9bSmrg
509*0bfacb9bSmrgOne approach when tracking down where a particular bogus state is
510*0bfacb9bSmrgintroduced into the @code{exploded_graph} is to add custom code to
511*0bfacb9bSmrg@code{region_model::validate}.
512*0bfacb9bSmrg
513*0bfacb9bSmrgFor example, this custom code (added to @code{region_model::validate})
514*0bfacb9bSmrgbreaks with an assertion failure when a variable called @code{ptr}
515*0bfacb9bSmrgacquires a value that's unknown, using
516*0bfacb9bSmrg@code{region_model::get_value_by_name} to locate the variable
517*0bfacb9bSmrg
518*0bfacb9bSmrg@smallexample
519*0bfacb9bSmrg    /* Find a variable matching "ptr".  */
520*0bfacb9bSmrg    svalue_id sid = get_value_by_name ("ptr");
521*0bfacb9bSmrg    if (!sid.null_p ())
522*0bfacb9bSmrg      @{
523*0bfacb9bSmrg	svalue *sval = get_svalue (sid);
524*0bfacb9bSmrg	gcc_assert (sval->get_kind () != SK_UNKNOWN);
525*0bfacb9bSmrg      @}
526*0bfacb9bSmrg@end smallexample
527*0bfacb9bSmrg
528*0bfacb9bSmrgmaking it easier to investigate further in a debugger when this occurs.
529