1*0bfacb9bSmrg@c Copyright (C) 2019 Free Software Foundation, Inc. 2*0bfacb9bSmrg@c This is part of the GCC manual. 3*0bfacb9bSmrg@c For copying conditions, see the file gcc.texi. 4*0bfacb9bSmrg@c Contributed by David Malcolm <dmalcolm@redhat.com>. 5*0bfacb9bSmrg 6*0bfacb9bSmrg@node Static Analyzer 7*0bfacb9bSmrg@chapter Static Analyzer 8*0bfacb9bSmrg@cindex analyzer 9*0bfacb9bSmrg@cindex static analysis 10*0bfacb9bSmrg@cindex static analyzer 11*0bfacb9bSmrg 12*0bfacb9bSmrg@menu 13*0bfacb9bSmrg* Analyzer Internals:: Analyzer Internals 14*0bfacb9bSmrg* Debugging the Analyzer:: Useful debugging tips 15*0bfacb9bSmrg@end menu 16*0bfacb9bSmrg 17*0bfacb9bSmrg@node Analyzer Internals 18*0bfacb9bSmrg@section Analyzer Internals 19*0bfacb9bSmrg@cindex analyzer, internals 20*0bfacb9bSmrg@cindex static analyzer, internals 21*0bfacb9bSmrg 22*0bfacb9bSmrg@subsection Overview 23*0bfacb9bSmrg 24*0bfacb9bSmrgThe analyzer implementation works on the gimple-SSA representation. 25*0bfacb9bSmrg(I chose this in the hopes of making it easy to work with LTO to 26*0bfacb9bSmrgdo whole-program analysis). 27*0bfacb9bSmrg 28*0bfacb9bSmrgThe implementation is read-only: it doesn't attempt to change anything, 29*0bfacb9bSmrgjust emit warnings. 30*0bfacb9bSmrg 31*0bfacb9bSmrgThe gimple representation can be seen using @option{-fdump-ipa-analyzer}. 32*0bfacb9bSmrg 33*0bfacb9bSmrgFirst, we build a @code{supergraph} which combines the callgraph and all 34*0bfacb9bSmrgof the CFGs into a single directed graph, with both interprocedural and 35*0bfacb9bSmrgintraprocedural edges. The nodes and edges in the supergraph are called 36*0bfacb9bSmrg``supernodes'' and ``superedges'', and often referred to in code as 37*0bfacb9bSmrg@code{snodes} and @code{sedges}. Basic blocks in the CFGs are split at 38*0bfacb9bSmrginterprocedural calls, so there can be more than one supernode per 39*0bfacb9bSmrgbasic block. Most statements will be in just one supernode, but a call 40*0bfacb9bSmrgstatement can appear in two supernodes: at the end of one for the call, 41*0bfacb9bSmrgand again at the start of another for the return. 42*0bfacb9bSmrg 43*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph}. 44*0bfacb9bSmrg 45*0bfacb9bSmrgWe then build an @code{analysis_plan} which walks the callgraph to 46*0bfacb9bSmrgdetermine which calls might be suitable for being summarized (rather 47*0bfacb9bSmrgthan fully explored) and thus in what order to explore the functions. 48*0bfacb9bSmrg 49*0bfacb9bSmrgNext is the heart of the analyzer: we use a worklist to explore state 50*0bfacb9bSmrgwithin the supergraph, building an "exploded graph". 51*0bfacb9bSmrgNodes in the exploded graph correspond to <point,@w{ }state> pairs, as in 52*0bfacb9bSmrg "Precise Interprocedural Dataflow Analysis via Graph Reachability" 53*0bfacb9bSmrg (Thomas Reps, Susan Horwitz and Mooly Sagiv). 54*0bfacb9bSmrg 55*0bfacb9bSmrgWe reuse nodes for <point, state> pairs we've already seen, and avoid 56*0bfacb9bSmrgtracking state too closely, so that (hopefully) we rapidly converge 57*0bfacb9bSmrgon a final exploded graph, and terminate the analysis. We also bail 58*0bfacb9bSmrgout if the number of exploded <end-of-basic-block, state> nodes gets 59*0bfacb9bSmrglarger than a particular multiple of the total number of basic blocks 60*0bfacb9bSmrg(to ensure termination in the face of pathological state-explosion 61*0bfacb9bSmrgcases, or bugs). We also stop exploring a point once we hit a limit 62*0bfacb9bSmrgof states for that point. 63*0bfacb9bSmrg 64*0bfacb9bSmrgWe can identify problems directly when processing a <point,@w{ }state> 65*0bfacb9bSmrginstance. For example, if we're finding the successors of 66*0bfacb9bSmrg 67*0bfacb9bSmrg@smallexample 68*0bfacb9bSmrg <point: before-stmt: "free (ptr);", 69*0bfacb9bSmrg state: @{"ptr": freed@}> 70*0bfacb9bSmrg@end smallexample 71*0bfacb9bSmrg 72*0bfacb9bSmrgthen we can detect a double-free of "ptr". We can then emit a path 73*0bfacb9bSmrgto reach the problem by finding the simplest route through the graph. 74*0bfacb9bSmrg 75*0bfacb9bSmrgProgram points in the analysis are much more fine-grained than in the 76*0bfacb9bSmrgCFG and supergraph, with points (and thus potentially exploded nodes) 77*0bfacb9bSmrgfor various events, including before individual statements. 78*0bfacb9bSmrgBy default the exploded graph merges multiple consecutive statements 79*0bfacb9bSmrgin a supernode into one exploded edge to minimize the size of the 80*0bfacb9bSmrgexploded graph. This can be suppressed via 81*0bfacb9bSmrg@option{-fanalyzer-fine-grained}. 82*0bfacb9bSmrgThe fine-grained approach seems to make things simpler and more debuggable 83*0bfacb9bSmrgthat other approaches I tried, in that each point is responsible for one 84*0bfacb9bSmrgthing. 85*0bfacb9bSmrg 86*0bfacb9bSmrgProgram points in the analysis also have a "call string" identifying the 87*0bfacb9bSmrgstack of callsites below them, so that paths in the exploded graph 88*0bfacb9bSmrgcorrespond to interprocedurally valid paths: we always return to the 89*0bfacb9bSmrgcorrect call site, propagating state information accordingly. 90*0bfacb9bSmrgWe avoid infinite recursion by stopping the analysis if a callsite 91*0bfacb9bSmrgappears more than @code{analyzer-max-recursion-depth} in a callstring 92*0bfacb9bSmrg(defaulting to 2). 93*0bfacb9bSmrg 94*0bfacb9bSmrg@subsection Graphs 95*0bfacb9bSmrg 96*0bfacb9bSmrgNodes and edges in the exploded graph are called ``exploded nodes'' and 97*0bfacb9bSmrg``exploded edges'' and often referred to in the code as 98*0bfacb9bSmrg@code{enodes} and @code{eedges} (especially when distinguishing them 99*0bfacb9bSmrgfrom the @code{snodes} and @code{sedges} in the supergraph). 100*0bfacb9bSmrg 101*0bfacb9bSmrgEach graph numbers its nodes, giving unique identifiers - supernodes 102*0bfacb9bSmrgare referred to throughout dumps in the form @samp{SN': @var{index}} and 103*0bfacb9bSmrgexploded nodes in the form @samp{EN: @var{index}} (e.g. @samp{SN: 2} and 104*0bfacb9bSmrg@samp{EN:29}). 105*0bfacb9bSmrg 106*0bfacb9bSmrgThe supergraph can be seen using @option{-fdump-analyzer-supergraph-graph}. 107*0bfacb9bSmrg 108*0bfacb9bSmrgThe exploded graph can be seen using @option{-fdump-analyzer-exploded-graph} 109*0bfacb9bSmrgand other dump options. Exploded nodes are color-coded in the .dot output 110*0bfacb9bSmrgbased on state-machine states to make it easier to see state changes at 111*0bfacb9bSmrga glance. 112*0bfacb9bSmrg 113*0bfacb9bSmrg@subsection State Tracking 114*0bfacb9bSmrg 115*0bfacb9bSmrgThere's a tension between: 116*0bfacb9bSmrg@itemize @bullet 117*0bfacb9bSmrg@item 118*0bfacb9bSmrgprecision of analysis in the straight-line case, vs 119*0bfacb9bSmrg@item 120*0bfacb9bSmrgexponential blow-up in the face of control flow. 121*0bfacb9bSmrg@end itemize 122*0bfacb9bSmrg 123*0bfacb9bSmrgFor example, in general, given this CFG: 124*0bfacb9bSmrg 125*0bfacb9bSmrg@smallexample 126*0bfacb9bSmrg A 127*0bfacb9bSmrg / \ 128*0bfacb9bSmrg B C 129*0bfacb9bSmrg \ / 130*0bfacb9bSmrg D 131*0bfacb9bSmrg / \ 132*0bfacb9bSmrg E F 133*0bfacb9bSmrg \ / 134*0bfacb9bSmrg G 135*0bfacb9bSmrg@end smallexample 136*0bfacb9bSmrg 137*0bfacb9bSmrgwe want to avoid differences in state-tracking in B and C from 138*0bfacb9bSmrgleading to blow-up. If we don't prevent state blowup, we end up 139*0bfacb9bSmrgwith exponential growth of the exploded graph like this: 140*0bfacb9bSmrg 141*0bfacb9bSmrg@smallexample 142*0bfacb9bSmrg 143*0bfacb9bSmrg 1:A 144*0bfacb9bSmrg / \ 145*0bfacb9bSmrg / \ 146*0bfacb9bSmrg / \ 147*0bfacb9bSmrg 2:B 3:C 148*0bfacb9bSmrg | | 149*0bfacb9bSmrg 4:D 5:D (2 exploded nodes for D) 150*0bfacb9bSmrg / \ / \ 151*0bfacb9bSmrg 6:E 7:F 8:E 9:F 152*0bfacb9bSmrg | | | | 153*0bfacb9bSmrg 10:G 11:G 12:G 13:G (4 exploded nodes for G) 154*0bfacb9bSmrg 155*0bfacb9bSmrg@end smallexample 156*0bfacb9bSmrg 157*0bfacb9bSmrgSimilar issues arise with loops. 158*0bfacb9bSmrg 159*0bfacb9bSmrgTo prevent this, we follow various approaches: 160*0bfacb9bSmrg 161*0bfacb9bSmrg@enumerate a 162*0bfacb9bSmrg@item 163*0bfacb9bSmrgstate pruning: which tries to discard state that won't be relevant 164*0bfacb9bSmrglater on withing the function. 165*0bfacb9bSmrgThis can be disabled via @option{-fno-analyzer-state-purge}. 166*0bfacb9bSmrg 167*0bfacb9bSmrg@item 168*0bfacb9bSmrgstate merging. We can try to find the commonality between two 169*0bfacb9bSmrgprogram_state instances to make a third, simpler program_state. 170*0bfacb9bSmrgWe have two strategies here: 171*0bfacb9bSmrg 172*0bfacb9bSmrg @enumerate 173*0bfacb9bSmrg @item 174*0bfacb9bSmrg the worklist keeps new nodes for the same program_point together, 175*0bfacb9bSmrg and tries to merge them before processing, and thus before they have 176*0bfacb9bSmrg successors. Hence, in the above, the two nodes for D (4 and 5) reach 177*0bfacb9bSmrg the front of the worklist together, and we create a node for D with 178*0bfacb9bSmrg the merger of the incoming states. 179*0bfacb9bSmrg 180*0bfacb9bSmrg @item 181*0bfacb9bSmrg try merging with the state of existing enodes for the program_point 182*0bfacb9bSmrg (which may have already been explored). There will be duplication, 183*0bfacb9bSmrg but only one set of duplication; subsequent duplicates are more likely 184*0bfacb9bSmrg to hit the cache. In particular, (hopefully) all merger chains are 185*0bfacb9bSmrg finite, and so we guarantee termination. 186*0bfacb9bSmrg This is intended to help with loops: we ought to explore the first 187*0bfacb9bSmrg iteration, and then have a "subsequent iterations" exploration, 188*0bfacb9bSmrg which uses a state merged from that of the first, to be more abstract. 189*0bfacb9bSmrg @end enumerate 190*0bfacb9bSmrg 191*0bfacb9bSmrgWe avoid merging pairs of states that have state-machine differences, 192*0bfacb9bSmrgas these are the kinds of differences that are likely to be most 193*0bfacb9bSmrginteresting. So, for example, given: 194*0bfacb9bSmrg 195*0bfacb9bSmrg@smallexample 196*0bfacb9bSmrg if (condition) 197*0bfacb9bSmrg ptr = malloc (size); 198*0bfacb9bSmrg else 199*0bfacb9bSmrg ptr = local_buf; 200*0bfacb9bSmrg 201*0bfacb9bSmrg .... do things with 'ptr' 202*0bfacb9bSmrg 203*0bfacb9bSmrg if (condition) 204*0bfacb9bSmrg free (ptr); 205*0bfacb9bSmrg 206*0bfacb9bSmrg ...etc 207*0bfacb9bSmrg@end smallexample 208*0bfacb9bSmrg 209*0bfacb9bSmrgthen we end up with an exploded graph that looks like this: 210*0bfacb9bSmrg 211*0bfacb9bSmrg@smallexample 212*0bfacb9bSmrg 213*0bfacb9bSmrg if (condition) 214*0bfacb9bSmrg / T \ F 215*0bfacb9bSmrg --------- ---------- 216*0bfacb9bSmrg / \ 217*0bfacb9bSmrg ptr = malloc (size) ptr = local_buf 218*0bfacb9bSmrg | | 219*0bfacb9bSmrg copy of copy of 220*0bfacb9bSmrg "do things with 'ptr'" "do things with 'ptr'" 221*0bfacb9bSmrg with ptr: heap-allocated with ptr: stack-allocated 222*0bfacb9bSmrg | | 223*0bfacb9bSmrg if (condition) if (condition) 224*0bfacb9bSmrg | known to be T | known to be F 225*0bfacb9bSmrg free (ptr); | 226*0bfacb9bSmrg \ / 227*0bfacb9bSmrg ----------------------------- 228*0bfacb9bSmrg | ('ptr' is pruned, so states can be merged) 229*0bfacb9bSmrg etc 230*0bfacb9bSmrg 231*0bfacb9bSmrg@end smallexample 232*0bfacb9bSmrg 233*0bfacb9bSmrgwhere some duplication has occurred, but only for the places where the 234*0bfacb9bSmrgthe different paths are worth exploringly separately. 235*0bfacb9bSmrg 236*0bfacb9bSmrgMerging can be disabled via @option{-fno-analyzer-state-merge}. 237*0bfacb9bSmrg@end enumerate 238*0bfacb9bSmrg 239*0bfacb9bSmrg@subsection Region Model 240*0bfacb9bSmrg 241*0bfacb9bSmrgPart of the state stored at a @code{exploded_node} is a @code{region_model}. 242*0bfacb9bSmrgThis is an implementation of the region-based ternary model described in 243*0bfacb9bSmrg@url{http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf, 244*0bfacb9bSmrg"A Memory Model for Static Analysis of C Programs"} 245*0bfacb9bSmrg(Zhongxing Xu, Ted Kremenek, and Jian Zhang). 246*0bfacb9bSmrg 247*0bfacb9bSmrgA @code{region_model} encapsulates a representation of the state of 248*0bfacb9bSmrgmemory, with a tree of @code{region} instances, along with their associated 249*0bfacb9bSmrgvalues. The representation is graph-like because values can be pointers 250*0bfacb9bSmrgto regions. It also stores a constraint_manager, capturing relationships 251*0bfacb9bSmrgbetween the values. 252*0bfacb9bSmrg 253*0bfacb9bSmrgBecause each node in the @code{exploded_graph} has a @code{region_model}, 254*0bfacb9bSmrgand each of the latter is graph-like, the @code{exploded_graph} is in some 255*0bfacb9bSmrgways a graph of graphs. 256*0bfacb9bSmrg 257*0bfacb9bSmrgHere's an example of printing a @code{region_model}, showing the ASCII-art 258*0bfacb9bSmrgused to visualize the region hierarchy (colorized when printing to stderr): 259*0bfacb9bSmrg 260*0bfacb9bSmrg@smallexample 261*0bfacb9bSmrg(gdb) call debug (*this) 262*0bfacb9bSmrgr0: @{kind: 'root', parent: null, sval: null@} 263*0bfacb9bSmrg|-stack: r1: @{kind: 'stack', parent: r0, sval: sv1@} 264*0bfacb9bSmrg| |: sval: sv1: @{poisoned: uninit@} 265*0bfacb9bSmrg| |-frame for 'test': r2: @{kind: 'frame', parent: r1, sval: null, map: @{'ptr_3': r3@}, function: 'test', depth: 0@} 266*0bfacb9bSmrg| | `-'ptr_3': r3: @{kind: 'map', parent: r2, sval: sv3, type: 'void *', map: @{@}@} 267*0bfacb9bSmrg| | |: sval: sv3: @{type: 'void *', unknown@} 268*0bfacb9bSmrg| | |: type: 'void *' 269*0bfacb9bSmrg| `-frame for 'calls_malloc': r4: @{kind: 'frame', parent: r1, sval: null, map: @{'result_3': r7, '_4': r8, '<anonymous>': r5@}, function: 'calls_malloc', depth: 1@} 270*0bfacb9bSmrg| |-'<anonymous>': r5: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 271*0bfacb9bSmrg| | |: sval: sv4: @{type: 'void *', &r6@} 272*0bfacb9bSmrg| | |: type: 'void *' 273*0bfacb9bSmrg| |-'result_3': r7: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 274*0bfacb9bSmrg| | |: sval: sv4: @{type: 'void *', &r6@} 275*0bfacb9bSmrg| | |: type: 'void *' 276*0bfacb9bSmrg| `-'_4': r8: @{kind: 'map', parent: r4, sval: sv4, type: 'void *', map: @{@}@} 277*0bfacb9bSmrg| |: sval: sv4: @{type: 'void *', &r6@} 278*0bfacb9bSmrg| |: type: 'void *' 279*0bfacb9bSmrg`-heap: r9: @{kind: 'heap', parent: r0, sval: sv2@} 280*0bfacb9bSmrg |: sval: sv2: @{poisoned: uninit@} 281*0bfacb9bSmrg `-r6: @{kind: 'symbolic', parent: r9, sval: null, map: @{@}@} 282*0bfacb9bSmrgsvalues: 283*0bfacb9bSmrg sv0: @{type: 'size_t', '1024'@} 284*0bfacb9bSmrg sv1: @{poisoned: uninit@} 285*0bfacb9bSmrg sv2: @{poisoned: uninit@} 286*0bfacb9bSmrg sv3: @{type: 'void *', unknown@} 287*0bfacb9bSmrg sv4: @{type: 'void *', &r6@} 288*0bfacb9bSmrgconstraint manager: 289*0bfacb9bSmrg equiv classes: 290*0bfacb9bSmrg ec0: @{sv0 == '1024'@} 291*0bfacb9bSmrg ec1: @{sv4@} 292*0bfacb9bSmrg constraints: 293*0bfacb9bSmrg@end smallexample 294*0bfacb9bSmrg 295*0bfacb9bSmrgThis is the state at the point of returning from @code{calls_malloc} back 296*0bfacb9bSmrgto @code{test} in the following: 297*0bfacb9bSmrg 298*0bfacb9bSmrg@smallexample 299*0bfacb9bSmrgvoid * 300*0bfacb9bSmrgcalls_malloc (void) 301*0bfacb9bSmrg@{ 302*0bfacb9bSmrg void *result = malloc (1024); 303*0bfacb9bSmrg return result; 304*0bfacb9bSmrg@} 305*0bfacb9bSmrg 306*0bfacb9bSmrgvoid test (void) 307*0bfacb9bSmrg@{ 308*0bfacb9bSmrg void *ptr = calls_malloc (); 309*0bfacb9bSmrg /* etc. */ 310*0bfacb9bSmrg@} 311*0bfacb9bSmrg@end smallexample 312*0bfacb9bSmrg 313*0bfacb9bSmrgThe ``root'' region (``r0'') has a ``stack'' child (``r1''), with two 314*0bfacb9bSmrgchildren: a frame for @code{test} (``r2''), and a frame for 315*0bfacb9bSmrg@code{calls_malloc} (``r4''). These frame regions have child regions for 316*0bfacb9bSmrgstoring their local variables. For example, the return region 317*0bfacb9bSmrgand that of various other regions within the ``calls_malloc'' frame all have 318*0bfacb9bSmrgvalue ``sv4'', a pointer to a heap-allocated region ``r6''. Within the parent 319*0bfacb9bSmrgframe, @code{ptr_3} has value ``sv3'', an unknown @code{void *}. 320*0bfacb9bSmrg 321*0bfacb9bSmrg@subsection Analyzer Paths 322*0bfacb9bSmrg 323*0bfacb9bSmrgWe need to explain to the user what the problem is, and to persuade them 324*0bfacb9bSmrgthat there really is a problem. Hence having a @code{diagnostic_path} 325*0bfacb9bSmrgisn't just an incidental detail of the analyzer; it's required. 326*0bfacb9bSmrg 327*0bfacb9bSmrgPaths ought to be: 328*0bfacb9bSmrg@itemize @bullet 329*0bfacb9bSmrg@item 330*0bfacb9bSmrginterprocedurally-valid 331*0bfacb9bSmrg@item 332*0bfacb9bSmrgfeasible 333*0bfacb9bSmrg@end itemize 334*0bfacb9bSmrg 335*0bfacb9bSmrgWithout state-merging, all paths in the exploded graph are feasible 336*0bfacb9bSmrg(in terms of constraints being satisified). 337*0bfacb9bSmrgWith state-merging, paths in the exploded graph can be infeasible. 338*0bfacb9bSmrg 339*0bfacb9bSmrgWe collate warnings and only emit them for the simplest path 340*0bfacb9bSmrge.g. for a bug in a utility function, with lots of routes to calling it, 341*0bfacb9bSmrgwe only emit the simplest path (which could be intraprocedural, if 342*0bfacb9bSmrgit can be reproduced without a caller). We apply a check that 343*0bfacb9bSmrgeach duplicate warning's shortest path is feasible, rejecting any 344*0bfacb9bSmrgwarnings for which the shortest path is infeasible (which could lead to 345*0bfacb9bSmrgfalse negatives). 346*0bfacb9bSmrg 347*0bfacb9bSmrgWe use the shortest feasible @code{exploded_path} through the 348*0bfacb9bSmrg@code{exploded_graph} (a list of @code{exploded_edge *}) to build a 349*0bfacb9bSmrg@code{diagnostic_path} (a list of events for the diagnostic subsystem) - 350*0bfacb9bSmrgspecifically a @code{checker_path}. 351*0bfacb9bSmrg 352*0bfacb9bSmrgHaving built the @code{checker_path}, we prune it to try to eliminate 353*0bfacb9bSmrgevents that aren't relevant, to minimize how much the user has to read. 354*0bfacb9bSmrg 355*0bfacb9bSmrgAfter pruning, we notify each event in the path of its ID and record the 356*0bfacb9bSmrgIDs of interesting events, allowing for events to refer to other events 357*0bfacb9bSmrgin their descriptions. The @code{pending_diagnostic} class has various 358*0bfacb9bSmrgvfuncs to support emitting more precise descriptions, so that e.g. 359*0bfacb9bSmrg 360*0bfacb9bSmrg@itemize @bullet 361*0bfacb9bSmrg@item 362*0bfacb9bSmrga deref-of-unchecked-malloc diagnostic might use: 363*0bfacb9bSmrg@smallexample 364*0bfacb9bSmrg returning possibly-NULL pointer to 'make_obj' from 'allocator' 365*0bfacb9bSmrg@end smallexample 366*0bfacb9bSmrgfor a @code{return_event} to make it clearer how the unchecked value moves 367*0bfacb9bSmrgfrom callee back to caller 368*0bfacb9bSmrg@item 369*0bfacb9bSmrga double-free diagnostic might use: 370*0bfacb9bSmrg@smallexample 371*0bfacb9bSmrg second 'free' here; first 'free' was at (3) 372*0bfacb9bSmrg@end smallexample 373*0bfacb9bSmrgand a use-after-free might use 374*0bfacb9bSmrg@smallexample 375*0bfacb9bSmrg use after 'free' here; memory was freed at (2) 376*0bfacb9bSmrg@end smallexample 377*0bfacb9bSmrg@end itemize 378*0bfacb9bSmrg 379*0bfacb9bSmrgAt this point we can emit the diagnostic. 380*0bfacb9bSmrg 381*0bfacb9bSmrg@subsection Limitations 382*0bfacb9bSmrg 383*0bfacb9bSmrg@itemize @bullet 384*0bfacb9bSmrg@item 385*0bfacb9bSmrgOnly for C so far 386*0bfacb9bSmrg@item 387*0bfacb9bSmrgThe implementation of call summaries is currently very simplistic. 388*0bfacb9bSmrg@item 389*0bfacb9bSmrgLack of function pointer analysis 390*0bfacb9bSmrg@item 391*0bfacb9bSmrgThe constraint-handling code assumes reflexivity in some places 392*0bfacb9bSmrg(that values are equal to themselves), which is not the case for NaN. 393*0bfacb9bSmrgAs a simple workaround, constraints on floating-point values are 394*0bfacb9bSmrgcurrently ignored. 395*0bfacb9bSmrg@item 396*0bfacb9bSmrgThe region model code creates lots of little mutable objects at each 397*0bfacb9bSmrg@code{region_model} (and thus per @code{exploded_node}) rather than 398*0bfacb9bSmrgsharing immutable objects and having the mutable state in the 399*0bfacb9bSmrg@code{program_state} or @code{region_model}. The latter approach might be 400*0bfacb9bSmrgmore efficient, and might avoid dealing with IDs rather than pointers 401*0bfacb9bSmrg(which requires us to impose an ordering to get meaningful equality). 402*0bfacb9bSmrg@item 403*0bfacb9bSmrgThe region model code doesn't yet support @code{memcpy}. At the 404*0bfacb9bSmrggimple-ssa level these have been optimized to statements like this: 405*0bfacb9bSmrg@smallexample 406*0bfacb9bSmrg_10 = MEM <long unsigned int> [(char * @{ref-all@})&c] 407*0bfacb9bSmrgMEM <long unsigned int> [(char * @{ref-all@})&d] = _10; 408*0bfacb9bSmrg@end smallexample 409*0bfacb9bSmrgPerhaps they could be supported via a new @code{compound_svalue} type. 410*0bfacb9bSmrg@item 411*0bfacb9bSmrgThere are various other limitations in the region model (grep for TODO/xfail 412*0bfacb9bSmrgin the testsuite). 413*0bfacb9bSmrg@item 414*0bfacb9bSmrgThe constraint_manager's implementation of transitivity is currently too 415*0bfacb9bSmrgexpensive to enable by default and so must be manually enabled via 416*0bfacb9bSmrg@option{-fanalyzer-transitivity}). 417*0bfacb9bSmrg@item 418*0bfacb9bSmrgThe checkers are currently hardcoded and don't allow for user extensibility 419*0bfacb9bSmrg(e.g. adding allocate/release pairs). 420*0bfacb9bSmrg@item 421*0bfacb9bSmrgAlthough the analyzer's test suite has a proof-of-concept test case for 422*0bfacb9bSmrgLTO, LTO support hasn't had extensive testing. There are various 423*0bfacb9bSmrglang-specific things in the analyzer that assume C rather than LTO. 424*0bfacb9bSmrgFor example, SSA names are printed to the user in ``raw'' form, rather 425*0bfacb9bSmrgthan printing the underlying variable name. 426*0bfacb9bSmrg@end itemize 427*0bfacb9bSmrg 428*0bfacb9bSmrgSome ideas for other checkers 429*0bfacb9bSmrg@itemize @bullet 430*0bfacb9bSmrg@item 431*0bfacb9bSmrgFile-descriptor-based APIs 432*0bfacb9bSmrg@item 433*0bfacb9bSmrgLinux kernel internal APIs 434*0bfacb9bSmrg@item 435*0bfacb9bSmrgSignal handling 436*0bfacb9bSmrg@end itemize 437*0bfacb9bSmrg 438*0bfacb9bSmrg@node Debugging the Analyzer 439*0bfacb9bSmrg@section Debugging the Analyzer 440*0bfacb9bSmrg@cindex analyzer, debugging 441*0bfacb9bSmrg@cindex static analyzer, debugging 442*0bfacb9bSmrg 443*0bfacb9bSmrg@subsection Special Functions for Debugging the Analyzer 444*0bfacb9bSmrg 445*0bfacb9bSmrgThe analyzer recognizes various special functions by name, for use 446*0bfacb9bSmrgin debugging the analyzer. Declarations can be seen in the testsuite 447*0bfacb9bSmrgin @file{analyzer-decls.h}. None of these functions are actually 448*0bfacb9bSmrgimplemented. 449*0bfacb9bSmrg 450*0bfacb9bSmrgAdd: 451*0bfacb9bSmrg@smallexample 452*0bfacb9bSmrg __analyzer_break (); 453*0bfacb9bSmrg@end smallexample 454*0bfacb9bSmrgto the source being analyzed to trigger a breakpoint in the analyzer when 455*0bfacb9bSmrgthat source is reached. By putting a series of these in the source, it's 456*0bfacb9bSmrgmuch easier to effectively step through the program state as it's analyzed. 457*0bfacb9bSmrg 458*0bfacb9bSmrg@smallexample 459*0bfacb9bSmrg__analyzer_dump (); 460*0bfacb9bSmrg@end smallexample 461*0bfacb9bSmrg 462*0bfacb9bSmrgwill dump the copious information about the analyzer's state each time it 463*0bfacb9bSmrgreaches the call in its traversal of the source. 464*0bfacb9bSmrg 465*0bfacb9bSmrg@smallexample 466*0bfacb9bSmrg__analyzer_dump_path (); 467*0bfacb9bSmrg@end smallexample 468*0bfacb9bSmrg 469*0bfacb9bSmrgwill emit a placeholder ``note'' diagnostic with a path to that call site, 470*0bfacb9bSmrgif the analyzer finds a feasible path to it. 471*0bfacb9bSmrg 472*0bfacb9bSmrgThe builtin @code{__analyzer_dump_exploded_nodes} will emit a warning 473*0bfacb9bSmrgafter analysis containing information on all of the exploded nodes at that 474*0bfacb9bSmrgprogram point: 475*0bfacb9bSmrg 476*0bfacb9bSmrg@smallexample 477*0bfacb9bSmrg __analyzer_dump_exploded_nodes (0); 478*0bfacb9bSmrg@end smallexample 479*0bfacb9bSmrg 480*0bfacb9bSmrgwill output the number of ``processed'' nodes, and the IDs of 481*0bfacb9bSmrgboth ``processed'' and ``merger'' nodes, such as: 482*0bfacb9bSmrg 483*0bfacb9bSmrg@smallexample 484*0bfacb9bSmrgwarning: 2 processed enodes: [EN: 56, EN: 58] merger(s): [EN: 54-55, EN: 57, EN: 59] 485*0bfacb9bSmrg@end smallexample 486*0bfacb9bSmrg 487*0bfacb9bSmrgWith a non-zero argument 488*0bfacb9bSmrg 489*0bfacb9bSmrg@smallexample 490*0bfacb9bSmrg __analyzer_dump_exploded_nodes (1); 491*0bfacb9bSmrg@end smallexample 492*0bfacb9bSmrg 493*0bfacb9bSmrgit will also dump all of the states within the ``processed'' nodes. 494*0bfacb9bSmrg 495*0bfacb9bSmrg@smallexample 496*0bfacb9bSmrg __analyzer_dump_region_model (); 497*0bfacb9bSmrg@end smallexample 498*0bfacb9bSmrgwill dump the region_model's state to stderr. 499*0bfacb9bSmrg 500*0bfacb9bSmrg@smallexample 501*0bfacb9bSmrg__analyzer_eval (expr); 502*0bfacb9bSmrg@end smallexample 503*0bfacb9bSmrgwill emit a warning with text "TRUE", FALSE" or "UNKNOWN" based on the 504*0bfacb9bSmrgtruthfulness of the argument. This is useful for writing DejaGnu tests. 505*0bfacb9bSmrg 506*0bfacb9bSmrg 507*0bfacb9bSmrg@subsection Other Debugging Techniques 508*0bfacb9bSmrg 509*0bfacb9bSmrgOne approach when tracking down where a particular bogus state is 510*0bfacb9bSmrgintroduced into the @code{exploded_graph} is to add custom code to 511*0bfacb9bSmrg@code{region_model::validate}. 512*0bfacb9bSmrg 513*0bfacb9bSmrgFor example, this custom code (added to @code{region_model::validate}) 514*0bfacb9bSmrgbreaks with an assertion failure when a variable called @code{ptr} 515*0bfacb9bSmrgacquires a value that's unknown, using 516*0bfacb9bSmrg@code{region_model::get_value_by_name} to locate the variable 517*0bfacb9bSmrg 518*0bfacb9bSmrg@smallexample 519*0bfacb9bSmrg /* Find a variable matching "ptr". */ 520*0bfacb9bSmrg svalue_id sid = get_value_by_name ("ptr"); 521*0bfacb9bSmrg if (!sid.null_p ()) 522*0bfacb9bSmrg @{ 523*0bfacb9bSmrg svalue *sval = get_svalue (sid); 524*0bfacb9bSmrg gcc_assert (sval->get_kind () != SK_UNKNOWN); 525*0bfacb9bSmrg @} 526*0bfacb9bSmrg@end smallexample 527*0bfacb9bSmrg 528*0bfacb9bSmrgmaking it easier to investigate further in a debugger when this occurs. 529