1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" 2 "http://www.w3.org/TR/html4/strict.dtd"> 3<html> 4<head> 5 <title>Checker Developer Manual</title> 6 <link type="text/css" rel="stylesheet" href="menu.css"> 7 <link type="text/css" rel="stylesheet" href="content.css"> 8 <script type="text/javascript" src="scripts/menu.js"></script> 9</head> 10<body> 11 12<div id="page"> 13<!--#include virtual="menu.html.incl"--> 14 15<div id="content"> 16 17<h3 style="color:red">This Page Is Under Construction</h3> 18 19<h1>Checker Developer Manual</h1> 20 21<p>The static analyzer engine performs path-sensitive exploration of the program and 22relies on a set of checkers to implement the logic for detecting and 23constructing specific bug reports. Anyone who is interested in implementing their own 24checker, should check out the Building a Checker in 24 Hours talk 25(<a href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a> 26 <a href="https://youtu.be/kdxlsP5QVPw">video</a>) 27and refer to this page for additional information on writing a checker. The static analyzer is a 28part of the Clang project, so consult <a href="https://clang.llvm.org/hacking.html">Hacking on Clang</a> 29and <a href="https://llvm.org/docs/ProgrammersManual.html">LLVM Programmer's Manual</a> 30for developer guidelines and send your questions and proposals to 31<a href=https://lists.llvm.org/mailman/listinfo/cfe-dev>cfe-dev mailing list</a>. 32</p> 33 34 <ul> 35 <li><a href="#start">Getting Started</a></li> 36 <li><a href="#analyzer">Static Analyzer Overview</a> 37 <ul> 38 <li><a href="#interaction">Interaction with Checkers</a></li> 39 <li><a href="#values">Representing Values</a></li> 40 </ul></li> 41 <li><a href="#idea">Idea for a Checker</a></li> 42 <li><a href="#registration">Checker Registration</a></li> 43 <li><a href="#events_callbacks">Events, Callbacks, and Checker Class Structure</a></li> 44 <li><a href="#extendingstates">Custom Program States</a></li> 45 <li><a href="#bugs">Bug Reports</a></li> 46 <li><a href="#ast">AST Visitors</a></li> 47 <li><a href="#testing">Testing</a></li> 48 <li><a href="#commands">Useful Commands/Debugging Hints</a> 49 <ul> 50 <li><a href="#attaching">Attaching the Debugger</a></li> 51 <li><a href="#narrowing">Narrowing Down the Problem</a></li> 52 <li><a href="#visualizing">Visualizing the Analysis</a></li> 53 <li><a href="#debugprints">Debug Prints and Tricks</a></li> 54 </ul></li> 55 <li><a href="#additioninformation">Additional Sources of Information</a></li> 56 <li><a href="#links">Useful Links</a></li> 57 </ul> 58 59<h2 id=start>Getting Started</h2> 60 <ul> 61 <li>To check out the source code and build the project, follow steps 1-4 of 62 the <a href="https://clang.llvm.org/get_started.html">Clang Getting Started</a> 63 page.</li> 64 65 <li>The analyzer source code is located under the Clang source tree: 66 <br><tt> 67 $ <b>cd llvm/tools/clang</b> 68 </tt> 69 <br>See: <tt>include/clang/StaticAnalyzer</tt>, <tt>lib/StaticAnalyzer</tt>, 70 <tt>test/Analysis</tt>.</li> 71 72 <li>The analyzer regression tests can be executed from the Clang's build 73 directory: 74 <br><tt> 75 $ <b>cd ../../../; cd build/tools/clang; TESTDIRS=Analysis make test</b> 76 </tt></li> 77 78 <li>Analyze a file with the specified checker: 79 <br><tt> 80 $ <b>clang -cc1 -analyze -analyzer-checker=core.DivideZero test.c</b> 81 </tt></li> 82 83 <li>List the available checkers: 84 <br><tt> 85 $ <b>clang -cc1 -analyzer-checker-help</b> 86 </tt></li> 87 88 <li>See the analyzer help for different output formats, fine tuning, and 89 debug options: 90 <br><tt> 91 $ <b>clang -cc1 -help | grep "analyzer"</b> 92 </tt></li> 93 94 </ul> 95 96<h2 id=analyzer>Static Analyzer Overview</h2> 97 The analyzer core performs symbolic execution of the given program. All the 98 input values are represented with symbolic values; further, the engine deduces 99 the values of all the expressions in the program based on the input symbols 100 and the path. The execution is path sensitive and every possible path through 101 the program is explored. The explored execution traces are represented with 102 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedGraph.html">ExplodedGraph</a> object. 103 Each node of the graph is 104 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ExplodedNode.html">ExplodedNode</a>, 105 which consists of a <tt>ProgramPoint</tt> and a <tt>ProgramState</tt>. 106 <p> 107 <a href="https://clang.llvm.org/doxygen/classclang_1_1ProgramPoint.html">ProgramPoint</a> 108 represents the corresponding location in the program (or the CFG). 109 <tt>ProgramPoint</tt> is also used to record additional information on 110 when/how the state was added. For example, <tt>PostPurgeDeadSymbolsKind</tt> 111 kind means that the state is the result of purging dead symbols - the 112 analyzer's equivalent of garbage collection. 113 <p> 114 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1ProgramState.html">ProgramState</a> 115 represents abstract state of the program. It consists of: 116 <ul> 117 <li><tt>Environment</tt> - a mapping from source code expressions to symbolic 118 values 119 <li><tt>Store</tt> - a mapping from memory locations to symbolic values 120 <li><tt>GenericDataMap</tt> - constraints on symbolic values 121 </ul> 122 123 <h3 id=interaction>Interaction with Checkers</h3> 124 125 <p> 126 Checkers are not merely passive receivers of the analyzer core changes - they 127 actively participate in the <tt>ProgramState</tt> construction through the 128 <tt>GenericDataMap</tt> which can be used to store the checker-defined part 129 of the state. Each time the analyzer engine explores a new statement, it 130 notifies each checker registered to listen for that statement, giving it an 131 opportunity to either report a bug or modify the state. (As a rule of thumb, 132 the checker itself should be stateless.) The checkers are called one after another 133 in the predefined order; thus, calling all the checkers adds a chain to the 134 <tt>ExplodedGraph</tt>. 135 </p> 136 137 <h3 id=values>Representing Values</h3> 138 139 <p> 140 During symbolic execution, <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SVal.html">SVal</a> 141 objects are used to represent the semantic evaluation of expressions. 142 They can represent things like concrete 143 integers, symbolic values, or memory locations (which are memory regions). 144 They are a discriminated union of "values", symbolic and otherwise. 145 If a value isn't symbolic, usually that means there is no symbolic 146 information to track. For example, if the value was an integer, such as 147 <tt>42</tt>, it would be a <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1nonloc_1_1ConcreteInt.html">ConcreteInt</a>, 148 and the checker doesn't usually need to track any state with the concrete 149 number. In some cases, <tt>SVal</tt> is not a symbol, but it really should be 150 a symbolic value. This happens when the analyzer cannot reason about something 151 (yet). An example is floating point numbers. In such cases, the 152 <tt>SVal</tt> will evaluate to <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1UnknownVal.html">UnknownVal</a>. 153 This represents a case that is outside the realm of the analyzer's reasoning 154 capabilities. <tt>SVals</tt> are value objects and their values can be viewed 155 using the <tt>.dump()</tt> method. Often they wrap persistent objects such as 156 symbols or regions. 157 </p> 158 159 <p> 160 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html">SymExpr</a> (symbol) 161 is meant to represent abstract, but named, symbolic value. Symbols represent 162 an actual (immutable) value. We might not know what its specific value is, but 163 we can associate constraints with that value as we analyze a path. For 164 example, we might record that the value of a symbol is greater than 165 <tt>0</tt>, etc. 166 </p> 167 168 <p> 169 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1MemRegion.html">MemRegion</a> is similar to a symbol. 170 It is used to provide a lexicon of how to describe abstract memory. Regions can 171 layer on top of other regions, providing a layered approach to representing memory. 172 For example, a struct object on the stack might be represented by a <tt>VarRegion</tt>, 173 but a <tt>FieldRegion</tt> which is a subregion of the <tt>VarRegion</tt> could 174 be used to represent the memory associated with a specific field of that object. 175 So how do we represent symbolic memory regions? That's what 176 <a href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymbolicRegion.html">SymbolicRegion</a> 177 is for. It is a <tt>MemRegion</tt> that has an associated symbol. Since the 178 symbol is unique and has a unique name; that symbol names the region. 179 </p> 180 181 <p> 182 Let's see how the analyzer processes the expressions in the following example: 183 </p> 184 185 <p> 186 <pre class="code_example"> 187 int foo(int x) { 188 int y = x * 2; 189 int z = x; 190 ... 191 } 192 </pre> 193 </p> 194 195 <p> 196Let's look at how <tt>x*2</tt> gets evaluated. When <tt>x</tt> is evaluated, 197we first construct an <tt>SVal</tt> that represents the lvalue of <tt>x</tt>, in 198this case it is an <tt>SVal</tt> that references the <tt>MemRegion</tt> for <tt>x</tt>. 199Afterwards, when we do the lvalue-to-rvalue conversion, we get a new <tt>SVal</tt>, 200which references the value <b>currently bound</b> to <tt>x</tt>. That value is 201symbolic; it's whatever <tt>x</tt> was bound to at the start of the function. 202Let's call that symbol <tt>$0</tt>. Similarly, we evaluate the expression for <tt>2</tt>, 203and get an <tt>SVal</tt> that references the concrete number <tt>2</tt>. When 204we evaluate <tt>x*2</tt>, we take the two <tt>SVals</tt> of the subexpressions, 205and create a new <tt>SVal</tt> that represents their multiplication (which in 206this case is a new symbolic expression, which we might call <tt>$1</tt>). When we 207evaluate the assignment to <tt>y</tt>, we again compute its lvalue (a <tt>MemRegion</tt>), 208and then bind the <tt>SVal</tt> for the RHS (which references the symbolic value <tt>$1</tt>) 209to the <tt>MemRegion</tt> in the symbolic store. 210<br> 211The second line is similar. When we evaluate <tt>x</tt> again, we do the same 212dance, and create an <tt>SVal</tt> that references the symbol <tt>$0</tt>. Note, two <tt>SVals</tt> 213might reference the same underlying values. 214 </p> 215 216<p> 217To summarize, MemRegions are unique names for blocks of memory. Symbols are 218unique names for abstract symbolic values. Some MemRegions represents abstract 219symbolic chunks of memory, and thus are also based on symbols. SVals are just 220references to values, and can reference either MemRegions, Symbols, or concrete 221values (e.g., the number 1). 222</p> 223 224 <!-- 225 TODO: Add a picture. 226 <br> 227 Symbols<br> 228 FunctionalObjects are used throughout. 229 --> 230 231<h2 id=idea>Idea for a Checker</h2> 232 Here are several questions which you should consider when evaluating your 233 checker idea: 234 <ul> 235 <li>Can the check be effectively implemented without path-sensitive 236 analysis? See <a href="#ast">AST Visitors</a>.</li> 237 238 <li>How high the false positive rate is going to be? Looking at the occurrences 239 of the issue you want to write a checker for in the existing code bases might 240 give you some ideas. </li> 241 242 <li>How the current limitations of the analysis will effect the false alarm 243 rate? Currently, the analyzer only reasons about one procedure at a time (no 244 inter-procedural analysis). Also, it uses a simple range tracking based 245 solver to model symbolic execution.</li> 246 247 <li>Consult the <a 248 href="https://bugs.llvm.org/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=REOPENED&version=trunk&component=Static%20Analyzer&product=clang">Bugzilla database</a> 249 to get some ideas for new checkers and consider starting with improving/fixing 250 bugs in the existing checkers.</li> 251 </ul> 252 253<p>Once an idea for a checker has been chosen, there are two key decisions that 254need to be made: 255 <ul> 256 <li> Which events the checker should be tracking. This is discussed in more 257 detail in the section <a href="#events_callbacks">Events, Callbacks, and 258 Checker Class Structure</a>. 259 <li> What checker-specific data needs to be stored as part of the program 260 state (if any). This should be minimized as much as possible. More detail about 261 implementing custom program state is given in section <a 262 href="#extendingstates">Custom Program States</a>. 263 </ul> 264 265 266<h2 id=registration>Checker Registration</h2> 267 All checker implementation files are located in 268 <tt>clang/lib/StaticAnalyzer/Checkers</tt> folder. The steps below describe 269 how the checker <tt>SimpleStreamChecker</tt>, which checks for misuses of 270 stream APIs, was registered with the analyzer. 271 Similar steps should be followed for a new checker. 272<ol> 273 <li>A new checker implementation file, <tt>SimpleStreamChecker.cpp</tt>, was 274 created in the directory <tt>lib/StaticAnalyzer/Checkers</tt>. 275 <li>The following registration code was added to the implementation file: 276<pre class="code_example"> 277void ento::registerSimpleStreamChecker(CheckerManager &mgr) { 278 mgr.registerChecker<SimpleStreamChecker>(); 279} 280</pre> 281<li>A package was selected for the checker and the checker was defined in the 282table of checkers at <tt>include/clang/StaticAnalyzer/Checkers/Checkers.td</tt>. 283Since all checkers should first be developed as "alpha", and the SimpleStreamChecker 284performs UNIX API checks, the correct package is "alpha.unix", and the following 285was added to the corresponding <tt>UnixAlpha</tt> section of <tt>Checkers.td</tt>: 286<pre class="code_example"> 287let ParentPackage = UnixAlpha in { 288... 289def SimpleStreamChecker : Checker<"SimpleStream">, 290 HelpText<"Check for misuses of stream APIs">, 291 DescFile<"SimpleStreamChecker.cpp">; 292... 293} // end "alpha.unix" 294</pre> 295 296<li>The source code file was made visible to CMake by adding it to 297<tt>lib/StaticAnalyzer/Checkers/CMakeLists.txt</tt>. 298 299</ol> 300 301After adding a new checker to the analyzer, one can verify that the new checker 302was successfully added by seeing if it appears in the list of available checkers: 303<br> <tt><b>$clang -cc1 -analyzer-checker-help</b></tt> 304 305<h2 id=events_callbacks>Events, Callbacks, and Checker Class Structure</h2> 306 307<p> All checkers inherit from the <tt><a 308href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1Checker.html"> 309Checker</a></tt> template class; the template parameter(s) describe the type of 310events that the checker is interested in processing. The various types of events 311that are available are described in the file <a 312href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> 313CheckerDocumentation.cpp</a> 314 315<p> For each event type requested, a corresponding callback function must be 316defined in the checker class (<a 317href="https://clang.llvm.org/doxygen/CheckerDocumentation_8cpp_source.html"> 318CheckerDocumentation.cpp</a> shows the 319correct function name and signature for each event type). 320 321<p> As an example, consider <tt>SimpleStreamChecker</tt>. This checker needs to 322take action at the following times: 323 324<ul> 325<li>Before making a call to a function, check if the function is <tt>fclose</tt>. 326If so, check the parameter being passed. 327<li>After making a function call, check if the function is <tt>fopen</tt>. If 328so, process the return value. 329<li>When values go out of scope, check whether they are still-open file 330descriptors, and report a bug if so. In addition, remove any information about 331them from the program state in order to keep the state as small as possible. 332<li>When file pointers "escape" (are used in a way that the analyzer can no longer 333track them), mark them as such. This prevents false positives in the cases where 334the analyzer cannot be sure whether the file was closed or not. 335</ul> 336 337<p>These events that will be used for each of these actions are, respectively, <a 338href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PreCall.html">PreCall</a>, 339<a 340href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PostCall.html">PostCall</a>, 341<a 342href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1DeadSymbols.html">DeadSymbols</a>, 343and <a 344href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1check_1_1PointerEscape.html">PointerEscape</a>. 345The high-level structure of the checker's class is thus: 346 347<pre class="code_example"> 348class SimpleStreamChecker : public Checker<check::PreCall, 349 check::PostCall, 350 check::DeadSymbols, 351 check::PointerEscape> { 352public: 353 354 void checkPreCall(const CallEvent &Call, CheckerContext &C) const; 355 356 void checkPostCall(const CallEvent &Call, CheckerContext &C) const; 357 358 void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const; 359 360 ProgramStateRef checkPointerEscape(ProgramStateRef State, 361 const InvalidatedSymbols &Escaped, 362 const CallEvent *Call, 363 PointerEscapeKind Kind) const; 364}; 365</pre> 366 367<h2 id=extendingstates>Custom Program States</h2> 368 369<p> Checkers often need to keep track of information specific to the checks they 370perform. However, since checkers have no guarantee about the order in which the 371program will be explored, or even that all possible paths will be explored, this 372state information cannot be kept within individual checkers. Therefore, if 373checkers need to store custom information, they need to add new categories of 374data to the <tt>ProgramState</tt>. The preferred way to do so is to use one of 375several macros designed for this purpose. They are: 376 377<ul> 378<li><a 379href="https://clang.llvm.org/doxygen/ProgramStateTrait_8h.html#ae4cddb54383cd702a045d7c61b009147">REGISTER_TRAIT_WITH_PROGRAMSTATE</a>: 380Used when the state information is a single value. The methods available for 381state types declared with this macro are <tt>get</tt>, <tt>set</tt>, and 382<tt>remove</tt>. 383<li><a 384href="https://clang.llvm.org/doxygen/CheckerContext_8h.html#aa27656fa0ce65b0d9ba12eb3c02e8be9">REGISTER_LIST_WITH_PROGRAMSTATE</a>: 385Used when the state information is a list of values. The methods available for 386state types declared with this macro are <tt>add</tt>, <tt>get</tt>, 387<tt>remove</tt>, and <tt>contains</tt>. 388<li><a 389href="https://clang.llvm.org/doxygen/CheckerContext_8h.html#ad90f9387b94b344eaaf499afec05f4d1">REGISTER_SET_WITH_PROGRAMSTATE</a>: 390Used when the state information is a set of values. The methods available for 391state types declared with this macro are <tt>add</tt>, <tt>get</tt>, 392<tt>remove</tt>, and <tt>contains</tt>. 393<li><a 394href="https://clang.llvm.org/doxygen/CheckerContext_8h.html#a6d1893bb8c18543337b6c363c1319fcf">REGISTER_MAP_WITH_PROGRAMSTATE</a>: 395Used when the state information is a map from a key to a value. The methods 396available for state types declared with this macro are <tt>add</tt>, 397<tt>set</tt>, <tt>get</tt>, <tt>remove</tt>, and <tt>contains</tt>. 398</ul> 399 400<p>All of these macros take as parameters the name to be used for the custom 401category of state information and the data type(s) to be used for storage. The 402data type(s) specified will become the parameter type and/or return type of the 403methods that manipulate the new category of state information. Each of these 404methods are templated with the name of the custom data type. 405 406<p>For example, a common case is the need to track data associated with a 407symbolic expression; a map type is the most logical way to implement this. The 408key for this map will be a pointer to a symbolic expression 409(<tt>SymbolRef</tt>). If the data type to be associated with the symbolic 410expression is an integer, then the custom category of state information would be 411declared as 412 413<pre class="code_example"> 414REGISTER_MAP_WITH_PROGRAMSTATE(ExampleDataType, SymbolRef, int) 415</pre> 416 417The data would be accessed with the function 418 419<pre class="code_example"> 420ProgramStateRef state; 421SymbolRef Sym; 422... 423int currentlValue = state->get<ExampleDataType>(Sym); 424</pre> 425 426and set with the function 427 428<pre class="code_example"> 429ProgramStateRef state; 430SymbolRef Sym; 431int newValue; 432... 433ProgramStateRef newState = state->set<ExampleDataType>(Sym, newValue); 434</pre> 435 436<p>In addition, the macros define a data type used for storing the data of the 437new data category; the name of this type is the name of the data category with 438"Ty" appended. For <tt>REGISTER_TRAIT_WITH_PROGRAMSTATE</tt>, this will simply 439be passed data type; for the other three macros, this will be a specialized 440version of the <a 441href="https://llvm.org/doxygen/classllvm_1_1ImmutableList.html">llvm::ImmutableList</a>, 442<a 443href="https://llvm.org/doxygen/classllvm_1_1ImmutableSet.html">llvm::ImmutableSet</a>, 444or <a 445href="https://llvm.org/doxygen/classllvm_1_1ImmutableMap.html">llvm::ImmutableMap</a> 446templated class. For the <tt>ExampleDataType</tt> example above, the type 447created would be equivalent to writing the declaration: 448 449<pre class="code_example"> 450typedef llvm::ImmutableMap<SymbolRef, int> ExampleDataTypeTy; 451</pre> 452 453<p>These macros will cover a majority of use cases; however, they still have a 454few limitations. They cannot be used inside namespaces (since they expand to 455contain top-level namespace references), and the data types that they define 456cannot be referenced from more than one file. 457 458<p>Note that <tt>ProgramStates</tt> are immutable; instead of modifying an existing 459one, functions that modify the state will return a copy of the previous state 460with the change applied. This updated state must be then provided to the 461analyzer core by calling the <tt>CheckerContext::addTransition</tt> function. 462<h2 id=bugs>Bug Reports</h2> 463 464 465<p> When a checker detects a mistake in the analyzed code, it needs a way to 466report it to the analyzer core so that it can be displayed. The two classes used 467to construct this report are <tt><a 468href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugType.html">BugType</a></tt> 469and <tt><a 470href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1BugReport.html"> 471BugReport</a></tt>. 472 473<p> 474<tt>BugType</tt>, as the name would suggest, represents a type of bug. The 475constructor for <tt>BugType</tt> takes two parameters: The name of the bug 476type, and the name of the category of the bug. These are used (e.g.) in the 477summary page generated by the scan-build tool. 478 479<P> 480 The <tt>BugReport</tt> class represents a specific occurrence of a bug. In 481 the most common case, three parameters are used to form a <tt>BugReport</tt>: 482<ol> 483<li>The type of bug, specified as an instance of the <tt>BugType</tt> class. 484<li>A short descriptive string. This is placed at the location of the bug in 485the detailed line-by-line output generated by scan-build. 486<li>The context in which the bug occurred. This includes both the location of 487the bug in the program and the program's state when the location is reached. These are 488both encapsulated in an <tt>ExplodedNode</tt>. 489</ol> 490 491<p>In order to obtain the correct <tt>ExplodedNode</tt>, a decision must be made 492as to whether or not analysis can continue along the current path. This decision 493is based on whether the detected bug is one that would prevent the program under 494analysis from continuing. For example, leaking of a resource should not stop 495analysis, as the program can continue to run after the leak. Dereferencing a 496null pointer, on the other hand, should stop analysis, as there is no way for 497the program to meaningfully continue after such an error. 498 499<p>If analysis can continue, then the most recent <tt>ExplodedNode</tt> 500generated by the checker can be passed to the <tt>BugReport</tt> constructor 501without additional modification. This <tt>ExplodedNode</tt> will be the one 502returned by the most recent call to <a 503href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition</a>. 504If no transition has been performed during the current callback, the checker should call <a 505href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#a264f48d97809707049689c37aa35af78">CheckerContext::addTransition()</a> 506and use the returned node for bug reporting. 507 508<p>If analysis can not continue, then the current state should be transitioned 509into a so-called <i>sink node</i>, a node from which no further analysis will be 510performed. This is done by calling the <a 511href="https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#adeea33a5a2bed190210c4a2bb807a6f0"> 512CheckerContext::generateSink</a> function; this function is the same as the 513<tt>addTransition</tt> function, but marks the state as a sink node. Like 514<tt>addTransition</tt>, this returns an <tt>ExplodedNode</tt> with the updated 515state, which can then be passed to the <tt>BugReport</tt> constructor. 516 517<p> 518After a <tt>BugReport</tt> is created, it should be passed to the analyzer core 519by calling <a href = "https://clang.llvm.org/doxygen/classclang_1_1ento_1_1CheckerContext.html#ae7738af2cbfd1d713edec33d3203dff5">CheckerContext::emitReport</a>. 520 521<h2 id=ast>AST Visitors</h2> 522 Some checks might not require path-sensitivity to be effective. Simple AST walk 523 might be sufficient. If that is the case, consider implementing a Clang 524 compiler warning. On the other hand, a check might not be acceptable as a compiler 525 warning; for example, because of a relatively high false positive rate. In this 526 situation, AST callbacks <tt><b>checkASTDecl</b></tt> and 527 <tt><b>checkASTCodeBody</b></tt> are your best friends. 528 529<h2 id=testing>Testing</h2> 530 Every patch should be well tested with Clang regression tests. The checker tests 531 live in <tt>clang/test/Analysis</tt> folder. To run all of the analyzer tests, 532 execute the following from the <tt>clang</tt> build directory: 533 <pre class="code"> 534 $ <b>bin/llvm-lit -sv ../llvm/tools/clang/test/Analysis</b> 535 </pre> 536 537<h2 id=commands>Useful Commands/Debugging Hints</h2> 538 539<h3 id=attaching>Attaching the Debugger</h3> 540 541<p>When your command contains the <tt><b>-cc1</b></tt> flag, you can attach the 542debugger to it directly:</p> 543 544<pre class="code"> 545 $ <b>gdb --args clang -cc1 -analyze -analyzer-checker=core test.c</b> 546 $ <b>lldb -- clang -cc1 -analyze -analyzer-checker=core test.c</b> 547</pre> 548 549<p> 550Otherwise, if your command line contains <tt><b>--analyze</b></tt>, 551the actual clang instance would be run in a separate process. In 552order to debug it, use the <tt><b>-###</b></tt> flag for obtaining 553the command line of the child process: 554</p> 555 556<pre class="code"> 557 $ <b>clang --analyze test.c -\#\#\#</b> 558</pre> 559 560<p> 561Below we describe a few useful command line arguments, all of which assume that 562you are running <tt><b>clang -cc1</b></tt>. 563</p> 564 565<h3 id=narrowing>Narrowing Down the Problem</h3> 566 567<p>While investigating a checker-related issue, instruct the analyzer to only 568execute a single checker: 569</p> 570<pre class="code"> 571 $ <b>clang -cc1 -analyze -analyzer-checker=osx.KeychainAPI test.c</b> 572</pre> 573 574<p>If you are experiencing a crash, to see which function is failing while 575processing a large file use the <tt><b>-analyzer-display-progress</b></tt> 576option.</p> 577 578<p>To selectively analyze only the given function, use the 579<tt><b>-analyze-function</b></tt> option:</p> 580<pre class="code"> 581 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress</b> 582 ANALYZE (Syntax): test.c foo 583 ANALYZE (Syntax): test.c bar 584 ANALYZE (Path, Inline_Regular): test.c bar 585 ANALYZE (Path, Inline_Regular): test.c foo 586 $ <b>clang -cc1 -analyze -analyzer-checker=core test.c -analyzer-display-progress -analyze-function=foo</b> 587 ANALYZE (Syntax): test.c foo 588 ANALYZE (Path, Inline_Regular): test.c foo 589</pre> 590 591<b>Note: </b> a fully qualified function name has to be used when selecting 592C++ functions and methods, Objective-C methods and blocks, e.g.: 593 594<pre class="code"> 595 $ <b>clang -cc1 -analyze -analyzer-checker=core test.cc -analyze-function=foo(int)</b> 596</pre> 597 598The fully qualified name can be found from the 599<tt><b>-analyzer-display-progress</b></tt> output. 600 601<p>The bug reporter mechanism removes path diagnostics inside intermediate 602function calls that have returned by the time the bug was found and contain 603no interesting pieces. Usually it is up to the checkers to produce more 604interesting pieces by adding custom <tt>BugReporterVisitor</tt> objects. 605However, you can disable path pruning while debugging with the 606<tt><b>-analyzer-config prune-paths=false</b></tt> option. 607 608<h3 id=visualizing>Visualizing the Analysis</h3> 609 610<p>To dump the AST, which often helps understanding how the program should 611behave:</p> 612<pre class="code"> 613 $ <b>clang -cc1 -ast-dump test.c</b> 614</pre> 615 616<p>To view/dump CFG use <tt>debug.ViewCFG</tt> or <tt>debug.DumpCFG</tt> 617checkers:</p> 618<pre class="code"> 619 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c</b> 620</pre> 621 622<p><tt>ExplodedGraph</tt> (the state graph explored by the analyzer) can be 623visualized with another debug checker:</p> 624<pre class="code"> 625 $ <b>clang -cc1 -analyze -analyzer-checker=debug.ViewExplodedGraph test.c</b> 626</pre> 627<p>Or, equivalently, with <tt><b>-analyzer-viz-egraph-graphviz</b></tt> 628option, which does the same thing - dumps the exploded graph in graphviz 629<tt><b>.dot</b></tt> format.</p> 630 631<p>You can convert <tt><b>.dot</b></tt> files into other formats - in 632particular, converting to <tt><b>.svg</b></tt> and viewing in your web 633browser might be more comfortable than using a <tt><b>.dot</b></tt> viewer:</p> 634<pre class="code"> 635 $ <b>dot -Tsvg ExprEngine-501e2e.dot -o ExprEngine-501e2e.svg</b> 636</pre> 637 638<p>The <tt><b>-trim-egraph</b></tt> option removes all paths except those 639leading to bug reports from the exploded graph dump. This is useful 640because exploded graphs are often huge and hard to navigate.</p> 641 642<p>Viewing <tt>ExplodedGraph</tt> is your most powerful tool for understanding 643the analyzer's false positives, because it gives comprehensive information 644on every decision made by the analyzer across all analysis paths.</p> 645 646<p>There are more debug checkers available. To see all available debug checkers: 647</p> 648<pre class="code"> 649 $ <b>clang -cc1 -analyzer-checker-help | grep "debug"</b> 650</pre> 651 652<h3 id=debugprints>Debug Prints and Tricks</h3> 653 654<p>To view "half-baked" <tt>ExplodedGraph</tt> while debugging, jump to a frame 655that has <tt>clang::ento::ExprEngine</tt> object and execute:</p> 656<pre class="code"> 657 (gdb) <b>p ViewGraph(0)</b> 658</pre> 659 660<p>To see the <tt>ProgramState</tt> while debugging use the following command. 661<pre class="code"> 662 (gdb) <b>p State->dump()</b> 663</pre> 664 665<p>To see <tt>clang::Expr</tt> while debugging use the following command. If you 666pass in a <tt>SourceManager</tt> object, it will also dump the corresponding line in the 667source code.</p> 668<pre class="code"> 669 (gdb) <b>p E->dump()</b> 670</pre> 671 672<p>To dump AST of a method that the current <tt>ExplodedNode</tt> belongs 673to:</p> 674<pre class="code"> 675 (gdb) <b>p C.getPredecessor()->getCodeDecl().getBody()->dump()</b> 676</pre> 677 678<h2 id=links>Making Your Checker Better</h2> 679<ul> 680<li>User facing documentation is important for adoption! Make sure the <a href="/available_checks.html">checker list </a>is updated 681 at the homepage of the analyzer. Also ensure the description is clear to 682 non-analyzer-developers in <tt>Checkers.td</tt>.</li> 683<li>Warning and note messages should be clear and easy to understand, even if a bit long.</li> 684<ul> 685 <li>Messages should start with a capital letter (unlike Clang warnings!) and should not 686 end with <tt>.</tt>.</li> 687 <li>Articles are usually omitted, eg. <tt>Dereference of a null pointer</tt> -> 688 <tt>Dereference of null pointer</tt>.</li> 689 <li>Introduce <tt>BugReporterVisitor</tt>s to emit additional notes that explain the warning 690 to the user better. There are some existing visitors that might be useful for your check, 691 e.g. <tt>trackNullOrUndefValue</tt>. For example, SimpleStreamChecker should highlight 692 the event of opening the file when reporting a file descriptor leak.</li> 693</ul> 694<li>If the check tracks anything in the program state, it needs to implement the 695 <tt>checkDeadSymbols</tt>callback to clean the state up.</li> 696<li>The check should conservatively assume that the program is correct when a tracked symbol 697 is passed to a function that is unknown to the analyzer. 698 <tt>checkPointerEscape</tt> callback could help you handle that case.</li> 699<li>Use safe and convenient APIs!</li> 700<ul> 701 <li>Always use <tt>CheckerContext::generateErrorNode</tt> and 702 <tt>CheckerContext::generateNonFatalErrorNode</tt> for emitting bug reports. 703 Most importantly, never emit report against <tt>CheckerContext::getPredecessor</tt>.</li> 704 <li>Prefer <tt>checkPreCall</tt> and <tt>checkPostCall</tt> to 705 <tt>checkPreStmt<CallExpr></tt> and <tt>checkPostStmt<CallExpr></tt>.</li> 706 <li>Use <tt>CallDescription</tt> to detect hardcoded API calls in the program.</li> 707 <li>Simplify <tt>C.getState()->getSVal(E, C.getLocationContext())</tt> to <tt>C.getSVal(E)</tt>.</li> 708</ul> 709<li>Common sources of crashes:</li> 710<ul> 711 <li><tt>CallEvent::getOriginExpr</tt> is nullable - for example, it returns null for an 712 automatic destructor of a variable. The same applies to some values generated while the 713 call was modeled, eg. <tt>SymbolConjured::getStmt</tt> is nullable.</li> 714 <li><tt>CallEvent::getDecl</tt> is nullable - for example, it returns null for a 715 call of symbolic function pointer.</li> 716 <li><tt>addTransition</tt>, <tt>generateSink</tt>, <tt>generateNonFatalErrorNode</tt>, 717 <tt>generateErrorNode</tt> are nullable because you can transition to a node that you have already visited.</li> 718 <li>Methods of <tt>CallExpr</tt>/<tt>FunctionDecl</tt>/<tt>CallEvent</tt> that 719 return arguments crash when the argument is out-of-bounds. If you checked the function name, 720 it doesn't mean that the function has the expected number of arguments! 721 Which is why you should use <tt>CallDescription</tt>.</li> 722 <li>Nullability of different entities within different kinds of symbols and regions is usually 723 documented via assertions in their constructors.</li> 724 <li><tt>NamedDecl::getName</tt> will fail if the name of the declaration is not a single token, 725 e.g. for destructors. You could use <tt>NamedDecl::getNameAsString</tt> for those cases. 726 Note that this method is much slower and should be used sparringly, e.g. only when generating reports 727 but not during analysis.</li> 728 <li>Is <tt>-analyzer-checker=core</tt> included in all test <tt>RUN:</tt> lines? It was never supported 729 to run the analyzer with the core checks disabled. It might cause unexpected behavior and 730 crashes. You should do all your testing with the core checks enabled.</li> 731</ul> 732</ul> 733<li>Patterns that you should most likely avoid even if they're not technically wrong:</li> 734<ul> 735 <li><tt>BugReporterVisitor</tt> should most likely not match the AST of the current program point 736 to decide when to emit a note. It is much easier to determine that by observing changes in 737 the program state.</li> 738 <li>In <tt>State->getSVal(Region)</tt>, if <tt>Region</tt> is not known to be a <tt>TypedValueRegion</tt> 739 and the optional type argument is not specified, the checker may accidentally try to dereference a 740 void pointer.</li> 741 <li>Checker logic should not depend on whether a certain value is a <tt>Loc</tt> or <tt>NonLoc</tt>. 742 It should be immediately obvious whether the <tt>SVal</tt> is a <tt>Loc</tt> or a 743 <tt>NonLoc</tt> depending on the AST that is being checked. Checking whether a value 744 is <tt>Loc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> or whether the value is 745 <tt>NonLoc</tt> or <tt>Unknown</tt>/<tt>Undefined</tt> is totally fine.</li> 746 <li>New symbols should not be constructed in the checker via direct calls to <tt>SymbolManager</tt>, 747 unless they are of <tt>SymbolMetadata</tt> class tagged by the checker, 748 or they represent newly created values such as the return value in <tt>evalCall</tt>. 749 For modeling arithmetic/bitwise/comparison operations, <tt>SValBuilder</tt> should be used.</li> 750 <li>Custom <tt>ProgramPointTag</tt>s should not be created within the checker. There is usually 751 no good reason for a checker to chain multiple nodes together, because checkers aren't worklists.</li> 752</ul> 753<li>Checkers are encouraged to actively participate in the analysis by sharing 754 their knowledge about the program state with the rest of the analyzer, 755 but they should not be disrupting the analysis unnecessarily:</li> 756<ul> 757 <li>If a checker splits program state, this must be based on knowledge that 758 the newly appearing branches are definitely possible and worth exploring 759 from the user's perspective. Otherwise the state split should be delayed 760 until there's an indication that one of the paths is taken, or one of the 761 paths needs to be dropped entirely. For example, it is fine to eagerly split 762 paths while modeling <tt>isalpha(x)</tt> as long as <tt>x</tt> is constrained accordingly on 763 each path. At the same time, it is not a good idea to split paths over the 764 return value of <tt>printf()</tt> while modeling the call because nobody ever checks 765 for errors in <tt>printf</tt>; at best, it'd just double the remaining analysis time. 766 </li> 767 <li>Caution is advised when using <tt>CheckerContext::generateNonFatalErrorNode</tt> 768 because it generates an independent transition, much like <tt>addTransition</tt>. 769 It is easy to accidentally split paths while using it. Ideally, try to 770 structure the code so that it was obvious that every <tt>addTransition</tt> or 771 <tt>generateNonFatalErrorNode</tt> (or sequence of such if the split is intended) is 772 immediately followed by return from the checker callback.</li> 773 <li>Multiple implementations of <tt>evalCall</tt> in different checkers should not conflict.</li> 774 <li>When implementing <tt>evalAssume</tt>, the checker should always return a non-null state 775 for either the true assumption or the false assumption (or both).</li> 776 <li>Checkers shall not mutate values of expressions, i.e. use the <tt>ProgramState::BindExpr</tt> API, 777 unless they are fully responsible for computing the value. 778 Under no circumstances should they change non-<tt>Unknown</tt> values of expressions. 779 Currently the only valid use case for this API in checkers is to model the return value in the <tt>evalCall</tt> callback. 780 If expression values are incorrect, <tt>ExprEngine</tt> needs to be fixed instead.</li> 781</ul> 782 783<h2 id=additioninformation>Additional Sources of Information</h2> 784 785Here are some additional resources that are useful when working on the Clang 786Static Analyzer: 787 788<ul> 789<li><a href="http://lcs.ios.ac.cn/~xuzb/canalyze/memmodel.pdf">Xu, Zhongxing & 790Kremenek, Ted & Zhang, Jian. (2010). A Memory Model for Static Analysis of C 791Programs.</a></li> 792<li><a href="https://github.com/llvm/llvm-project/blob/main/clang/lib/StaticAnalyzer/README.txt"> 793The Clang Static Analyzer README</a></li> 794<li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/RegionStore.txt"> 795Documentation for how the Store works</a></li> 796<li><a href="https://github.com/llvm/llvm-project/blob/main/clang/docs/analyzer/IPA.txt"> 797Documentation about inlining</a></li> 798<li> The "Building a Checker in 24 hours" presentation given at the <a 799href="https://llvm.org/devmtg/2012-11">November 2012 LLVM Developer's 800meeting</a>. Describes the construction of SimpleStreamChecker. <a 801href="https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">Slides</a> 802and <a 803href="https://youtu.be/kdxlsP5QVPw">video</a> 804are available.</li> 805<li> 806<a href="https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf"> 807Artem Degrachev: Clang Static Analyzer: A Checker Developer's Guide 808</a> (reading the previous items first might be a good idea)</li> 809<li>The list of <a href="implicit_checks.html">Implicit Checkers</a></li> 810<li> <a href="https://clang.llvm.org/doxygen">Clang doxygen</a>. Contains 811up-to-date documentation about the APIs available in Clang. Relevant entries 812have been linked throughout this page. Also of use is the 813<a href="https://llvm.org/doxygen">LLVM doxygen</a>, when dealing with classes 814from LLVM.</li> 815<li> The <a href="https://lists.llvm.org/mailman/listinfo/cfe-dev"> 816cfe-dev mailing list</a>. This is the primary mailing list used for 817discussion of Clang development (including static code analysis). The 818<a href="https://lists.llvm.org/pipermail/cfe-dev">archive</a> also contains 819a lot of information.</li> 820</ul> 821 822</div> 823</div> 824</body> 825</html> 826