1<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
2          "http://www.w3.org/TR/html4/strict.dtd">
3<html>
4<head>
5  <title>Open Projects</title>
6  <link type="text/css" rel="stylesheet" href="menu.css">
7  <link type="text/css" rel="stylesheet" href="content.css">
8  <script type="text/javascript" src="scripts/menu.js"></script>
9</head>
10<body>
11
12<div id="page">
13<!--#include virtual="menu.html.incl"-->
14<div id="content">
15
16<h1>Open Projects</h1>
17
18<p>This page lists several projects that would boost analyzer's usability and
19power. Most of the projects listed here are infrastructure-related so this list
20is an addition to the <a href="potential_checkers.html">potential checkers
21list</a>. If you are interested in tackling one of these, please send an email
22to the <a href=http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev>cfe-dev
23mailing list</a> to notify other members of the community.</p>
24
25<ul>
26  <li>Core Analyzer Infrastructure
27  <ul>
28    <li>Explicitly model standard library functions with <tt>BodyFarm</tt>.
29    <p><tt><a href="http://clang.llvm.org/doxygen/classclang_1_1BodyFarm.html">BodyFarm</a></tt>
30    allows the analyzer to explicitly model functions whose definitions are
31    not available during analysis. Modeling more of the widely used functions
32    (such as the members of <tt>std::string</tt>) will improve precision of the
33    analysis.
34    <i>(Difficulty: Easy, ongoing)</i><p>
35    </li>
36
37    <li>Handle floating-point values.
38    <p>Currently, the analyzer treats all floating-point values as unknown.
39    However, we already have most of the infrastructure we need to handle
40    floats: RangeConstraintManager. This would involve adding a new SVal kind
41    for constant floats, generalizing the constraint manager to handle floats
42    and integers equally, and auditing existing code to make sure it doesn't
43    make untoward assumptions.
44    <i> (Difficulty: Medium)</i></p>
45    </li>
46
47    <li>Implement generalized loop execution modeling.
48    <p>Currently, the analyzer simply unrolls each loop <tt>N</tt> times. This
49    means that it will not execute any code after the loop if the loop is
50    guaranteed to execute more than <tt>N</tt> times. This results in lost
51    basic block coverage. We could continue exploring the path if we could
52    model a generic <tt>i</tt>-th iteration of a loop.
53    <i> (Difficulty: Hard)</i></p>
54    </li>
55
56    <li>Enhance CFG to model C++ temporaries properly.
57    <p>There is an existing implementation of this, but it's not complete and
58    is disabled in the analyzer.
59    <i>(Difficulty: Medium)</i></p>
60
61    <li>Enhance CFG to model exception-handling properly.
62    <p>Currently exceptions are treated as "black holes", and exception-handling
63    control structures are poorly modeled (to be conservative). This could be
64    much improved for both C++ and Objective-C exceptions.
65    <i>(Difficulty: Medium)</i></p>
66
67    <li>Enhance CFG to model C++ <code>new</code> more precisely.
68    <p>The current representation of <code>new</code> does not provide an easy
69    way for the analyzer to model the call to a memory allocation function
70    (<code>operator new</code>), then initialize the result with a constructor
71    call. The problem is discussed at length in
72    <a href="http://llvm.org/bugs/show_bug.cgi?id=12014">PR12014</a>.
73    <i>(Difficulty: Easy)</i></p>
74
75    <li>Enhance CFG to model C++ <code>delete</code> more precisely.
76    <p>Similarly, the representation of <code>delete</code> does not include
77    the call to the destructor, followed by the call to the deallocation
78    function (<code>operator delete</code>). One particular issue
79    (<tt>noreturn</tt> destructors) is discussed in
80    <a href="http://llvm.org/bugs/show_bug.cgi?id=15599">PR15599</a>
81    <i>(Difficulty: Easy)</i></p>
82
83    <li>Track type info through casts more precisely.
84    <p>The DynamicTypePropagation checker is in charge of inferring a region's
85    dynamic type based on what operations the code is performing. Casts are a
86    rich source of type information that the analyzer currently ignores. They
87    are tricky to get right, but might have very useful consequences.
88    <i>(Difficulty: Medium)</i></p>
89
90    <li>Design and implement alpha-renaming.
91    <p>Implement unifying two symbolic values along a path after they are
92    determined to be equal via comparison. This would allow us to reduce the
93    number of false positives and would be a building step to more advanced
94    analyses, such as summary-based interprocedural and cross-translation-unit
95    analysis.
96    <i>(Difficulty: Hard)</i></p>
97    </li>
98  </ul>
99  </li>
100
101  <li>Bug Reporting
102  <ul>
103    <li>Add support for displaying cross-file diagnostic paths in HTML output
104    (used by <tt>scan-build</tt>).
105    <p>Currently <tt>scan-build</tt> output does not display reports that span
106    multiple files. The main problem is that we do not have a good format to
107    display such paths in HTML output. <i>(Difficulty: Medium)</i> </p>
108    </li>
109
110    <li>Relate bugs to checkers / "bug types"
111    <p>We need to come up with an API which will relate bug reports
112    to the checkers that produce them and refactor the existing code to use the
113    new API. This would allow us to identify the checker from the bug report,
114    which paves the way for selective control of certain checks.
115    <i>(Difficulty: Easy-Medium)</i></p>
116    </li>
117
118    <li>Refactor path diagnostic generation in <a href="http://clang.llvm.org/doxygen/BugReporter_8cpp_source.html">BugReporter.cpp</a>.
119    <p>It would be great to have more code reuse between "Minimal" and
120    "Extensive" PathDiagnostic generation algorithms. One idea is to create an
121    IR for representing path diagnostics, which would be later be used to
122    generate minimal or extensive report output. <i>(Difficulty: Medium)</i></p>
123    </li>
124  </ul>
125  </li>
126
127  <li>Other Infrastructure
128  <ul>
129    <li>Rewrite <tt>scan-build</tt> (in Python).
130    <p><i>(Difficulty: Easy)</i></p>
131    </li>
132
133    <li>Do a better job interposing on a compilation.
134    <p>Currently, <tt>scan-build</tt> just sets the <tt>CC</tt> and <tt>CXX</tt>
135    environment variables to its wrapper scripts, which then call into an
136    underlying platform compiler. This is problematic for any project that
137    doesn't exclusively use <tt>CC</tt> and <tt>CXX</tt> to control its
138    compilers.
139    <p><i>(Difficulty: Medium-Hard)</i></p>
140    </li>
141
142    <li>Create an <tt>analyzer_annotate</tt> attribute for the analyzer
143    annotations.
144    <p>We would like to put all analyzer attributes behind a fence so that we
145    could add/remove them without worrying that compiler (not analyzer) users
146    depend on them. Design and implement such a generic analyzer attribute in
147    the compiler. <i>(Difficulty: Medium)</i></p>
148    </li>
149  </ul>
150  </li>
151
152  <li>Enhanced Checks
153  <ul>
154    <li>Implement a production-ready StreamChecker.
155    <p>A SimpleStreamChecker has been presented in the Building a Checker in 24
156    Hours talk
157    (<a href="http://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf">slides</a>
158    <a href="http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4">video</a>).
159    We need to implement a production version of the checker with richer set of
160    APIs and evaluate it by running on real codebases.
161    <i>(Difficulty: Easy)</i></p>
162    </li>
163
164    <li>Extend Malloc checker with reasoning about custom allocator,
165    deallocator, and ownership-transfer functions.
166    <p>This would require extending the MallocPessimistic checker to reason
167    about annotated functions. It is strongly desired that one would rely on
168    the <tt>analyzer_annotate</tt> attribute, as described above.
169    <i>(Difficulty: Easy)</i></p>
170    </li>
171
172    <li>Implement a BitwiseMaskingChecker to handle <a href="http://llvm.org/bugs/show_bug.cgi?id=16615">PR16615</a>.
173    <p>Symbolic expressions of the form <code>$sym &amp; CONSTANT</code> can range from 0 to <code>CONSTANT-</code>1 if CONSTANT is <code>2^n-1</code>, e.g. 0xFF (0b11111111), 0x7F (0b01111111), 0x3 (0b0011), 0xFFFF, etc. Even without handling general bitwise operations on symbols, we can at least bound the value of the resulting expression. Bonus points for handling masks followed by shifts, e.g. <code>($sym &amp; 0b1100) >> 2</code>.
174    <i>(Difficulty: Easy)</i></p>
175    </li>
176
177    <li>Implement iterators invalidation checker.
178    <p><i>(Difficulty: Easy)</i></p>
179    </li>
180
181    <li>Write checkers which catch Copy and Paste errors.
182    <p>Take a look at the
183    <a href="http://pages.cs.wisc.edu/~shanlu/paper/TSE-CPMiner.pdf">CP-Miner</a>
184    paper for inspiration.
185    <i>(Difficulty: Medium-Hard)</i></p>
186    </li>
187  </ul>
188  </li>
189</ul>
190
191</div>
192</div>
193</body>
194</html>
195
196