1e5dd7070Spatrick=====================================
2e5dd7070SpatrickCross Translation Unit (CTU) Analysis
3e5dd7070Spatrick=====================================
4e5dd7070Spatrick
5e5dd7070SpatrickNormally, static analysis works in the boundary of one translation unit (TU).
6ec727ea7SpatrickHowever, with additional steps and configuration we can enable the analysis to inline the definition of a function from
7ec727ea7Spatrickanother TU.
8e5dd7070Spatrick
9e5dd7070Spatrick.. contents::
10e5dd7070Spatrick   :local:
11e5dd7070Spatrick
12ec727ea7SpatrickOverview
13ec727ea7Spatrick________
14ec727ea7SpatrickCTU analysis can be used in a variety of ways. The importing of external TU definitions can work with pre-dumped PCH
15ec727ea7Spatrickfiles or generating the necessary AST structure on-demand, during the analysis of the main TU. Driving the static
16ec727ea7Spatrickanalysis can also be implemented in multiple ways. The most direct way is to specify the necessary commandline options
17ec727ea7Spatrickof the Clang frontend manually (and generate the prerequisite dependencies of the specific import method by hand). This
18ec727ea7Spatrickprocess can be automated by other tools, like `CodeChecker <https://github.com/Ericsson/codechecker>`_ and scan-build-py
19ec727ea7Spatrick(preference for the former).
20e5dd7070Spatrick
21ec727ea7SpatrickPCH-based analysis
22ec727ea7Spatrick__________________
23ec727ea7SpatrickThe analysis needs the PCH dumps of all the translations units used in the project.
24ec727ea7SpatrickThese can be generated by the Clang Frontend itself, and must be arranged in a specific way in the filesystem.
25ec727ea7SpatrickThe index, which maps symbols' USR names to PCH dumps containing them must also be generated by the
26ec727ea7Spatrick`clang-extdef-mapping`. Entries in the index *must* have an `.ast` suffix if the goal
27ec727ea7Spatrickis to use PCH-based analysis, as the lack of that extension signals that the entry is to be used as a source-file, and parsed on-demand.
28ec727ea7SpatrickThis tool uses a :doc:`compilation database <../../JSONCompilationDatabase>` to
29ec727ea7Spatrickdetermine the compilation flags used.
30ec727ea7SpatrickThe analysis invocation must be provided with the directory which contains the dumps and the mapping files.
31ec727ea7Spatrick
32ec727ea7Spatrick
33ec727ea7SpatrickManual CTU Analysis
34ec727ea7Spatrick###################
35e5dd7070SpatrickLet's consider these source files in our minimal example:
36e5dd7070Spatrick
37e5dd7070Spatrick.. code-block:: cpp
38e5dd7070Spatrick
39e5dd7070Spatrick  // main.cpp
40e5dd7070Spatrick  int foo();
41e5dd7070Spatrick
42e5dd7070Spatrick  int main() {
43e5dd7070Spatrick    return 3 / foo();
44e5dd7070Spatrick  }
45e5dd7070Spatrick
46e5dd7070Spatrick.. code-block:: cpp
47e5dd7070Spatrick
48e5dd7070Spatrick  // foo.cpp
49e5dd7070Spatrick  int foo() {
50e5dd7070Spatrick    return 0;
51e5dd7070Spatrick  }
52e5dd7070Spatrick
53e5dd7070SpatrickAnd a compilation database:
54e5dd7070Spatrick
55e5dd7070Spatrick.. code-block:: bash
56e5dd7070Spatrick
57e5dd7070Spatrick  [
58e5dd7070Spatrick    {
59e5dd7070Spatrick      "directory": "/path/to/your/project",
60e5dd7070Spatrick      "command": "clang++ -c foo.cpp -o foo.o",
61e5dd7070Spatrick      "file": "foo.cpp"
62e5dd7070Spatrick    },
63e5dd7070Spatrick    {
64e5dd7070Spatrick      "directory": "/path/to/your/project",
65e5dd7070Spatrick      "command": "clang++ -c main.cpp -o main.o",
66e5dd7070Spatrick      "file": "main.cpp"
67e5dd7070Spatrick    }
68e5dd7070Spatrick  ]
69e5dd7070Spatrick
70e5dd7070SpatrickWe'd like to analyze `main.cpp` and discover the division by zero bug.
71ec727ea7SpatrickIn order to be able to inline the definition of `foo` from `foo.cpp` first we have to generate the `AST` (or `PCH`) file
72ec727ea7Spatrickof `foo.cpp`:
73e5dd7070Spatrick
74e5dd7070Spatrick.. code-block:: bash
75e5dd7070Spatrick
76e5dd7070Spatrick  $ pwd $ /path/to/your/project
77e5dd7070Spatrick  $ clang++ -emit-ast -o foo.cpp.ast foo.cpp
78e5dd7070Spatrick  $ # Check that the .ast file is generated:
79e5dd7070Spatrick  $ ls
80e5dd7070Spatrick  compile_commands.json  foo.cpp.ast  foo.cpp  main.cpp
81e5dd7070Spatrick  $
82e5dd7070Spatrick
83ec727ea7SpatrickThe next step is to create a CTU index file which holds the `USR` name and location of external definitions in the
84*12c85518Srobertsource files in format `<USR-Length>:<USR> <File-Path>`:
85e5dd7070Spatrick
86e5dd7070Spatrick.. code-block:: bash
87e5dd7070Spatrick
88e5dd7070Spatrick  $ clang-extdef-mapping -p . foo.cpp
89*12c85518Srobert  9:c:@F@foo# /path/to/your/project/foo.cpp
90e5dd7070Spatrick  $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
91e5dd7070Spatrick
92e5dd7070SpatrickWe have to modify `externalDefMap.txt` to contain the name of the `.ast` files instead of the source files:
93e5dd7070Spatrick
94e5dd7070Spatrick.. code-block:: bash
95e5dd7070Spatrick
96e5dd7070Spatrick  $ sed -i -e "s/.cpp/.cpp.ast/g" externalDefMap.txt
97e5dd7070Spatrick
98e5dd7070SpatrickWe still have to further modify the `externalDefMap.txt` file to contain relative paths:
99e5dd7070Spatrick
100e5dd7070Spatrick.. code-block:: bash
101e5dd7070Spatrick
102e5dd7070Spatrick  $ sed -i -e "s|$(pwd)/||g" externalDefMap.txt
103e5dd7070Spatrick
104e5dd7070SpatrickNow everything is available for the CTU analysis.
105e5dd7070SpatrickWe have to feed Clang with CTU specific extra arguments:
106e5dd7070Spatrick
107e5dd7070Spatrick.. code-block:: bash
108e5dd7070Spatrick
109e5dd7070Spatrick  $ pwd
110e5dd7070Spatrick  /path/to/your/project
111ec727ea7Spatrick  $ clang++ --analyze \
112ec727ea7Spatrick      -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
113ec727ea7Spatrick      -Xclang -analyzer-config -Xclang ctu-dir=. \
114ec727ea7Spatrick      -Xclang -analyzer-output=plist-multi-file \
115ec727ea7Spatrick      main.cpp
116e5dd7070Spatrick  main.cpp:5:12: warning: Division by zero
117e5dd7070Spatrick    return 3 / foo();
118e5dd7070Spatrick           ~~^~~~~~~
119e5dd7070Spatrick  1 warning generated.
120e5dd7070Spatrick  $ # The plist file with the result is generated.
121ec727ea7Spatrick  $ ls -F
122e5dd7070Spatrick  compile_commands.json  externalDefMap.txt  foo.ast  foo.cpp  foo.cpp.ast  main.cpp  main.plist
123e5dd7070Spatrick  $
124e5dd7070Spatrick
125ec727ea7SpatrickThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use
126ec727ea7Spatrick`CodeChecker` or `scan-build-py`.
127e5dd7070Spatrick
128e5dd7070SpatrickAutomated CTU Analysis with CodeChecker
129ec727ea7Spatrick#######################################
130e5dd7070SpatrickThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang.
131e5dd7070SpatrickOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes:
132e5dd7070Spatrick
133e5dd7070Spatrick.. code-block:: bash
134e5dd7070Spatrick
135e5dd7070Spatrick  $ CodeChecker analyze --ctu compile_commands.json -o reports
136ec727ea7Spatrick  $ ls -F
137ec727ea7Spatrick  compile_commands.json  foo.cpp  foo.cpp.ast  main.cpp  reports/
138e5dd7070Spatrick  $ tree reports
139e5dd7070Spatrick  reports
140e5dd7070Spatrick  ├── compile_cmd.json
141e5dd7070Spatrick  ├── compiler_info.json
142e5dd7070Spatrick  ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist
143e5dd7070Spatrick  ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist
144e5dd7070Spatrick  ├── metadata.json
145e5dd7070Spatrick  └── unique_compile_commands.json
146e5dd7070Spatrick
147e5dd7070Spatrick  0 directories, 6 files
148e5dd7070Spatrick  $
149e5dd7070Spatrick
150e5dd7070SpatrickThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools.
151e5dd7070SpatrickE.g. one may use `CodeChecker parse` to view the results in command line:
152e5dd7070Spatrick
153e5dd7070Spatrick.. code-block:: bash
154e5dd7070Spatrick
155e5dd7070Spatrick  $ CodeChecker parse reports
156e5dd7070Spatrick  [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero]
157e5dd7070Spatrick    return 3 / foo();
158e5dd7070Spatrick             ^
159e5dd7070Spatrick
160e5dd7070Spatrick  Found 1 defect(s) in main.cpp
161e5dd7070Spatrick
162e5dd7070Spatrick
163e5dd7070Spatrick  ----==== Summary ====----
164e5dd7070Spatrick  -----------------------
165e5dd7070Spatrick  Filename | Report count
166e5dd7070Spatrick  -----------------------
167e5dd7070Spatrick  main.cpp |            1
168e5dd7070Spatrick  -----------------------
169e5dd7070Spatrick  -----------------------
170e5dd7070Spatrick  Severity | Report count
171e5dd7070Spatrick  -----------------------
172e5dd7070Spatrick  HIGH     |            1
173e5dd7070Spatrick  -----------------------
174e5dd7070Spatrick  ----=================----
175e5dd7070Spatrick  Total number of reports: 1
176e5dd7070Spatrick  ----=================----
177e5dd7070Spatrick
178e5dd7070SpatrickOr we can use `CodeChecker parse -e html` to export the results into HTML format:
179e5dd7070Spatrick
180e5dd7070Spatrick.. code-block:: bash
181e5dd7070Spatrick
182e5dd7070Spatrick  $ CodeChecker parse -e html -o html_out reports
183e5dd7070Spatrick  $ firefox html_out/index.html
184e5dd7070Spatrick
185e5dd7070SpatrickAutomated CTU Analysis with scan-build-py (don't do it)
186ec727ea7Spatrick#############################################################
187ec727ea7SpatrickWe actively develop CTU with CodeChecker as the driver for this feature, `scan-build-py` is not actively developed for CTU.
188ec727ea7Spatrick`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only.
189e5dd7070Spatrick
190e5dd7070SpatrickExample usage of scan-build-py:
191e5dd7070Spatrick
192e5dd7070Spatrick.. code-block:: bash
193e5dd7070Spatrick
194e5dd7070Spatrick  $ /your/path/to/llvm-project/clang/tools/scan-build-py/bin/analyze-build --ctu
195e5dd7070Spatrick  analyze-build: Run 'scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk' to examine bug reports.
196e5dd7070Spatrick  $ /your/path/to/llvm-project/clang/tools/scan-view/bin/scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk
197e5dd7070Spatrick  Starting scan-view at: http://127.0.0.1:8181
198e5dd7070Spatrick    Use Ctrl-C to exit.
199e5dd7070Spatrick  [6336:6431:0717/175357.633914:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service
200e5dd7070Spatrick  Opening in existing browser session.
201e5dd7070Spatrick  ^C
202e5dd7070Spatrick  $
203ec727ea7Spatrick
204ec727ea7Spatrick.. _ctu-on-demand:
205ec727ea7Spatrick
206ec727ea7SpatrickOn-demand analysis
207ec727ea7Spatrick__________________
208ec727ea7SpatrickThe analysis produces the necessary AST structure of external TUs during analysis. This requires the
209ec727ea7Spatrickexact compiler invocations for each TU, which can be generated by hand, or by tools driving the analyzer.
210ec727ea7SpatrickThe compiler invocation is a shell command that could be used to compile the TU-s main source file.
211ec727ea7SpatrickThe mapping from absolute source file paths of a TU to lists of compilation command segments used to
212ec727ea7Spatrickcompile said TU are given in YAML format referred to as `invocation list`, and must be passed as an
213*12c85518Srobertanalyzer-config argument.
214ec727ea7SpatrickThe index, which maps function USR names to source files containing them must also be generated by the
215ec727ea7Spatrick`clang-extdef-mapping`. Entries in the index must *not* have an `.ast` suffix if the goal
216ec727ea7Spatrickis to use On-demand analysis, as that extension signals that the entry is to be used as an PCH-dump.
217ec727ea7SpatrickThe mapping of external definitions implicitly uses a
218ec727ea7Spatrick:doc:`compilation database <../../JSONCompilationDatabase>` to determine the compilation flags used.
219ec727ea7SpatrickThe analysis invocation must be provided with the directory which contains the mapping
220ec727ea7Spatrickfiles, and the `invocation list` which is used to determine compiler flags.
221ec727ea7Spatrick
222ec727ea7Spatrick
223ec727ea7SpatrickManual CTU Analysis
224ec727ea7Spatrick###################
225ec727ea7Spatrick
226ec727ea7SpatrickLet's consider these source files in our minimal example:
227ec727ea7Spatrick
228ec727ea7Spatrick.. code-block:: cpp
229ec727ea7Spatrick
230ec727ea7Spatrick  // main.cpp
231ec727ea7Spatrick  int foo();
232ec727ea7Spatrick
233ec727ea7Spatrick  int main() {
234ec727ea7Spatrick    return 3 / foo();
235ec727ea7Spatrick  }
236ec727ea7Spatrick
237ec727ea7Spatrick.. code-block:: cpp
238ec727ea7Spatrick
239ec727ea7Spatrick  // foo.cpp
240ec727ea7Spatrick  int foo() {
241ec727ea7Spatrick    return 0;
242ec727ea7Spatrick  }
243ec727ea7Spatrick
244ec727ea7SpatrickThe compilation database:
245ec727ea7Spatrick
246ec727ea7Spatrick.. code-block:: bash
247ec727ea7Spatrick
248ec727ea7Spatrick  [
249ec727ea7Spatrick    {
250ec727ea7Spatrick      "directory": "/path/to/your/project",
251ec727ea7Spatrick      "command": "clang++ -c foo.cpp -o foo.o",
252ec727ea7Spatrick      "file": "foo.cpp"
253ec727ea7Spatrick    },
254ec727ea7Spatrick    {
255ec727ea7Spatrick      "directory": "/path/to/your/project",
256ec727ea7Spatrick      "command": "clang++ -c main.cpp -o main.o",
257ec727ea7Spatrick      "file": "main.cpp"
258ec727ea7Spatrick    }
259ec727ea7Spatrick  ]
260ec727ea7Spatrick
261ec727ea7SpatrickThe `invocation list`:
262ec727ea7Spatrick
263ec727ea7Spatrick.. code-block:: bash
264ec727ea7Spatrick
265ec727ea7Spatrick  "/path/to/your/project/foo.cpp":
266ec727ea7Spatrick    - "clang++"
267ec727ea7Spatrick    - "-c"
268ec727ea7Spatrick    - "/path/to/your/project/foo.cpp"
269ec727ea7Spatrick    - "-o"
270ec727ea7Spatrick    - "/path/to/your/project/foo.o"
271ec727ea7Spatrick
272ec727ea7Spatrick  "/path/to/your/project/main.cpp":
273ec727ea7Spatrick    - "clang++"
274ec727ea7Spatrick    - "-c"
275ec727ea7Spatrick    - "/path/to/your/project/main.cpp"
276ec727ea7Spatrick    - "-o"
277ec727ea7Spatrick    - "/path/to/your/project/main.o"
278ec727ea7Spatrick
279ec727ea7SpatrickWe'd like to analyze `main.cpp` and discover the division by zero bug.
280ec727ea7SpatrickAs we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of
281*12c85518Srobertexternal definitions in the source files in format `<USR-Length>:<USR> <File-Path>`:
282ec727ea7Spatrick
283ec727ea7Spatrick.. code-block:: bash
284ec727ea7Spatrick
285ec727ea7Spatrick  $ clang-extdef-mapping -p . foo.cpp
286*12c85518Srobert  9:c:@F@foo# /path/to/your/project/foo.cpp
287ec727ea7Spatrick  $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt
288ec727ea7Spatrick
289ec727ea7SpatrickNow everything is available for the CTU analysis.
290ec727ea7SpatrickWe have to feed Clang with CTU specific extra arguments:
291ec727ea7Spatrick
292ec727ea7Spatrick.. code-block:: bash
293ec727ea7Spatrick
294ec727ea7Spatrick  $ pwd
295ec727ea7Spatrick  /path/to/your/project
296ec727ea7Spatrick  $ clang++ --analyze \
297ec727ea7Spatrick      -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \
298ec727ea7Spatrick      -Xclang -analyzer-config -Xclang ctu-dir=. \
299ec727ea7Spatrick      -Xclang -analyzer-config -Xclang ctu-invocation-list=invocations.yaml \
300ec727ea7Spatrick      -Xclang -analyzer-output=plist-multi-file \
301ec727ea7Spatrick      main.cpp
302ec727ea7Spatrick  main.cpp:5:12: warning: Division by zero
303ec727ea7Spatrick    return 3 / foo();
304ec727ea7Spatrick           ~~^~~~~~~
305ec727ea7Spatrick  1 warning generated.
306ec727ea7Spatrick  $ # The plist file with the result is generated.
307ec727ea7Spatrick  $ ls -F
308ec727ea7Spatrick  compile_commands.json  externalDefMap.txt  foo.cpp  main.cpp  main.plist
309ec727ea7Spatrick  $
310ec727ea7Spatrick
311ec727ea7SpatrickThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use
312ec727ea7Spatrick`CodeChecker` or `scan-build-py`.
313ec727ea7Spatrick
314ec727ea7SpatrickAutomated CTU Analysis with CodeChecker
315ec727ea7Spatrick#######################################
316ec727ea7SpatrickThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang.
317ec727ea7SpatrickOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes:
318ec727ea7Spatrick
319ec727ea7Spatrick.. code-block:: bash
320ec727ea7Spatrick
321ec727ea7Spatrick  $ CodeChecker analyze --ctu --ctu-ast-loading-mode on-demand compile_commands.json -o reports
322ec727ea7Spatrick  $ ls -F
323ec727ea7Spatrick  compile_commands.json  foo.cpp main.cpp  reports/
324ec727ea7Spatrick  $ tree reports
325ec727ea7Spatrick  reports
326ec727ea7Spatrick  ├── compile_cmd.json
327ec727ea7Spatrick  ├── compiler_info.json
328ec727ea7Spatrick  ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist
329ec727ea7Spatrick  ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist
330ec727ea7Spatrick  ├── metadata.json
331ec727ea7Spatrick  └── unique_compile_commands.json
332ec727ea7Spatrick
333ec727ea7Spatrick  0 directories, 6 files
334ec727ea7Spatrick  $
335ec727ea7Spatrick
336ec727ea7SpatrickThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools.
337ec727ea7SpatrickE.g. one may use `CodeChecker parse` to view the results in command line:
338ec727ea7Spatrick
339ec727ea7Spatrick.. code-block:: bash
340ec727ea7Spatrick
341ec727ea7Spatrick  $ CodeChecker parse reports
342ec727ea7Spatrick  [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero]
343ec727ea7Spatrick    return 3 / foo();
344ec727ea7Spatrick             ^
345ec727ea7Spatrick
346ec727ea7Spatrick  Found 1 defect(s) in main.cpp
347ec727ea7Spatrick
348ec727ea7Spatrick
349ec727ea7Spatrick  ----==== Summary ====----
350ec727ea7Spatrick  -----------------------
351ec727ea7Spatrick  Filename | Report count
352ec727ea7Spatrick  -----------------------
353ec727ea7Spatrick  main.cpp |            1
354ec727ea7Spatrick  -----------------------
355ec727ea7Spatrick  -----------------------
356ec727ea7Spatrick  Severity | Report count
357ec727ea7Spatrick  -----------------------
358ec727ea7Spatrick  HIGH     |            1
359ec727ea7Spatrick  -----------------------
360ec727ea7Spatrick  ----=================----
361ec727ea7Spatrick  Total number of reports: 1
362ec727ea7Spatrick  ----=================----
363ec727ea7Spatrick
364ec727ea7SpatrickOr we can use `CodeChecker parse -e html` to export the results into HTML format:
365ec727ea7Spatrick
366ec727ea7Spatrick.. code-block:: bash
367ec727ea7Spatrick
368ec727ea7Spatrick  $ CodeChecker parse -e html -o html_out reports
369ec727ea7Spatrick  $ firefox html_out/index.html
370ec727ea7Spatrick
371ec727ea7SpatrickAutomated CTU Analysis with scan-build-py (don't do it)
372ec727ea7Spatrick#######################################################
373ec727ea7SpatrickWe actively develop CTU with CodeChecker as the driver for feature, `scan-build-py` is not actively developed for CTU.
374ec727ea7Spatrick`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only.
375ec727ea7Spatrick
376ec727ea7SpatrickCurrently On-demand analysis is not supported with `scan-build-py`.
377