1e5dd7070Spatrick===================================== 2e5dd7070SpatrickCross Translation Unit (CTU) Analysis 3e5dd7070Spatrick===================================== 4e5dd7070Spatrick 5e5dd7070SpatrickNormally, static analysis works in the boundary of one translation unit (TU). 6ec727ea7SpatrickHowever, with additional steps and configuration we can enable the analysis to inline the definition of a function from 7ec727ea7Spatrickanother TU. 8e5dd7070Spatrick 9e5dd7070Spatrick.. contents:: 10e5dd7070Spatrick :local: 11e5dd7070Spatrick 12ec727ea7SpatrickOverview 13ec727ea7Spatrick________ 14ec727ea7SpatrickCTU analysis can be used in a variety of ways. The importing of external TU definitions can work with pre-dumped PCH 15ec727ea7Spatrickfiles or generating the necessary AST structure on-demand, during the analysis of the main TU. Driving the static 16ec727ea7Spatrickanalysis can also be implemented in multiple ways. The most direct way is to specify the necessary commandline options 17ec727ea7Spatrickof the Clang frontend manually (and generate the prerequisite dependencies of the specific import method by hand). This 18ec727ea7Spatrickprocess can be automated by other tools, like `CodeChecker <https://github.com/Ericsson/codechecker>`_ and scan-build-py 19ec727ea7Spatrick(preference for the former). 20e5dd7070Spatrick 21ec727ea7SpatrickPCH-based analysis 22ec727ea7Spatrick__________________ 23ec727ea7SpatrickThe analysis needs the PCH dumps of all the translations units used in the project. 24ec727ea7SpatrickThese can be generated by the Clang Frontend itself, and must be arranged in a specific way in the filesystem. 25ec727ea7SpatrickThe index, which maps symbols' USR names to PCH dumps containing them must also be generated by the 26ec727ea7Spatrick`clang-extdef-mapping`. Entries in the index *must* have an `.ast` suffix if the goal 27ec727ea7Spatrickis to use PCH-based analysis, as the lack of that extension signals that the entry is to be used as a source-file, and parsed on-demand. 28ec727ea7SpatrickThis tool uses a :doc:`compilation database <../../JSONCompilationDatabase>` to 29ec727ea7Spatrickdetermine the compilation flags used. 30ec727ea7SpatrickThe analysis invocation must be provided with the directory which contains the dumps and the mapping files. 31ec727ea7Spatrick 32ec727ea7Spatrick 33ec727ea7SpatrickManual CTU Analysis 34ec727ea7Spatrick################### 35e5dd7070SpatrickLet's consider these source files in our minimal example: 36e5dd7070Spatrick 37e5dd7070Spatrick.. code-block:: cpp 38e5dd7070Spatrick 39e5dd7070Spatrick // main.cpp 40e5dd7070Spatrick int foo(); 41e5dd7070Spatrick 42e5dd7070Spatrick int main() { 43e5dd7070Spatrick return 3 / foo(); 44e5dd7070Spatrick } 45e5dd7070Spatrick 46e5dd7070Spatrick.. code-block:: cpp 47e5dd7070Spatrick 48e5dd7070Spatrick // foo.cpp 49e5dd7070Spatrick int foo() { 50e5dd7070Spatrick return 0; 51e5dd7070Spatrick } 52e5dd7070Spatrick 53e5dd7070SpatrickAnd a compilation database: 54e5dd7070Spatrick 55e5dd7070Spatrick.. code-block:: bash 56e5dd7070Spatrick 57e5dd7070Spatrick [ 58e5dd7070Spatrick { 59e5dd7070Spatrick "directory": "/path/to/your/project", 60e5dd7070Spatrick "command": "clang++ -c foo.cpp -o foo.o", 61e5dd7070Spatrick "file": "foo.cpp" 62e5dd7070Spatrick }, 63e5dd7070Spatrick { 64e5dd7070Spatrick "directory": "/path/to/your/project", 65e5dd7070Spatrick "command": "clang++ -c main.cpp -o main.o", 66e5dd7070Spatrick "file": "main.cpp" 67e5dd7070Spatrick } 68e5dd7070Spatrick ] 69e5dd7070Spatrick 70e5dd7070SpatrickWe'd like to analyze `main.cpp` and discover the division by zero bug. 71ec727ea7SpatrickIn order to be able to inline the definition of `foo` from `foo.cpp` first we have to generate the `AST` (or `PCH`) file 72ec727ea7Spatrickof `foo.cpp`: 73e5dd7070Spatrick 74e5dd7070Spatrick.. code-block:: bash 75e5dd7070Spatrick 76e5dd7070Spatrick $ pwd $ /path/to/your/project 77e5dd7070Spatrick $ clang++ -emit-ast -o foo.cpp.ast foo.cpp 78e5dd7070Spatrick $ # Check that the .ast file is generated: 79e5dd7070Spatrick $ ls 80e5dd7070Spatrick compile_commands.json foo.cpp.ast foo.cpp main.cpp 81e5dd7070Spatrick $ 82e5dd7070Spatrick 83ec727ea7SpatrickThe next step is to create a CTU index file which holds the `USR` name and location of external definitions in the 84*12c85518Srobertsource files in format `<USR-Length>:<USR> <File-Path>`: 85e5dd7070Spatrick 86e5dd7070Spatrick.. code-block:: bash 87e5dd7070Spatrick 88e5dd7070Spatrick $ clang-extdef-mapping -p . foo.cpp 89*12c85518Srobert 9:c:@F@foo# /path/to/your/project/foo.cpp 90e5dd7070Spatrick $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt 91e5dd7070Spatrick 92e5dd7070SpatrickWe have to modify `externalDefMap.txt` to contain the name of the `.ast` files instead of the source files: 93e5dd7070Spatrick 94e5dd7070Spatrick.. code-block:: bash 95e5dd7070Spatrick 96e5dd7070Spatrick $ sed -i -e "s/.cpp/.cpp.ast/g" externalDefMap.txt 97e5dd7070Spatrick 98e5dd7070SpatrickWe still have to further modify the `externalDefMap.txt` file to contain relative paths: 99e5dd7070Spatrick 100e5dd7070Spatrick.. code-block:: bash 101e5dd7070Spatrick 102e5dd7070Spatrick $ sed -i -e "s|$(pwd)/||g" externalDefMap.txt 103e5dd7070Spatrick 104e5dd7070SpatrickNow everything is available for the CTU analysis. 105e5dd7070SpatrickWe have to feed Clang with CTU specific extra arguments: 106e5dd7070Spatrick 107e5dd7070Spatrick.. code-block:: bash 108e5dd7070Spatrick 109e5dd7070Spatrick $ pwd 110e5dd7070Spatrick /path/to/your/project 111ec727ea7Spatrick $ clang++ --analyze \ 112ec727ea7Spatrick -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \ 113ec727ea7Spatrick -Xclang -analyzer-config -Xclang ctu-dir=. \ 114ec727ea7Spatrick -Xclang -analyzer-output=plist-multi-file \ 115ec727ea7Spatrick main.cpp 116e5dd7070Spatrick main.cpp:5:12: warning: Division by zero 117e5dd7070Spatrick return 3 / foo(); 118e5dd7070Spatrick ~~^~~~~~~ 119e5dd7070Spatrick 1 warning generated. 120e5dd7070Spatrick $ # The plist file with the result is generated. 121ec727ea7Spatrick $ ls -F 122e5dd7070Spatrick compile_commands.json externalDefMap.txt foo.ast foo.cpp foo.cpp.ast main.cpp main.plist 123e5dd7070Spatrick $ 124e5dd7070Spatrick 125ec727ea7SpatrickThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use 126ec727ea7Spatrick`CodeChecker` or `scan-build-py`. 127e5dd7070Spatrick 128e5dd7070SpatrickAutomated CTU Analysis with CodeChecker 129ec727ea7Spatrick####################################### 130e5dd7070SpatrickThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang. 131e5dd7070SpatrickOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes: 132e5dd7070Spatrick 133e5dd7070Spatrick.. code-block:: bash 134e5dd7070Spatrick 135e5dd7070Spatrick $ CodeChecker analyze --ctu compile_commands.json -o reports 136ec727ea7Spatrick $ ls -F 137ec727ea7Spatrick compile_commands.json foo.cpp foo.cpp.ast main.cpp reports/ 138e5dd7070Spatrick $ tree reports 139e5dd7070Spatrick reports 140e5dd7070Spatrick ├── compile_cmd.json 141e5dd7070Spatrick ├── compiler_info.json 142e5dd7070Spatrick ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist 143e5dd7070Spatrick ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist 144e5dd7070Spatrick ├── metadata.json 145e5dd7070Spatrick └── unique_compile_commands.json 146e5dd7070Spatrick 147e5dd7070Spatrick 0 directories, 6 files 148e5dd7070Spatrick $ 149e5dd7070Spatrick 150e5dd7070SpatrickThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools. 151e5dd7070SpatrickE.g. one may use `CodeChecker parse` to view the results in command line: 152e5dd7070Spatrick 153e5dd7070Spatrick.. code-block:: bash 154e5dd7070Spatrick 155e5dd7070Spatrick $ CodeChecker parse reports 156e5dd7070Spatrick [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero] 157e5dd7070Spatrick return 3 / foo(); 158e5dd7070Spatrick ^ 159e5dd7070Spatrick 160e5dd7070Spatrick Found 1 defect(s) in main.cpp 161e5dd7070Spatrick 162e5dd7070Spatrick 163e5dd7070Spatrick ----==== Summary ====---- 164e5dd7070Spatrick ----------------------- 165e5dd7070Spatrick Filename | Report count 166e5dd7070Spatrick ----------------------- 167e5dd7070Spatrick main.cpp | 1 168e5dd7070Spatrick ----------------------- 169e5dd7070Spatrick ----------------------- 170e5dd7070Spatrick Severity | Report count 171e5dd7070Spatrick ----------------------- 172e5dd7070Spatrick HIGH | 1 173e5dd7070Spatrick ----------------------- 174e5dd7070Spatrick ----=================---- 175e5dd7070Spatrick Total number of reports: 1 176e5dd7070Spatrick ----=================---- 177e5dd7070Spatrick 178e5dd7070SpatrickOr we can use `CodeChecker parse -e html` to export the results into HTML format: 179e5dd7070Spatrick 180e5dd7070Spatrick.. code-block:: bash 181e5dd7070Spatrick 182e5dd7070Spatrick $ CodeChecker parse -e html -o html_out reports 183e5dd7070Spatrick $ firefox html_out/index.html 184e5dd7070Spatrick 185e5dd7070SpatrickAutomated CTU Analysis with scan-build-py (don't do it) 186ec727ea7Spatrick############################################################# 187ec727ea7SpatrickWe actively develop CTU with CodeChecker as the driver for this feature, `scan-build-py` is not actively developed for CTU. 188ec727ea7Spatrick`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only. 189e5dd7070Spatrick 190e5dd7070SpatrickExample usage of scan-build-py: 191e5dd7070Spatrick 192e5dd7070Spatrick.. code-block:: bash 193e5dd7070Spatrick 194e5dd7070Spatrick $ /your/path/to/llvm-project/clang/tools/scan-build-py/bin/analyze-build --ctu 195e5dd7070Spatrick analyze-build: Run 'scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk' to examine bug reports. 196e5dd7070Spatrick $ /your/path/to/llvm-project/clang/tools/scan-view/bin/scan-view /tmp/scan-build-2019-07-17-17-53-33-810365-7fqgWk 197e5dd7070Spatrick Starting scan-view at: http://127.0.0.1:8181 198e5dd7070Spatrick Use Ctrl-C to exit. 199e5dd7070Spatrick [6336:6431:0717/175357.633914:ERROR:browser_process_sub_thread.cc(209)] Waited 5 ms for network service 200e5dd7070Spatrick Opening in existing browser session. 201e5dd7070Spatrick ^C 202e5dd7070Spatrick $ 203ec727ea7Spatrick 204ec727ea7Spatrick.. _ctu-on-demand: 205ec727ea7Spatrick 206ec727ea7SpatrickOn-demand analysis 207ec727ea7Spatrick__________________ 208ec727ea7SpatrickThe analysis produces the necessary AST structure of external TUs during analysis. This requires the 209ec727ea7Spatrickexact compiler invocations for each TU, which can be generated by hand, or by tools driving the analyzer. 210ec727ea7SpatrickThe compiler invocation is a shell command that could be used to compile the TU-s main source file. 211ec727ea7SpatrickThe mapping from absolute source file paths of a TU to lists of compilation command segments used to 212ec727ea7Spatrickcompile said TU are given in YAML format referred to as `invocation list`, and must be passed as an 213*12c85518Srobertanalyzer-config argument. 214ec727ea7SpatrickThe index, which maps function USR names to source files containing them must also be generated by the 215ec727ea7Spatrick`clang-extdef-mapping`. Entries in the index must *not* have an `.ast` suffix if the goal 216ec727ea7Spatrickis to use On-demand analysis, as that extension signals that the entry is to be used as an PCH-dump. 217ec727ea7SpatrickThe mapping of external definitions implicitly uses a 218ec727ea7Spatrick:doc:`compilation database <../../JSONCompilationDatabase>` to determine the compilation flags used. 219ec727ea7SpatrickThe analysis invocation must be provided with the directory which contains the mapping 220ec727ea7Spatrickfiles, and the `invocation list` which is used to determine compiler flags. 221ec727ea7Spatrick 222ec727ea7Spatrick 223ec727ea7SpatrickManual CTU Analysis 224ec727ea7Spatrick################### 225ec727ea7Spatrick 226ec727ea7SpatrickLet's consider these source files in our minimal example: 227ec727ea7Spatrick 228ec727ea7Spatrick.. code-block:: cpp 229ec727ea7Spatrick 230ec727ea7Spatrick // main.cpp 231ec727ea7Spatrick int foo(); 232ec727ea7Spatrick 233ec727ea7Spatrick int main() { 234ec727ea7Spatrick return 3 / foo(); 235ec727ea7Spatrick } 236ec727ea7Spatrick 237ec727ea7Spatrick.. code-block:: cpp 238ec727ea7Spatrick 239ec727ea7Spatrick // foo.cpp 240ec727ea7Spatrick int foo() { 241ec727ea7Spatrick return 0; 242ec727ea7Spatrick } 243ec727ea7Spatrick 244ec727ea7SpatrickThe compilation database: 245ec727ea7Spatrick 246ec727ea7Spatrick.. code-block:: bash 247ec727ea7Spatrick 248ec727ea7Spatrick [ 249ec727ea7Spatrick { 250ec727ea7Spatrick "directory": "/path/to/your/project", 251ec727ea7Spatrick "command": "clang++ -c foo.cpp -o foo.o", 252ec727ea7Spatrick "file": "foo.cpp" 253ec727ea7Spatrick }, 254ec727ea7Spatrick { 255ec727ea7Spatrick "directory": "/path/to/your/project", 256ec727ea7Spatrick "command": "clang++ -c main.cpp -o main.o", 257ec727ea7Spatrick "file": "main.cpp" 258ec727ea7Spatrick } 259ec727ea7Spatrick ] 260ec727ea7Spatrick 261ec727ea7SpatrickThe `invocation list`: 262ec727ea7Spatrick 263ec727ea7Spatrick.. code-block:: bash 264ec727ea7Spatrick 265ec727ea7Spatrick "/path/to/your/project/foo.cpp": 266ec727ea7Spatrick - "clang++" 267ec727ea7Spatrick - "-c" 268ec727ea7Spatrick - "/path/to/your/project/foo.cpp" 269ec727ea7Spatrick - "-o" 270ec727ea7Spatrick - "/path/to/your/project/foo.o" 271ec727ea7Spatrick 272ec727ea7Spatrick "/path/to/your/project/main.cpp": 273ec727ea7Spatrick - "clang++" 274ec727ea7Spatrick - "-c" 275ec727ea7Spatrick - "/path/to/your/project/main.cpp" 276ec727ea7Spatrick - "-o" 277ec727ea7Spatrick - "/path/to/your/project/main.o" 278ec727ea7Spatrick 279ec727ea7SpatrickWe'd like to analyze `main.cpp` and discover the division by zero bug. 280ec727ea7SpatrickAs we are using On-demand mode, we only need to create a CTU index file which holds the `USR` name and location of 281*12c85518Srobertexternal definitions in the source files in format `<USR-Length>:<USR> <File-Path>`: 282ec727ea7Spatrick 283ec727ea7Spatrick.. code-block:: bash 284ec727ea7Spatrick 285ec727ea7Spatrick $ clang-extdef-mapping -p . foo.cpp 286*12c85518Srobert 9:c:@F@foo# /path/to/your/project/foo.cpp 287ec727ea7Spatrick $ clang-extdef-mapping -p . foo.cpp > externalDefMap.txt 288ec727ea7Spatrick 289ec727ea7SpatrickNow everything is available for the CTU analysis. 290ec727ea7SpatrickWe have to feed Clang with CTU specific extra arguments: 291ec727ea7Spatrick 292ec727ea7Spatrick.. code-block:: bash 293ec727ea7Spatrick 294ec727ea7Spatrick $ pwd 295ec727ea7Spatrick /path/to/your/project 296ec727ea7Spatrick $ clang++ --analyze \ 297ec727ea7Spatrick -Xclang -analyzer-config -Xclang experimental-enable-naive-ctu-analysis=true \ 298ec727ea7Spatrick -Xclang -analyzer-config -Xclang ctu-dir=. \ 299ec727ea7Spatrick -Xclang -analyzer-config -Xclang ctu-invocation-list=invocations.yaml \ 300ec727ea7Spatrick -Xclang -analyzer-output=plist-multi-file \ 301ec727ea7Spatrick main.cpp 302ec727ea7Spatrick main.cpp:5:12: warning: Division by zero 303ec727ea7Spatrick return 3 / foo(); 304ec727ea7Spatrick ~~^~~~~~~ 305ec727ea7Spatrick 1 warning generated. 306ec727ea7Spatrick $ # The plist file with the result is generated. 307ec727ea7Spatrick $ ls -F 308ec727ea7Spatrick compile_commands.json externalDefMap.txt foo.cpp main.cpp main.plist 309ec727ea7Spatrick $ 310ec727ea7Spatrick 311ec727ea7SpatrickThis manual procedure is error-prone and not scalable, therefore to analyze real projects it is recommended to use 312ec727ea7Spatrick`CodeChecker` or `scan-build-py`. 313ec727ea7Spatrick 314ec727ea7SpatrickAutomated CTU Analysis with CodeChecker 315ec727ea7Spatrick####################################### 316ec727ea7SpatrickThe `CodeChecker <https://github.com/Ericsson/codechecker>`_ project fully supports automated CTU analysis with Clang. 317ec727ea7SpatrickOnce we have set up the `PATH` environment variable and we activated the python `venv` then it is all it takes: 318ec727ea7Spatrick 319ec727ea7Spatrick.. code-block:: bash 320ec727ea7Spatrick 321ec727ea7Spatrick $ CodeChecker analyze --ctu --ctu-ast-loading-mode on-demand compile_commands.json -o reports 322ec727ea7Spatrick $ ls -F 323ec727ea7Spatrick compile_commands.json foo.cpp main.cpp reports/ 324ec727ea7Spatrick $ tree reports 325ec727ea7Spatrick reports 326ec727ea7Spatrick ├── compile_cmd.json 327ec727ea7Spatrick ├── compiler_info.json 328ec727ea7Spatrick ├── foo.cpp_53f6fbf7ab7ec9931301524b551959e2.plist 329ec727ea7Spatrick ├── main.cpp_23db3d8df52ff0812e6e5a03071c8337.plist 330ec727ea7Spatrick ├── metadata.json 331ec727ea7Spatrick └── unique_compile_commands.json 332ec727ea7Spatrick 333ec727ea7Spatrick 0 directories, 6 files 334ec727ea7Spatrick $ 335ec727ea7Spatrick 336ec727ea7SpatrickThe `plist` files contain the results of the analysis, which may be viewed with the regular analysis tools. 337ec727ea7SpatrickE.g. one may use `CodeChecker parse` to view the results in command line: 338ec727ea7Spatrick 339ec727ea7Spatrick.. code-block:: bash 340ec727ea7Spatrick 341ec727ea7Spatrick $ CodeChecker parse reports 342ec727ea7Spatrick [HIGH] /home/egbomrt/ctu_mini_raw_project/main.cpp:5:12: Division by zero [core.DivideZero] 343ec727ea7Spatrick return 3 / foo(); 344ec727ea7Spatrick ^ 345ec727ea7Spatrick 346ec727ea7Spatrick Found 1 defect(s) in main.cpp 347ec727ea7Spatrick 348ec727ea7Spatrick 349ec727ea7Spatrick ----==== Summary ====---- 350ec727ea7Spatrick ----------------------- 351ec727ea7Spatrick Filename | Report count 352ec727ea7Spatrick ----------------------- 353ec727ea7Spatrick main.cpp | 1 354ec727ea7Spatrick ----------------------- 355ec727ea7Spatrick ----------------------- 356ec727ea7Spatrick Severity | Report count 357ec727ea7Spatrick ----------------------- 358ec727ea7Spatrick HIGH | 1 359ec727ea7Spatrick ----------------------- 360ec727ea7Spatrick ----=================---- 361ec727ea7Spatrick Total number of reports: 1 362ec727ea7Spatrick ----=================---- 363ec727ea7Spatrick 364ec727ea7SpatrickOr we can use `CodeChecker parse -e html` to export the results into HTML format: 365ec727ea7Spatrick 366ec727ea7Spatrick.. code-block:: bash 367ec727ea7Spatrick 368ec727ea7Spatrick $ CodeChecker parse -e html -o html_out reports 369ec727ea7Spatrick $ firefox html_out/index.html 370ec727ea7Spatrick 371ec727ea7SpatrickAutomated CTU Analysis with scan-build-py (don't do it) 372ec727ea7Spatrick####################################################### 373ec727ea7SpatrickWe actively develop CTU with CodeChecker as the driver for feature, `scan-build-py` is not actively developed for CTU. 374ec727ea7Spatrick`scan-build-py` has various errors and issues, expect it to work only with the very basic projects only. 375ec727ea7Spatrick 376ec727ea7SpatrickCurrently On-demand analysis is not supported with `scan-build-py`. 377